Re: high-volume offline processing using cayenne?

From: Arndt Brenschede (a..iamos.de)
Date: Tue Feb 25 2003 - 05:27:51 EST

  • Next message: Arndt Brenschede: "No need that IteratedQuery blocks the connection"

    Hi Andrus,

    meanwhile I tried to batch-commit feature in
    the current CVS - seems to work as expected (good job!).

    (It took me some time to figure out
    that I have to use "commit" instead
    of "commitChanges" in order to make use
    of it :-) )

    > Hi Arndt,
    >
    > Lets see how Cayenne can address different issues here.
    >
    > 1. Reading. In fact Cayenne is already optimized pretty well for batch
    > reading:
    >
    > http://objectstyle.org/cayenne/userguide/perform/index.html#iterator
    >
    > Using these features instead of raw JDBC has an obvious advantage of
    > reusing all the mapping info you created.

    I understand that, but to get the pipe clean and the last bottleneck
    out of the way, I also have to do the loading of relations in a
    batch-like fashion. On the sql-level, that means e.g. for a ToMany
    relation to query instead of:

    SELECT ... FROM painting t0 where t0.gallery_id = ?

    the corresponding query for a set of objects simulanously:

    SELECT ... FROM painting t0 where t0.gallery_id IN ( ?, ?, ?, ...)

    That's a big boost (at least on oracle), at least in the
    case where "ToMany" usually means "to-very-few" (1,2,3,...),
    as in ou case.

    I've no clear idea how such a feature could be
    integrated in the API, one possibility would be
    to have a method on DataContext:

       resolveRelationsForObjects( List dataObjects, String relName );

    to trigger the relation-resolving for a list of
    objects of the same type.

    (I understand that this is nothing for your 1.0 version,
    but maybe you can give me a hint how to add it as
    a quick&dirty patch...)

    >
    > 2. Batch commits. We discussed that - it should be done by Beta.
    see above.

    >3. Maintaining low memory footprint. As mentioned earlier,
    > simply throwing away the whole DataContext after each
    > commit will not be a good solution, since you mentioned
    > around 10000 objects that are shared between the batches.
    > So this is the area that will need special handling in
    > Cayenne. I can see a few ways to handle that:
    >
    > a. complete custom handling of ObjectStore cleanup
    > after commit. You can create custom code to remove
    > some entities from cache, and to preserve some other.
    >
    > b. generic solution: having a special "shared" context
    > (EOF people, think EOSharedEditingContext), which is
    > not a *parent*, but rather a *peer* of any other
    > DataContexts. SharedDataContext will probably be
    > read-only (but doesn't have to be). Its important
    > property is that all objects it contains are "shared"
    > and can be accessed from other DataContexts by reference
    > (not by copy like TopLink UnitOfWork does) as if they
    > were local. It also means that local objects can have
    > relationships to objects in the SharedDataContext (but
    > not the other way around).
    >
    > With this you can simply throw away an instance of
    > DataContext after each commit, creating a new one
    > (DataContext by itself is very lightweight, before
    > its cache gets filled in). At the same time "shared"
    > DataContext will stay around, so you won't need to
    > refetch reusable data, and memory footprint will
    > stay constant.
    >
    > I really like (b) - the idea of cleanly separating
    > "configuration" immutable objects from the objects
    > being modified, but still maintaining the same object
    > graph. Unfortunately this is not planned for 1.0 and
    > will probably be included in the later releases.

    I agree that having 2 dedicated "levels" of data-contexts
    (shared-config/worker-context) would be sufficient, but
    I cannot see that this is really simpler than a real
    recursive solution of nested data-contexts. The api
    would simply be:

    DataContext subContext = ctxt.createSubContext();
    ...
    subContext .query(...);
    ...
    subContext .commit(...);
    subContext = null;

    I've no real idea of how difficult that is to implement,
    but I can try to dig into it...

    with best regards,

    Arndt

    -- 
    --
    Dr. Arndt Brenschede
    DIAMOS AG
    Innovapark
    Am Limespark 2
    65843 Sulzbach
    

    Tel.: +49 (0) 61 96 - 65 06 - 134 Fax: +49 (0) 61 96 - 65 06 - 100 mobile: +49 (0) 151 151 36 134 mailto:arndt.brensched..iamos.com http://www.diamos.com



    This archive was generated by hypermail 2.0.0 : Tue Feb 25 2003 - 05:14:46 EST