Re: high-volume offline processing using cayenne?

From: Arndt Brenschede (a..iamos.de)
Date: Tue Feb 25 2003 - 05:27:51 EST

Next message: Arndt Brenschede: "No need that IteratedQuery blocks the connection"

Previous message: Andrus Adamchik: "Re: Runtime Error: how to find cayenne.xml"
Maybe in reply to: Arndt Nrenschede: "high-volume offline processing using cayenne?"
Next in thread: Andrus Adamchik: "Re: high-volume offline processing using cayenne?"
Next in thread: Arndt Brenschede: "Re: Re: high-volume offline processing using cayenne?"
Maybe reply: Arndt Brenschede: "Re: Re: high-volume offline processing using cayenne?"
Reply: Andrus Adamchik: "Re: high-volume offline processing using cayenne?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Andrus,

meanwhile I tried to batch-commit feature in
the current CVS - seems to work as expected (good job!).

(It took me some time to figure out
that I have to use "commit" instead
of "commitChanges" in order to make use
of it :-) )

> Hi Arndt,
>
> Lets see how Cayenne can address different issues here.
>
> 1. Reading. In fact Cayenne is already optimized pretty well for batch
> reading:
>
> http://objectstyle.org/cayenne/userguide/perform/index.html#iterator
>
> Using these features instead of raw JDBC has an obvious advantage of
> reusing all the mapping info you created.

I understand that, but to get the pipe clean and the last bottleneck
out of the way, I also have to do the loading of relations in a
batch-like fashion. On the sql-level, that means e.g. for a ToMany
relation to query instead of:

SELECT ... FROM painting t0 where t0.gallery_id = ?

the corresponding query for a set of objects simulanously:

SELECT ... FROM painting t0 where t0.gallery_id IN ( ?, ?, ?, ...)

That's a big boost (at least on oracle), at least in the
case where "ToMany" usually means "to-very-few" (1,2,3,...),
as in ou case.

I've no clear idea how such a feature could be
integrated in the API, one possibility would be
to have a method on DataContext:

resolveRelationsForObjects( List dataObjects, String relName );

to trigger the relation-resolving for a list of
objects of the same type.

(I understand that this is nothing for your 1.0 version,
but maybe you can give me a hint how to add it as
a quick&dirty patch...)

>
> 2. Batch commits. We discussed that - it should be done by Beta.
see above.

>3. Maintaining low memory footprint. As mentioned earlier,
> simply throwing away the whole DataContext after each
> commit will not be a good solution, since you mentioned
> around 10000 objects that are shared between the batches.
> So this is the area that will need special handling in
> Cayenne. I can see a few ways to handle that:
>
> a. complete custom handling of ObjectStore cleanup
> after commit. You can create custom code to remove
> some entities from cache, and to preserve some other.
>
> b. generic solution: having a special "shared" context
> (EOF people, think EOSharedEditingContext), which is
> not a *parent*, but rather a *peer* of any other
> DataContexts. SharedDataContext will probably be
> read-only (but doesn't have to be). Its important
> property is that all objects it contains are "shared"
> and can be accessed from other DataContexts by reference
> (not by copy like TopLink UnitOfWork does) as if they
> were local. It also means that local objects can have
> relationships to objects in the SharedDataContext (but
> not the other way around).
>
> With this you can simply throw away an instance of
> DataContext after each commit, creating a new one
> (DataContext by itself is very lightweight, before
> its cache gets filled in). At the same time "shared"
> DataContext will stay around, so you won't need to
> refetch reusable data, and memory footprint will
> stay constant.
>
> I really like (b) - the idea of cleanly separating
> "configuration" immutable objects from the objects
> being modified, but still maintaining the same object
> graph. Unfortunately this is not planned for 1.0 and
> will probably be included in the later releases.

I agree that having 2 dedicated "levels" of data-contexts
(shared-config/worker-context) would be sufficient, but
I cannot see that this is really simpler than a real
recursive solution of nested data-contexts. The api
would simply be:

DataContext subContext = ctxt.createSubContext();
...
subContext .query(...);
...
subContext .commit(...);
subContext = null;

I've no real idea of how difficult that is to implement,
but I can try to dig into it...

with best regards,

Arndt

-- 
--
Dr. Arndt Brenschede
DIAMOS AG
Innovapark
Am Limespark 2
65843 Sulzbach
Tel.: +49 (0) 61 96 - 65 06 - 134
Fax: +49 (0) 61 96 - 65 06 - 100
mobile: +49 (0) 151 151 36 134
mailto:arndt.brensched..iamos.com
http://www.diamos.com

Next message: Arndt Brenschede: "No need that IteratedQuery blocks the connection"
Previous message: Andrus Adamchik: "Re: Runtime Error: how to find cayenne.xml"
Maybe in reply to: Arndt Nrenschede: "high-volume offline processing using cayenne?"
Next in thread: Andrus Adamchik: "Re: high-volume offline processing using cayenne?"
Next in thread: Arndt Brenschede: "Re: Re: high-volume offline processing using cayenne?"
Maybe reply: Arndt Brenschede: "Re: Re: high-volume offline processing using cayenne?"
Reply: Andrus Adamchik: "Re: high-volume offline processing using cayenne?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Tue Feb 25 2003 - 05:14:46 EST