Re: high-volume offline processing using cayenne?

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Mon Feb 17 2003 - 19:02:09 EST

Next message: Andrus Adamchik: "Re: java.lang.OutOfMemoryError"

Previous message: Andrus Adamchik: "Re: java.lang.OutOfMemoryError"
In reply to: Arndt Brenschede: "Re: high-volume offline processing using cayenne?"
Next in thread: Arndt Brenschede: "Re: high-volume offline processing using cayenne?"
Next in thread: Arndt Brenschede: "Re: Re: high-volume offline processing using cayenne?"
Maybe reply: Arndt Brenschede: "Re: Re: high-volume offline processing using cayenne?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Arndt,

Lets see how Cayenne can address different issues here.

1. Reading. In fact Cayenne is already optimized pretty well for batch
reading:

http://objectstyle.org/cayenne/userguide/perform/index.html#iterator

Using these features instead of raw JDBC has an obvious advantage of
reusing all the mapping info you created.

2. Batch commits. We discussed that - it should be done by Beta.

3. Maintaining low memory footprint. As mentioned earlier, simply
throwing away the whole DataContext after each commit will not be a good
solution, since you mentioned around 10000 objects that are shared
between the batches. So this is the area that will need special handling
in Cayenne. I can see a few ways to handle that:

a. complete custom handling of ObjectStore cleanup after commit. You can
create custom code to remove some entities from cache, and to preserve
some other.

b. generic solution: having a special "shared" context (EOF people,
think EOSharedEditingContext), which is not a *parent*, but rather a
*peer* of any other DataContexts. SharedDataContext will probably be
read-only (but doesn't have to be). Its important property is that all
objects it contains are "shared" and can be accessed from other
DataContexts by reference (not by copy like TopLink UnitOfWork does) as
if they were local. It also means that local objects can have
relationships to objects in the SharedDataContext (but not the other way
around).

With this you can simply throw away an instance of DataContext after
each commit, creating a new one (DataContext by itself is very
lightweight, before its cache gets filled in). At the same time "shared"
DataContext will stay around, so you won't need to refetch reusable
data, and memory footprint will stay constant.

I really like (b) - the idea of cleanly separating "configuration"
immutable objects from the objects being modified, but still maintaining
the same object graph. Unfortunately this is not planned for 1.0 and
will probably be included in the later releases.

Andrus

Arndt Brenschede wrote:
> Andrus Adamchik <andru..bjectstyle.org> schrieb am 16.02.03 19:52:38:
>
>
>>Since you are bringing up an interesting scenario for this new feature,
>>could you describe the flow some more? Are the objects mostly created in
>>memory and then saved? How big of a transactional scope you need? I
>>mean, you don't plan to keep millions of uncommitted objects in the
>>memory at once? Or do you, and you simple write them via batch one by
>>one and do a commit after that?
>
>
> Hi Andrus,
>
> the most demanding problem in terms of performance are
> business-processes that effect e.g. 500.000 out of 5 Mio
> customers/accounts. For each customer, we have to read a
> dozen objects, change a handful and create 1 or 2 new.
>
> We did technology prototypes based on either plain jdbc
> or stored procs that reach the required performance.
>
> In the plain jdbc code, the read direction was optimzed
> using select queries with " WHERE id IN (?,?,?,?,?,...)"
> to batch-read the objects for a list of primary keys,
> and the write direction used batch updates/inserts
> via addBatch()/executeBatch() on prepared statements.
>
> There's no significant performance-difference if the
> batch-size is 100 or 1000, so think of 100 customers
> (->2000 objects) as the transactional scope.
> (plus some ten thousand objects master data that
> should stay in memory during the process)
>
> Obvious problem in that code is the poor separation
> of the business-logic and jdbc-logic and the pitfall
> of simulatounsly changing 2 copies of the same object...
>
> So we need real O/R-Mapping with object identity, but
> basically keep the underlying db-access mechanism of
> batch read/write.
>
> It's clear that the reading will always require
> some explicit programming, but having a commit-
> engine doing the write transparantly (and still
> fast) would be cool...
>
> with best regards,
>
> Arndt
>
>
> --
> Dr. Arndt Brenschede
> DIAMOS AG
> Innovapark
> Am Limespark 2
> 65843 Sulzbach
>
> Tel.: +49 (0) 61 96 - 65 06 - 134
> Fax: +49 (0) 61 96 - 65 06 - 100
> mobile: +49 (0) 151 151 36 134
> mailto:arndt.brensched..iamos.com
> http://www.diamos.com
>
>

Next message: Andrus Adamchik: "Re: java.lang.OutOfMemoryError"
Previous message: Andrus Adamchik: "Re: java.lang.OutOfMemoryError"
In reply to: Arndt Brenschede: "Re: high-volume offline processing using cayenne?"
Next in thread: Arndt Brenschede: "Re: high-volume offline processing using cayenne?"
Next in thread: Arndt Brenschede: "Re: Re: high-volume offline processing using cayenne?"
Maybe reply: Arndt Brenschede: "Re: Re: high-volume offline processing using cayenne?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Mon Feb 17 2003 - 19:02:58 EST