Re: high-volume offline processing using cayenne?

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Mon Feb 17 2003 - 19:02:09 EST

  • Next message: Andrus Adamchik: "Re: java.lang.OutOfMemoryError"

    Hi Arndt,

    Lets see how Cayenne can address different issues here.

    1. Reading. In fact Cayenne is already optimized pretty well for batch
    reading:

        http://objectstyle.org/cayenne/userguide/perform/index.html#iterator

    Using these features instead of raw JDBC has an obvious advantage of
    reusing all the mapping info you created.

    2. Batch commits. We discussed that - it should be done by Beta.

    3. Maintaining low memory footprint. As mentioned earlier, simply
    throwing away the whole DataContext after each commit will not be a good
    solution, since you mentioned around 10000 objects that are shared
    between the batches. So this is the area that will need special handling
    in Cayenne. I can see a few ways to handle that:

    a. complete custom handling of ObjectStore cleanup after commit. You can
    create custom code to remove some entities from cache, and to preserve
    some other.

    b. generic solution: having a special "shared" context (EOF people,
    think EOSharedEditingContext), which is not a *parent*, but rather a
    *peer* of any other DataContexts. SharedDataContext will probably be
    read-only (but doesn't have to be). Its important property is that all
    objects it contains are "shared" and can be accessed from other
    DataContexts by reference (not by copy like TopLink UnitOfWork does) as
    if they were local. It also means that local objects can have
    relationships to objects in the SharedDataContext (but not the other way
    around).

    With this you can simply throw away an instance of DataContext after
    each commit, creating a new one (DataContext by itself is very
    lightweight, before its cache gets filled in). At the same time "shared"
    DataContext will stay around, so you won't need to refetch reusable
    data, and memory footprint will stay constant.

    I really like (b) - the idea of cleanly separating "configuration"
    immutable objects from the objects being modified, but still maintaining
    the same object graph. Unfortunately this is not planned for 1.0 and
    will probably be included in the later releases.

    Andrus

    Arndt Brenschede wrote:
    > Andrus Adamchik <andru..bjectstyle.org> schrieb am 16.02.03 19:52:38:
    >
    >
    >>Since you are bringing up an interesting scenario for this new feature,
    >>could you describe the flow some more? Are the objects mostly created in
    >>memory and then saved? How big of a transactional scope you need? I
    >>mean, you don't plan to keep millions of uncommitted objects in the
    >>memory at once? Or do you, and you simple write them via batch one by
    >>one and do a commit after that?
    >
    >
    > Hi Andrus,
    >
    > the most demanding problem in terms of performance are
    > business-processes that effect e.g. 500.000 out of 5 Mio
    > customers/accounts. For each customer, we have to read a
    > dozen objects, change a handful and create 1 or 2 new.
    >
    > We did technology prototypes based on either plain jdbc
    > or stored procs that reach the required performance.
    >
    > In the plain jdbc code, the read direction was optimzed
    > using select queries with " WHERE id IN (?,?,?,?,?,...)"
    > to batch-read the objects for a list of primary keys,
    > and the write direction used batch updates/inserts
    > via addBatch()/executeBatch() on prepared statements.
    >
    > There's no significant performance-difference if the
    > batch-size is 100 or 1000, so think of 100 customers
    > (->2000 objects) as the transactional scope.
    > (plus some ten thousand objects master data that
    > should stay in memory during the process)
    >
    > Obvious problem in that code is the poor separation
    > of the business-logic and jdbc-logic and the pitfall
    > of simulatounsly changing 2 copies of the same object...
    >
    > So we need real O/R-Mapping with object identity, but
    > basically keep the underlying db-access mechanism of
    > batch read/write.
    >
    > It's clear that the reading will always require
    > some explicit programming, but having a commit-
    > engine doing the write transparantly (and still
    > fast) would be cool...
    >
    > with best regards,
    >
    > Arndt
    >
    >
    > --
    > Dr. Arndt Brenschede
    > DIAMOS AG
    > Innovapark
    > Am Limespark 2
    > 65843 Sulzbach
    >
    > Tel.: +49 (0) 61 96 - 65 06 - 134
    > Fax: +49 (0) 61 96 - 65 06 - 100
    > mobile: +49 (0) 151 151 36 134
    > mailto:arndt.brensched..iamos.com
    > http://www.diamos.com
    >
    >



    This archive was generated by hypermail 2.0.0 : Mon Feb 17 2003 - 19:02:58 EST