Re: high-volume offline processing using cayenne?

From: Holger Hoffstätte (holge..izards.de)
Date: Sun Feb 16 2003 - 09:43:20 EST

  • Next message: Andrus Adamchik: "Re: Query problem!"

    Hello Arndt,

    good to see that you like Cayenne! :)

    Arndt Nrenschede wrote:
    > For the offline processing, however, we
    > have high volumes (millions) and tight
    > performance constraints, so
    > we cannot deviate much from plain jdbc's
    > performance.

    Understood. Do you use server-side stored procedures, or is this mostly a
    manually written loading/processing/saving process? Which database(s) do
    you use?
    I ask because initial support for stored procedures has been added to CVS
    since the last release.

    > The features needed for that would be
    > (and I think they are not implemented,
    > or at least I didn't find them):
    >
    > - support for JDBC-batch updates during
    > "commitChanges"

    Your wish shall be granted :-)
    Batching is in CVS already and should be automagically used in the next
    version, if your JDBC driver properly supports it. It is already used with
    Oracle for a new regression test application that will be part of the next
    release as well, together with a new commitChanges that makes use of a fk
    constraint resolving framework called ashwood (see the objectstyle web
    pages). All this basically works but is just not fully integrated yet.

    > - re-use of Prepared-Statements

    I think Andrus (Cayenne project lead) thought about this, together with an
    object caching strategy, but this is on the back burner until the new
    commit machinery is firmly in place and the DbAdapter layer has been
    cleaned up some more, in order to better enable DB specific features.

    I just had a quick look and QueryAssembler.createStatement() looks like
    the place to check for cached PreparedStatements; can't say offhand how
    much the query string creation could be optimized away, probably
    completely.

    > - more detailed control of what happens
    > to the identy-map (Object-Store) in
    > "commitChanges"
    > The behaviour we need is to fully
    > release the volume data (e.g. accounts),
    > thus allowing them to be garbage-collected,
    > while keeping the master-data
    > (e.g. fund-properties) linked to the
    > identy-map.
    > (would require something like nested
    > DataContext's - or TopLinks "unitOfWork")

    Currently DataContexts are isolated from each other and don't expire
    objects automagically e.g. after a certain time of inactivity, and I think
    that's very unlikely to change until 1.0 - simply because it is very
    difficult to get right (think threading, e.g in a servlet engine). We
    discussed some the aspects on cayenne-devel, this should be in the mail
    archives. Maybe Andrus will say a bit more about how you could address
    your problem. I think the simplest way would be to simply throw away the
    entire DataContext and use a fresh one for every large batch;
    alternatively, if you have fine-grained control over which objects should
    'go away', look at DataContext's invalidateObject() and
    unregisterObject(). The latter should let the DataObjects be GC'ed
    properly.

    Please continue with feedback and suggestions!

    Holger



    This archive was generated by hypermail 2.0.0 : Sun Feb 16 2003 - 09:45:42 EST