Re: high-volume offline processing using cayenne?

From: Holger Hoffst�tte (holge..izards.de)
Date: Sun Feb 16 2003 - 09:43:20 EST

Next message: Andrus Adamchik: "Re: Query problem!"

Previous message: Michael Schuldt: "Query problem!"
In reply to: Arndt Nrenschede: "high-volume offline processing using cayenne?"
Next in thread: Andrus Adamchik: "Re: high-volume offline processing using cayenne?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello Arndt,

good to see that you like Cayenne! :)

Arndt Nrenschede wrote:
> For the offline processing, however, we
> have high volumes (millions) and tight
> performance constraints, so
> we cannot deviate much from plain jdbc's
> performance.

Understood. Do you use server-side stored procedures, or is this mostly a
manually written loading/processing/saving process? Which database(s) do
you use?
I ask because initial support for stored procedures has been added to CVS
since the last release.

> The features needed for that would be
> (and I think they are not implemented,
> or at least I didn't find them):
>
> - support for JDBC-batch updates during
> "commitChanges"

Your wish shall be granted :-)
Batching is in CVS already and should be automagically used in the next
version, if your JDBC driver properly supports it. It is already used with
Oracle for a new regression test application that will be part of the next
release as well, together with a new commitChanges that makes use of a fk
constraint resolving framework called ashwood (see the objectstyle web
pages). All this basically works but is just not fully integrated yet.

> - re-use of Prepared-Statements

I think Andrus (Cayenne project lead) thought about this, together with an
object caching strategy, but this is on the back burner until the new
commit machinery is firmly in place and the DbAdapter layer has been
cleaned up some more, in order to better enable DB specific features.

I just had a quick look and QueryAssembler.createStatement() looks like
the place to check for cached PreparedStatements; can't say offhand how
much the query string creation could be optimized away, probably
completely.

> - more detailed control of what happens
> to the identy-map (Object-Store) in
> "commitChanges"
> The behaviour we need is to fully
> release the volume data (e.g. accounts),
> thus allowing them to be garbage-collected,
> while keeping the master-data
> (e.g. fund-properties) linked to the
> identy-map.
> (would require something like nested
> DataContext's - or TopLinks "unitOfWork")

Currently DataContexts are isolated from each other and don't expire
objects automagically e.g. after a certain time of inactivity, and I think
that's very unlikely to change until 1.0 - simply because it is very
difficult to get right (think threading, e.g in a servlet engine). We
discussed some the aspects on cayenne-devel, this should be in the mail
archives. Maybe Andrus will say a bit more about how you could address
your problem. I think the simplest way would be to simply throw away the
entire DataContext and use a fresh one for every large batch;
alternatively, if you have fine-grained control over which objects should
'go away', look at DataContext's invalidateObject() and
unregisterObject(). The latter should let the DataObjects be GC'ed
properly.

Please continue with feedback and suggestions!

Holger

Next message: Andrus Adamchik: "Re: Query problem!"
Previous message: Michael Schuldt: "Query problem!"
In reply to: Arndt Nrenschede: "high-volume offline processing using cayenne?"
Next in thread: Andrus Adamchik: "Re: high-volume offline processing using cayenne?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Sun Feb 16 2003 - 09:45:42 EST