Re: high-volume offline processing using cayenne?

From: Holger Hoffstätte (holge..izards.de)
Date: Mon Feb 17 2003 - 07:03:31 EST

  • Next message: Arndt Brenschede: "Re: high-volume offline processing using cayenne?"

    Arndt Brenschede wrote:
    > thanks for the quick response!

    any time!

    > It's Oracle in the first place, but with database independency
    > as a required feature (and DB2 in mind). So we consider the

    DB2..thanks for reminding us to write an adapter for that.
    Let us know if you want to volunteer..

    > (ashwood)
    > build, the dependecies are mostly simple). I am curious to see that
    > working.

    So are we, actually. :-)

    > > I just had a quick look and QueryAssembler.createStatement() looks like
    > > the place to check for cached PreparedStatements; can't say offhand how
    > > much the query string creation could be optimized away, probably
    > > completely.
    >
    > The query string creation on java side is unlikely to be a performance
    > hit - it is the additional turnoraound and the query compilation on the
    > db-server which is costly - so simply replacing the
    > Connection.prepareStatement() by a caching-wrapper which uses
    > the query string itself as a key to a hashmap should do.

    My first thought, but it should be a LRUMap (commons-collections has one),
    so that the query cache doesn't grow too large. Could you whip up a
    prototype implementation and test whether it really helps with
    performance? If you check out the cvs tree from SF you can run the unit
    test suite, this should give enough feedback since many queries are
    repetitive.
    Still, if we can get rid of the string fumbling (maybe later) all the
    better.

    > Timeouts would be too dangerous, as well as any other
    > "non-deterministic" mechanism (think of Reference-Objects that
    > expire when memory runs low...).

    Curiously there are quite a few cases where these 'non-deterministic'
    mechanisms work the same or even better than complicated algorithms, e.g.
    perfect' round-robin vs. random scheduling.
    In any case it has to be configurable, to avoid trashing (VM/GC heap
    ping-pong).

    > or the like. A simple solution would be to have the possibility
    > to unregister all objects that where newly registered
    > during such a 100-record cycle (that would be sufficient
    > for our processes).

    Yes - the quick & dirty way if you have fine-grained control over the
    objects, like it seems to be the case.

    > But still, then "commitChanges" will also upload the changes
    > to the objects that are not part of this "100-record-unit-of-work",
    > which may not be what's intended.
    > So a more general solution is to have a "sub-DataContext"
    > that can be committed separately (to commit the changes)
    > and then closed (to unregister the newly created objects).

    I understand, yet it's not so easy. Are you familiar with Enterprise
    Objects Framework (EOF) by NeXT/now Apple, part of WebObjects? It has
    nested contexts, and when a child commits its changes these are written to
    the parent *in memory*. Only when the parent would be committed, the
    changes would all be written to the data store; this guarantees that both
    parent and child context edits are handled in a transactional way. I'm not
    sure that writing the child to the store while keeping the modified parent
    in limbo is a good idea?
    I currently don't know whether TopLink's UnitOfWork can be nested and what
    consequences this has; I'll just check the docs and think some more about
    it.

    thanks
    Holger



    This archive was generated by hypermail 2.0.0 : Mon Feb 17 2003 - 07:05:57 EST