Re: high-volume offline processing using cayenne?

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Sun Feb 16 2003 - 13:53:38 EST

  • Next message: Andrus Adamchik: "Re: java.lang.OutOfMemoryError"

    Arndt,

    Holger already commented on this and I agree with everything he said.

    I personally tried using Cayenne for batch processing (BTW, this was an
    evaluation for the future conversion of the big wireless messaging
    system from TopLink to Cayenne). It was a very specific scenario of
    reading millions of rows from Oracle, processing info from each row and
    adding the results to the object holding the totals (business rules were
    pretty complex actually, so using simple GROUP BY didn't work), and then
    saving the resulting report objects (tens or hundreds, not too many) to
    a different set of tables. Worked pretty well, but this was optimized
    for read, not write.

    As Holger mentioned, now we are adding batch commits, using JDBC batch
    feature - the big step in optimizing writes. Reusing PreparedStatements
    is part of this effort as well. Though we already have the backend for
    this implemented, it is not used by default just yet, and public API is
    still being refactored.

    Since you are bringing up an interesting scenario for this new feature,
    could you describe the flow some more? Are the objects mostly created in
    memory and then saved? How big of a transactional scope you need? I
    mean, you don't plan to keep millions of uncommitted objects in the
    memory at once? Or do you, and you simple write them via batch one by
    one and do a commit after that?

    Andrus

    Arndt Nrenschede wrote:
    > Hi,
    >
    > I just had a deeper look at cayenne trying
    > to find out whether we could use it for
    > our project (a large back-office system).
    >
    > I got through the examples quickly and it
    > was really nice to see that everything worked
    > as expected (alpha-6).
    >
    > So, for the interactive part of our system
    > I'm sure cayenne would do the job.
    >
    > For the offline processing, however, we
    > have high volumes (millions) and tight
    > performance constraints, so
    > we cannot deviate much from plain jdbc's
    > performance.
    >
    > The features needed for that would be
    > (and I think they are not implemented,
    > or at least I didn't find them):
    >
    > - support for JDBC-batch updates during
    > "commitChanges"
    >
    > - re-use of Prepared-Statements
    >
    > - more detailed control of what happens
    > to the identy-map (Object-Store) in
    > "commitChanges"
    > The behaviour we need is to fully
    > release the volume data (e.g. accounts),
    > thus allowing them to be garbage-collected,
    > while keeping the master-data
    > (e.g. fund-properties) linked to the
    > identy-map.
    > (would require something like nested
    > DataContext's - or TopLinks "unitOfWork")
    >
    > Could some of the gurus tell me if you
    > have plans in that direction, or if
    > I just missed something?
    >
    > thanx in advance,
    >
    > Arndt
    >
    > PS: I also evaluated toplink, but that failed
    > because they support batch-writing, but messed
    > it up, and also because their concept of
    > data-comparisons (to find out what changed)
    > when commiting a unitOfWork turned out to
    > be a cpu-time grave :-(
    >
    >



    This archive was generated by hypermail 2.0.0 : Sun Feb 16 2003 - 13:54:22 EST