Re: OutOfMemoryError: reading a large number of objects one by one

From: Tomi N/A (hefes..mail.com)
Date: Mon May 14 2007 - 20:37:24 EDT

  • Next message: Marc Gabriel-Willem: "RE: Stored procedure question ?"

    2007/5/14, Derek Rendall <derek.rendal..mail.com>:
    > OK, my memory on this stuff is now going back a year or two, but I did do
    > some extensive playing around with exactly this scenario. My example died
    > about 6-7 thousand records in 64 Megs - I found out why, and it seemed
    > pretty reasonable reasons at the time (something to do with a data row being
    > inflated to an object and then being cached in the object store as an
    > object). As I had some time back then I ended up creating a version of
    > Cayenne that handled pretty large data sets (I only needed 100 thousand
    > records to be handled), by making some relatively minor adjustments. I did
    > discuss some of the techniques on the mailing lists but I can't seem to find
    > the entries now. I did find a Jira issue:
    > http://issues.apache.org/cayenne/browse/CAY-294

    Thanks for the comment, Derek.
    Your patch would probably serve me just fine for this project (<80k
    records), but I'm looking for a more general approach in light of
    future projects...more importantly, because I think that this is the
    sort of thing an ORM layer should support and should support well (for
    an arbitrary number of records, basically), I'm inclined to look
    deeper into the problem.

    > Try doing the following every 100 records or so (BTW: not sure if this stuff
    > actually still around :-) YMMV:
    >
    > getDataContext().getObjectStore().unregisterNewObjects();
    > getDataContext().getObjectStore().startTrackingNewObjects();

    My mileage did vary.
    Your suggestion had two effects:
    1.) I got to work with an order of magnitude larger datasets
    (experimenting just how much bigger right now)
    2.) I got unexplicable NPEs, for instance: I have a table A and a
    table redundantA which holds cached data about records in A. They use
    the same pk and are kept automatically in sync with triggers. However,
    a.getToRedundantA() gives me null for certain records (always the same
    ones?!). This should not be possible. I've checked and the data in the
    database is valid, it's an application problem.

    First of all, I'm rather unnerved by the fact that this should occur
    in such a (seemingly) random fashion: this greatly (coupled with
    similar problems when I set the DataRowStore size to low values like
    1000) undermines my confidence that the objects I have in memory
    represent the database state. Secondly, I'm interested to know why it
    happens. Here's what I did this time:

    int i=0;
    while (ri.hasNextRow()) {
            DataRow dr = (DataRow) ri.nextDataRow();
            mca = (MyClassA)
    Util.getCommonDataContext().objectFromDataRow(MyClassA.class, dr,
    false);
            ...do something with mca ....
            if (i++ == 99) {
                    Util.getCommonDataContext().getObjectStore().unregisterNewObjects();
                    Util.getCommonDataContext().getObjectStore().startTrackingNewObjects();
                    i=0;
            }
    }

    I'll try to map out the memory consumption by varying the reset limit
    (99) and other parameters...but the memory consumption is completely
    irrelevant when compared to the problem of unreliable data.
    I'd rather be back at square one than have to worry if an object is
    correctly reconstructed after I do objectFromDataRow.

    Cheers,
    t.n.a.



    This archive was generated by hypermail 2.0.0 : Mon May 14 2007 - 20:37:57 EDT