2007/5/14, Derek Rendall <derek.rendal..mail.com>:
> OK, my memory on this stuff is now going back a year or two, but I did do
> some extensive playing around with exactly this scenario. My example died
> about 6-7 thousand records in 64 Megs - I found out why, and it seemed
> pretty reasonable reasons at the time (something to do with a data row being
> inflated to an object and then being cached in the object store as an
> object). As I had some time back then I ended up creating a version of
> Cayenne that handled pretty large data sets (I only needed 100 thousand
> records to be handled), by making some relatively minor adjustments. I did
> discuss some of the techniques on the mailing lists but I can't seem to find
> the entries now. I did find a Jira issue:
> http://issues.apache.org/cayenne/browse/CAY-294
Thanks for the comment, Derek.
Your patch would probably serve me just fine for this project (<80k
records), but I'm looking for a more general approach in light of
future projects...more importantly, because I think that this is the
sort of thing an ORM layer should support and should support well (for
an arbitrary number of records, basically), I'm inclined to look
deeper into the problem.
> Try doing the following every 100 records or so (BTW: not sure if this stuff
> actually still around :-) YMMV:
>
> getDataContext().getObjectStore().unregisterNewObjects();
> getDataContext().getObjectStore().startTrackingNewObjects();
My mileage did vary.
Your suggestion had two effects:
1.) I got to work with an order of magnitude larger datasets
(experimenting just how much bigger right now)
2.) I got unexplicable NPEs, for instance: I have a table A and a
table redundantA which holds cached data about records in A. They use
the same pk and are kept automatically in sync with triggers. However,
a.getToRedundantA() gives me null for certain records (always the same
ones?!). This should not be possible. I've checked and the data in the
database is valid, it's an application problem.
First of all, I'm rather unnerved by the fact that this should occur
in such a (seemingly) random fashion: this greatly (coupled with
similar problems when I set the DataRowStore size to low values like
1000) undermines my confidence that the objects I have in memory
represent the database state. Secondly, I'm interested to know why it
happens. Here's what I did this time:
int i=0;
while (ri.hasNextRow()) {
DataRow dr = (DataRow) ri.nextDataRow();
mca = (MyClassA)
Util.getCommonDataContext().objectFromDataRow(MyClassA.class, dr,
false);
...do something with mca ....
if (i++ == 99) {
Util.getCommonDataContext().getObjectStore().unregisterNewObjects();
Util.getCommonDataContext().getObjectStore().startTrackingNewObjects();
i=0;
}
}
I'll try to map out the memory consumption by varying the reset limit
(99) and other parameters...but the memory consumption is completely
irrelevant when compared to the problem of unreliable data.
I'd rather be back at square one than have to worry if an object is
correctly reconstructed after I do objectFromDataRow.
Cheers,
t.n.a.
This archive was generated by hypermail 2.0.0 : Mon May 14 2007 - 20:37:57 EDT