Paginated Queries and memory usage

From: Derek Rendall (cayenn..sure.com)
Date: Mon Nov 01 2004 - 20:07:28 EST

Next message: jir..bjectstyle.org: "[OS-JIRA] Created: (CAY-227) Cannot launch modeler on OSX when logged in as user with networked home directory"

Previous message: jir..bjectstyle.org: "[OS-JIRA] Created: (CAY-226) Importing EOModel loses "used for locking" setting and some attributes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I have been playing around with Paginated Queries, especially with regards
to the ability to work through sets of data in the 100,000 - 300,000 record
range (as our current datalayer is required to do). I have added memory
conservation "options" to the query code in order to achieve the levels of
memory usage we require. As this is my first go at any Cayenne coding, I
suspect I have got something wrong :-), so I want to run it past the list
before releasing it to the rest of my team.

First let me say, Paginated Queries and the results iterator are really
cool capabilities. However, using a results iterator in some places in our
code would be undesirable due to the type and length of lock that would be
required as we process the records (although I recognise what I have done
may be overkill, and that we should just use results iterator and live with
the locking and server based transaction/database connection timeout
implications :-).

What I've done is add the (configurable) capability for a query to clean out
the current page of data when a new page of data is needed to be loaded. The
default option is to leave the process as it is now (as that's ideal for
most uses people will have for a Paginated Query). The first variation is to
invalidate the objects in the data context. The second (and more dangerous)
variation is to remove (unregister) the objects. Removing the objects is (as
I understand things) dangerous because if I have another reference to that
object in the same data context, it will become a transient object, which
may affect other parts of the code that had already loaded this object. That
being said, when we need this level of memory optimisation, we will/can have
a dedicated data context to work with, so its not a big problem (for us).
Having a reference to that object in another data context doesnt seem to
cause any problems that I can see. BTW: in both variations I copy the
original ID map back into the query element list (is smaller than the full
data row map that is there after it has been processed). Also, processing
the list again works fine.

So what are the results? I ran some simple tests and got some "performance"
numbers out. These numbers are not to be used as an absolute measurement,
but a relative measurement of performance (my poor laptop has to run DB2 as
well as my test code :-)

Set up: I was loading a fairly complex (inherited) object that had 122696
records in the database. I ran the test as a jUnit test client that had a
maximum memory setting of 64 meg. My query had a page size of 200, and my
(shared, but not remotely) cache was set to a maximum of 2000 objects. I had
Info level debugging on and used the verbosegc flag.

The basic test code consumed ~3.5 meg of memory. The basic query (with all
the ids loaded) consumed a further ~17 megs of memory. I ran each type of
test 3 times and averaged the results (there was little variation for each
run). Each test simply consisted of iterating through the list (getting a
reference to each object).

Test 1 - The default (current) bahaviour
Result: out of memory error
Number of records loaded: ~12200
Total Memory used (after GC/before gc): 64Meg / 64Meg
Records processed per second: ~470 (note: the gc thrashing hits this number
hard)

Test 2 - Invalidate the objects in the previous page
Result: out of memory error
Number of records loaded: ~60260
Total Memory used (after GC/before gc): 64Meg / 64Meg
Records processed per second: ~672 (note: less effect from the gc thrashing)

Test 3 - Unregister the objects in the previous page
Result: Successfully completed
Number of records loaded: 122696
Total Memory used (after full GC/before gc): ~23 Meg / ~46Meg (note: an
overall increase in actual memory consumtion of 3 meg)
Records processed per second: ~1200

Then, just for kicks I did a comparison with a Results interator (converting
each entry to an object). The base query added no significant amount to the
3.5 megs used by the test itself.

Test 1 - The default (current) bahaviour with no invalidation.removal other
than that provided by the maximum cache size
Result: out of memory error
Number of records loaded: ~16500
Total Memory used (after GC/before gc): 64Meg / 64Meg
Records processed per second: ~840

Test 2 - Invalidate the object after processing it
Result: out of memory error
Number of records loaded: ~84500
Total Memory used (after GC/before gc): 64Meg / 64Meg
Records processed per second: ~1480

Test 3 - Unregister the object after processing it
Result: Successfully completed
Number of records loaded: 122696
Total Memory used (after full GC/before gc): ~3.5 Meg / ~7Meg
Records processed per second: ~3600

So - am I doing somethng silly? Is there any big gotcha waiting to nail me?
Would there be any interest to have alook at my code?

Thanks

Derek

Next message: jir..bjectstyle.org: "[OS-JIRA] Created: (CAY-227) Cannot launch modeler on OSX when logged in as user with networked home directory"
Previous message: jir..bjectstyle.org: "[OS-JIRA] Created: (CAY-226) Importing EOModel loses "used for locking" setting and some attributes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Mon Nov 01 2004 - 20:07:34 EST