Paginated Queries and memory usage

From: Derek Rendall (cayenn..sure.com)
Date: Mon Nov 01 2004 - 20:07:28 EST

  • Next message: jir..bjectstyle.org: "[OS-JIRA] Created: (CAY-227) Cannot launch modeler on OSX when logged in as user with networked home directory"

    Hi

    I have been playing around with Paginated Queries, especially with regards
    to the ability to work through sets of data in the 100,000 - 300,000 record
    range (as our current datalayer is required to do). I have added memory
    conservation "options" to the query code in order to achieve the levels of
    memory usage we require. As this is my first go at any Cayenne coding, I
    suspect I have got something wrong :-), so I want to run it past the list
    before releasing it to the rest of my team.

    First let me say, Paginated Queries and the results iterator are really
    cool capabilities. However, using a results iterator in some places in our
    code would be undesirable due to the type and length of lock that would be
    required as we process the records (although I recognise what I have done
    may be overkill, and that we should just use results iterator and live with
    the locking and server based transaction/database connection timeout
    implications :-).

    What I've done is add the (configurable) capability for a query to clean out
    the current page of data when a new page of data is needed to be loaded. The
    default option is to leave the process as it is now (as that's ideal for
    most uses people will have for a Paginated Query). The first variation is to
    invalidate the objects in the data context. The second (and more dangerous)
    variation is to remove (unregister) the objects. Removing the objects is (as
    I understand things) dangerous because if I have another reference to that
    object in the same data context, it will become a transient object, which
    may affect other parts of the code that had already loaded this object. That
    being said, when we need this level of memory optimisation, we will/can have
    a dedicated data context to work with, so its not a big problem (for us).
    Having a reference to that object in another data context doesnt seem to
    cause any problems that I can see. BTW: in both variations I copy the
    original ID map back into the query element list (is smaller than the full
    data row map that is there after it has been processed). Also, processing
    the list again works fine.

    So what are the results? I ran some simple tests and got some "performance"
    numbers out. These numbers are not to be used as an absolute measurement,
    but a relative measurement of performance (my poor laptop has to run DB2 as
    well as my test code :-)

    Set up: I was loading a fairly complex (inherited) object that had 122696
    records in the database. I ran the test as a jUnit test client that had a
    maximum memory setting of 64 meg. My query had a page size of 200, and my
    (shared, but not remotely) cache was set to a maximum of 2000 objects. I had
    Info level debugging on and used the verbosegc flag.

    The basic test code consumed ~3.5 meg of memory. The basic query (with all
    the ids loaded) consumed a further ~17 megs of memory. I ran each type of
    test 3 times and averaged the results (there was little variation for each
    run). Each test simply consisted of iterating through the list (getting a
    reference to each object).

    Test 1 - The default (current) bahaviour
    Result: out of memory error
    Number of records loaded: ~12200
    Total Memory used (after GC/before gc): 64Meg / 64Meg
    Records processed per second: ~470 (note: the gc thrashing hits this number
    hard)

    Test 2 - Invalidate the objects in the previous page
    Result: out of memory error
    Number of records loaded: ~60260
    Total Memory used (after GC/before gc): 64Meg / 64Meg
    Records processed per second: ~672 (note: less effect from the gc thrashing)

    Test 3 - Unregister the objects in the previous page
    Result: Successfully completed
    Number of records loaded: 122696
    Total Memory used (after full GC/before gc): ~23 Meg / ~46Meg (note: an
    overall increase in actual memory consumtion of 3 meg)
    Records processed per second: ~1200

    Then, just for kicks I did a comparison with a Results interator (converting
    each entry to an object). The base query added no significant amount to the
    3.5 megs used by the test itself.

    Test 1 - The default (current) bahaviour with no invalidation.removal other
    than that provided by the maximum cache size
    Result: out of memory error
    Number of records loaded: ~16500
    Total Memory used (after GC/before gc): 64Meg / 64Meg
    Records processed per second: ~840

    Test 2 - Invalidate the object after processing it
    Result: out of memory error
    Number of records loaded: ~84500
    Total Memory used (after GC/before gc): 64Meg / 64Meg
    Records processed per second: ~1480

    Test 3 - Unregister the object after processing it
    Result: Successfully completed
    Number of records loaded: 122696
    Total Memory used (after full GC/before gc): ~3.5 Meg / ~7Meg
    Records processed per second: ~3600

    So - am I doing somethng silly? Is there any big gotcha waiting to nail me?
    Would there be any interest to have alook at my code?

    Thanks

    Derek



    This archive was generated by hypermail 2.0.0 : Mon Nov 01 2004 - 20:07:34 EST