Re: paged query slow when fetching big lists

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Fri Jun 22 2007 - 11:10:15 EDT

  • Next message: Craig L Russell: "Re: AW: creating/saving transient objects to database"

    Hi Marcin,

    > * fetch only Pk columns and create all ObjectIds at once, get rid
    > of the iterating process if possible
    > * use already existing method resolveInterval() to fault the
    > required range of records

    This strategy was discussed in the May thread with Ari (the one that
    Michael Gentry mentioned). My vote is +0, meaning that before we make
    this change, I want to confirm first that it has a visible impact on
    performance. Could you possibly make such change locally and see if
    it helps? (Look at SelectQuery.addCustomDbAttribute() to only include
    PK; if you have problems making this change, ping me via the dev list
    - I'll try my best to help).

    > If the creation of ObjectId and getting the results from ResultSet
    > cannot be speed up (because it simply has to happen, and it does
    > not depend on the way it is done), the only choice will be to
    > implement some more complex solution using sql LIMIT statement.

    I'd love to avoid that, as the data you get may as well be different
    the next time you resolve a page, so you may end up with duplicates
    or skipped records. If we ever go this way, we'll probably need to
    make it a user choice (use LIMIT vs. IncrementalFaultList).

    Andrus

    On Jun 22, 2007, at 2:35 AM, Marcin Skladaniec wrote:

    > Hi
    > Recently we have found that fetching a list of 100,000 records
    > using ROP with paging and no cache takes a long time, about 50
    > seconds in our case. We have profiled the cpu usage and the result
    > shows that 99% of time is spent in IncrementalFaultList, within the
    > fillIn() method.
    >
    > The fillIn method works (in my opinion) in a bit strange fashion:
    > it does execute query at once, stores the query result in
    > java.sql.ResultSet, and than iterates through the result either
    > creating the whole DataRow or just ObjectId. If there is a need the
    > DataRows are faulted at the end of the method.
    > From our testing it came up that this bit of code :
    >
    > while (it.hasNextRow()) {
    > elements.add(it.nextObjectId(entity));
    > }
    >
    > is where all the time is spent. Each iteration in this loop takes
    > about 0.5ms, which multiplied by 100,000 takes almost 50 seconds.
    > nextObjectId method consists of two parts: fetching the next result
    > from ResultSet and creating a ObjectId, but I was unable to check
    > which one takes the most time, anyway I think that this approach is
    > somewhat wrong, since always 99% of the records will be fetched as
    > ObjectId and never faulted, so my ideas to enhance this are:
    > * fetch only Pk columns and create all ObjectIds at once, get rid
    > of the iterating process if possible
    > * use already existing method resolveInterval() to fault the
    > required range of records
    > If the creation of ObjectId and getting the results from ResultSet
    > cannot be speed up (because it simply has to happen, and it does
    > not depend on the way it is done), the only choice will be to
    > implement some more complex solution using sql LIMIT statement.
    >
    > I would like to mention that we are using some DataContext
    > decorators and life-cycle callbacks, but I don't believe those are
    > important factors in this case.
    >
    > Whatever is the solution, i think it is pretty crucial that it will
    > be implemented soon, since the usability of the ROP without fast
    > paging is rather low.
    >
    > With regards
    > Marcin
    > -------------------------->
    > ish
    > http://www.ish.com.au
    > Level 1, 30 Wilson Street Newtown 2042 Australia
    > phone +61 2 9550 5001 fax +61 2 9550 4001
    >
    >
    >



    This archive was generated by hypermail 2.0.0 : Fri Jun 22 2007 - 11:10:41 EDT