Re: paged query slow when fetching big lists

From: Marcin Skladaniec (marci..sh.com.au)
Date: Tue Jun 26 2007 - 02:26:40 EDT

  • Next message: Andrus Adamchik: "Re: paged query slow when fetching big lists"

    Hi

    I have done some more profiling and testing.

    executing queries on table with >100000 records, directly on server
    (not on client) gave results as listed below:

    1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
    2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    STUDENT") without paging takes 100 sec.
    3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    STUDENT") with page size 10 takes 5 sec.
    4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.

    what more, I found that executing the SQLTemplate does allow to fault
    the objects (I sometimes discover the simplest things last), so I did
    try to check how long it takes for the objects to be faulted:
    1) first object on every page (except first) 30-200ms, rest = 0ms
    2) objects is faulted in 20ms (average)
    3) on first page first object faulted in 200ms, rest ~20 ms,
          on any following page first object faulted in 30-200ms, rest
    0ms (interesting that the first page does not seem to be faulted at all)
    4) no point testing.

    Also I did check if the resizing of the ArrayList which is keeping
    the results does affect the speed, and it does not. (Tried to make
    the ArrayList initial size = 150,000).

    My conclusion is that SelectQuery with paging is usable only for
    fetching less than, say 10,000 records, otherwise the performance is
    to low. With SQLTemplate the performance is much greater. It applies
    to both ROP and 'normal' cayenne, since I made those tests on server.

    Andrus, you mentioned using addCustomDbAttribute to fetch only part
    of the data. I tried to use addCustomDbAttribute("id") on client, it
    resulted in returning the raw dataRows, is there something I can do
    to fetch faulted objects ?

    Our application was designed to use the SelectQuery. If we have to
    change that and use the SQLTemplate instead, there is a lot of work
    for us, including:
    - dealing with adding and concatenating Expressions to the
    SQLTemplate (is there an easy way ?)
    - dealing with declared qualifier (the one set in modeller)
    - possibly more...
    i would really like to avoid all of that, so if you have any ideas on
    how to improve the performance without too much hassle I would really
    appreciate.

    Marcin

    On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:

    > Hi Andrus
    > I had not much time to check, but with the fix the 100k records
    > load in 30 instead of 50 seconds. It is some improvement, but not
    > enough. I'll do some more profiling tomorrow and let you know.
    >
    > By the way, we are using netbeans for profiling, the benefit : it
    > is free. I will evaluate the yourkit as we are moving away from
    > netbeans as a development platform.
    >
    > Marcin
    >
    > On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
    >
    >> Ari, Marcin --
    >>
    >> going through the code I noticed one inefficiency - the elements
    >> array access is synchronized in 'fillIn' method. Since 'fillIn' is
    >> called from constructor, such synchronization is unneeded and only
    >> slows things down. I just checked a fixed version to trunk. Could
    >> you try it out?
    >>
    >> Andrus
    >>
    >>
    >>
    >> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
    >>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
    >>>
    >>>> Marcin, this thread might be of interest to you ...
    >>>>
    >>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/
    >>>> browser
    >>>>
    >>>> Look at the "Paging and SQL queries" thread on May 25.
    >>>
    >>>
    >>> Yes, this is the same project we are working on. I started some
    >>> performance profiling and Marcin has been able now to take it
    >>> much further. What is it about:
    >>>
    >>>> elements.add(it.nextObjectId(entity));
    >>>
    >>> which is so slow? The code gets a little complex at that point
    >>> and we are having difficulty tracing it through to the exact
    >>> performance problem in the underlying code. Is it the speed of
    >>> adding the object id to the Collection or the speed of creating
    >>> an object id itself? 0.5ms doesn't sound slow, but it doesn't
    >>> scale well.
    >>>
    >>> Andrus, I got the impression from the previous thread that you
    >>> suspected something slightly different. That the performance
    >>> problem might be in the fat query itself, but from our tests this
    >>> isn't the case. If I've got this right, the way it works is:
    >>>
    >>> 1. perform regular query to get all columns but return result in
    >>> iterator
    >>> 2. iterate through first page and build full objects
    >>> 3. iterate through other pages and build just objectids (this is
    >>> the slow part for us)
    >>> 4. when another page is fetched perform another query and fetch
    >>> those objects from the DB
    >>>
    >>> So, getting just primary keys from the DB may not be any faster
    >>> if the performance problem is simply in the construction of
    >>> objectIds. If the performance problem is in the numerous
    >>> resizings of the Collection (each time it runs out of space, then
    >>> it is increased by 50% or 100% in size), then the solution could
    >>> be as simple as figuring out the size of the iterator and sizing
    >>> the collection appropriately from the beginning.
    >>>
    >>> Any ideas on how to discover the exact cause of the performance hit?
    >>>
    >>>
    >>> Ari Maniatis
    >>>
    >>>



    This archive was generated by hypermail 2.0.0 : Tue Jun 26 2007 - 02:27:14 EDT