Re: paged query slow when fetching big lists

From: Marcin Skladaniec (marci..sh.com.au)
Date: Tue Jun 26 2007 - 03:55:17 EDT

  • Next message: Florijan Stamenkovic: "Java Client"

    Hi Andrus !
    Many thanks for that !
    Marcin

    On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:

    > Hi Marcin,
    >
    >> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
    >> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >> STUDENT") without paging takes 100 sec.
    >> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >> STUDENT") with page size 10 takes 5 sec.
    >> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
    >
    > I see you still didn't use profiler, but the data you provided
    > seems to finally confirm that at least on Derby not fetching all
    > columns does result in significant speedup (#1 vs. #3). So now it
    > is a question of implementing the right algorithm for the
    > IncrementalFaultList.
    >
    >> Andrus, you mentioned using addCustomDbAttribute to fetch only
    >> part of the data. I tried to use addCustomDbAttribute("id") on
    >> client, it resulted in returning the raw dataRows, is there
    >> something I can do to fetch faulted objects ?
    >
    > We should encapsulate this logic inside IncrementalFaultList on the
    > server.
    >
    >> Our application was designed to use the SelectQuery. If we have to
    >> change that and use the SQLTemplate instead, there is a lot of
    >> work for us, including:
    >
    > Same thing - the right thing to do is to fix it on the server.
    >
    > Let me try to find a spare minute later tonight and implement id-
    > only fetch. I have some large tables in a MySQL5 so I can test the
    > performance in a slightly different environment.
    >
    > Andrus
    >
    >
    >
    > On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
    >> Hi
    >>
    >> I have done some more profiling and testing.
    >>
    >> executing queries on table with >100000 records, directly on
    >> server (not on client) gave results as listed below:
    >>
    >> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
    >> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >> STUDENT") without paging takes 100 sec.
    >> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >> STUDENT") with page size 10 takes 5 sec.
    >> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
    >>
    >> what more, I found that executing the SQLTemplate does allow to
    >> fault the objects (I sometimes discover the simplest things last),
    >> so I did try to check how long it takes for the objects to be
    >> faulted:
    >> 1) first object on every page (except first) 30-200ms, rest = 0ms
    >> 2) objects is faulted in 20ms (average)
    >> 3) on first page first object faulted in 200ms, rest ~20 ms,
    >> on any following page first object faulted in 30-200ms, rest
    >> 0ms (interesting that the first page does not seem to be faulted
    >> at all)
    >> 4) no point testing.
    >>
    >> Also I did check if the resizing of the ArrayList which is keeping
    >> the results does affect the speed, and it does not. (Tried to make
    >> the ArrayList initial size = 150,000).
    >>
    >> My conclusion is that SelectQuery with paging is usable only for
    >> fetching less than, say 10,000 records, otherwise the performance
    >> is to low. With SQLTemplate the performance is much greater. It
    >> applies to both ROP and 'normal' cayenne, since I made those tests
    >> on server.
    >>
    >> Andrus, you mentioned using addCustomDbAttribute to fetch only
    >> part of the data. I tried to use addCustomDbAttribute("id") on
    >> client, it resulted in returning the raw dataRows, is there
    >> something I can do to fetch faulted objects ?
    >>
    >> Our application was designed to use the SelectQuery. If we have to
    >> change that and use the SQLTemplate instead, there is a lot of
    >> work for us, including:
    >> - dealing with adding and concatenating Expressions to the
    >> SQLTemplate (is there an easy way ?)
    >> - dealing with declared qualifier (the one set in modeller)
    >> - possibly more...
    >> i would really like to avoid all of that, so if you have any ideas
    >> on how to improve the performance without too much hassle I would
    >> really appreciate.
    >>
    >> Marcin
    >>
    >> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
    >>
    >>> Hi Andrus
    >>> I had not much time to check, but with the fix the 100k records
    >>> load in 30 instead of 50 seconds. It is some improvement, but not
    >>> enough. I'll do some more profiling tomorrow and let you know.
    >>>
    >>> By the way, we are using netbeans for profiling, the benefit :
    >>> it is free. I will evaluate the yourkit as we are moving away
    >>> from netbeans as a development platform.
    >>>
    >>> Marcin
    >>>
    >>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
    >>>
    >>>> Ari, Marcin --
    >>>>
    >>>> going through the code I noticed one inefficiency - the elements
    >>>> array access is synchronized in 'fillIn' method. Since 'fillIn'
    >>>> is called from constructor, such synchronization is unneeded and
    >>>> only slows things down. I just checked a fixed version to trunk.
    >>>> Could you try it out?
    >>>>
    >>>> Andrus
    >>>>
    >>>>
    >>>>
    >>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
    >>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
    >>>>>
    >>>>>> Marcin, this thread might be of interest to you ...
    >>>>>>
    >>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
    >>>>>> 200705.mbox/browser
    >>>>>>
    >>>>>> Look at the "Paging and SQL queries" thread on May 25.
    >>>>>
    >>>>>
    >>>>> Yes, this is the same project we are working on. I started some
    >>>>> performance profiling and Marcin has been able now to take it
    >>>>> much further. What is it about:
    >>>>>
    >>>>>> elements.add(it.nextObjectId(entity));
    >>>>>
    >>>>> which is so slow? The code gets a little complex at that point
    >>>>> and we are having difficulty tracing it through to the exact
    >>>>> performance problem in the underlying code. Is it the speed of
    >>>>> adding the object id to the Collection or the speed of creating
    >>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't
    >>>>> scale well.
    >>>>>
    >>>>> Andrus, I got the impression from the previous thread that you
    >>>>> suspected something slightly different. That the performance
    >>>>> problem might be in the fat query itself, but from our tests
    >>>>> this isn't the case. If I've got this right, the way it works is:
    >>>>>
    >>>>> 1. perform regular query to get all columns but return result
    >>>>> in iterator
    >>>>> 2. iterate through first page and build full objects
    >>>>> 3. iterate through other pages and build just objectids (this
    >>>>> is the slow part for us)
    >>>>> 4. when another page is fetched perform another query and fetch
    >>>>> those objects from the DB
    >>>>>
    >>>>> So, getting just primary keys from the DB may not be any faster
    >>>>> if the performance problem is simply in the construction of
    >>>>> objectIds. If the performance problem is in the numerous
    >>>>> resizings of the Collection (each time it runs out of space,
    >>>>> then it is increased by 50% or 100% in size), then the solution
    >>>>> could be as simple as figuring out the size of the iterator and
    >>>>> sizing the collection appropriately from the beginning.
    >>>>>
    >>>>> Any ideas on how to discover the exact cause of the performance
    >>>>> hit?
    >>>>>
    >>>>>
    >>>>> Ari Maniatis
    >>>>>
    >>>>>
    >>
    >>
    >>
    >



    This archive was generated by hypermail 2.0.0 : Tue Jun 26 2007 - 03:56:07 EDT