Re: paged query slow when fetching big lists

From: Marcin Skladaniec (marci..sh.com.au)
Date: Tue Jun 26 2007 - 23:38:18 EDT

  • Next message: John Armstrong: "Re: Java Client"

    Hi Andrus !

    The fix you committed today works great, I can confirm, in our case
    the 30-50 second SelectQuery now executes in 5-7 seconds !

    There is one but: fix does work only for queries executed on server,
    when I executed the query on (ROP) client, the query takes the same
    amount of time ! Is it possible that the remote calls are using a
    different constructor ? or maybe the isFetchingCustomAttributes()
    returns true for 'remote' SelectQueries, and therefore the
    constructor works as before ?
    Can I ask you to check that out, I could have debug it myself, but
    from previous experience I know, that you know those things straight
    away, and I would only waste my time.

    Marcin

    On 27/06/2007, at 4:32 AM, Andrus Adamchik wrote:

    > Hi Marcin,
    >
    > I have good news (I think). Fetching just id columns inside the
    > IncrementalFaultList indeed speeds things up significantly. I just
    > committed the change to optimize SelectQueries to do just that.
    > Please let me know how does it work for you.
    >
    > Now the profiling details...
    >
    > * I profiled on Derby and MySQL. In both cases fetching a table
    > with 25 columns and 100000 rows took between 3-4 seconds (not as
    > long as in your case, but what's important is relative times I guess)
    > * YourKit clearly showed the bottleneck: ~95% of the 'fillIn'
    > method is spent in the driver code, rewinding the result set (i.e.
    > brining the data from db to the client).
    > * After my change the query time went down to 0.2-0.5 second (0.8
    > if you consider the second query needed to fault the first page).
    > Not bad!
    > * ResultSet reading still remained a bottleneck, but it became
    > faster in absolute terms. And now finally Cayenne-related code
    > (such as DataRow creation) started to show up on the radar (e.g.
    > DataRow constructor taking 3% of the 'fillIn' method time).
    >
    > Andrus
    >
    >
    >
    > On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
    >
    >> Hi Andrus !
    >> Many thanks for that !
    >> Marcin
    >>
    >> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
    >>
    >>> Hi Marcin,
    >>>
    >>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50
    >>>> seconds.
    >>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") without paging takes 100 sec.
    >>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") with page size 10 takes 5 sec.
    >>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") with page size 10 and fetching row data rows takes 4.5
    >>>> sec.
    >>>
    >>> I see you still didn't use profiler, but the data you provided
    >>> seems to finally confirm that at least on Derby not fetching all
    >>> columns does result in significant speedup (#1 vs. #3). So now it
    >>> is a question of implementing the right algorithm for the
    >>> IncrementalFaultList.
    >>>
    >>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
    >>>> part of the data. I tried to use addCustomDbAttribute("id") on
    >>>> client, it resulted in returning the raw dataRows, is there
    >>>> something I can do to fetch faulted objects ?
    >>>
    >>> We should encapsulate this logic inside IncrementalFaultList on
    >>> the server.
    >>>
    >>>> Our application was designed to use the SelectQuery. If we have
    >>>> to change that and use the SQLTemplate instead, there is a lot
    >>>> of work for us, including:
    >>>
    >>> Same thing - the right thing to do is to fix it on the server.
    >>>
    >>> Let me try to find a spare minute later tonight and implement id-
    >>> only fetch. I have some large tables in a MySQL5 so I can test
    >>> the performance in a slightly different environment.
    >>>
    >>> Andrus
    >>>
    >>>
    >>>
    >>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
    >>>> Hi
    >>>>
    >>>> I have done some more profiling and testing.
    >>>>
    >>>> executing queries on table with >100000 records, directly on
    >>>> server (not on client) gave results as listed below:
    >>>>
    >>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50
    >>>> seconds.
    >>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") without paging takes 100 sec.
    >>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") with page size 10 takes 5 sec.
    >>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") with page size 10 and fetching row data rows takes 4.5
    >>>> sec.
    >>>>
    >>>> what more, I found that executing the SQLTemplate does allow to
    >>>> fault the objects (I sometimes discover the simplest things
    >>>> last), so I did try to check how long it takes for the objects
    >>>> to be faulted:
    >>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
    >>>> 2) objects is faulted in 20ms (average)
    >>>> 3) on first page first object faulted in 200ms, rest ~20 ms,
    >>>> on any following page first object faulted in 30-200ms,
    >>>> rest 0ms (interesting that the first page does not seem to be
    >>>> faulted at all)
    >>>> 4) no point testing.
    >>>>
    >>>> Also I did check if the resizing of the ArrayList which is
    >>>> keeping the results does affect the speed, and it does not.
    >>>> (Tried to make the ArrayList initial size = 150,000).
    >>>>
    >>>> My conclusion is that SelectQuery with paging is usable only for
    >>>> fetching less than, say 10,000 records, otherwise the
    >>>> performance is to low. With SQLTemplate the performance is much
    >>>> greater. It applies to both ROP and 'normal' cayenne, since I
    >>>> made those tests on server.
    >>>>
    >>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
    >>>> part of the data. I tried to use addCustomDbAttribute("id") on
    >>>> client, it resulted in returning the raw dataRows, is there
    >>>> something I can do to fetch faulted objects ?
    >>>>
    >>>> Our application was designed to use the SelectQuery. If we have
    >>>> to change that and use the SQLTemplate instead, there is a lot
    >>>> of work for us, including:
    >>>> - dealing with adding and concatenating Expressions to the
    >>>> SQLTemplate (is there an easy way ?)
    >>>> - dealing with declared qualifier (the one set in modeller)
    >>>> - possibly more...
    >>>> i would really like to avoid all of that, so if you have any
    >>>> ideas on how to improve the performance without too much hassle
    >>>> I would really appreciate.
    >>>>
    >>>> Marcin
    >>>>
    >>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
    >>>>
    >>>>> Hi Andrus
    >>>>> I had not much time to check, but with the fix the 100k records
    >>>>> load in 30 instead of 50 seconds. It is some improvement, but
    >>>>> not enough. I'll do some more profiling tomorrow and let you know.
    >>>>>
    >>>>> By the way, we are using netbeans for profiling, the benefit :
    >>>>> it is free. I will evaluate the yourkit as we are moving away
    >>>>> from netbeans as a development platform.
    >>>>>
    >>>>> Marcin
    >>>>>
    >>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
    >>>>>
    >>>>>> Ari, Marcin --
    >>>>>>
    >>>>>> going through the code I noticed one inefficiency - the
    >>>>>> elements array access is synchronized in 'fillIn' method.
    >>>>>> Since 'fillIn' is called from constructor, such
    >>>>>> synchronization is unneeded and only slows things down. I just
    >>>>>> checked a fixed version to trunk. Could you try it out?
    >>>>>>
    >>>>>> Andrus
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
    >>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
    >>>>>>>
    >>>>>>>> Marcin, this thread might be of interest to you ...
    >>>>>>>>
    >>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
    >>>>>>>> 200705.mbox/browser
    >>>>>>>>
    >>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
    >>>>>>>
    >>>>>>>
    >>>>>>> Yes, this is the same project we are working on. I started
    >>>>>>> some performance profiling and Marcin has been able now to
    >>>>>>> take it much further. What is it about:
    >>>>>>>
    >>>>>>>> elements.add(it.nextObjectId(entity));
    >>>>>>>
    >>>>>>> which is so slow? The code gets a little complex at that
    >>>>>>> point and we are having difficulty tracing it through to the
    >>>>>>> exact performance problem in the underlying code. Is it the
    >>>>>>> speed of adding the object id to the Collection or the speed
    >>>>>>> of creating an object id itself? 0.5ms doesn't sound slow,
    >>>>>>> but it doesn't scale well.
    >>>>>>>
    >>>>>>> Andrus, I got the impression from the previous thread that
    >>>>>>> you suspected something slightly different. That the
    >>>>>>> performance problem might be in the fat query itself, but
    >>>>>>> from our tests this isn't the case. If I've got this right,
    >>>>>>> the way it works is:
    >>>>>>>
    >>>>>>> 1. perform regular query to get all columns but return result
    >>>>>>> in iterator
    >>>>>>> 2. iterate through first page and build full objects
    >>>>>>> 3. iterate through other pages and build just objectids (this
    >>>>>>> is the slow part for us)
    >>>>>>> 4. when another page is fetched perform another query and
    >>>>>>> fetch those objects from the DB
    >>>>>>>
    >>>>>>> So, getting just primary keys from the DB may not be any
    >>>>>>> faster if the performance problem is simply in the
    >>>>>>> construction of objectIds. If the performance problem is in
    >>>>>>> the numerous resizings of the Collection (each time it runs
    >>>>>>> out of space, then it is increased by 50% or 100% in size),
    >>>>>>> then the solution could be as simple as figuring out the size
    >>>>>>> of the iterator and sizing the collection appropriately from
    >>>>>>> the beginning.
    >>>>>>>
    >>>>>>> Any ideas on how to discover the exact cause of the
    >>>>>>> performance hit?
    >>>>>>>
    >>>>>>>
    >>>>>>> Ari Maniatis
    >>>>>>>
    >>>>>>>
    >>>>
    >>>>
    >>>>
    >>>
    >>
    >>
    >>
    >>
    >

    Marcin



    This archive was generated by hypermail 2.0.0 : Tue Jun 26 2007 - 23:38:57 EDT