Re: paged query slow when fetching big lists

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Thu Jul 05 2007 - 08:21:39 EDT

  • Next message: Greg Boshart: "problems with Oracle stored procedure"

    3.0 (SVN trunk), which will hopefully be released fairly soon as 3.0M1

    Andrus

    On Jul 5, 2007, at 3:06 PM, Borut Bolčina wrote:
    > Hello,
    >
    > which version includes this corrections? 2.0.3 or 3.0?
    >
    > Thanks,
    > Borut
    >
    > On 26.6.2007 20:32, Andrus Adamchik wrote:
    >> Hi Marcin,
    >>
    >> I have good news (I think). Fetching just id columns inside the
    >> IncrementalFaultList indeed speeds things up significantly. I just
    >> committed the change to optimize SelectQueries to do just that.
    >> Please let me know how does it work for you.
    >>
    >> Now the profiling details...
    >>
    >> * I profiled on Derby and MySQL. In both cases fetching a table
    >> with 25 columns and 100000 rows took between 3-4 seconds (not as
    >> long as in your case, but what's important is relative times I guess)
    >> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn'
    >> method is spent in the driver code, rewinding the result set (i.e.
    >> brining the data from db to the client).
    >> * After my change the query time went down to 0.2-0.5 second (0.8
    >> if you consider the second query needed to fault the first page).
    >> Not bad!
    >> * ResultSet reading still remained a bottleneck, but it became
    >> faster in absolute terms. And now finally Cayenne-related code
    >> (such as DataRow creation) started to show up on the radar (e.g.
    >> DataRow constructor taking 3% of the 'fillIn' method time).
    >>
    >> Andrus
    >>
    >>
    >>
    >> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
    >>
    >>> Hi Andrus !
    >>> Many thanks for that !
    >>> Marcin
    >>>
    >>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
    >>>
    >>>> Hi Marcin,
    >>>>
    >>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50
    >>>>> seconds.
    >>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>>> STUDENT") without paging takes 100 sec.
    >>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>>> STUDENT") with page size 10 takes 5 sec.
    >>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>>> STUDENT") with page size 10 and fetching row data rows takes
    >>>>> 4.5 sec.
    >>>>
    >>>> I see you still didn't use profiler, but the data you provided
    >>>> seems to finally confirm that at least on Derby not fetching all
    >>>> columns does result in significant speedup (#1 vs. #3). So now
    >>>> it is a question of implementing the right algorithm for the
    >>>> IncrementalFaultList.
    >>>>
    >>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
    >>>>> part of the data. I tried to use addCustomDbAttribute("id") on
    >>>>> client, it resulted in returning the raw dataRows, is there
    >>>>> something I can do to fetch faulted objects ?
    >>>>
    >>>> We should encapsulate this logic inside IncrementalFaultList on
    >>>> the server.
    >>>>
    >>>>> Our application was designed to use the SelectQuery. If we have
    >>>>> to change that and use the SQLTemplate instead, there is a lot
    >>>>> of work for us, including:
    >>>>
    >>>> Same thing - the right thing to do is to fix it on the server.
    >>>>
    >>>> Let me try to find a spare minute later tonight and implement id-
    >>>> only fetch. I have some large tables in a MySQL5 so I can test
    >>>> the performance in a slightly different environment.
    >>>>
    >>>> Andrus
    >>>>
    >>>>
    >>>>
    >>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
    >>>>> Hi
    >>>>>
    >>>>> I have done some more profiling and testing.
    >>>>>
    >>>>> executing queries on table with >100000 records, directly on
    >>>>> server (not on client) gave results as listed below:
    >>>>>
    >>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50
    >>>>> seconds.
    >>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>>> STUDENT") without paging takes 100 sec.
    >>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>>> STUDENT") with page size 10 takes 5 sec.
    >>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>>> STUDENT") with page size 10 and fetching row data rows takes
    >>>>> 4.5 sec.
    >>>>>
    >>>>> what more, I found that executing the SQLTemplate does allow to
    >>>>> fault the objects (I sometimes discover the simplest things
    >>>>> last), so I did try to check how long it takes for the objects
    >>>>> to be faulted:
    >>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
    >>>>> 2) objects is faulted in 20ms (average)
    >>>>> 3) on first page first object faulted in 200ms, rest ~20 ms,
    >>>>> on any following page first object faulted in 30-200ms,
    >>>>> rest 0ms (interesting that the first page does not seem to be
    >>>>> faulted at all)
    >>>>> 4) no point testing.
    >>>>>
    >>>>> Also I did check if the resizing of the ArrayList which is
    >>>>> keeping the results does affect the speed, and it does not.
    >>>>> (Tried to make the ArrayList initial size = 150,000).
    >>>>>
    >>>>> My conclusion is that SelectQuery with paging is usable only
    >>>>> for fetching less than, say 10,000 records, otherwise the
    >>>>> performance is to low. With SQLTemplate the performance is much
    >>>>> greater. It applies to both ROP and 'normal' cayenne, since I
    >>>>> made those tests on server.
    >>>>>
    >>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
    >>>>> part of the data. I tried to use addCustomDbAttribute("id") on
    >>>>> client, it resulted in returning the raw dataRows, is there
    >>>>> something I can do to fetch faulted objects ?
    >>>>>
    >>>>> Our application was designed to use the SelectQuery. If we have
    >>>>> to change that and use the SQLTemplate instead, there is a lot
    >>>>> of work for us, including:
    >>>>> - dealing with adding and concatenating Expressions to the
    >>>>> SQLTemplate (is there an easy way ?)
    >>>>> - dealing with declared qualifier (the one set in modeller)
    >>>>> - possibly more...
    >>>>> i would really like to avoid all of that, so if you have any
    >>>>> ideas on how to improve the performance without too much hassle
    >>>>> I would really appreciate.
    >>>>>
    >>>>> Marcin
    >>>>>
    >>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
    >>>>>
    >>>>>> Hi Andrus
    >>>>>> I had not much time to check, but with the fix the 100k
    >>>>>> records load in 30 instead of 50 seconds. It is some
    >>>>>> improvement, but not enough. I'll do some more profiling
    >>>>>> tomorrow and let you know.
    >>>>>>
    >>>>>> By the way, we are using netbeans for profiling, the
    >>>>>> benefit : it is free. I will evaluate the yourkit as we are
    >>>>>> moving away from netbeans as a development platform.
    >>>>>>
    >>>>>> Marcin
    >>>>>>
    >>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
    >>>>>>
    >>>>>>> Ari, Marcin --
    >>>>>>>
    >>>>>>> going through the code I noticed one inefficiency - the
    >>>>>>> elements array access is synchronized in 'fillIn' method.
    >>>>>>> Since 'fillIn' is called from constructor, such
    >>>>>>> synchronization is unneeded and only slows things down. I
    >>>>>>> just checked a fixed version to trunk. Could you try it out?
    >>>>>>>
    >>>>>>> Andrus
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
    >>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
    >>>>>>>>
    >>>>>>>>> Marcin, this thread might be of interest to you ...
    >>>>>>>>>
    >>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
    >>>>>>>>> 200705.mbox/browser
    >>>>>>>>>
    >>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Yes, this is the same project we are working on. I started
    >>>>>>>> some performance profiling and Marcin has been able now to
    >>>>>>>> take it much further. What is it about:
    >>>>>>>>
    >>>>>>>>> elements.add(it.nextObjectId(entity));
    >>>>>>>>
    >>>>>>>> which is so slow? The code gets a little complex at that
    >>>>>>>> point and we are having difficulty tracing it through to the
    >>>>>>>> exact performance problem in the underlying code. Is it the
    >>>>>>>> speed of adding the object id to the Collection or the speed
    >>>>>>>> of creating an object id itself? 0.5ms doesn't sound slow,
    >>>>>>>> but it doesn't scale well.
    >>>>>>>>
    >>>>>>>> Andrus, I got the impression from the previous thread that
    >>>>>>>> you suspected something slightly different. That the
    >>>>>>>> performance problem might be in the fat query itself, but
    >>>>>>>> from our tests this isn't the case. If I've got this right,
    >>>>>>>> the way it works is:
    >>>>>>>>
    >>>>>>>> 1. perform regular query to get all columns but return
    >>>>>>>> result in iterator
    >>>>>>>> 2. iterate through first page and build full objects
    >>>>>>>> 3. iterate through other pages and build just objectids
    >>>>>>>> (this is the slow part for us)
    >>>>>>>> 4. when another page is fetched perform another query and
    >>>>>>>> fetch those objects from the DB
    >>>>>>>>
    >>>>>>>> So, getting just primary keys from the DB may not be any
    >>>>>>>> faster if the performance problem is simply in the
    >>>>>>>> construction of objectIds. If the performance problem is in
    >>>>>>>> the numerous resizings of the Collection (each time it runs
    >>>>>>>> out of space, then it is increased by 50% or 100% in size),
    >>>>>>>> then the solution could be as simple as figuring out the
    >>>>>>>> size of the iterator and sizing the collection appropriately
    >>>>>>>> from the beginning.
    >>>>>>>>
    >>>>>>>> Any ideas on how to discover the exact cause of the
    >>>>>>>> performance hit?
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Ari Maniatis
    >>>>>>>>
    >>>>>>>>
    >>>>>
    >>>>>
    >>>>>
    >>>>
    >>>
    >>>
    >>>
    >>>
    >>
    >
    > --
    > <blog-logo.gif>



    This archive was generated by hypermail 2.0.0 : Thu Jul 05 2007 - 08:22:05 EDT