Re: paged query slow when fetching big lists

From: Borut Bolčin (bo..ajdi.si)
Date: Thu Jul 05 2007 - 08:06:53 EDT

  • Next message: Andrus Adamchik: "Re: paged query slow when fetching big lists"

    Hello,

    which version includes this corrections? 2.0.3 or 3.0?

    Thanks,
    Borut

    On 26.6.2007 20:32, Andrus Adamchik wrote:
    > Hi Marcin,
    >
    > I have good news (I think). Fetching just id columns inside the
    > IncrementalFaultList indeed speeds things up significantly. I just
    > committed the change to optimize SelectQueries to do just that. Please
    > let me know how does it work for you.
    >
    > Now the profiling details...
    >
    > * I profiled on Derby and MySQL. In both cases fetching a table with
    > 25 columns and 100000 rows took between 3-4 seconds (not as long as in
    > your case, but what's important is relative times I guess)
    > * YourKit clearly showed the bottleneck: ~95% of the 'fillIn' method
    > is spent in the driver code, rewinding the result set (i.e. brining
    > the data from db to the client).
    > * After my change the query time went down to 0.2-0.5 second (0.8 if
    > you consider the second query needed to fault the first page). Not bad!
    > * ResultSet reading still remained a bottleneck, but it became faster
    > in absolute terms. And now finally Cayenne-related code (such as
    > DataRow creation) started to show up on the radar (e.g. DataRow
    > constructor taking 3% of the 'fillIn' method time).
    >
    > Andrus
    >
    >
    >
    > On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
    >
    >> Hi Andrus !
    >> Many thanks for that !
    >> Marcin
    >>
    >> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
    >>
    >>> Hi Marcin,
    >>>
    >>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
    >>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") without paging takes 100 sec.
    >>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") with page size 10 takes 5 sec.
    >>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
    >>>
    >>> I see you still didn't use profiler, but the data you provided seems
    >>> to finally confirm that at least on Derby not fetching all columns
    >>> does result in significant speedup (#1 vs. #3). So now it is a
    >>> question of implementing the right algorithm for the
    >>> IncrementalFaultList.
    >>>
    >>>> Andrus, you mentioned using addCustomDbAttribute to fetch only part
    >>>> of the data. I tried to use addCustomDbAttribute("id") on client,
    >>>> it resulted in returning the raw dataRows, is there something I can
    >>>> do to fetch faulted objects ?
    >>>
    >>> We should encapsulate this logic inside IncrementalFaultList on the
    >>> server.
    >>>
    >>>> Our application was designed to use the SelectQuery. If we have to
    >>>> change that and use the SQLTemplate instead, there is a lot of work
    >>>> for us, including:
    >>>
    >>> Same thing - the right thing to do is to fix it on the server.
    >>>
    >>> Let me try to find a spare minute later tonight and implement
    >>> id-only fetch. I have some large tables in a MySQL5 so I can test
    >>> the performance in a slightly different environment.
    >>>
    >>> Andrus
    >>>
    >>>
    >>>
    >>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
    >>>> Hi
    >>>>
    >>>> I have done some more profiling and testing.
    >>>>
    >>>> executing queries on table with >100000 records, directly on server
    >>>> (not on client) gave results as listed below:
    >>>>
    >>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
    >>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") without paging takes 100 sec.
    >>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") with page size 10 takes 5 sec.
    >>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    >>>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
    >>>>
    >>>> what more, I found that executing the SQLTemplate does allow to
    >>>> fault the objects (I sometimes discover the simplest things last),
    >>>> so I did try to check how long it takes for the objects to be faulted:
    >>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
    >>>> 2) objects is faulted in 20ms (average)
    >>>> 3) on first page first object faulted in 200ms, rest ~20 ms,
    >>>> on any following page first object faulted in 30-200ms, rest
    >>>> 0ms (interesting that the first page does not seem to be faulted at
    >>>> all)
    >>>> 4) no point testing.
    >>>>
    >>>> Also I did check if the resizing of the ArrayList which is keeping
    >>>> the results does affect the speed, and it does not. (Tried to make
    >>>> the ArrayList initial size = 150,000).
    >>>>
    >>>> My conclusion is that SelectQuery with paging is usable only for
    >>>> fetching less than, say 10,000 records, otherwise the performance
    >>>> is to low. With SQLTemplate the performance is much greater. It
    >>>> applies to both ROP and 'normal' cayenne, since I made those tests
    >>>> on server.
    >>>>
    >>>> Andrus, you mentioned using addCustomDbAttribute to fetch only part
    >>>> of the data. I tried to use addCustomDbAttribute("id") on client,
    >>>> it resulted in returning the raw dataRows, is there something I can
    >>>> do to fetch faulted objects ?
    >>>>
    >>>> Our application was designed to use the SelectQuery. If we have to
    >>>> change that and use the SQLTemplate instead, there is a lot of work
    >>>> for us, including:
    >>>> - dealing with adding and concatenating Expressions to the
    >>>> SQLTemplate (is there an easy way ?)
    >>>> - dealing with declared qualifier (the one set in modeller)
    >>>> - possibly more...
    >>>> i would really like to avoid all of that, so if you have any ideas
    >>>> on how to improve the performance without too much hassle I would
    >>>> really appreciate.
    >>>>
    >>>> Marcin
    >>>>
    >>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
    >>>>
    >>>>> Hi Andrus
    >>>>> I had not much time to check, but with the fix the 100k records
    >>>>> load in 30 instead of 50 seconds. It is some improvement, but not
    >>>>> enough. I'll do some more profiling tomorrow and let you know.
    >>>>>
    >>>>> By the way, we are using netbeans for profiling, the benefit : it
    >>>>> is free. I will evaluate the yourkit as we are moving away from
    >>>>> netbeans as a development platform.
    >>>>>
    >>>>> Marcin
    >>>>>
    >>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
    >>>>>
    >>>>>> Ari, Marcin --
    >>>>>>
    >>>>>> going through the code I noticed one inefficiency - the elements
    >>>>>> array access is synchronized in 'fillIn' method. Since 'fillIn'
    >>>>>> is called from constructor, such synchronization is unneeded and
    >>>>>> only slows things down. I just checked a fixed version to trunk.
    >>>>>> Could you try it out?
    >>>>>>
    >>>>>> Andrus
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
    >>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
    >>>>>>>
    >>>>>>>> Marcin, this thread might be of interest to you ...
    >>>>>>>>
    >>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/browser
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
    >>>>>>>
    >>>>>>>
    >>>>>>> Yes, this is the same project we are working on. I started some
    >>>>>>> performance profiling and Marcin has been able now to take it
    >>>>>>> much further. What is it about:
    >>>>>>>
    >>>>>>>> elements.add(it.nextObjectId(entity));
    >>>>>>>
    >>>>>>> which is so slow? The code gets a little complex at that point
    >>>>>>> and we are having difficulty tracing it through to the exact
    >>>>>>> performance problem in the underlying code. Is it the speed of
    >>>>>>> adding the object id to the Collection or the speed of creating
    >>>>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't
    >>>>>>> scale well.
    >>>>>>>
    >>>>>>> Andrus, I got the impression from the previous thread that you
    >>>>>>> suspected something slightly different. That the performance
    >>>>>>> problem might be in the fat query itself, but from our tests
    >>>>>>> this isn't the case. If I've got this right, the way it works is:
    >>>>>>>
    >>>>>>> 1. perform regular query to get all columns but return result in
    >>>>>>> iterator
    >>>>>>> 2. iterate through first page and build full objects
    >>>>>>> 3. iterate through other pages and build just objectids (this is
    >>>>>>> the slow part for us)
    >>>>>>> 4. when another page is fetched perform another query and fetch
    >>>>>>> those objects from the DB
    >>>>>>>
    >>>>>>> So, getting just primary keys from the DB may not be any faster
    >>>>>>> if the performance problem is simply in the construction of
    >>>>>>> objectIds. If the performance problem is in the numerous
    >>>>>>> resizings of the Collection (each time it runs out of space,
    >>>>>>> then it is increased by 50% or 100% in size), then the solution
    >>>>>>> could be as simple as figuring out the size of the iterator and
    >>>>>>> sizing the collection appropriately from the beginning.
    >>>>>>>
    >>>>>>> Any ideas on how to discover the exact cause of the performance
    >>>>>>> hit?
    >>>>>>>
    >>>>>>>
    >>>>>>> Ari Maniatis
    >>>>>>>
    >>>>>>>
    >>>>
    >>>>
    >>>>
    >>>
    >>
    >>
    >>
    >>
    >

    -- 
    bbLOG <http://borutb.tuditi.delo.si/>
    



    This archive was generated by hypermail 2.0.0 : Thu Jul 05 2007 - 08:07:24 EDT