Re: paged query slow when fetching big lists

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Tue Jun 26 2007 - 14:32:27 EDT

Next message: Marcin Skladaniec: "Re: paged query slow when fetching big lists"

Previous message: Michael Lepine: "Exception: No property for arcId"
In reply to: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Next in thread: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Reply: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Marcin,

I have good news (I think). Fetching just id columns inside the
IncrementalFaultList indeed speeds things up significantly. I just
committed the change to optimize SelectQueries to do just that.
Please let me know how does it work for you.

Now the profiling details...

* I profiled on Derby and MySQL. In both cases fetching a table with
25 columns and 100000 rows took between 3-4 seconds (not as long as
in your case, but what's important is relative times I guess)
* YourKit clearly showed the bottleneck: ~95% of the 'fillIn' method
is spent in the driver code, rewinding the result set (i.e. brining
the data from db to the client).
* After my change the query time went down to 0.2-0.5 second (0.8 if
you consider the second query needed to fault the first page). Not bad!
* ResultSet reading still remained a bottleneck, but it became faster
in absolute terms. And now finally Cayenne-related code (such as
DataRow creation) started to show up on the radar (e.g. DataRow
constructor taking 3% of the 'fillIn' method time).

Andrus

On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:

> Hi Andrus !
> Many thanks for that !
> Marcin
>
> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>
>> Hi Marcin,
>>
>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>> STUDENT") without paging takes 100 sec.
>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>> STUDENT") with page size 10 takes 5 sec.
>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>> STUDENT") with page size 10 and fetching row data rows takes 4.5
>>> sec.
>>
>> I see you still didn't use profiler, but the data you provided
>> seems to finally confirm that at least on Derby not fetching all
>> columns does result in significant speedup (#1 vs. #3). So now it
>> is a question of implementing the right algorithm for the
>> IncrementalFaultList.
>>
>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
>>> part of the data. I tried to use addCustomDbAttribute("id") on
>>> client, it resulted in returning the raw dataRows, is there
>>> something I can do to fetch faulted objects ?
>>
>> We should encapsulate this logic inside IncrementalFaultList on
>> the server.
>>
>>> Our application was designed to use the SelectQuery. If we have
>>> to change that and use the SQLTemplate instead, there is a lot of
>>> work for us, including:
>>
>> Same thing - the right thing to do is to fix it on the server.
>>
>> Let me try to find a spare minute later tonight and implement id-
>> only fetch. I have some large tables in a MySQL5 so I can test the
>> performance in a slightly different environment.
>>
>> Andrus
>>
>>
>>
>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>> Hi
>>>
>>> I have done some more profiling and testing.
>>>
>>> executing queries on table with >100000 records, directly on
>>> server (not on client) gave results as listed below:
>>>
>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>> STUDENT") without paging takes 100 sec.
>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>> STUDENT") with page size 10 takes 5 sec.
>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>> STUDENT") with page size 10 and fetching row data rows takes 4.5
>>> sec.
>>>
>>> what more, I found that executing the SQLTemplate does allow to
>>> fault the objects (I sometimes discover the simplest things
>>> last), so I did try to check how long it takes for the objects to
>>> be faulted:
>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>> 2) objects is faulted in 20ms (average)
>>> 3) on first page first object faulted in 200ms, rest ~20 ms,
>>> on any following page first object faulted in 30-200ms, rest
>>> 0ms (interesting that the first page does not seem to be faulted
>>> at all)
>>> 4) no point testing.
>>>
>>> Also I did check if the resizing of the ArrayList which is
>>> keeping the results does affect the speed, and it does not.
>>> (Tried to make the ArrayList initial size = 150,000).
>>>
>>> My conclusion is that SelectQuery with paging is usable only for
>>> fetching less than, say 10,000 records, otherwise the performance
>>> is to low. With SQLTemplate the performance is much greater. It
>>> applies to both ROP and 'normal' cayenne, since I made those
>>> tests on server.
>>>
>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
>>> part of the data. I tried to use addCustomDbAttribute("id") on
>>> client, it resulted in returning the raw dataRows, is there
>>> something I can do to fetch faulted objects ?
>>>
>>> Our application was designed to use the SelectQuery. If we have
>>> to change that and use the SQLTemplate instead, there is a lot of
>>> work for us, including:
>>> - dealing with adding and concatenating Expressions to the
>>> SQLTemplate (is there an easy way ?)
>>> - dealing with declared qualifier (the one set in modeller)
>>> - possibly more...
>>> i would really like to avoid all of that, so if you have any
>>> ideas on how to improve the performance without too much hassle I
>>> would really appreciate.
>>>
>>> Marcin
>>>
>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>
>>>> Hi Andrus
>>>> I had not much time to check, but with the fix the 100k records
>>>> load in 30 instead of 50 seconds. It is some improvement, but
>>>> not enough. I'll do some more profiling tomorrow and let you know.
>>>>
>>>> By the way, we are using netbeans for profiling, the benefit :
>>>> it is free. I will evaluate the yourkit as we are moving away
>>>> from netbeans as a development platform.
>>>>
>>>> Marcin
>>>>
>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>
>>>>> Ari, Marcin --
>>>>>
>>>>> going through the code I noticed one inefficiency - the
>>>>> elements array access is synchronized in 'fillIn' method. Since
>>>>> 'fillIn' is called from constructor, such synchronization is
>>>>> unneeded and only slows things down. I just checked a fixed
>>>>> version to trunk. Could you try it out?
>>>>>
>>>>> Andrus
>>>>>
>>>>>
>>>>>
>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>
>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>
>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
>>>>>>> 200705.mbox/browser
>>>>>>>
>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>
>>>>>>
>>>>>> Yes, this is the same project we are working on. I started
>>>>>> some performance profiling and Marcin has been able now to
>>>>>> take it much further. What is it about:
>>>>>>
>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>
>>>>>> which is so slow? The code gets a little complex at that point
>>>>>> and we are having difficulty tracing it through to the exact
>>>>>> performance problem in the underlying code. Is it the speed of
>>>>>> adding the object id to the Collection or the speed of
>>>>>> creating an object id itself? 0.5ms doesn't sound slow, but it
>>>>>> doesn't scale well.
>>>>>>
>>>>>> Andrus, I got the impression from the previous thread that you
>>>>>> suspected something slightly different. That the performance
>>>>>> problem might be in the fat query itself, but from our tests
>>>>>> this isn't the case. If I've got this right, the way it works is:
>>>>>>
>>>>>> 1. perform regular query to get all columns but return result
>>>>>> in iterator
>>>>>> 2. iterate through first page and build full objects
>>>>>> 3. iterate through other pages and build just objectids (this
>>>>>> is the slow part for us)
>>>>>> 4. when another page is fetched perform another query and
>>>>>> fetch those objects from the DB
>>>>>>
>>>>>> So, getting just primary keys from the DB may not be any
>>>>>> faster if the performance problem is simply in the
>>>>>> construction of objectIds. If the performance problem is in
>>>>>> the numerous resizings of the Collection (each time it runs
>>>>>> out of space, then it is increased by 50% or 100% in size),
>>>>>> then the solution could be as simple as figuring out the size
>>>>>> of the iterator and sizing the collection appropriately from
>>>>>> the beginning.
>>>>>>
>>>>>> Any ideas on how to discover the exact cause of the
>>>>>> performance hit?
>>>>>>
>>>>>>
>>>>>> Ari Maniatis
>>>>>>
>>>>>>
>>>
>>>
>>>
>>
>
>
>
>

Next message: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Previous message: Michael Lepine: "Exception: No property for arcId"
In reply to: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Next in thread: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Reply: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Tue Jun 26 2007 - 14:32:55 EDT