Re: paged query slow when fetching big lists

From: Marcin Skladaniec (marci..sh.com.au)
Date: Tue Jun 26 2007 - 23:38:18 EDT

Next message: John Armstrong: "Re: Java Client"

Previous message: Andrus Adamchik: "Re: paged query slow when fetching big lists"
In reply to: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Reply: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Andrus !

The fix you committed today works great, I can confirm, in our case
the 30-50 second SelectQuery now executes in 5-7 seconds !

There is one but: fix does work only for queries executed on server,
when I executed the query on (ROP) client, the query takes the same
amount of time ! Is it possible that the remote calls are using a
different constructor ? or maybe the isFetchingCustomAttributes()
returns true for 'remote' SelectQueries, and therefore the
constructor works as before ?
Can I ask you to check that out, I could have debug it myself, but
from previous experience I know, that you know those things straight
away, and I would only waste my time.

Marcin

On 27/06/2007, at 4:32 AM, Andrus Adamchik wrote:

> Hi Marcin,
>
> I have good news (I think). Fetching just id columns inside the
> IncrementalFaultList indeed speeds things up significantly. I just
> committed the change to optimize SelectQueries to do just that.
> Please let me know how does it work for you.
>
> Now the profiling details...
>
> * I profiled on Derby and MySQL. In both cases fetching a table
> with 25 columns and 100000 rows took between 3-4 seconds (not as
> long as in your case, but what's important is relative times I guess)
> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn'
> method is spent in the driver code, rewinding the result set (i.e.
> brining the data from db to the client).
> * After my change the query time went down to 0.2-0.5 second (0.8
> if you consider the second query needed to fault the first page).
> Not bad!
> * ResultSet reading still remained a bottleneck, but it became
> faster in absolute terms. And now finally Cayenne-related code
> (such as DataRow creation) started to show up on the radar (e.g.
> DataRow constructor taking 3% of the 'fillIn' method time).
>
> Andrus
>
>
>
> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
>
>> Hi Andrus !
>> Many thanks for that !
>> Marcin
>>
>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>>
>>> Hi Marcin,
>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50
>>>> seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5
>>>> sec.
>>>
>>> I see you still didn't use profiler, but the data you provided
>>> seems to finally confirm that at least on Derby not fetching all
>>> columns does result in significant speedup (#1 vs. #3). So now it
>>> is a question of implementing the right algorithm for the
>>> IncrementalFaultList.
>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
>>>> part of the data. I tried to use addCustomDbAttribute("id") on
>>>> client, it resulted in returning the raw dataRows, is there
>>>> something I can do to fetch faulted objects ?
>>>
>>> We should encapsulate this logic inside IncrementalFaultList on
>>> the server.
>>>
>>>> Our application was designed to use the SelectQuery. If we have
>>>> to change that and use the SQLTemplate instead, there is a lot
>>>> of work for us, including:
>>>
>>> Same thing - the right thing to do is to fix it on the server.
>>>
>>> Let me try to find a spare minute later tonight and implement id-
>>> only fetch. I have some large tables in a MySQL5 so I can test
>>> the performance in a slightly different environment.
>>>
>>> Andrus
>>>
>>>
>>>
>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>>> Hi
>>>>
>>>> I have done some more profiling and testing.
>>>>
>>>> executing queries on table with >100000 records, directly on
>>>> server (not on client) gave results as listed below:
>>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50
>>>> seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5
>>>> sec.
>>>>
>>>> what more, I found that executing the SQLTemplate does allow to
>>>> fault the objects (I sometimes discover the simplest things
>>>> last), so I did try to check how long it takes for the objects
>>>> to be faulted:
>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>>> 2) objects is faulted in 20ms (average)
>>>> 3) on first page first object faulted in 200ms, rest ~20 ms,
>>>> on any following page first object faulted in 30-200ms,
>>>> rest 0ms (interesting that the first page does not seem to be
>>>> faulted at all)
>>>> 4) no point testing.
>>>>
>>>> Also I did check if the resizing of the ArrayList which is
>>>> keeping the results does affect the speed, and it does not.
>>>> (Tried to make the ArrayList initial size = 150,000).
>>>>
>>>> My conclusion is that SelectQuery with paging is usable only for
>>>> fetching less than, say 10,000 records, otherwise the
>>>> performance is to low. With SQLTemplate the performance is much
>>>> greater. It applies to both ROP and 'normal' cayenne, since I
>>>> made those tests on server.
>>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
>>>> part of the data. I tried to use addCustomDbAttribute("id") on
>>>> client, it resulted in returning the raw dataRows, is there
>>>> something I can do to fetch faulted objects ?
>>>>
>>>> Our application was designed to use the SelectQuery. If we have
>>>> to change that and use the SQLTemplate instead, there is a lot
>>>> of work for us, including:
>>>> - dealing with adding and concatenating Expressions to the
>>>> SQLTemplate (is there an easy way ?)
>>>> - dealing with declared qualifier (the one set in modeller)
>>>> - possibly more...
>>>> i would really like to avoid all of that, so if you have any
>>>> ideas on how to improve the performance without too much hassle
>>>> I would really appreciate.
>>>>
>>>> Marcin
>>>>
>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>>
>>>>> Hi Andrus
>>>>> I had not much time to check, but with the fix the 100k records
>>>>> load in 30 instead of 50 seconds. It is some improvement, but
>>>>> not enough. I'll do some more profiling tomorrow and let you know.
>>>>>
>>>>> By the way, we are using netbeans for profiling, the benefit :
>>>>> it is free. I will evaluate the yourkit as we are moving away
>>>>> from netbeans as a development platform.
>>>>>
>>>>> Marcin
>>>>>
>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>>
>>>>>> Ari, Marcin --
>>>>>>
>>>>>> going through the code I noticed one inefficiency - the
>>>>>> elements array access is synchronized in 'fillIn' method.
>>>>>> Since 'fillIn' is called from constructor, such
>>>>>> synchronization is unneeded and only slows things down. I just
>>>>>> checked a fixed version to trunk. Could you try it out?
>>>>>>
>>>>>> Andrus
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>>
>>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>>
>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
>>>>>>>> 200705.mbox/browser
>>>>>>>>
>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is the same project we are working on. I started
>>>>>>> some performance profiling and Marcin has been able now to
>>>>>>> take it much further. What is it about:
>>>>>>>
>>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>>
>>>>>>> which is so slow? The code gets a little complex at that
>>>>>>> point and we are having difficulty tracing it through to the
>>>>>>> exact performance problem in the underlying code. Is it the
>>>>>>> speed of adding the object id to the Collection or the speed
>>>>>>> of creating an object id itself? 0.5ms doesn't sound slow,
>>>>>>> but it doesn't scale well.
>>>>>>>
>>>>>>> Andrus, I got the impression from the previous thread that
>>>>>>> you suspected something slightly different. That the
>>>>>>> performance problem might be in the fat query itself, but
>>>>>>> from our tests this isn't the case. If I've got this right,
>>>>>>> the way it works is:
>>>>>>>
>>>>>>> 1. perform regular query to get all columns but return result
>>>>>>> in iterator
>>>>>>> 2. iterate through first page and build full objects
>>>>>>> 3. iterate through other pages and build just objectids (this
>>>>>>> is the slow part for us)
>>>>>>> 4. when another page is fetched perform another query and
>>>>>>> fetch those objects from the DB
>>>>>>>
>>>>>>> So, getting just primary keys from the DB may not be any
>>>>>>> faster if the performance problem is simply in the
>>>>>>> construction of objectIds. If the performance problem is in
>>>>>>> the numerous resizings of the Collection (each time it runs
>>>>>>> out of space, then it is increased by 50% or 100% in size),
>>>>>>> then the solution could be as simple as figuring out the size
>>>>>>> of the iterator and sizing the collection appropriately from
>>>>>>> the beginning.
>>>>>>>
>>>>>>> Any ideas on how to discover the exact cause of the
>>>>>>> performance hit?
>>>>>>>
>>>>>>>
>>>>>>> Ari Maniatis
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>

Marcin

Next message: John Armstrong: "Re: Java Client"
Previous message: Andrus Adamchik: "Re: paged query slow when fetching big lists"
In reply to: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Reply: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Tue Jun 26 2007 - 23:38:57 EDT