Re: paged query slow when fetching big lists

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Tue Jun 26 2007 - 03:39:35 EDT

Next message: Marcin Skladaniec: "Re: paged query slow when fetching big lists"

Previous message: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
In reply to: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Next in thread: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Reply: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Marcin,

> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
> STUDENT") without paging takes 100 sec.
> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
> STUDENT") with page size 10 takes 5 sec.
> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.

I see you still didn't use profiler, but the data you provided seems
to finally confirm that at least on Derby not fetching all columns
does result in significant speedup (#1 vs. #3). So now it is a
question of implementing the right algorithm for the
IncrementalFaultList.

> Andrus, you mentioned using addCustomDbAttribute to fetch only part
> of the data. I tried to use addCustomDbAttribute("id") on client,
> it resulted in returning the raw dataRows, is there something I can
> do to fetch faulted objects ?

We should encapsulate this logic inside IncrementalFaultList on the
server.

> Our application was designed to use the SelectQuery. If we have to
> change that and use the SQLTemplate instead, there is a lot of work
> for us, including:

Same thing - the right thing to do is to fix it on the server.

Let me try to find a spare minute later tonight and implement id-only
fetch. I have some large tables in a MySQL5 so I can test the
performance in a slightly different environment.

Andrus

On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
> Hi
>
> I have done some more profiling and testing.
>
> executing queries on table with >100000 records, directly on server
> (not on client) gave results as listed below:
>
> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
> STUDENT") without paging takes 100 sec.
> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
> STUDENT") with page size 10 takes 5 sec.
> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>
> what more, I found that executing the SQLTemplate does allow to
> fault the objects (I sometimes discover the simplest things last),
> so I did try to check how long it takes for the objects to be faulted:
> 1) first object on every page (except first) 30-200ms, rest = 0ms
> 2) objects is faulted in 20ms (average)
> 3) on first page first object faulted in 200ms, rest ~20 ms,
> on any following page first object faulted in 30-200ms, rest
> 0ms (interesting that the first page does not seem to be faulted at
> all)
> 4) no point testing.
>
> Also I did check if the resizing of the ArrayList which is keeping
> the results does affect the speed, and it does not. (Tried to make
> the ArrayList initial size = 150,000).
>
> My conclusion is that SelectQuery with paging is usable only for
> fetching less than, say 10,000 records, otherwise the performance
> is to low. With SQLTemplate the performance is much greater. It
> applies to both ROP and 'normal' cayenne, since I made those tests
> on server.
>
> Andrus, you mentioned using addCustomDbAttribute to fetch only part
> of the data. I tried to use addCustomDbAttribute("id") on client,
> it resulted in returning the raw dataRows, is there something I can
> do to fetch faulted objects ?
>
> Our application was designed to use the SelectQuery. If we have to
> change that and use the SQLTemplate instead, there is a lot of work
> for us, including:
> - dealing with adding and concatenating Expressions to the
> SQLTemplate (is there an easy way ?)
> - dealing with declared qualifier (the one set in modeller)
> - possibly more...
> i would really like to avoid all of that, so if you have any ideas
> on how to improve the performance without too much hassle I would
> really appreciate.
>
> Marcin
>
> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>
>> Hi Andrus
>> I had not much time to check, but with the fix the 100k records
>> load in 30 instead of 50 seconds. It is some improvement, but not
>> enough. I'll do some more profiling tomorrow and let you know.
>>
>> By the way, we are using netbeans for profiling, the benefit : it
>> is free. I will evaluate the yourkit as we are moving away from
>> netbeans as a development platform.
>>
>> Marcin
>>
>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>
>>> Ari, Marcin --
>>>
>>> going through the code I noticed one inefficiency - the elements
>>> array access is synchronized in 'fillIn' method. Since 'fillIn'
>>> is called from constructor, such synchronization is unneeded and
>>> only slows things down. I just checked a fixed version to trunk.
>>> Could you try it out?
>>>
>>> Andrus
>>>
>>>
>>>
>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>
>>>>> Marcin, this thread might be of interest to you ...
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
>>>>> 200705.mbox/browser
>>>>>
>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>
>>>>
>>>> Yes, this is the same project we are working on. I started some
>>>> performance profiling and Marcin has been able now to take it
>>>> much further. What is it about:
>>>>
>>>>> elements.add(it.nextObjectId(entity));
>>>>
>>>> which is so slow? The code gets a little complex at that point
>>>> and we are having difficulty tracing it through to the exact
>>>> performance problem in the underlying code. Is it the speed of
>>>> adding the object id to the Collection or the speed of creating
>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't
>>>> scale well.
>>>>
>>>> Andrus, I got the impression from the previous thread that you
>>>> suspected something slightly different. That the performance
>>>> problem might be in the fat query itself, but from our tests
>>>> this isn't the case. If I've got this right, the way it works is:
>>>>
>>>> 1. perform regular query to get all columns but return result in
>>>> iterator
>>>> 2. iterate through first page and build full objects
>>>> 3. iterate through other pages and build just objectids (this is
>>>> the slow part for us)
>>>> 4. when another page is fetched perform another query and fetch
>>>> those objects from the DB
>>>>
>>>> So, getting just primary keys from the DB may not be any faster
>>>> if the performance problem is simply in the construction of
>>>> objectIds. If the performance problem is in the numerous
>>>> resizings of the Collection (each time it runs out of space,
>>>> then it is increased by 50% or 100% in size), then the solution
>>>> could be as simple as figuring out the size of the iterator and
>>>> sizing the collection appropriately from the beginning.
>>>>
>>>> Any ideas on how to discover the exact cause of the performance
>>>> hit?
>>>>
>>>>
>>>> Ari Maniatis
>>>>
>>>>
>
>
>

Next message: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Previous message: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
In reply to: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Next in thread: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Reply: Marcin Skladaniec: "Re: paged query slow when fetching big lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Tue Jun 26 2007 - 03:40:04 EDT