Re: paged query slow when fetching big lists

From: Borut Bolčin (bo..ajdi.si)
Date: Thu Jul 05 2007 - 08:06:53 EDT

Next message: Andrus Adamchik: "Re: paged query slow when fetching big lists"

Previous message: Andrus Adamchik: "Re: Cayenne in a Cluster environment ends up in NullPointerException [RESOLVED]"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Reply: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello,

which version includes this corrections? 2.0.3 or 3.0?

Thanks,
Borut

On 26.6.2007 20:32, Andrus Adamchik wrote:
> Hi Marcin,
>
> I have good news (I think). Fetching just id columns inside the
> IncrementalFaultList indeed speeds things up significantly. I just
> committed the change to optimize SelectQueries to do just that. Please
> let me know how does it work for you.
>
> Now the profiling details...
>
> * I profiled on Derby and MySQL. In both cases fetching a table with
> 25 columns and 100000 rows took between 3-4 seconds (not as long as in
> your case, but what's important is relative times I guess)
> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn' method
> is spent in the driver code, rewinding the result set (i.e. brining
> the data from db to the client).
> * After my change the query time went down to 0.2-0.5 second (0.8 if
> you consider the second query needed to fault the first page). Not bad!
> * ResultSet reading still remained a bottleneck, but it became faster
> in absolute terms. And now finally Cayenne-related code (such as
> DataRow creation) started to show up on the radar (e.g. DataRow
> constructor taking 3% of the 'fillIn' method time).
>
> Andrus
>
>
>
> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
>
>> Hi Andrus !
>> Many thanks for that !
>> Marcin
>>
>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>>
>>> Hi Marcin,
>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>>>
>>> I see you still didn't use profiler, but the data you provided seems
>>> to finally confirm that at least on Derby not fetching all columns
>>> does result in significant speedup (#1 vs. #3). So now it is a
>>> question of implementing the right algorithm for the
>>> IncrementalFaultList.
>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only part
>>>> of the data. I tried to use addCustomDbAttribute("id") on client,
>>>> it resulted in returning the raw dataRows, is there something I can
>>>> do to fetch faulted objects ?
>>>
>>> We should encapsulate this logic inside IncrementalFaultList on the
>>> server.
>>>
>>>> Our application was designed to use the SelectQuery. If we have to
>>>> change that and use the SQLTemplate instead, there is a lot of work
>>>> for us, including:
>>>
>>> Same thing - the right thing to do is to fix it on the server.
>>>
>>> Let me try to find a spare minute later tonight and implement
>>> id-only fetch. I have some large tables in a MySQL5 so I can test
>>> the performance in a slightly different environment.
>>>
>>> Andrus
>>>
>>>
>>>
>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>>> Hi
>>>>
>>>> I have done some more profiling and testing.
>>>>
>>>> executing queries on table with >100000 records, directly on server
>>>> (not on client) gave results as listed below:
>>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>>>>
>>>> what more, I found that executing the SQLTemplate does allow to
>>>> fault the objects (I sometimes discover the simplest things last),
>>>> so I did try to check how long it takes for the objects to be faulted:
>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>>> 2) objects is faulted in 20ms (average)
>>>> 3) on first page first object faulted in 200ms, rest ~20 ms,
>>>> on any following page first object faulted in 30-200ms, rest
>>>> 0ms (interesting that the first page does not seem to be faulted at
>>>> all)
>>>> 4) no point testing.
>>>>
>>>> Also I did check if the resizing of the ArrayList which is keeping
>>>> the results does affect the speed, and it does not. (Tried to make
>>>> the ArrayList initial size = 150,000).
>>>>
>>>> My conclusion is that SelectQuery with paging is usable only for
>>>> fetching less than, say 10,000 records, otherwise the performance
>>>> is to low. With SQLTemplate the performance is much greater. It
>>>> applies to both ROP and 'normal' cayenne, since I made those tests
>>>> on server.
>>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only part
>>>> of the data. I tried to use addCustomDbAttribute("id") on client,
>>>> it resulted in returning the raw dataRows, is there something I can
>>>> do to fetch faulted objects ?
>>>>
>>>> Our application was designed to use the SelectQuery. If we have to
>>>> change that and use the SQLTemplate instead, there is a lot of work
>>>> for us, including:
>>>> - dealing with adding and concatenating Expressions to the
>>>> SQLTemplate (is there an easy way ?)
>>>> - dealing with declared qualifier (the one set in modeller)
>>>> - possibly more...
>>>> i would really like to avoid all of that, so if you have any ideas
>>>> on how to improve the performance without too much hassle I would
>>>> really appreciate.
>>>>
>>>> Marcin
>>>>
>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>>
>>>>> Hi Andrus
>>>>> I had not much time to check, but with the fix the 100k records
>>>>> load in 30 instead of 50 seconds. It is some improvement, but not
>>>>> enough. I'll do some more profiling tomorrow and let you know.
>>>>>
>>>>> By the way, we are using netbeans for profiling, the benefit : it
>>>>> is free. I will evaluate the yourkit as we are moving away from
>>>>> netbeans as a development platform.
>>>>>
>>>>> Marcin
>>>>>
>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>>
>>>>>> Ari, Marcin --
>>>>>>
>>>>>> going through the code I noticed one inefficiency - the elements
>>>>>> array access is synchronized in 'fillIn' method. Since 'fillIn'
>>>>>> is called from constructor, such synchronization is unneeded and
>>>>>> only slows things down. I just checked a fixed version to trunk.
>>>>>> Could you try it out?
>>>>>>
>>>>>> Andrus
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>>
>>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>>
>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/browser
>>>>>>>>
>>>>>>>>
>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is the same project we are working on. I started some
>>>>>>> performance profiling and Marcin has been able now to take it
>>>>>>> much further. What is it about:
>>>>>>>
>>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>>
>>>>>>> which is so slow? The code gets a little complex at that point
>>>>>>> and we are having difficulty tracing it through to the exact
>>>>>>> performance problem in the underlying code. Is it the speed of
>>>>>>> adding the object id to the Collection or the speed of creating
>>>>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't
>>>>>>> scale well.
>>>>>>>
>>>>>>> Andrus, I got the impression from the previous thread that you
>>>>>>> suspected something slightly different. That the performance
>>>>>>> problem might be in the fat query itself, but from our tests
>>>>>>> this isn't the case. If I've got this right, the way it works is:
>>>>>>>
>>>>>>> 1. perform regular query to get all columns but return result in
>>>>>>> iterator
>>>>>>> 2. iterate through first page and build full objects
>>>>>>> 3. iterate through other pages and build just objectids (this is
>>>>>>> the slow part for us)
>>>>>>> 4. when another page is fetched perform another query and fetch
>>>>>>> those objects from the DB
>>>>>>>
>>>>>>> So, getting just primary keys from the DB may not be any faster
>>>>>>> if the performance problem is simply in the construction of
>>>>>>> objectIds. If the performance problem is in the numerous
>>>>>>> resizings of the Collection (each time it runs out of space,
>>>>>>> then it is increased by 50% or 100% in size), then the solution
>>>>>>> could be as simple as figuring out the size of the iterator and
>>>>>>> sizing the collection appropriately from the beginning.
>>>>>>>
>>>>>>> Any ideas on how to discover the exact cause of the performance
>>>>>>> hit?
>>>>>>>
>>>>>>>
>>>>>>> Ari Maniatis
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>

-- 
bbLOG <http://borutb.tuditi.delo.si/>

Next message: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Previous message: Andrus Adamchik: "Re: Cayenne in a Cluster environment ends up in NullPointerException [RESOLVED]"
Next in thread: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Reply: Andrus Adamchik: "Re: paged query slow when fetching big lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Thu Jul 05 2007 - 08:07:24 EDT