Hi Andrus !
Many thanks for that !
Marcin
On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
> Hi Marcin,
>
>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>> STUDENT") without paging takes 100 sec.
>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>> STUDENT") with page size 10 takes 5 sec.
>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>
> I see you still didn't use profiler, but the data you provided
> seems to finally confirm that at least on Derby not fetching all
> columns does result in significant speedup (#1 vs. #3). So now it
> is a question of implementing the right algorithm for the
> IncrementalFaultList.
>
>> Andrus, you mentioned using addCustomDbAttribute to fetch only
>> part of the data. I tried to use addCustomDbAttribute("id") on
>> client, it resulted in returning the raw dataRows, is there
>> something I can do to fetch faulted objects ?
>
> We should encapsulate this logic inside IncrementalFaultList on the
> server.
>
>> Our application was designed to use the SelectQuery. If we have to
>> change that and use the SQLTemplate instead, there is a lot of
>> work for us, including:
>
> Same thing - the right thing to do is to fix it on the server.
>
> Let me try to find a spare minute later tonight and implement id-
> only fetch. I have some large tables in a MySQL5 so I can test the
> performance in a slightly different environment.
>
> Andrus
>
>
>
> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>> Hi
>>
>> I have done some more profiling and testing.
>>
>> executing queries on table with >100000 records, directly on
>> server (not on client) gave results as listed below:
>>
>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>> STUDENT") without paging takes 100 sec.
>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>> STUDENT") with page size 10 takes 5 sec.
>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>>
>> what more, I found that executing the SQLTemplate does allow to
>> fault the objects (I sometimes discover the simplest things last),
>> so I did try to check how long it takes for the objects to be
>> faulted:
>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>> 2) objects is faulted in 20ms (average)
>> 3) on first page first object faulted in 200ms, rest ~20 ms,
>> on any following page first object faulted in 30-200ms, rest
>> 0ms (interesting that the first page does not seem to be faulted
>> at all)
>> 4) no point testing.
>>
>> Also I did check if the resizing of the ArrayList which is keeping
>> the results does affect the speed, and it does not. (Tried to make
>> the ArrayList initial size = 150,000).
>>
>> My conclusion is that SelectQuery with paging is usable only for
>> fetching less than, say 10,000 records, otherwise the performance
>> is to low. With SQLTemplate the performance is much greater. It
>> applies to both ROP and 'normal' cayenne, since I made those tests
>> on server.
>>
>> Andrus, you mentioned using addCustomDbAttribute to fetch only
>> part of the data. I tried to use addCustomDbAttribute("id") on
>> client, it resulted in returning the raw dataRows, is there
>> something I can do to fetch faulted objects ?
>>
>> Our application was designed to use the SelectQuery. If we have to
>> change that and use the SQLTemplate instead, there is a lot of
>> work for us, including:
>> - dealing with adding and concatenating Expressions to the
>> SQLTemplate (is there an easy way ?)
>> - dealing with declared qualifier (the one set in modeller)
>> - possibly more...
>> i would really like to avoid all of that, so if you have any ideas
>> on how to improve the performance without too much hassle I would
>> really appreciate.
>>
>> Marcin
>>
>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>
>>> Hi Andrus
>>> I had not much time to check, but with the fix the 100k records
>>> load in 30 instead of 50 seconds. It is some improvement, but not
>>> enough. I'll do some more profiling tomorrow and let you know.
>>>
>>> By the way, we are using netbeans for profiling, the benefit :
>>> it is free. I will evaluate the yourkit as we are moving away
>>> from netbeans as a development platform.
>>>
>>> Marcin
>>>
>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>
>>>> Ari, Marcin --
>>>>
>>>> going through the code I noticed one inefficiency - the elements
>>>> array access is synchronized in 'fillIn' method. Since 'fillIn'
>>>> is called from constructor, such synchronization is unneeded and
>>>> only slows things down. I just checked a fixed version to trunk.
>>>> Could you try it out?
>>>>
>>>> Andrus
>>>>
>>>>
>>>>
>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>
>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>
>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
>>>>>> 200705.mbox/browser
>>>>>>
>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>
>>>>>
>>>>> Yes, this is the same project we are working on. I started some
>>>>> performance profiling and Marcin has been able now to take it
>>>>> much further. What is it about:
>>>>>
>>>>>> elements.add(it.nextObjectId(entity));
>>>>>
>>>>> which is so slow? The code gets a little complex at that point
>>>>> and we are having difficulty tracing it through to the exact
>>>>> performance problem in the underlying code. Is it the speed of
>>>>> adding the object id to the Collection or the speed of creating
>>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't
>>>>> scale well.
>>>>>
>>>>> Andrus, I got the impression from the previous thread that you
>>>>> suspected something slightly different. That the performance
>>>>> problem might be in the fat query itself, but from our tests
>>>>> this isn't the case. If I've got this right, the way it works is:
>>>>>
>>>>> 1. perform regular query to get all columns but return result
>>>>> in iterator
>>>>> 2. iterate through first page and build full objects
>>>>> 3. iterate through other pages and build just objectids (this
>>>>> is the slow part for us)
>>>>> 4. when another page is fetched perform another query and fetch
>>>>> those objects from the DB
>>>>>
>>>>> So, getting just primary keys from the DB may not be any faster
>>>>> if the performance problem is simply in the construction of
>>>>> objectIds. If the performance problem is in the numerous
>>>>> resizings of the Collection (each time it runs out of space,
>>>>> then it is increased by 50% or 100% in size), then the solution
>>>>> could be as simple as figuring out the size of the iterator and
>>>>> sizing the collection appropriately from the beginning.
>>>>>
>>>>> Any ideas on how to discover the exact cause of the performance
>>>>> hit?
>>>>>
>>>>>
>>>>> Ari Maniatis
>>>>>
>>>>>
>>
>>
>>
>
This archive was generated by hypermail 2.0.0 : Tue Jun 26 2007 - 03:56:07 EDT