3.0 (SVN trunk), which will hopefully be released fairly soon as 3.0M1
Andrus
On Jul 5, 2007, at 3:06 PM, Borut Bolčina wrote:
> Hello,
>
> which version includes this corrections? 2.0.3 or 3.0?
>
> Thanks,
> Borut
>
> On 26.6.2007 20:32, Andrus Adamchik wrote:
>> Hi Marcin,
>>
>> I have good news (I think). Fetching just id columns inside the
>> IncrementalFaultList indeed speeds things up significantly. I just
>> committed the change to optimize SelectQueries to do just that.
>> Please let me know how does it work for you.
>>
>> Now the profiling details...
>>
>> * I profiled on Derby and MySQL. In both cases fetching a table
>> with 25 columns and 100000 rows took between 3-4 seconds (not as
>> long as in your case, but what's important is relative times I guess)
>> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn'
>> method is spent in the driver code, rewinding the result set (i.e.
>> brining the data from db to the client).
>> * After my change the query time went down to 0.2-0.5 second (0.8
>> if you consider the second query needed to fault the first page).
>> Not bad!
>> * ResultSet reading still remained a bottleneck, but it became
>> faster in absolute terms. And now finally Cayenne-related code
>> (such as DataRow creation) started to show up on the radar (e.g.
>> DataRow constructor taking 3% of the 'fillIn' method time).
>>
>> Andrus
>>
>>
>>
>> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
>>
>>> Hi Andrus !
>>> Many thanks for that !
>>> Marcin
>>>
>>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>>>
>>>> Hi Marcin,
>>>>
>>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50
>>>>> seconds.
>>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>>> STUDENT") without paging takes 100 sec.
>>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>>> STUDENT") with page size 10 takes 5 sec.
>>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>>> STUDENT") with page size 10 and fetching row data rows takes
>>>>> 4.5 sec.
>>>>
>>>> I see you still didn't use profiler, but the data you provided
>>>> seems to finally confirm that at least on Derby not fetching all
>>>> columns does result in significant speedup (#1 vs. #3). So now
>>>> it is a question of implementing the right algorithm for the
>>>> IncrementalFaultList.
>>>>
>>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
>>>>> part of the data. I tried to use addCustomDbAttribute("id") on
>>>>> client, it resulted in returning the raw dataRows, is there
>>>>> something I can do to fetch faulted objects ?
>>>>
>>>> We should encapsulate this logic inside IncrementalFaultList on
>>>> the server.
>>>>
>>>>> Our application was designed to use the SelectQuery. If we have
>>>>> to change that and use the SQLTemplate instead, there is a lot
>>>>> of work for us, including:
>>>>
>>>> Same thing - the right thing to do is to fix it on the server.
>>>>
>>>> Let me try to find a spare minute later tonight and implement id-
>>>> only fetch. I have some large tables in a MySQL5 so I can test
>>>> the performance in a slightly different environment.
>>>>
>>>> Andrus
>>>>
>>>>
>>>>
>>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>>>> Hi
>>>>>
>>>>> I have done some more profiling and testing.
>>>>>
>>>>> executing queries on table with >100000 records, directly on
>>>>> server (not on client) gave results as listed below:
>>>>>
>>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50
>>>>> seconds.
>>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>>> STUDENT") without paging takes 100 sec.
>>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>>> STUDENT") with page size 10 takes 5 sec.
>>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
>>>>> STUDENT") with page size 10 and fetching row data rows takes
>>>>> 4.5 sec.
>>>>>
>>>>> what more, I found that executing the SQLTemplate does allow to
>>>>> fault the objects (I sometimes discover the simplest things
>>>>> last), so I did try to check how long it takes for the objects
>>>>> to be faulted:
>>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>>>> 2) objects is faulted in 20ms (average)
>>>>> 3) on first page first object faulted in 200ms, rest ~20 ms,
>>>>> on any following page first object faulted in 30-200ms,
>>>>> rest 0ms (interesting that the first page does not seem to be
>>>>> faulted at all)
>>>>> 4) no point testing.
>>>>>
>>>>> Also I did check if the resizing of the ArrayList which is
>>>>> keeping the results does affect the speed, and it does not.
>>>>> (Tried to make the ArrayList initial size = 150,000).
>>>>>
>>>>> My conclusion is that SelectQuery with paging is usable only
>>>>> for fetching less than, say 10,000 records, otherwise the
>>>>> performance is to low. With SQLTemplate the performance is much
>>>>> greater. It applies to both ROP and 'normal' cayenne, since I
>>>>> made those tests on server.
>>>>>
>>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only
>>>>> part of the data. I tried to use addCustomDbAttribute("id") on
>>>>> client, it resulted in returning the raw dataRows, is there
>>>>> something I can do to fetch faulted objects ?
>>>>>
>>>>> Our application was designed to use the SelectQuery. If we have
>>>>> to change that and use the SQLTemplate instead, there is a lot
>>>>> of work for us, including:
>>>>> - dealing with adding and concatenating Expressions to the
>>>>> SQLTemplate (is there an easy way ?)
>>>>> - dealing with declared qualifier (the one set in modeller)
>>>>> - possibly more...
>>>>> i would really like to avoid all of that, so if you have any
>>>>> ideas on how to improve the performance without too much hassle
>>>>> I would really appreciate.
>>>>>
>>>>> Marcin
>>>>>
>>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>>>
>>>>>> Hi Andrus
>>>>>> I had not much time to check, but with the fix the 100k
>>>>>> records load in 30 instead of 50 seconds. It is some
>>>>>> improvement, but not enough. I'll do some more profiling
>>>>>> tomorrow and let you know.
>>>>>>
>>>>>> By the way, we are using netbeans for profiling, the
>>>>>> benefit : it is free. I will evaluate the yourkit as we are
>>>>>> moving away from netbeans as a development platform.
>>>>>>
>>>>>> Marcin
>>>>>>
>>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>>>
>>>>>>> Ari, Marcin --
>>>>>>>
>>>>>>> going through the code I noticed one inefficiency - the
>>>>>>> elements array access is synchronized in 'fillIn' method.
>>>>>>> Since 'fillIn' is called from constructor, such
>>>>>>> synchronization is unneeded and only slows things down. I
>>>>>>> just checked a fixed version to trunk. Could you try it out?
>>>>>>>
>>>>>>> Andrus
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>>>
>>>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>>>
>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
>>>>>>>>> 200705.mbox/browser
>>>>>>>>>
>>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is the same project we are working on. I started
>>>>>>>> some performance profiling and Marcin has been able now to
>>>>>>>> take it much further. What is it about:
>>>>>>>>
>>>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>>>
>>>>>>>> which is so slow? The code gets a little complex at that
>>>>>>>> point and we are having difficulty tracing it through to the
>>>>>>>> exact performance problem in the underlying code. Is it the
>>>>>>>> speed of adding the object id to the Collection or the speed
>>>>>>>> of creating an object id itself? 0.5ms doesn't sound slow,
>>>>>>>> but it doesn't scale well.
>>>>>>>>
>>>>>>>> Andrus, I got the impression from the previous thread that
>>>>>>>> you suspected something slightly different. That the
>>>>>>>> performance problem might be in the fat query itself, but
>>>>>>>> from our tests this isn't the case. If I've got this right,
>>>>>>>> the way it works is:
>>>>>>>>
>>>>>>>> 1. perform regular query to get all columns but return
>>>>>>>> result in iterator
>>>>>>>> 2. iterate through first page and build full objects
>>>>>>>> 3. iterate through other pages and build just objectids
>>>>>>>> (this is the slow part for us)
>>>>>>>> 4. when another page is fetched perform another query and
>>>>>>>> fetch those objects from the DB
>>>>>>>>
>>>>>>>> So, getting just primary keys from the DB may not be any
>>>>>>>> faster if the performance problem is simply in the
>>>>>>>> construction of objectIds. If the performance problem is in
>>>>>>>> the numerous resizings of the Collection (each time it runs
>>>>>>>> out of space, then it is increased by 50% or 100% in size),
>>>>>>>> then the solution could be as simple as figuring out the
>>>>>>>> size of the iterator and sizing the collection appropriately
>>>>>>>> from the beginning.
>>>>>>>>
>>>>>>>> Any ideas on how to discover the exact cause of the
>>>>>>>> performance hit?
>>>>>>>>
>>>>>>>>
>>>>>>>> Ari Maniatis
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>
> --
> <blog-logo.gif>
This archive was generated by hypermail 2.0.0 : Thu Jul 05 2007 - 08:22:05 EDT