Re: paged query slow when fetching big lists

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Tue Jun 26 2007 - 03:39:35 EDT

  • Next message: Marcin Skladaniec: "Re: paged query slow when fetching big lists"

    Hi Marcin,

    > 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
    > 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    > STUDENT") without paging takes 100 sec.
    > 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    > STUDENT") with page size 10 takes 5 sec.
    > 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    > STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.

    I see you still didn't use profiler, but the data you provided seems
    to finally confirm that at least on Derby not fetching all columns
    does result in significant speedup (#1 vs. #3). So now it is a
    question of implementing the right algorithm for the
    IncrementalFaultList.

    > Andrus, you mentioned using addCustomDbAttribute to fetch only part
    > of the data. I tried to use addCustomDbAttribute("id") on client,
    > it resulted in returning the raw dataRows, is there something I can
    > do to fetch faulted objects ?

    We should encapsulate this logic inside IncrementalFaultList on the
    server.

    > Our application was designed to use the SelectQuery. If we have to
    > change that and use the SQLTemplate instead, there is a lot of work
    > for us, including:

    Same thing - the right thing to do is to fix it on the server.

    Let me try to find a spare minute later tonight and implement id-only
    fetch. I have some large tables in a MySQL5 so I can test the
    performance in a slightly different environment.

    Andrus

    On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
    > Hi
    >
    > I have done some more profiling and testing.
    >
    > executing queries on table with >100000 records, directly on server
    > (not on client) gave results as listed below:
    >
    > 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
    > 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    > STUDENT") without paging takes 100 sec.
    > 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    > STUDENT") with page size 10 takes 5 sec.
    > 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
    > STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
    >
    > what more, I found that executing the SQLTemplate does allow to
    > fault the objects (I sometimes discover the simplest things last),
    > so I did try to check how long it takes for the objects to be faulted:
    > 1) first object on every page (except first) 30-200ms, rest = 0ms
    > 2) objects is faulted in 20ms (average)
    > 3) on first page first object faulted in 200ms, rest ~20 ms,
    > on any following page first object faulted in 30-200ms, rest
    > 0ms (interesting that the first page does not seem to be faulted at
    > all)
    > 4) no point testing.
    >
    > Also I did check if the resizing of the ArrayList which is keeping
    > the results does affect the speed, and it does not. (Tried to make
    > the ArrayList initial size = 150,000).
    >
    > My conclusion is that SelectQuery with paging is usable only for
    > fetching less than, say 10,000 records, otherwise the performance
    > is to low. With SQLTemplate the performance is much greater. It
    > applies to both ROP and 'normal' cayenne, since I made those tests
    > on server.
    >
    > Andrus, you mentioned using addCustomDbAttribute to fetch only part
    > of the data. I tried to use addCustomDbAttribute("id") on client,
    > it resulted in returning the raw dataRows, is there something I can
    > do to fetch faulted objects ?
    >
    > Our application was designed to use the SelectQuery. If we have to
    > change that and use the SQLTemplate instead, there is a lot of work
    > for us, including:
    > - dealing with adding and concatenating Expressions to the
    > SQLTemplate (is there an easy way ?)
    > - dealing with declared qualifier (the one set in modeller)
    > - possibly more...
    > i would really like to avoid all of that, so if you have any ideas
    > on how to improve the performance without too much hassle I would
    > really appreciate.
    >
    > Marcin
    >
    > On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
    >
    >> Hi Andrus
    >> I had not much time to check, but with the fix the 100k records
    >> load in 30 instead of 50 seconds. It is some improvement, but not
    >> enough. I'll do some more profiling tomorrow and let you know.
    >>
    >> By the way, we are using netbeans for profiling, the benefit : it
    >> is free. I will evaluate the yourkit as we are moving away from
    >> netbeans as a development platform.
    >>
    >> Marcin
    >>
    >> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
    >>
    >>> Ari, Marcin --
    >>>
    >>> going through the code I noticed one inefficiency - the elements
    >>> array access is synchronized in 'fillIn' method. Since 'fillIn'
    >>> is called from constructor, such synchronization is unneeded and
    >>> only slows things down. I just checked a fixed version to trunk.
    >>> Could you try it out?
    >>>
    >>> Andrus
    >>>
    >>>
    >>>
    >>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
    >>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
    >>>>
    >>>>> Marcin, this thread might be of interest to you ...
    >>>>>
    >>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/
    >>>>> 200705.mbox/browser
    >>>>>
    >>>>> Look at the "Paging and SQL queries" thread on May 25.
    >>>>
    >>>>
    >>>> Yes, this is the same project we are working on. I started some
    >>>> performance profiling and Marcin has been able now to take it
    >>>> much further. What is it about:
    >>>>
    >>>>> elements.add(it.nextObjectId(entity));
    >>>>
    >>>> which is so slow? The code gets a little complex at that point
    >>>> and we are having difficulty tracing it through to the exact
    >>>> performance problem in the underlying code. Is it the speed of
    >>>> adding the object id to the Collection or the speed of creating
    >>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't
    >>>> scale well.
    >>>>
    >>>> Andrus, I got the impression from the previous thread that you
    >>>> suspected something slightly different. That the performance
    >>>> problem might be in the fat query itself, but from our tests
    >>>> this isn't the case. If I've got this right, the way it works is:
    >>>>
    >>>> 1. perform regular query to get all columns but return result in
    >>>> iterator
    >>>> 2. iterate through first page and build full objects
    >>>> 3. iterate through other pages and build just objectids (this is
    >>>> the slow part for us)
    >>>> 4. when another page is fetched perform another query and fetch
    >>>> those objects from the DB
    >>>>
    >>>> So, getting just primary keys from the DB may not be any faster
    >>>> if the performance problem is simply in the construction of
    >>>> objectIds. If the performance problem is in the numerous
    >>>> resizings of the Collection (each time it runs out of space,
    >>>> then it is increased by 50% or 100% in size), then the solution
    >>>> could be as simple as figuring out the size of the iterator and
    >>>> sizing the collection appropriately from the beginning.
    >>>>
    >>>> Any ideas on how to discover the exact cause of the performance
    >>>> hit?
    >>>>
    >>>>
    >>>> Ari Maniatis
    >>>>
    >>>>
    >
    >
    >



    This archive was generated by hypermail 2.0.0 : Tue Jun 26 2007 - 03:40:04 EDT