Re: Problem concerning Prefetching

From: Jean-Paul Le Fèvre (jean-paul.lefevr..ea.fr)
Date: Mon Jan 28 2008 - 08:31:07 EST

  • Next message: Andrus Adamchik: "Re: Problem concerning Prefetching"

    On Thursday 24 January 2008, Aristedes Maniatis wrote:
    > On 25/01/2008, at 3:10 AM, Jean-Paul Le Fèvre wrote:
    > > I'm currently trying to improve the performances of my queries.
    > > I've added a call to addPrefetch() in my code but the result was not
    > > as
    > > expected : the time to fetch the 42597 rows increased from 35 s
    > > without
    > > addPrefetch() to 56 s with addPrefetch()
    >
    > That is expected. The initial query will take longer with the prefetch
    > since you are getting more data. What will be faster is following
    > relations later since these other records are now already in memory.
    > It is unlikely you'll want to prefetch relations to all 42000 records.

    Actually it is *very likely* that I want to prefetch relations to all
    42000 records (or more) hundreds of times ! So it's worth spending some time
    to optimize my implementation.

    I've made 3 tests so far : one using plain sql, one using cayenne without
    prefetching and the third one using prefetching.
    The first one is done in less that 4 s, the second one in 35 s and the last
    one in more that 50 s.

    To find out how cayenne works I've extracted from the MySQL log file the
    statements sent to the server.
    (I've simplified the actual queries to make things clearer)

    Manually written SQL statement :
    ----------------------

    select galaxies.number, galaxy_component_properties.mass_of_baryons
           from galaxy_component_properties
           join galaxies on galaxy_component_properties.galaxy_id = galaxies.id
           where some_clause
           order by galaxies.number;

    Cayenne generated (without prefetching) SQL statements :
    -------------------------

    SELECT t0.* FROM galaxies t0 WHERE some_clause ORDER BY t0.number;

    then ~40000 queries :
    SELECT t0.* FROM galaxy_component_properties t0
           WHERE t0.galaxy_id = nnnnnn;

    It is clear that issuing more than 40000 times a SELECT to get the related
    properties leads to an execution time of 35 s.

    Cayenne generated (with prefetching) SQL statements :
    -------------------------

    SELECT t0.* FROM galaxies t0 WHERE some_clause ORDER BY t0.number;

    SELECT t0.* FROM galaxy_component_properties t0, galaxies t1
           WHERE t0.galaxy_id = t1.id AND some_clause;

    then ~40000 queries :
    SELECT DISTINCT t0.*
           FROM galaxies t0, galaxy_component_properties t1
           WHERE t0.id = t1.galaxy_id AND (t1.id = nnnnn)

    The first 2 SELECT are expected and should have been sufficient to build my
    collection of galaxies. But they are followed by 40000 queries which make
    performance worse than without prefetching.

    Comments, advice are appreciated ...
    >
    > Investigate paging as possibly a more appropriate optimisation for you.

    I'm gonna try ...

    >
    > Ari Maniatis
    >

    -- 
    ___________________________________________________________________
    

    Jean-Paul Le Fèvre * Mail : LeFevr..onteny.org



    This archive was generated by hypermail 2.0.0 : Mon Jan 28 2008 - 08:31:46 EST