Re: New prefetching algorithms

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Mon Sep 07 2009 - 16:55:34 EDT

  • Next message: Apache Hudson Server: "Cayenne-trunk - Build # 448 - Failure"

    Finally some good news on performance. After tweaking of the prefetch
    strategies, I got the following test numbers on PostgreSQL, fetching/
    prefetching a few thousands of objects (smaller number of milliseconds
    means faster processing) :

    (disjoint)
    n:1 ... M6 ...... 51 ms
    n:1 ... trunk ... 45 ms

    (joint)
    n:1 ... M6 ...... 100 ms
    n:1 ... trunk ... 45 ms

    (disjoint)
    1:n ... M6 ...... 100 ms
    1:n ... trunk ... 54 ms

    (disjoint)
    n:m ... M6 ...... 54 ms
    n:m ... trunk ... 51 ms

    So the trunk code significantly improves on 3.0M6 when prefetching to-
    many and joint to-ones relationships, and somewhat improves on other
    cases (within a margin of error I guess).

    Andrus

    On Sep 7, 2009, at 8:53 AM, Andrus Adamchik wrote:

    > Been thinking about the new prefetching model some more and found a
    > glaring performance hole - the most common N:1 prefetch case will
    > result in a cartesian product processing in memory. E.g. if one
    > Artist has 3 Paintings, and the Paintings are fetched with Artist
    > prefetch, the Artist DB data will be read repeatedly 3 times. The
    > result will be correct - 3 Paintings all pointing to a single Artist
    > object, however processing will be much slower.
    >
    > Now will be making another pass over the code to restore the old
    > prefetch strategy for N:1 relationships. Hopefully the resulting
    > code will be tighter than it used to be.
    >
    > Andrus
    >
    >
    > On Sep 6, 2009, at 9:43 PM, Andrus Adamchik wrote:
    >
    >> Good to have a little time again to hack Cayenne internals.
    >>
    >> Just committed a pretty big change to the prefetching algorithm
    >> motivated by CAY-1250 bug report. So combining prefetching and
    >> inheritance now works 100%.
    >>
    >> One visible effect of this change is that all disjoint prefetch
    >> queries will now include the ID's of the source side of the
    >> prefetch relationship and a mandatory join to the source entity. In
    >> return for this small inefficiency (increased result set size...
    >> hopefully most ID's are small), we get a bunch of benefits, main
    >> one being the ability to process related fetched objects in a
    >> consistent manner regardless of the relationship semantics (1..1,
    >> 1..N, N..M). This strategy was used before for flattened
    >> relationships, now it is used for everything. On the other hand
    >> this change allowed to optimize some related cases, so all in all,
    >> there may be no performance penalty.
    >>
    >> It is still possible to go back and optimize it further to prevent
    >> the addition of the extra columns to the resultset in some cases
    >> (e.g. if both joined FK and PK are present in the result, only
    >> fetch one of them), I wish we could do that in some central
    >> location (like SelectTranslator) instead of writing endless if/else
    >> in the prefetch processing code.
    >>
    >> Now the prefetch code is easier to make sense of, with fewer if/
    >> else. And I am planning to refactor it further.
    >>
    >> Also I came very close to fixing the biggest remaining limitation
    >> of disjoint prefetching:
    >>
    >> https://issues.apache.org/jira/browse/CAY-1025
    >>
    >> Andrus
    >>
    >>
    >
    >



    This archive was generated by hypermail 2.0.0 : Mon Sep 07 2009 - 16:56:12 EDT