Re: Object Caching

From: Hans Pikkemaat (h.pikkemaa..si-solutions.nl)
Date: Fri Nov 13 2009 - 07:09:41 EST

  • Next message: Michael Gentry: "Re: Object Caching"

    Hi,

    No not completely. Because of the problems I was having with paging in
    combination with SQLTemplate and prefetching
    I was doing a simple test to see how much memory would actually be
    required to run a big query.
    This in an attempt to calculate the total memory I would need for my 2.5
    mil query.

    This test does not use paging (because of the earlier problems) but did
    use prefetching (because I don't
    want to run 2.5 million queries for the detail table).

    The problem I saw is the following. The initial query which accesses the
    main table and does a left join
    to the detail table took about 1 minute. Then it took cayenne 2 minutes
    to construct the objects and
    related objects from the datarows produced from this query. This
    construction does the job
    correctly but in my opinion it is taking too long. If the main query can
    run in 1 minute and get all
    the data from the database (which is IO and normally would be seen as
    the bottleneck) why
    does it take 2 minutes to convert this into objects and relations.

    The conversion into memory goes fairly quickly. After that I only see
    100% cpu and no changes
    in memory occupation (used profiler). From stack traces I can see that
    all the time spent is in

        org.apache.cayenne.access.DataDomainQueryAction.runQueryInTransaction()

    and

        
    org.apache.cayenne.access.DataDomainQueryAction.interceptObjectConversion()

    which finally calls

        org.apache.cayenne.query.PrefetchTreeNode.traverse(PrefetchProcessor)

    which does all the time consuming work.

    I think there is a piece of code somewhere which traverses a list of
    some kind
    which is inefficient. Maybe a HashMap is used with a key object without
    hashCode/equals methods?

    Answering your points:

    1) As I'm not using paging here prefetching does help because I don't
    have to query the detail table separately.
    2) Yes I do need these 2.5 million records because after some
    investigation I will forward records I need.
        No its clearly not a user interface. I totally agree that Cayenne at
    least is not the best way to use.

    What I initially started to use is the iterated query such I could
    iterate over the data rows and construct the
    objects on the fly which would then be forwarded.

    The problem here is that I cannot use prefetching nor can I manually
    construct relationships. The code is
    probably there (prefetching uses it) but the api does not give me a (n
    easy way to) handle to use it.
    This effectively leaves me running separate queries for each main record
    what is not performing.

    Anyway, my conclusion is indeed: don't use cayenne for large query
    processing.

    tx

    Hans

    Aristedes Maniatis wrote:
    > On 13/11/09 10:04 PM, Hans Pikkemaat wrote:
    >
    >> I ran some tests using 3.0b with SQLTemplate in combination with
    >> prefetching and found
    >> a possible new problem.
    >>
    >> It seems that when running the query in eg 1 minute, it takes about 2
    >> minutes before cayenne
    >> has constructed the prefetched objects.
    >>
    >> My query produces 2.5 million records. The query will take about 30
    >> minutes. Construction
    >> of the objects will then take an extra hour.
    >>
    >
    >
    > Just to be clear about what you are doing:
    >
    > * Cayenne 3.0 beta 1
    > * SQL template query
    > * prefeching across to-many join
    > * paging on
    >
    > You expect the first query (which gets hollow objects) to NOT include the prefetch JOIN, but when fetching a page of results, it should use the prefetch. Cayenne is constructing the first query which includes the JOIN and that makes it take 30 minutes in your database to return 2.5 million records.
    >
    > Is that correct?
    >
    >
    > My opinions:
    >
    > 1. Does prefetch really help here anyway? You are only getting (say) 100 records at a time, so the extra queries to follow the relations may not be that significant.
    >
    > 2. Do you really want to fetch 2.5 million rows? If so, I assume this is not a user interface :-) Perhaps Cayenne (or any ORM for that matter) is not the best way to batch process that many rows.
    >
    >
    >
    > Ari
    >
    >



    This archive was generated by hypermail 2.0.0 : Fri Nov 13 2009 - 07:10:23 EST