Hi,
No not completely. Because of the problems I was having with paging in
combination with SQLTemplate and prefetching
I was doing a simple test to see how much memory would actually be
required to run a big query.
This in an attempt to calculate the total memory I would need for my 2.5
mil query.
This test does not use paging (because of the earlier problems) but did
use prefetching (because I don't
want to run 2.5 million queries for the detail table).
The problem I saw is the following. The initial query which accesses the
main table and does a left join
to the detail table took about 1 minute. Then it took cayenne 2 minutes
to construct the objects and
related objects from the datarows produced from this query. This
construction does the job
correctly but in my opinion it is taking too long. If the main query can
run in 1 minute and get all
the data from the database (which is IO and normally would be seen as
the bottleneck) why
does it take 2 minutes to convert this into objects and relations.
The conversion into memory goes fairly quickly. After that I only see
100% cpu and no changes
in memory occupation (used profiler). From stack traces I can see that
all the time spent is in
org.apache.cayenne.access.DataDomainQueryAction.runQueryInTransaction()
and
org.apache.cayenne.access.DataDomainQueryAction.interceptObjectConversion()
which finally calls
org.apache.cayenne.query.PrefetchTreeNode.traverse(PrefetchProcessor)
which does all the time consuming work.
I think there is a piece of code somewhere which traverses a list of
some kind
which is inefficient. Maybe a HashMap is used with a key object without
hashCode/equals methods?
Answering your points:
1) As I'm not using paging here prefetching does help because I don't
have to query the detail table separately.
2) Yes I do need these 2.5 million records because after some
investigation I will forward records I need.
No its clearly not a user interface. I totally agree that Cayenne at
least is not the best way to use.
What I initially started to use is the iterated query such I could
iterate over the data rows and construct the
objects on the fly which would then be forwarded.
The problem here is that I cannot use prefetching nor can I manually
construct relationships. The code is
probably there (prefetching uses it) but the api does not give me a (n
easy way to) handle to use it.
This effectively leaves me running separate queries for each main record
what is not performing.
Anyway, my conclusion is indeed: don't use cayenne for large query
processing.
tx
Hans
Aristedes Maniatis wrote:
> On 13/11/09 10:04 PM, Hans Pikkemaat wrote:
>
>> I ran some tests using 3.0b with SQLTemplate in combination with
>> prefetching and found
>> a possible new problem.
>>
>> It seems that when running the query in eg 1 minute, it takes about 2
>> minutes before cayenne
>> has constructed the prefetched objects.
>>
>> My query produces 2.5 million records. The query will take about 30
>> minutes. Construction
>> of the objects will then take an extra hour.
>>
>
>
> Just to be clear about what you are doing:
>
> * Cayenne 3.0 beta 1
> * SQL template query
> * prefeching across to-many join
> * paging on
>
> You expect the first query (which gets hollow objects) to NOT include the prefetch JOIN, but when fetching a page of results, it should use the prefetch. Cayenne is constructing the first query which includes the JOIN and that makes it take 30 minutes in your database to return 2.5 million records.
>
> Is that correct?
>
>
> My opinions:
>
> 1. Does prefetch really help here anyway? You are only getting (say) 100 records at a time, so the extra queries to follow the relations may not be that significant.
>
> 2. Do you really want to fetch 2.5 million rows? If so, I assume this is not a user interface :-) Perhaps Cayenne (or any ORM for that matter) is not the best way to batch process that many rows.
>
>
>
> Ari
>
>
This archive was generated by hypermail 2.0.0 : Fri Nov 13 2009 - 07:10:23 EST