On Sat, 2002-06-15 at 00:49, Andrus wrote:
Hi Robert,
Unfortunately it looks like there is no generic solution that would equally
cover all cases of big queries. Therefore I wasn't really much excited
about the solutions I suggested originally :-(. If we are to implement
something in Cayenne, I would like to solve as many cases as we can. So
here is the bottlenecks for different cases and opportunities for
optimization as I see it:
1. Slow queries with intermediate size result sets (1000-2000 of objects).
Multithreading solution should help to optimize that. Bottleneck would be
the database calls (maybe due to some complex joins, or an underlying
database view), and not memory consumption.
Best Usage Scenario: Web Application or Java Client application
2. Not all results are needed. For instance a user will probably never need
to browse through a 20000 records on a single screen. Paging would nicely
fake "displaying" 20000 records without really displaying them, especially
if it is safe to assume that all objects have a simple int PK, then page
faults will just be saved as int[] - very inexpensive.
Best Usage Scenario: Web Application or Java Client application
3. [Your case as I understand it.] Application needs to traverse and
process a huge (and full) result set. In fact some pieces of Cayenne can be
extended nicely to provide this functionality. In particular
OperationObserver interface. This is a main callback mechanism used in
query execution. It can be implemented in many different ways, including
some kind of page by page access (I guess the interface itself should be
extended with paging support). Executing other queries in parallel should
be possible with a big enough connection pool, though this needs to be tested.
I'd like to better understand your requirements. Could you describe how the
data fetched is used. In particular:
- Is there a need to traverse relationships of objects being read?
Yes
- Is there a need to preserve the objects after the iteration over the
result set is done (or maybe there is only a need to preserve a small subset)?
No
- Is there a need to modify the objects on the fly while the result set is
open (like JDBC 3.0 "ResultSet.updateRow")?
No
- Is there a UI hooked up to the whole process?
No
Basically it is a batch report that gets written to a file and then
downloaded to the user. To do this in WO we had to use raw rows (which
is something I wanted to mention as another feature) to be able to fetch
all the data. Once we had that data set, we iterated over it taking N
objects at a time, converting the raw row into an object and putting it
into a new context and then dropping the context once that set was
done. For each item there would be traversals of the objects paths to
get associated data.
Let me know if you need more info.
Andrus
At 11:08 PM 6/14/2002 -0400, Robert John Andersen wrote:
>On Fri, 2002-06-14 at 22:16, Andrus wrote:
>At 09:38 PM 6/14/2002 -0400, Robert John Andersen wrote:
>
> >Is it possible to bust up fetches into segments? I'm trying to do a large
>
> >fetch and the jvm bombs out with an out of memory exception.
>
>
>
>It is interesting that I discussed the same thing with Nikhil just a few
>
>days ago. Looks like this is a popular issue. I guess I'd put this in as a
>
>feature request for BETA (ALPHA is coming out on Sunday). Here is some
>
>snips from those emails categorized by solutions suggested. I'd welcome
>
>other ideas and comments on the solutions suggested below.
>
>
>
>[Multithreading]:
>
>
>To address slowness of big queries:
>
>
> >Another interesting thought [...] Perform actual ResultSet processing in a
>
> >separate thread. Such a thread would populate a list with objects from
>
> >ResultSet while the list itself is already being accessed by the main
>
> >thread, and users can preview the first X number of result rows.
>This doesn't really address the issue of big queries, you will still run
>into the memory issue. The other problem is that as the memory
>consumption increases the performance of the JVM decreases as it has to go
>through that much more during a GC. If you turn off GC you still get the
>memory consumption problem.
>[Paging ResultSet]:
>
>
>To address slowness and memory issues (though it will still require a
>
>primary key fetch):
>
>
> >[...] Basically this would work almost like faults in WebObjects - when
>
> >sending a query, one would set a page size = N. First N objects in the
>
> >query result will be returned fully initialized, the rest will be faults.
>
> >When an object at index "i" is accessed, a special List implementation
>
> >that holds the objects can check if this page is resolved, if not, objects
>
> >from aN <= i <= (a + 1)N will be resolved from faults. Pages don't even
>
> >have to be in order.
>If you are going to have this address the memory issues you will need to
>fault out old objects otherwise you will again run into the memory issue
>once you have traversed the list enough.
>[Trimming ResultSet]:
>
>
>Addresses memory issues. Simple but efficient way would be to set "fetch
>
>limit" on select queries, so that extra rows just never read from the
>
>result set, thus protecting the memory.
>This does not address the issue when you have to go through all the data,
>though we do use this feature in WO now for data that is presented to the
>user by html.
>
>An implementation I would like to see is similar to the faulting list in
>that you can get N at a time, but you control when you get them. It
>leaves the dataset open from the initial fetch and then you call
>successive fetchRows to get your next set. Once the
>ResultSet is exhausted the query is closed. One problem this raises is
>that you should still be able to execute queries while in the middle of this.
>
>Instead of returning N objects you could pass in a delegate that gets
>called every N objects.
>
>Andrus
>
This archive was generated by hypermail 2.0.0 : Sat Jun 15 2002 - 22:49:40 EDT