Re: Really large fetches

From: Andrus (andru..bjectstyle.org)
Date: Sat Jun 15 2002 - 00:49:56 EDT

  • Next message: Andrus: "Re: Upcoming Cayenne alpha release."

    Hi Robert,

    Unfortunately it looks like there is no generic solution that would equally
    cover all cases of big queries. Therefore I wasn't really much excited
    about the solutions I suggested originally :-(. If we are to implement
    something in Cayenne, I would like to solve as many cases as we can. So
    here is the bottlenecks for different cases and opportunities for
    optimization as I see it:

    1. Slow queries with intermediate size result sets (1000-2000 of objects).
    Multithreading solution should help to optimize that. Bottleneck would be
    the database calls (maybe due to some complex joins, or an underlying
    database view), and not memory consumption.

    Best Usage Scenario: Web Application or Java Client application

    2. Not all results are needed. For instance a user will probably never need
    to browse through a 20000 records on a single screen. Paging would nicely
    fake "displaying" 20000 records without really displaying them, especially
    if it is safe to assume that all objects have a simple int PK, then page
    faults will just be saved as int[] - very inexpensive.

    Best Usage Scenario: Web Application or Java Client application

    3. [Your case as I understand it.] Application needs to traverse and
    process a huge (and full) result set. In fact some pieces of Cayenne can be
    extended nicely to provide this functionality. In particular
    OperationObserver interface. This is a main callback mechanism used in
    query execution. It can be implemented in many different ways, including
    some kind of page by page access (I guess the interface itself should be
    extended with paging support). Executing other queries in parallel should
    be possible with a big enough connection pool, though this needs to be tested.

    I'd like to better understand your requirements. Could you describe how the
    data fetched is used. In particular:

    - Is there a need to traverse relationships of objects being read?
    - Is there a need to preserve the objects after the iteration over the
    result set is done (or maybe there is only a need to preserve a small subset)?
    - Is there a need to modify the objects on the fly while the result set is
    open (like JDBC 3.0 "ResultSet.updateRow")?
    - Is there a UI hooked up to the whole process?

    Andrus

    At 11:08 PM 6/14/2002 -0400, Robert John Andersen wrote:
    >On Fri, 2002-06-14 at 22:16, Andrus wrote:
    >At 09:38 PM 6/14/2002 -0400, Robert John Andersen wrote:
    >
    > >Is it possible to bust up fetches into segments? I'm trying to do a large
    >
    > >fetch and the jvm bombs out with an out of memory exception.
    >
    >
    >
    >It is interesting that I discussed the same thing with Nikhil just a few
    >
    >days ago. Looks like this is a popular issue. I guess I'd put this in as a
    >
    >feature request for BETA (ALPHA is coming out on Sunday). Here is some
    >
    >snips from those emails categorized by solutions suggested. I'd welcome
    >
    >other ideas and comments on the solutions suggested below.
    >
    >
    >
    >[Multithreading]:
    >
    >
    >To address slowness of big queries:
    >
    >
    > >Another interesting thought [...] Perform actual ResultSet processing in a
    >
    > >separate thread. Such a thread would populate a list with objects from
    >
    > >ResultSet while the list itself is already being accessed by the main
    >
    > >thread, and users can preview the first X number of result rows.
    >This doesn't really address the issue of big queries, you will still run
    >into the memory issue. The other problem is that as the memory
    >consumption increases the performance of the JVM decreases as it has to go
    >through that much more during a GC. If you turn off GC you still get the
    >memory consumption problem.
    >[Paging ResultSet]:
    >
    >
    >To address slowness and memory issues (though it will still require a
    >
    >primary key fetch):
    >
    >
    > >[...] Basically this would work almost like faults in WebObjects - when
    >
    > >sending a query, one would set a page size = N. First N objects in the
    >
    > >query result will be returned fully initialized, the rest will be faults.
    >
    > >When an object at index "i" is accessed, a special List implementation
    >
    > >that holds the objects can check if this page is resolved, if not, objects
    >
    > >from aN <= i <= (a + 1)N will be resolved from faults. Pages don't even
    >
    > >have to be in order.
    >If you are going to have this address the memory issues you will need to
    >fault out old objects otherwise you will again run into the memory issue
    >once you have traversed the list enough.
    >[Trimming ResultSet]:
    >
    >
    >Addresses memory issues. Simple but efficient way would be to set "fetch
    >
    >limit" on select queries, so that extra rows just never read from the
    >
    >result set, thus protecting the memory.
    >This does not address the issue when you have to go through all the data,
    >though we do use this feature in WO now for data that is presented to the
    >user by html.
    >
    >An implementation I would like to see is similar to the faulting list in
    >that you can get N at a time, but you control when you get them. It
    >leaves the dataset open from the initial fetch and then you call
    >successive fetchRows to get your next set. Once the
    >ResultSet is exhausted the query is closed. One problem this raises is
    >that you should still be able to execute queries while in the middle of this.
    >
    >Instead of returning N objects you could pass in a delegate that gets
    >called every N objects.
    >
    >Andrus
    >



    This archive was generated by hypermail 2.0.0 : Sat Jun 15 2002 - 00:49:36 EDT