Re: Really large fetches

From: Robert John Andersen (robert_anderse..ridge-point.com)
Date: Tue Jun 18 2002 - 23:06:44 EDT

Next message: Andrus: "Re: Really large fetches"

Previous message: Nikhil Budhiraja: "Re: Really large fetches"
In reply to: Andrus: "Re: Really large fetches"
Next in thread: Andrus: "Re: Really large fetches"
Reply: Andrus: "Re: Really large fetches"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 2002-06-18 at 00:52, Andrus wrote:

    At 10:48 PM 6/15/2002 -0400, Robert John Andersen wrote:
>Basically it is a batch report that gets written to a file and then
>downloaded to the user. To do this in WO we had to use raw rows (which is
>something I wanted to mention as another feature)

    "Raw rows" is one of the "undocumented features" that exist already, though
    I'd put a more friendly API around it before showing it to the public.
    Anyway, for raw rows (returned as java.util.Map objects)
    org.objectstyle.cayenne.access.SelectOperationObserver can be used, and the
    queries should be executed bypassing DataContext:

    SelectQuery q = ...;
    DataDomain d = ...;


    SelectOperationObserver observer = new SelectOperationObserver();
    d.performQuery(q, observer);
    List rawRows = observer.getResults(q);

    Of course instead of 1 line of code it requires 3 and is not intuitive. I
    will hide it under the hood of DataContext within the next couple of days.



>Once we had that data set, we iterated over it taking N objects at a time,
>converting the raw row into an object and putting it into a new context
>and then dropping the context once that set was done.

    But you would still read all raw rows at once?

There are numerous reasons why we had to do it this way for now, it is
not the best but given the version we are using and problems we had to
work around and the time frame we needed it in this was the only
option. And we will probably run into the memory problem again since
the query we are using for testing has been qualified to limit the
result set is growing at a rate of at least several thousand rows per
day and we are not even at the point of peak data processing yet ;(.

>For each item there would be traversals of the objects paths to get
>associated data.

    I would think that some kind of prefetching would speed up this process -
    you would get a single query and a single result set. There is a long
    overdue task that we are planning to implement that would do that :
    http://sourceforge.net/pm/task.php?func=detailtask&project_task_id=50293&group_id=48132&group_project_id=18155
    .


    Now we are back to memory consumption. Raw rows or DataObjects, there will
    always be a result set big enough to kill the JVM if read as a single list.
    It looks like for the queries that do in-memory report generation, paging
    over the open ResultSet is unavoidable. Now, we can cover it with a nice
    object wrapper of course. Say something like this (class ResultIterator
    below is a suggested implementation of this feature and is not currently in
    Cayenne):


    DataContext ctxt;
    ResultIterator it = null;

    // this would throw a checked exception, since unlike
    // "performQuery" we are dealing with an open result set
    try {
        it = ctxt.performPagedQuery(q);
        while(it.hasNextRow()) {
             Map row = it.nextRow();

             // now, we are multithreaded, we can use a different connection to
    fetch.
             // also, behind the scenes this can be done using paging,
             // and reading relationships one page at a time.
             // user code doesn't have to be aware of that
             Map relatedRow = it.relatedObject(entity, relName);

             // process report data
             // ....
        }
    }
    finally {
        it.close();
    }

It would be nice to be able to set the page size (which I'm sure you
omitted just for example sake) but also be able get multiple rows back
instead of a one at a time. Also, since you are using a paging
mechanism you wouldn't have to do the raw row, you could just return the
objects.

it = ctxt.performPagedQuery( q, 100 ); // or performQuerySet( q, 100 )
while( it.hasData() ) {
List list = it.nextPage(); // or nextSet() which uses the default set
above, could also have nextSet( int )

// loop through list of objects and do what you need
}
it.close();

As far as being multithreaded, I think that should be an option since
you wouldn't need it in all cases.

    Hopefully with each iteration in the while loop, garbage collection will
    get rid of the Maps that are already processed, thus conserving the memory.
    This implementation will require additional hooks to the DataNode that
    currently processes ResultSets, as well as some additions to the
    OperationObserver interface. I wouldn't give a date for this feature, but I
    am very interested in trying this out ASAP.

    Comments?

See above ;)

    Andrus

Next message: Andrus: "Re: Really large fetches"
Previous message: Nikhil Budhiraja: "Re: Really large fetches"
In reply to: Andrus: "Re: Really large fetches"
Next in thread: Andrus: "Re: Really large fetches"
Reply: Andrus: "Re: Really large fetches"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Tue Jun 18 2002 - 23:08:14 EDT