Re: Really large fetches

From: Robert John Andersen (robert_anderse..ridge-point.com)
Date: Tue Jun 18 2002 - 23:06:44 EDT

  • Next message: Andrus: "Re: Really large fetches"

    On Tue, 2002-06-18 at 00:52, Andrus wrote:

        At 10:48 PM 6/15/2002 -0400, Robert John Andersen wrote:
    >Basically it is a batch report that gets written to a file and then
    >downloaded to the user. To do this in WO we had to use raw rows (which is
    >something I wanted to mention as another feature)
        
        "Raw rows" is one of the "undocumented features" that exist already, though
        I'd put a more friendly API around it before showing it to the public.
        Anyway, for raw rows (returned as java.util.Map objects)
        org.objectstyle.cayenne.access.SelectOperationObserver can be used, and the
        queries should be executed bypassing DataContext:
        
        SelectQuery q = ...;
        DataDomain d = ...;
        
        
        SelectOperationObserver observer = new SelectOperationObserver();
        d.performQuery(q, observer);
        List rawRows = observer.getResults(q);
        
        Of course instead of 1 line of code it requires 3 and is not intuitive. I
        will hide it under the hood of DataContext within the next couple of days.
        
        
        
    >Once we had that data set, we iterated over it taking N objects at a time,
    >converting the raw row into an object and putting it into a new context
    >and then dropping the context once that set was done.
        
        But you would still read all raw rows at once?

    There are numerous reasons why we had to do it this way for now, it is
    not the best but given the version we are using and problems we had to
    work around and the time frame we needed it in this was the only
    option. And we will probably run into the memory problem again since
    the query we are using for testing has been qualified to limit the
    result set is growing at a rate of at least several thousand rows per
    day and we are not even at the point of peak data processing yet ;(.

        
    >For each item there would be traversals of the objects paths to get
    >associated data.
        
        I would think that some kind of prefetching would speed up this process -
        you would get a single query and a single result set. There is a long
        overdue task that we are planning to implement that would do that :
        http://sourceforge.net/pm/task.php?func=detailtask&project_task_id=50293&group_id=48132&group_project_id=18155
        .
        
        
        Now we are back to memory consumption. Raw rows or DataObjects, there will
        always be a result set big enough to kill the JVM if read as a single list.
        It looks like for the queries that do in-memory report generation, paging
        over the open ResultSet is unavoidable. Now, we can cover it with a nice
        object wrapper of course. Say something like this (class ResultIterator
        below is a suggested implementation of this feature and is not currently in
        Cayenne):
        
        
        DataContext ctxt;
        ResultIterator it = null;
        
        // this would throw a checked exception, since unlike
        // "performQuery" we are dealing with an open result set
        try {
            it = ctxt.performPagedQuery(q);
            while(it.hasNextRow()) {
                 Map row = it.nextRow();
        
                 // now, we are multithreaded, we can use a different connection to
        fetch.
                 // also, behind the scenes this can be done using paging,
                 // and reading relationships one page at a time.
                 // user code doesn't have to be aware of that
                 Map relatedRow = it.relatedObject(entity, relName);
        
                 // process report data
                 // ....
            }
        }
        finally {
            it.close();
        }

    It would be nice to be able to set the page size (which I'm sure you
    omitted just for example sake) but also be able get multiple rows back
    instead of a one at a time. Also, since you are using a paging
    mechanism you wouldn't have to do the raw row, you could just return the
    objects.

    it = ctxt.performPagedQuery( q, 100 ); // or performQuerySet( q, 100 )
    while( it.hasData() ) {
      List list = it.nextPage(); // or nextSet() which uses the default set
    above, could also have nextSet( int )

      // loop through list of objects and do what you need
    }
    it.close();

    As far as being multithreaded, I think that should be an option since
    you wouldn't need it in all cases.

        
        Hopefully with each iteration in the while loop, garbage collection will
        get rid of the Maps that are already processed, thus conserving the memory.
        This implementation will require additional hooks to the DataNode that
        currently processes ResultSets, as well as some additions to the
        OperationObserver interface. I wouldn't give a date for this feature, but I
        am very interested in trying this out ASAP.
        
        Comments?

    See above ;)

        
        Andrus
        
        

        



    This archive was generated by hypermail 2.0.0 : Tue Jun 18 2002 - 23:08:14 EDT