Re: Really large fetches

From: Andrus (andru..bjectstyle.org)
Date: Tue Jun 18 2002 - 00:52:56 EDT

  • Next message: Nikhil Budhiraja: "Re: Really large fetches"

    At 10:48 PM 6/15/2002 -0400, Robert John Andersen wrote:
    >Basically it is a batch report that gets written to a file and then
    >downloaded to the user. To do this in WO we had to use raw rows (which is
    >something I wanted to mention as another feature)

    "Raw rows" is one of the "undocumented features" that exist already, though
    I'd put a more friendly API around it before showing it to the public.
    Anyway, for raw rows (returned as java.util.Map objects)
    org.objectstyle.cayenne.access.SelectOperationObserver can be used, and the
    queries should be executed bypassing DataContext:

    SelectQuery q = ...;
    DataDomain d = ...;

    SelectOperationObserver observer = new SelectOperationObserver();
    d.performQuery(q, observer);
    List rawRows = observer.getResults(q);

    Of course instead of 1 line of code it requires 3 and is not intuitive. I
    will hide it under the hood of DataContext within the next couple of days.

    >Once we had that data set, we iterated over it taking N objects at a time,
    >converting the raw row into an object and putting it into a new context
    >and then dropping the context once that set was done.

    But you would still read all raw rows at once?

    >For each item there would be traversals of the objects paths to get
    >associated data.

    I would think that some kind of prefetching would speed up this process -
    you would get a single query and a single result set. There is a long
    overdue task that we are planning to implement that would do that :
    http://sourceforge.net/pm/task.php?func=detailtask&project_task_id=50293&group_id=48132&group_project_id=18155
    .

    Now we are back to memory consumption. Raw rows or DataObjects, there will
    always be a result set big enough to kill the JVM if read as a single list.
    It looks like for the queries that do in-memory report generation, paging
    over the open ResultSet is unavoidable. Now, we can cover it with a nice
    object wrapper of course. Say something like this (class ResultIterator
    below is a suggested implementation of this feature and is not currently in
    Cayenne):

    DataContext ctxt;
    ResultIterator it = null;

    // this would throw a checked exception, since unlike
    // "performQuery" we are dealing with an open result set
    try {
        it = ctxt.performPagedQuery(q);
        while(it.hasNextRow()) {
             Map row = it.nextRow();

             // now, we are multithreaded, we can use a different connection to
    fetch.
             // also, behind the scenes this can be done using paging,
             // and reading relationships one page at a time.
             // user code doesn't have to be aware of that
             Map relatedRow = it.relatedObject(entity, relName);

             // process report data
             // ....
        }
    }
    finally {
        it.close();
    }

    Hopefully with each iteration in the while loop, garbage collection will
    get rid of the Maps that are already processed, thus conserving the memory.
    This implementation will require additional hooks to the DataNode that
    currently processes ResultSets, as well as some additions to the
    OperationObserver interface. I wouldn't give a date for this feature, but I
    am very interested in trying this out ASAP.

    Comments?

    Andrus



    This archive was generated by hypermail 2.0.0 : Tue Jun 18 2002 - 00:52:39 EDT