Re: Batch processing with large data sets

From: Juergen Saar (juerge..saar.org)
Date: Thu Aug 03 2006 - 02:37:16 EDT

  • Next message: Ayhan Kondoz: "Custom SQL query - HowTo"

    Hi Nikolai,

    please keep me informed ...
    I've tried a lot, but in all cases the reading DataContext was running out
    of memory when the tables had enough data.

    The problem with this root-context is, that it is not possible to throw it
    out and continue with a new one ...

    --- Juergen ---

    2006/8/3, Nikolai Raitsev <nikolai.raitse..mail.com>:
    >
    > Mike you is very fast with answering:)
    >
    > Thank you!! I will try it out tomorrow
    >
    > Nikolai
    >
    > 2006/8/2, Mike Kienenberger <mkienen..mail.com>:
    > >
    > > I'm not quite answering your specific question, but some of the things
    > > people have done in the past is to throw out the old Cayenne Data
    > > context periodically (every N number of objects written out) and
    > > create a new one.
    > >
    > > Also, you can use performIteratedQuery instead of performQuery to only
    > > fetch a limited number of records for processing at a time.
    > >
    > > On 8/2/06, Nikolai Raitsev <nikolai.raitse..mail.com> wrote:
    > > > Hello all, I hope, this is my last question...:)
    > > >
    > > > Example:
    > > >
    > > > I would like to perform a copy from table 1 ("InterfaceTable") to
    > table
    > > 2
    > > > ("BaseTable").
    > > >
    > > > On both tables i have defined business objects (with cayenne modeler,
    > > > InterfaceObj and BaseObj).
    > > >
    > > > BaseObj possesses plausibility (validate) methods with which his
    > > attributes
    > > > are examined for correctness.
    > > >
    > > > The copying process runs like that:
    > > >
    > > > //open DataContext
    > > > dataContext = DataContext.createDataContext ();
    > > >
    > > > //read InterfaceData
    > > > SelectQuery selQueryInterface = new SelectQuery(sClassNameInterface);
    > > > dataObjectsInInterface = new DataObjectList(dataContext.performQuery
    > > > (selQueryInterface));
    > > >
    > > > //read BaseData
    > > > SelectQuery selQueryBasis = new SelectQuery(sClassNameBasis);
    > > > dataObjectsInBasis = new DataObjectList(dataContext.performQuery
    > > > (selQueryBasis));
    > > >
    > > >
    > > > int nSizeBasisTable = dataObjectInBasis.size();
    > > >
    > > > int nSize = dataObjectsInInterface.size();
    > > >
    > > > //transfer data
    > > >
    > > > for(int i = 0; i<nSize; i++)
    > > > {
    > > > dataObjectInt = (CayenneDataObject)
    > > dataObjectsInInterface.get
    > > > (i);
    > > >
    > > > if(nSizeBasisTable > 1) //here possible updates
    > > > {
    > > > if(locateInBasisData())//locate BaseObj with primary
    > > keys
    > > > from dataObjectInt, update if a BaseObj was found
    > > > {
    > > > dataObjectBasis.setValuesFromInterface
    > > (dataObjectInt);
    > > > }
    > > > }
    > > > else//here inserts
    > > > {
    > > > dataObjectBasis =
    > > >
    > >
    > (MaDataObjectBasis)dataContext.createAndRegisterNewObject(sClassNameBasis) ;
    > > >
    > > > dataObjectBasis.setValuesFromInterface(dataObjectInt);
    > > > countInsert++;
    > > > }
    > > >
    > > > //Validate Data
    > > > try
    > > > {
    > > > ValidationResult validationResult = new
    > ValidationResult();
    > > >
    > > > dataObjectBasis.validateForSave(validationResult);
    > > > if(validationResult.hasFailures())
    > > > {
    > > > //do something with failures
    > > > }
    > > >
    > > > }
    > > > catch(ValidationException vex)
    > > > {
    > > >
    > > > }
    > > >
    > > > }
    > > >
    > > >
    > > > //and now commitChanges
    > > > dataContext.setValidatingObjectsOnCommit(false);
    > > > dataContext.commitChanges();
    > > >
    > > > //end
    > > >
    > > > in section //read BasisData are gotten all data from the "BaseTable"
    > so
    > > that
    > > > locateInBasisData becomes fast (otherwise i have for every object from
    > > > "InterfaceTable" a select query on "BaseTable")
    > > >
    > > > With such procedure I can process problem-free 30,000 data records (on
    > > both
    > > > tables) under 10 sec without a large memory capacity.
    > > >
    > > > but...
    > > >
    > > > It can occur in extreme cases by far more data records. 500,000,
    > > 1,000,000…
    > > > and so on
    > > >
    > > > Of course, i would like use my procedure for small data sets and for
    > > large
    > > > data sets. So that the memory in case of large data sets does not run
    > > out, i
    > > > need a possibility to cache the data local in files.
    > > > Is that possible with Cayenne? I mean, is possible to set cayenne
    > > > (property?) so that their ObjectStore becomes out in files and work
    > > with
    > > > that?
    > > >
    > > > I have read the tips here:
    > > > http://cwiki.apache.org/CAYDOC/performance-tuning.html, but it is not
    > a
    > > > solution for very large datasets (that memory can run out)...
    > > >
    > > > i hope my question is clear, if not, please mail me;)
    > > >
    > > > Thanks at all,
    > > >
    > > > Nikolai
    > > >
    > > > P.S. I use standard environment, not a app. or web server
    > > >
    > > >
    > >
    >
    >



    This archive was generated by hypermail 2.0.0 : Thu Aug 03 2006 - 02:37:41 EDT