Re: Batch processing with large data sets

From: Nikolai Raitsev (nikolai.raitse..mail.com)
Date: Wed Aug 02 2006 - 18:02:07 EDT

  • Next message: Juergen Saar: "Re: Batch processing with large data sets"

    Mike you is very fast with answering:)

    Thank you!! I will try it out tomorrow

    Nikolai

    2006/8/2, Mike Kienenberger <mkienen..mail.com>:
    >
    > I'm not quite answering your specific question, but some of the things
    > people have done in the past is to throw out the old Cayenne Data
    > context periodically (every N number of objects written out) and
    > create a new one.
    >
    > Also, you can use performIteratedQuery instead of performQuery to only
    > fetch a limited number of records for processing at a time.
    >
    > On 8/2/06, Nikolai Raitsev <nikolai.raitse..mail.com> wrote:
    > > Hello all, I hope, this is my last question...:)
    > >
    > > Example:
    > >
    > > I would like to perform a copy from table 1 ("InterfaceTable") to table
    > 2
    > > ("BaseTable").
    > >
    > > On both tables i have defined business objects (with cayenne modeler,
    > > InterfaceObj and BaseObj).
    > >
    > > BaseObj possesses plausibility (validate) methods with which his
    > attributes
    > > are examined for correctness.
    > >
    > > The copying process runs like that:
    > >
    > > //open DataContext
    > > dataContext = DataContext.createDataContext ();
    > >
    > > //read InterfaceData
    > > SelectQuery selQueryInterface = new SelectQuery(sClassNameInterface);
    > > dataObjectsInInterface = new DataObjectList(dataContext.performQuery
    > > (selQueryInterface));
    > >
    > > //read BaseData
    > > SelectQuery selQueryBasis = new SelectQuery(sClassNameBasis);
    > > dataObjectsInBasis = new DataObjectList(dataContext.performQuery
    > > (selQueryBasis));
    > >
    > >
    > > int nSizeBasisTable = dataObjectInBasis.size();
    > >
    > > int nSize = dataObjectsInInterface.size();
    > >
    > > //transfer data
    > >
    > > for(int i = 0; i<nSize; i++)
    > > {
    > > dataObjectInt = (CayenneDataObject)
    > dataObjectsInInterface.get
    > > (i);
    > >
    > > if(nSizeBasisTable > 1) //here possible updates
    > > {
    > > if(locateInBasisData())//locate BaseObj with primary
    > keys
    > > from dataObjectInt, update if a BaseObj was found
    > > {
    > > dataObjectBasis.setValuesFromInterface
    > (dataObjectInt);
    > > }
    > > }
    > > else//here inserts
    > > {
    > > dataObjectBasis =
    > >
    > (MaDataObjectBasis)dataContext.createAndRegisterNewObject(sClassNameBasis) ;
    > >
    > > dataObjectBasis.setValuesFromInterface(dataObjectInt);
    > > countInsert++;
    > > }
    > >
    > > //Validate Data
    > > try
    > > {
    > > ValidationResult validationResult = new ValidationResult();
    > >
    > > dataObjectBasis.validateForSave(validationResult);
    > > if(validationResult.hasFailures())
    > > {
    > > //do something with failures
    > > }
    > >
    > > }
    > > catch(ValidationException vex)
    > > {
    > >
    > > }
    > >
    > > }
    > >
    > >
    > > //and now commitChanges
    > > dataContext.setValidatingObjectsOnCommit(false);
    > > dataContext.commitChanges();
    > >
    > > //end
    > >
    > > in section //read BasisData are gotten all data from the "BaseTable" so
    > that
    > > locateInBasisData becomes fast (otherwise i have for every object from
    > > "InterfaceTable" a select query on "BaseTable")
    > >
    > > With such procedure I can process problem-free 30,000 data records (on
    > both
    > > tables) under 10 sec without a large memory capacity.
    > >
    > > but...
    > >
    > > It can occur in extreme cases by far more data records. 500,000,
    > 1,000,000…
    > > and so on
    > >
    > > Of course, i would like use my procedure for small data sets and for
    > large
    > > data sets. So that the memory in case of large data sets does not run
    > out, i
    > > need a possibility to cache the data local in files.
    > > Is that possible with Cayenne? I mean, is possible to set cayenne
    > > (property?) so that their ObjectStore becomes out in files and work
    > with
    > > that?
    > >
    > > I have read the tips here:
    > > http://cwiki.apache.org/CAYDOC/performance-tuning.html, but it is not a
    > > solution for very large datasets (that memory can run out)...
    > >
    > > i hope my question is clear, if not, please mail me;)
    > >
    > > Thanks at all,
    > >
    > > Nikolai
    > >
    > > P.S. I use standard environment, not a app. or web server
    > >
    > >
    >



    This archive was generated by hypermail 2.0.0 : Wed Aug 02 2006 - 18:02:31 EDT