Re: Batch processing with large data sets

From: Mike Kienenberger (mkienen..mail.com)
Date: Wed Aug 02 2006 - 17:54:34 EDT

  • Next message: Nikolai Raitsev: "Re: Batch processing with large data sets"

    I'm not quite answering your specific question, but some of the things
    people have done in the past is to throw out the old Cayenne Data
    context periodically (every N number of objects written out) and
    create a new one.

    Also, you can use performIteratedQuery instead of performQuery to only
    fetch a limited number of records for processing at a time.

    On 8/2/06, Nikolai Raitsev <nikolai.raitse..mail.com> wrote:
    > Hello all, I hope, this is my last question...:)
    >
    > Example:
    >
    > I would like to perform a copy from table 1 ("InterfaceTable") to table 2
    > ("BaseTable").
    >
    > On both tables i have defined business objects (with cayenne modeler,
    > InterfaceObj and BaseObj).
    >
    > BaseObj possesses plausibility (validate) methods with which his attributes
    > are examined for correctness.
    >
    > The copying process runs like that:
    >
    > //open DataContext
    > dataContext = DataContext.createDataContext ();
    >
    > //read InterfaceData
    > SelectQuery selQueryInterface = new SelectQuery(sClassNameInterface);
    > dataObjectsInInterface = new DataObjectList(dataContext.performQuery
    > (selQueryInterface));
    >
    > //read BaseData
    > SelectQuery selQueryBasis = new SelectQuery(sClassNameBasis);
    > dataObjectsInBasis = new DataObjectList(dataContext.performQuery
    > (selQueryBasis));
    >
    >
    > int nSizeBasisTable = dataObjectInBasis.size();
    >
    > int nSize = dataObjectsInInterface.size();
    >
    > //transfer data
    >
    > for(int i = 0; i<nSize; i++)
    > {
    > dataObjectInt = (CayenneDataObject) dataObjectsInInterface.get
    > (i);
    >
    > if(nSizeBasisTable > 1) //here possible updates
    > {
    > if(locateInBasisData())//locate BaseObj with primary keys
    > from dataObjectInt, update if a BaseObj was found
    > {
    > dataObjectBasis.setValuesFromInterface(dataObjectInt);
    > }
    > }
    > else//here inserts
    > {
    > dataObjectBasis =
    > (MaDataObjectBasis)dataContext.createAndRegisterNewObject(sClassNameBasis) ;
    >
    > dataObjectBasis.setValuesFromInterface(dataObjectInt);
    > countInsert++;
    > }
    >
    > //Validate Data
    > try
    > {
    > ValidationResult validationResult = new ValidationResult();
    >
    > dataObjectBasis.validateForSave(validationResult);
    > if(validationResult.hasFailures())
    > {
    > //do something with failures
    > }
    >
    > }
    > catch(ValidationException vex)
    > {
    >
    > }
    >
    > }
    >
    >
    > //and now commitChanges
    > dataContext.setValidatingObjectsOnCommit(false);
    > dataContext.commitChanges();
    >
    > //end
    >
    > in section //read BasisData are gotten all data from the "BaseTable" so that
    > locateInBasisData becomes fast (otherwise i have for every object from
    > "InterfaceTable" a select query on "BaseTable")
    >
    > With such procedure I can process problem-free 30,000 data records (on both
    > tables) under 10 sec without a large memory capacity.
    >
    > but...
    >
    > It can occur in extreme cases by far more data records. 500,000, 1,000,000…
    > and so on
    >
    > Of course, i would like use my procedure for small data sets and for large
    > data sets. So that the memory in case of large data sets does not run out, i
    > need a possibility to cache the data local in files.
    > Is that possible with Cayenne? I mean, is possible to set cayenne
    > (property?) so that their ObjectStore becomes out in files and work with
    > that?
    >
    > I have read the tips here:
    > http://cwiki.apache.org/CAYDOC/performance-tuning.html, but it is not a
    > solution for very large datasets (that memory can run out)...
    >
    > i hope my question is clear, if not, please mail me;)
    >
    > Thanks at all,
    >
    > Nikolai
    >
    > P.S. I use standard environment, not a app. or web server
    >
    >



    This archive was generated by hypermail 2.0.0 : Wed Aug 02 2006 - 17:54:58 EDT