Batch processing with large data sets

From: Nikolai Raitsev (nikolai.raitse..mail.com)
Date: Wed Aug 02 2006 - 17:46:08 EDT

  • Next message: Mike Kienenberger: "Re: Batch processing with large data sets"

    Hello all, I hope, this is my last question...:)

    Example:

    I would like to perform a copy from table 1 ("InterfaceTable") to table 2
    ("BaseTable").

    On both tables i have defined business objects (with cayenne modeler,
    InterfaceObj and BaseObj).

    BaseObj possesses plausibility (validate) methods with which his attributes
    are examined for correctness.

    The copying process runs like that:

    //open DataContext
    dataContext = DataContext.createDataContext ();

    //read InterfaceData
    SelectQuery selQueryInterface = new SelectQuery(sClassNameInterface);
    dataObjectsInInterface = new DataObjectList(dataContext.performQuery
    (selQueryInterface));

    //read BaseData
    SelectQuery selQueryBasis = new SelectQuery(sClassNameBasis);
    dataObjectsInBasis = new DataObjectList(dataContext.performQuery
    (selQueryBasis));

    int nSizeBasisTable = dataObjectInBasis.size();

    int nSize = dataObjectsInInterface.size();

    //transfer data

    for(int i = 0; i<nSize; i++)
     {
                dataObjectInt = (CayenneDataObject) dataObjectsInInterface.get
    (i);

                if(nSizeBasisTable > 1) //here possible updates
                {
                    if(locateInBasisData())//locate BaseObj with primary keys
    from dataObjectInt, update if a BaseObj was found
                    {
                        dataObjectBasis.setValuesFromInterface(dataObjectInt);
                    }
                }
                else//here inserts
                {
                    dataObjectBasis =
    (MaDataObjectBasis)dataContext.createAndRegisterNewObject(sClassNameBasis) ;

                    dataObjectBasis.setValuesFromInterface(dataObjectInt);
                    countInsert++;
                }

              //Validate Data
             try
             {
                ValidationResult validationResult = new ValidationResult();

                dataObjectBasis.validateForSave(validationResult);
                if(validationResult.hasFailures())
                {
                    //do something with failures
                }

            }
            catch(ValidationException vex)
            {

            }

    }

    //and now commitChanges
    dataContext.setValidatingObjectsOnCommit(false);
    dataContext.commitChanges();

    //end

    in section //read BasisData are gotten all data from the "BaseTable" so that
    locateInBasisData becomes fast (otherwise i have for every object from
    "InterfaceTable" a select query on "BaseTable")

    With such procedure I can process problem-free 30,000 data records (on both
    tables) under 10 sec without a large memory capacity.

    but...

    It can occur in extreme cases by far more data records. 500,000, 1,000,000…
    and so on

    Of course, i would like use my procedure for small data sets and for large
    data sets. So that the memory in case of large data sets does not run out, i
    need a possibility to cache the data local in files.
    Is that possible with Cayenne? I mean, is possible to set cayenne
    (property?) so that their ObjectStore becomes out in files and work with
    that?

    I have read the tips here:
    http://cwiki.apache.org/CAYDOC/performance-tuning.html, but it is not a
    solution for very large datasets (that memory can run out)...

    i hope my question is clear, if not, please mail me;)

    Thanks at all,

    Nikolai

    P.S. I use standard environment, not a app. or web server



    This archive was generated by hypermail 2.0.0 : Wed Aug 02 2006 - 17:46:36 EDT