On 23/07/2007, at 11:29 PM, Jean-Paul Le Fèvre wrote:
> Hi,
>
> I'm trying to import a pretty big amount of data into my database.
> The input is a xml formatted file. It describes more than 10 millions
> of objects each having tens of attributes. The application parses
> the input
> file, creates the cayenne objects and commits the changes if
> requested.
We've just written something quite similar. We tested it with
something like 500,000 objects across several tables.
> As you can imagining I'm facing difficulties trying to avoid out of
> memory
> errors. Unfortunately, at this point, I'm still unable to load my big
> input file.
As a starting point I hope you are using a SAX parser for your XML.
>
> To figure out what it's happening I'm monitoring the application
> behavior
> with jconsole. My tactic is the following : every 10000 objects
> (this number
> is a parameter) I call rollbackChanges() or commitChanges().
We committed each object individually (or sometimes a logical group
of objects). That way we could have very fine control and validation
errors only caused the loss of that particular record (or small
group). I don't know that it helps to commit in batches like this.
>
> When I run the program in rollback mode It turns out that the
> memory used
> oscillates between a min and a max value as expected : after each
> rollback
> the garbage collector feels free to cleanup the memory.
>
> But in commit mode the amount of memory keeps on increasing and the
> application fails eventually.
Probably because the context continues to fill up with the objects
you are committing. They aren't discarded. Try creating a new context
(and discarding the old for the gc to clean up) after every couple
thousand records.
In our situation we were able to get away with 256Mb RAM on the
client (we are running this as ROP) and 512Mb RAM on the server (most
of which appears to be used by Derby).
Ari
-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2 A9 BF 2D 48 C2 20 0C C8
This archive was generated by hypermail 2.0.0 : Mon Jul 23 2007 - 09:45:11 EDT