Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

From: Andrey Razumovsky (razumovsky.andre..mail.com)
Date: Mon May 25 2009 - 05:38:44 EDT

  • Next message: stefcl: "Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery"

    I'm using latest postgres (or mysql) driver. To make a test, I put 1M
    records in db table, then do "select * from that_table". No matter which
    fetch size I tried to set, at the moment of PreparedStastement.execute()
    something about 70-100Mb are allocated (I don't actually remember the
    quantity right now). So, if heap size is small, I get OutOfMemory.
    Otherwise, it's all ok and rows can be quickly iterated. Looking at code on
    postgres site, the only difference is that I didn't bother with autocommit
    mode. I'm not sure Cayenne does, too... Maybe this is the reason why I
    didn't get positive results.

    Andrey

    2009/5/25 Andrus Adamchik <andru..bjectstyle.org>

    > [Forwarding to the dev list. Stéphane, please forward your replies here as
    > well or subscribe to dev if you feel like it]
    >
    > Stéphane, thanks for providing this information. It is encouraging.
    >
    > The mismatch in the results between you and Andrey is exactly why I wanted
    > to do real profiling, as we need to see how the memory allocation changes in
    > the process. OutOfMemory may or may not happen depending on many factors.
    > What we care about here is the dynamics of ResultSet reading.
    >
    > I see that Andrey has already committed this code to trunk, so now we have
    > something to play with. If you don't mind using a trunk build (doing it
    > yourself or grabbing one from Hudson), you can give us the feedback on the
    > implementation.
    >
    > Andrus
    >
    >
    > Begin forwarded message:
    >
    >> From: Stéphane Claret <stefatwor..mail.com>
    >> Date: May 25, 2009 11:58:52 AM GMT+03:00
    >> To: Andrus Adamchik <andru..bjectstyle.org>
    >> Subject: Re: Big selects on PostGres : Configuring
    >> Statement.setFetchSize() of a selectquery
    >>
    >> Hello,
    >>
    >> To answer, setting fetchSize with postgres does make a difference.
    >> If you consider the following code copy-pasted from my test case:
    >>
    >> PreparedStatement stmt =
    >> sqlMapper.getCurrentConnection().prepareStatement("SELECT
    >> itemID,sellingprice,margin,popularityrating,name,att1,att2,att3,att4,att5,longtailpos,timest
    >> FROM itemdetails WHERE profileID= 10 ORDER BY longtailpos");
    >>
    >> stmt.setFetchSize(1000);
    >> stmt.execute();
    >> ResultSet set = stmt.executeQuery();
    >>
    >> int i = 0;
    >>
    >>
    >> while( set.next())
    >> {
    >> i++;
    >> System.out.println(set.getString(1));
    >> }
    >>
    >>
    >> It can iterates through approx 1'500'000 rows without any memory issues
    >> (java process stays at approx 20mo while the loop executes). I can also see
    >> that it takes no more than a few seconds before the first row is printed in
    >> the output console, meaning that results are still being fetched from the DB
    >> at the moment the resultset is accessed.
    >>
    >> Now if I comment the line "stmt.setFetchSize(1000);" in the above code,
    >> execution stops at "ResultSet set = stmt.executeQuery();" for approx 15
    >> seconds and then an OutOfMemoryException is thrown from JDBC code. Also in
    >> cayenne code, that would be before the resultset is even accessed (making
    >> ResultIerator useless).
    >>
    >> A similar problem is exposed here, with a sample stacktrace :
    >>
    >> http://www.nabble.com/java.sql.Statement-generates-java.lang.OutOfMemoryError-in-big-tabe-td21568763.html
    >>
    >> And the reason is detailed here :
    >> http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor
    >>
    >> Perhaps other drivers that automatically use server side cursors aren't
    >> affected or just ignore this setting but I can say for sure that this
    >> setting is necessary with postGres.
    >>
    >> Andrey, you say you keep getting OutOfMemory exceptions no matter which
    >> value you set. Do you get these exceptions while iterating through the
    >> results or during the call to executeQuery?
    >> If you get it while looping through the resultset, it could mean that
    >> somehow the rows objects aren't garbage collected, it can happen with
    >> cayenne version 2.0 due to the dataContext holding strong references to the
    >> dataObjects. I didn't want to write a fix for v3 at first, but I realized
    >> that in v2, I had to manually unregister the dataObjects from the context in
    >> order to keep the memory usage in safe areas.
    >>
    >> If I can be of any help, please ask.
    >> Best regards...
    >>
    >> Stéphane
    >>
    >>
    >> 2009/5/25 Andrus Adamchik <andru..bjectstyle.org>
    >>
    >> Somehow I am not surprised. I may also try that on Oracle when the code
    >> becomes available on trunk.
    >>
    >> (BTW not sure that Stephane is subscribed to the dev list, so cc'ying this
    >> message.)
    >>
    >> Andrus
    >>
    >> On May 24, 2009, at 12:52 PM, Andrey Razumovsky wrote:
    >> I should say my tests on Postgres and mysql didn't show any results. If I
    >> use small heap size, i get OutOfMemory no matter which fetch size was set.
    >> Fetch speed and memory usage are the same. Seems drivers just ignore this
    >> parameter. Stephane, did your workaround help you?
    >>
    >> Drivers of hsql, postres, mysql didn't throw any exceptions.. Maybe other
    >> DBMS drivers work well with this param. I'm going to commit today or
    >> tomorrow. Nothing will break anything if we add this ability.
    >>
    >> Andrey
    >>
    >> 2009/5/21 Andrus Adamchik <andru..bjectstyle.org>
    >>
    >> Cool :-)
    >>
    >> On May 21, 2009, at 3:59 PM, Andrey Razumovsky wrote:
    >>
    >> Now when I know of this JDBC feature, I'd prefer to have it in Cayenne
    >> sooner that later. We would also want to have it for other queries than
    >> just
    >> only SelectQuery.
    >>
    >>
    >> Absolutely. This has to be a part of the QueryMetadata on the backend. On
    >> the frontend any query that can potentially select data should have a
    >> corresponding setter.
    >>
    >> Could someone help me testing it against different types
    >> of database if I commit?
    >>
    >>
    >> I can test it on almost all DB's that we support. Of course we should have
    >> Cayenne unit tests that will provide regression (i.e. driver XYZ doesn't
    >> throw UnsupportedOperationException when we call a corresponding JDBC
    >> method).
    >>
    >> In addition to that I'd like to see if there's really memory/speed savings
    >> when using that (i.e. is it really worth it). For that I suggest writing a
    >> JDBC test outside of Cayenne, that can be run in profiler against
    >> different
    >> DB's.
    >>
    >> Andrus
    >>
    >>
    >>
    >>
    >



    This archive was generated by hypermail 2.0.0 : Mon May 25 2009 - 05:39:32 EDT