Fwd: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Mon May 25 2009 - 05:22:17 EDT

  • Next message: stefcl: "Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery"

    [Forwarding to the dev list. Stéphane, please forward your replies
    here as well or subscribe to dev if you feel like it]

    Stéphane, thanks for providing this information. It is encouraging.

    The mismatch in the results between you and Andrey is exactly why I
    wanted to do real profiling, as we need to see how the memory
    allocation changes in the process. OutOfMemory may or may not happen
    depending on many factors. What we care about here is the dynamics of
    ResultSet reading.

    I see that Andrey has already committed this code to trunk, so now we
    have something to play with. If you don't mind using a trunk build
    (doing it yourself or grabbing one from Hudson), you can give us the
    feedback on the implementation.

    Andrus

    Begin forwarded message:
    > From: Stéphane Claret <stefatwor..mail.com>
    > Date: May 25, 2009 11:58:52 AM GMT+03:00
    > To: Andrus Adamchik <andru..bjectstyle.org>
    > Subject: Re: Big selects on PostGres : Configuring
    > Statement.setFetchSize() of a selectquery
    >
    > Hello,
    >
    > To answer, setting fetchSize with postgres does make a difference.
    > If you consider the following code copy-pasted from my test case:
    >
    > PreparedStatement stmt =
    > sqlMapper.getCurrentConnection().prepareStatement("SELECT
    > itemID
    > ,sellingprice
    > ,margin
    > ,popularityrating,name,att1,att2,att3,att4,att5,longtailpos,timest
    > FROM itemdetails WHERE profileID= 10 ORDER BY longtailpos");
    >
    > stmt.setFetchSize(1000);
    > stmt.execute();
    > ResultSet set = stmt.executeQuery();
    >
    > int i = 0;
    >
    >
    > while( set.next())
    > {
    > i++;
    > System.out.println(set.getString(1));
    > }
    >
    >
    > It can iterates through approx 1'500'000 rows without any memory
    > issues (java process stays at approx 20mo while the loop executes).
    > I can also see that it takes no more than a few seconds before the
    > first row is printed in the output console, meaning that results are
    > still being fetched from the DB at the moment the resultset is
    > accessed.
    >
    > Now if I comment the line "stmt.setFetchSize(1000);" in the above
    > code, execution stops at "ResultSet set = stmt.executeQuery();" for
    > approx 15 seconds and then an OutOfMemoryException is thrown from
    > JDBC code. Also in cayenne code, that would be before the resultset
    > is even accessed (making ResultIerator useless).
    >
    > A similar problem is exposed here, with a sample stacktrace :
    > http://www.nabble.com/java.sql.Statement-generates-java.lang.OutOfMemoryError-in-big-tabe-td21568763.html
    >
    > And the reason is detailed here :
    > http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor
    >
    > Perhaps other drivers that automatically use server side cursors
    > aren't affected or just ignore this setting but I can say for sure
    > that this setting is necessary with postGres.
    >
    > Andrey, you say you keep getting OutOfMemory exceptions no matter
    > which value you set. Do you get these exceptions while iterating
    > through the results or during the call to executeQuery?
    > If you get it while looping through the resultset, it could mean
    > that somehow the rows objects aren't garbage collected, it can
    > happen with cayenne version 2.0 due to the dataContext holding
    > strong references to the dataObjects. I didn't want to write a fix
    > for v3 at first, but I realized that in v2, I had to manually
    > unregister the dataObjects from the context in order to keep the
    > memory usage in safe areas.
    >
    > If I can be of any help, please ask.
    > Best regards...
    >
    > Stéphane
    >
    >
    > 2009/5/25 Andrus Adamchik <andru..bjectstyle.org>
    > Somehow I am not surprised. I may also try that on Oracle when the
    > code becomes available on trunk.
    >
    > (BTW not sure that Stephane is subscribed to the dev list, so
    > cc'ying this message.)
    >
    > Andrus
    >
    > On May 24, 2009, at 12:52 PM, Andrey Razumovsky wrote:
    > I should say my tests on Postgres and mysql didn't show any results.
    > If I
    > use small heap size, i get OutOfMemory no matter which fetch size
    > was set.
    > Fetch speed and memory usage are the same. Seems drivers just ignore
    > this
    > parameter. Stephane, did your workaround help you?
    >
    > Drivers of hsql, postres, mysql didn't throw any exceptions.. Maybe
    > other
    > DBMS drivers work well with this param. I'm going to commit today or
    > tomorrow. Nothing will break anything if we add this ability.
    >
    > Andrey
    >
    > 2009/5/21 Andrus Adamchik <andru..bjectstyle.org>
    >
    > Cool :-)
    >
    > On May 21, 2009, at 3:59 PM, Andrey Razumovsky wrote:
    >
    > Now when I know of this JDBC feature, I'd prefer to have it in Cayenne
    > sooner that later. We would also want to have it for other queries
    > than
    > just
    > only SelectQuery.
    >
    >
    > Absolutely. This has to be a part of the QueryMetadata on the
    > backend. On
    > the frontend any query that can potentially select data should have a
    > corresponding setter.
    >
    > Could someone help me testing it against different types
    > of database if I commit?
    >
    >
    > I can test it on almost all DB's that we support. Of course we
    > should have
    > Cayenne unit tests that will provide regression (i.e. driver XYZ
    > doesn't
    > throw UnsupportedOperationException when we call a corresponding JDBC
    > method).
    >
    > In addition to that I'd like to see if there's really memory/speed
    > savings
    > when using that (i.e. is it really worth it). For that I suggest
    > writing a
    > JDBC test outside of Cayenne, that can be run in profiler against
    > different
    > DB's.
    >
    > Andrus
    >
    >
    >



    This archive was generated by hypermail 2.0.0 : Mon May 25 2009 - 05:22:58 EDT