Paging and SQL queries

From: Aristedes Maniatis (ar..sh.com.au)
Date: Fri May 25 2007 - 01:11:31 EDT

  • Next message: Andrus Adamchik: "Re: unsubscribe"

    We have recently been testing our 3 tier Cayenne app with a larger
    number of records and finding some performance problems with scale.
    In particular we have a test database with 60,000 student records
    (the production db will have 120,000). We are using paging (of 25
    records) to ensure that when a user brings up the list of students on
    the client we don't load them all across the network. However we are
    still seeing load times of around 40 seconds. At the moment we are
    using objects, not DataRows on the client.

    One part of the problem appears to occur on the server. The SQL
    Cayenne generates looks like this:

    SELECT * FROM Student t0 WHERE (t0.isDeleted IS NULL) OR
    (t0.isDeleted = 0) ORDER BY t0.firstName

    although the * is actually a list of every field in the table. We
    timed this query against the db and even directly without Cayenne it
    takes 16 seconds to return all the data. I presume the next 24
    seconds Cayenne spends parsing the results into DataRows or Objects.
    Instead, if we executed:

    SELECT t0.id FROM Student t0 WHERE (t0.isDeleted IS NULL) OR
    (t0.isDeleted = 0) ORDER BY t0.firstName

    Then it returns in about 2 seconds (when performing this query
    directly against the db and not through Cayenne).

    Q1: So, my first question is why is Cayenne doing this? If paging is
    switched on, shouldn't Cayenne be fetching only the primary keys and
    then faulting in the particular records it needs in full?

    We can see paging working with the query which Cayenne executes next:

    SELECT * t0 WHERE ((t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR
    (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id
    = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR
    (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id
    = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR
    (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?)) AND
    ((t0.isDeleted IS NULL) OR (t0.isDeleted = ?)) [bind: 18569, 18521,
    18253, 18060, 17932, 16451, 16395, 15967, 15819, 14773, 14492, 14317,
    14280, 13088, 12464, 11183, 10957, 10378, 10219, 9686, 9554, 9435,
    9155, 8611, 8038, 0]

    25 records are paged into Cayenne as expected. But Cayenne has
    already fetched into memory (server-side) all the records in the
    original query.

    Q2: Would it more more efficient from a db perspective to use the
    "t0.id IS IN (?, ?, ?...)" style syntax here instead or does it make
    no difference? I don't know enough about SQL optimisation to know
    either way, but at least the SQL is shorter (which might be good!).

    Cheers
    Ari Maniatis

    -------------------------->
    ish
    http://www.ish.com.au
    Level 1, 30 Wilson Street Newtown 2042 Australia
    phone +61 2 9550 5001 fax +61 2 9550 4001
    GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A





    This archive was generated by hypermail 2.0.0 : Fri May 25 2007 - 01:12:17 EDT