We have recently been testing our 3 tier Cayenne app with a larger
number of records and finding some performance problems with scale.
In particular we have a test database with 60,000 student records
(the production db will have 120,000). We are using paging (of 25
records) to ensure that when a user brings up the list of students on
the client we don't load them all across the network. However we are
still seeing load times of around 40 seconds. At the moment we are
using objects, not DataRows on the client.
One part of the problem appears to occur on the server. The SQL
Cayenne generates looks like this:
SELECT * FROM Student t0 WHERE (t0.isDeleted IS NULL) OR
(t0.isDeleted = 0) ORDER BY t0.firstName
although the * is actually a list of every field in the table. We
timed this query against the db and even directly without Cayenne it
takes 16 seconds to return all the data. I presume the next 24
seconds Cayenne spends parsing the results into DataRows or Objects.
Instead, if we executed:
SELECT t0.id FROM Student t0 WHERE (t0.isDeleted IS NULL) OR
(t0.isDeleted = 0) ORDER BY t0.firstName
Then it returns in about 2 seconds (when performing this query
directly against the db and not through Cayenne).
Q1: So, my first question is why is Cayenne doing this? If paging is
switched on, shouldn't Cayenne be fetching only the primary keys and
then faulting in the particular records it needs in full?
We can see paging working with the query which Cayenne executes next:
SELECT * t0 WHERE ((t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR
(t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id
= ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR
(t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id
= ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR
(t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?)) AND
((t0.isDeleted IS NULL) OR (t0.isDeleted = ?)) [bind: 18569, 18521,
18253, 18060, 17932, 16451, 16395, 15967, 15819, 14773, 14492, 14317,
14280, 13088, 12464, 11183, 10957, 10378, 10219, 9686, 9554, 9435,
9155, 8611, 8038, 0]
25 records are paged into Cayenne as expected. But Cayenne has
already fetched into memory (server-side) all the records in the
original query.
Q2: Would it more more efficient from a db perspective to use the
"t0.id IS IN (?, ?, ?...)" style syntax here instead or does it make
no difference? I don't know enough about SQL optimisation to know
either way, but at least the SQL is shorter (which might be good!).
Cheers
Ari Maniatis
-------------------------->
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001 fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A
This archive was generated by hypermail 2.0.0 : Fri May 25 2007 - 01:12:17 EDT