Use of DISTINCT in "to-many" qualifiers [Was: Multiple flattened relationships...]

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Tue May 13 2003 - 12:28:03 EDT

  • Next message: Laszlo Spoor: "Re: Use of DISTINCT in "to-many" qualifiers [Was: Multiple flattened relationships...]"

    Laszlo,

    I kind of expected such result, but your message prompted me to look for
    an alternative solution. I did some research on various queries
    performance on Oracle. In my case there are 2 tables with to-many from one
    to another, both having around 100000 to 200000 rows. I tried running 3
    flavors of the same query, directly on Oracle, with no JDBC involved:

    1. joining on to-many with no DISTINCT
    2. joining on to-many with DISTINCT
    3. Using correlated subquery with EXISTS

    (1) was the fastest - 2 seconds in my case, but produced incorrect results
    (duplicate rows), (2) was the slowest - 4 seconds. (3) was in between (1)
    and (2) - 3 to 4 seconds. So using correlated subquery with EXISTS may
    yield to faster queries in some cases. Though I believe that playing with
    indexes may alter the results dramatically either way.

    We may look into adding this optimization in the future. It will have to
    be optional though, since many databases do not support correlated
    subqueries and EXISTS (MySQL prior to 4.*).

    Andrus

    > Hi Andrus,
    >
    > About the 'DISTINCT' part that I would write you back about.
    >
    > To be sure I formulated my case well, I had to look it up in a couple of
    > guides. As you will notice I will use the Oracle database as the
    > datastore, simply because I have the most experience with this database.
    >
    > I'd got confirmed that the most important reason not to use DISTINCT is
    > the fact that is forces the database to do a 'SORT UNIQUE' which can
    > cost more disk reads and more processing time when reading the results.
    >
    > This information comes from the 'Oracle 9i Performance and Tuning Guide,
    > p2-14):
    > The distinct part results in a SORT-UNIQUE action... SORT UNIQUE occurs
    > if a user specifies a DISTINCT clause (Example 2û15)or if an operation
    > requires unique values for the next step (Example 2û16). Example 2û15
    > DISTINCT Clause Causing SORT UNIQUE
    >
    > SELECT DISTINCT last_name, first_name
    > FROM per_all_people_f
    > WHERE full_name LIKE :b1;
    >
    > Plan
    > --------------------------------------------------
    > SELECT STATEMENT
    > SORT UNIQUE
    > TABLE ACCESS BY INDEX ROWID PER_ALL_PEOPLE_F
    > INDEX RANGE SCAN PER_PEOPLE_F_N54
    >
    > Another, not technical, reason is that in general it is a bad practice
    > to use
    > DISTINCT, because the query writer 'admits' that his 'where-logic' is
    > not strong
    > enough to return the results that are required. However, I don't quite
    > think this is 100% valid for Cayenne, because Cayenne can not always
    > tell if that is
    > the case.
    >
    > Still I do feel that the developer can have the option leaving the
    > distinct out
    > by using query.setDistinct(false), because it gives somewhat more space
    > to tuning.
    >
    > Regards, Laszlo
    >
    > _________________________________________________________________
    > MSN 8 with e-mail virus protection service: 2 months FREE*
    > http://join.msn.com/?page=features/virus



    This archive was generated by hypermail 2.0.0 : Tue May 13 2003 - 12:28:03 EDT