[LONG] reasons why "advanced" expressions are needed (Was: [Newbie] Advanced expressions)

From: Giulio Cesare Solaroli (slrgcs..bn-italy.com)
Date: Tue Sep 30 2003 - 13:18:19 EDT

Next message: Andrus Adamchik: "Re: DataContext delegate?"

Previous message: Giulio Cesare Solaroli: "Re: [Newbie] Advanced expressions"
In reply to: Andrus Adamchik: "Re: [Newbie] Advanced expressions"
Next in thread: Andrus Adamchik: "Re: DataContext delegate?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Monday, Sep 29, 2003, at 18:52 Europe/Rome, Andrus Adamchik wrote:

> Hi,
>
> I would really like to see many things that you suggested in Cayenne
> (and don't really object to the reset :-)). Per your discussion with
> Mike, in any possible implementation, the main issue to address is
> cross-db compatibility. This can be done in Cayenne, since Query +
> Expression are abstractions translated to SQL using
> org.objectstyle.cayenne.access.QueryTranslator. Subclass of
> QueryTranslator can be returned by DbAdapter, so e.g. EXISTS can be
> substituted by WHERE count(1) > 0. Default implementation should
> normally use the least common denominator.
>
> On Monday, September 29, 2003, at 05:41 AM, Giulio Cesare Solaroli
> wrote:
>> In EOF we have written a few qualifiers (in Cayenne they are called
>> expressions, if I get it right) like:
>> - empty relationship;
>> - not empty relationship;
>> - no match for value;
>
> I guess we need to define expression semantics first, and then see how
> the SQL translation can be done. Something like Expression.IS_EMPTY,
> Expression.IS_NOT_EMPTY?
>
>
>> - match all values;
>
> IN Expression already takes care of that.

Here are a few examples I hope can illustrate how a naive
implementation of the above expressions can lead to differences between
the common logic and the achieved results.

Let me try to use a simplified auction scenario; everything is fantasy
here, so don't take it too seriously. It is just for illustration
purposes.

EMPTY RELATIONSHIP EXPRESSION
-------------------------------
Suppose we have a "Item" entity and "Bid"; "Bid" has a to-one
relationship to "Item" ("to_item") and "Item" has a to-many
relationship with "Bid" ("to_bids").

Now, suppose we want to find all the items that have no bids; What I
would like to do is write something like this:

        expression = EmptyRelationshipExpression("to_bids");
        query = new SelectQuery(Item.class, expression);
        result = dataContext.performQuery(query);

This should generate the following SQL statement:

      select t0.id_item, t0....
      from item t0
        where not exists (select t1.id_bid
                         from t1 bid
                         where t0.id_item = t1.id_item)

This problem could be much simpler if a simple to-one relationship (or
list of to-one relationships) had to be followed in order to define the
expression. In these situations, a "simple" noMatchExp would be
perfectly fine, but I think that delegating to the user the choice of
which expression to use is plain wrong. To the user the semantic of the
two expressions are just the same; the problem is that, due to how the
entities are related to each other (and how the tables are arranged on
the DB) two differente solutions needs to be used. But this is a task
that the code should take care of, as it has all the elements to
correctly choose the right way to build the SQL statement.

NOT EMPTY RELATIONSHIP
------------------------
Almost like the empty relationship qualifier, but in the SQL statement
the where clause looks like "where exists (....)".

MATCH ALL VALUES
------------------
> IN Expression already takes care of that.

I have not checked deeply what SQL will be generated by the IN
Expression, but I guess it will fail to produce the "right" statements
if a to-many relationship is involved.

Example: let's take the structure described above and add a "User"
entity.

Item <-->> Bid <<--> User

The "Bid" has a to-one relationship to the "User" entity ("to_user");
the "User" entity has a to-many relationship to the "Bid" entity
("to_bids").

Now, suppose I am interested in the following expression: "all items
that have bids done by User_A and User_B"

The code I would like to write is something like this:

        users = new Vector();
        users.add(User_A);
        users.add(User_B);
        expression = MatchAllValuesExpression("to_bids.to_user", users);
        query = new SelectQuery(Item.class, expression);
        result = dataContext.performQuery(query);

This should generate the following SQL statement:

      select t0.id_item, t0....
      from item t0, bid t1, bid t2, user t3, user t4
        where t0.id_item = t1.id_item
        and t1.id_user = t3.id_user
        and t3.id_user = <id User_A>
        and t0.id_item = t2.id_item
        and t2.id_user = t3.id_user
        and t3.id_user = <id User_B>

As you can see, in oder to achieve the same semantic that the
expression above implies, here we need to "duplicate" the "to_bids"
relationship, in order to generate a set of distinct joins for every
value of the values array.

The problem gets a little bit tougher if many to-many relationships are
involved in the path to the information I want to check.
Let's add one more entity, "Appraisal", with two to-one relationship to
the "User" entity, one for the author of the appraisal, "to_author",
and the other to the subject of the appraisal "to_subject", both with
the relevant inverse to-many relationships ("to_published_appraisal"
and "to_about_appraisal").

| |<- to_item ---| |--- to_user ->| |<------- to_author
----------| |------- to_subject ------>| |
| Item | | Bid | | User |
| Appraisal | | User |
| |-- to_bids ->>| |<<- to_bids --| |--
to_published_appraisal -->| |<<-- to_about_appraisal --|
|

I hope this "sketch" makes the arrangement clear enough.

Now, I want get the "items with bids done by users that have received
an appraisal posted by User_A and User_B"
(this could sound quite artificial, and it may even be, but this same
condition happened to us solving a real problem, otherwise I would have
never found this problem in the first place).

The code looks quite the same:

        users = new Vector();
        users.add(User_A);
        users.add(User_B);
        expression =
MatchAllValuesExpression("to_bids.to_user.to_about_appraisal.to_author",
  users);
        query = new SelectQuery(Item.class, expression);
        result = dataContext.performQuery(query);

This COULD (and beware, could, as there is an hidden ambiguity in the
following solution) generate the following SQL statement:

      select t0.id_item, t0....
      from item t0, bid t1, bid t2, user t3, user t4, appraisal t5,
appraisal t6, user t7, user t8
        where t0.id_item = t1.id_item
        and t1.id_user = t3.id_user
        and t3.id_user = t5.id_subject
        and t5.id_author = t7.id_user
        and t7.id_user = <id User_A>
        and t0.id_item = t2.id_item
        and t2.id_user = t4.id_user
        and t4.id_user = t6.id_subject
        and t6.id_author = t8.id_user
        and t8.id_user = <id User_B>

Here the whole relationships followed by the expression have been
duplicated. But the results will not be quite right; can you see why?

The problem is that the above query finds "items with bids done by
users that have received an appraisal posted by User_A AND items with
bids done by users that have received an appraisal posted by User_B".
This is a completely different question!

How should the right SQL statement look like? (I leave the same table
aliases as in the previous query to simplify the comparison)

      select t0.id_item, t0....
      from item t0, bid t1, user t3, appraisal t5, appraisal t6, user
t7, user t8
        where t0.id_item = t1.id_item
        and t1.id_user = t3.id_user
        and t3.id_user = t5.id_subject
        and t5.id_author = t7.id_user
        and t7.id_user = <id User_A>
        and t3.id_user = t6.id_subject
        and t6.id_author = t8.id_user
        and t8.id_user = <id User_B>

Can you spot the difference? Basically, you should avoid to split the
to-many relationships until the Appraisal entity.
This means that the original code does not give enough information to
the library to decide how to behave. This because both the above
queries are reasonable and can be useful in different contextes.

Now the code could be changed to add the new information:

        users = new Vector();
        users.add(User_A);
        users.add(User_B);
        expression =
MatchAllValuesExpression("to_bids.to_user.to_about_appraisal.to_author",
  users);
        expression.dontSplitUpTo("to_bids.touser");
        query = new SelectQuery(Item.class, expression);
        result = dataContext.performQuery(query);

This syntax is quite random, but should give an idea of what is needed.
In this case, the default implementation of the expression could try to
split the relationship on the first to-many relationship found
following the given path. If the dontSplitUpTo method is invoiced, the
expression will split the first to-many relationship found AFTER the
path passed in the method call.

Have I been clear enough? Can you see my point?

NO MATCH FOR VALUE
--------------------
Here the situation si almost like the "empty relationship" situation,
but instead of having a simple joing inside of the not exists where
clause, you can have any expression.

I avoid further description and examples as this post is getting too
long, but I could explain further if needed.

>> [...]

I hope this LONG message have shown you my point.

Now the main question is: which is the best place, given Cayenne
current architecture, to place the logic to handle these cases?
I will be very pleased to "migrate" our current EOF/Objective-C
implementation into Cayenne if this is achievable with a reasonable
effort.

Thank you very much for you patience.

Giulio Cesare Solaroli

Next message: Andrus Adamchik: "Re: DataContext delegate?"
Previous message: Giulio Cesare Solaroli: "Re: [Newbie] Advanced expressions"
In reply to: Andrus Adamchik: "Re: [Newbie] Advanced expressions"
Next in thread: Andrus Adamchik: "Re: DataContext delegate?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Tue Sep 30 2003 - 13:18:03 EDT