|
|
|
I would like to see a use case where this makes sense to do. I have a counterexample that may indicate that we should not do this:
Let's consider an example where we have bugs and products, where each bug may relate to many products, and each product may have many bugs, thereby providing the many-to-many relationship. We then have tables bugs, products, and bugs_products, where bugs_products is the intersection table between bugs and products. Irrelevant pseudo-schema information omitted for brevity: CREATE TABLE products ( product_id PRIMARY KEY ); CREATE TABLE bugs ( bug_id PRIMARY KEY ); CREATE TABLE bugs_products ( bug_id REFERENCES bugs(bug_id), product_id REFERENCES products(product_id), PRIMARY KEY(bug_id, product_id) ); Given this structure, it should be impossible that there exists more than one identical row in the bugs_products table. When is it okay that the intersection table has multiple identical records? Even if such cases exist and are not "edge" cases, why should the framework component automatically perform DISTINCT? I for one would want to know if my database tables did not comply with normalization My feeling is that such automagical behavior is dangerous because it hides problems in the underlying database. Maybe there are compelling reasons to do this, but I'm not seeing it yet? I came across the problem with this scenario:
CREATE TABLE article( article_id PRIMARY KEY ); CREATE TABLE users ( user_id PRIMARY KEY ); CREATE TABLE comments ( comment_id PRIMARY KEY article_id REFERENCES bugs(bug_id), user_id REFERENCES products(product_id), comment ); In this scenario, each user may comment more than once on an article, but if I want to create a list of all articles that a given user has commented on, then I wouldn't want multiple article records in the result set. (I've probably designed the database incorrectly !) Given that I've got this schema, I'd be happy if I could choose to have findManyToManyRowset() use "distinct" situations like this. Thanks, Rob, for illustrating the problem you are trying to solve! This helps a lot.
I do not see a problem with your schema at all, and clearly you need the ability to select the distinct rows to solve the problem of creating the list. My concern is that we likely should not make retrieving distinct rows an implementation detail that cannot be overridden, but I agree that we should at least provide the ability to get distinct rows, in order to be able to solve problems like this, which probably are quite common. We're marking this one as "won't fix" for the following reasons:
Group By, Having, Order By, etc are useful too, but to me are clearly enhancements rather than a case that the function doesn't actually work as expected.
array_unique doesn't work on an array of arrays, unless you are thinking of another function? Even then, you'd still end up with an array of arrays rather than a Zend_Db_Rowset which is more useful, so I don't think that's a very practical solution. The only solution is to forget about the Zend_Db_Table relationship code completely and write your own code using Zend_Db_Select. Obviously I disagree with marking this as "won't fix". Adding $select->distinct() to the current findManyToManyRowset() doesn't actually break the usage of the function for the schema that Darby pointed out. If it ever did "hide" a problem, the problem it is hiding is that the database engine doesn't handle the concept of a primary key! I don't think that this is enough of a reason for making findManyToManyRowset() useless for a large set of use-cases. i.e. I can't actually think of a use-case where you would ever want the current functionality of findManyToManyRowset(). Does anyone ever want the set of destination records to include duplicates? Even the examples in the documentation exhibit this problem as the code in the examples doesn't produce the expected result if the same user reports (or is assigned to) more than one bug against a given product. i.e. in example 9.73, $productsRowset contains the same product multiple times if a user ever reports more than one bug against the same product. Ditto for example 9.74 and 9.75. (in http://framework.zend.com/manual/en/zend.db.table.relationships.html Clearly, you don't want to add an additional parameter to the function signature (which I'm fine with!), so I think that the distinct should be enforced as it doesn't break anything and will ensure that the dataset returned matches the user's expectations. I would suggest that the findManyToManyRowset() currently provides expected default behavior, since it provides all matches through the intersection table. The fundamental difference between the example I provided and the example you provided is that in your example includes multiple matching records in the intersection table, and rightfully so.
Of course, array_unique() isn't exactly what you need, and you may have to write a little code to get the desired functional effect of creating a list of distinct article rows where each identified article has at least one comment authored by the user of the queried source row. I disagree that the only solution is to forget about the relationship code. The above example is easily workable with a foreach and an array. Maybe there are other ways to solve it, too. But yes, I think using Zend_Db_Select is also a reasonable solution. Let's consider then, making DISTINCT the default behavior. If we were to do this, then how to solve the following problem? /Given a user row, how to use findManyToManyRowset() to get the articles on which the user has commented, along with a count of how many times the user commented on each article?/ Of course, the current implementation provides the solution already. But if we were to change the default behavior to DISTINCT, how now does a user with the above requirement solve the problem using the same method? You have not lost at all - and I appreciate your good sense of humor.
$articlesCommentCount = array();
foreach ($userRow->findManyToManyRowset(...) as $articleRow) {
++$articlesCommentCount[$articleRow->id];
}
// $articlesCommentCount contains the number of comments that the user has posted, indexed per article ID
This provides the extra information that would be lost by using DISTINCT and hopefully also solves the problem you presented. What do you think? Deadpan British humour FTW
The basics of the foreach() approach work fine if you don't have too many records to go through. I'd hate to do it on an 8000 odd result set! The main problem is that the subsequent foreach() in the view now iterates over an array rather than a Zend_Db_Rowset so instead of doing: foreach($this->articles as $article) { echo "<li>" . $article->title . "</li>\n"; } you do: foreach($this->articles as $article) { echo "<li>" . $article['title'] . "</li>\n"; } which breaks the convention we have here that "all iterations over database records are over objects". Hence you end up putting the type in to the variable name ($this->articleArray) which is just "messy" in my opinion! However, now that you've mentioned ORDER BY, I'm interested! I've realised that the order I get the results out in only works cos the id is broadly equivalent to date added, but alphabetically would be nice too! Ditto LIMIT would be useful for paging... Now we are clearly in "feature-request land"! I'll have to do some serious thinking on the best way to do this sort of stuff. I can see two obvious routes: 1. adding $select as a new last parameter to the function call so that the user can set up the select object first before the function does it's bit. Or, 2. follow Zend_Db_Controller_Front's lead with setParam() to allow setting stuff for the object independently of the actual function call. There's probably other, better, ways once we all start mulling over problem too! To me, the prevailing reason not to make DISTINCT be the default behavior of the many-to-many query is performance: to reduce the result set to DISTINCT entries requires the RDBMS to sort the result set, and this can incur a performance hit depending on the RDBMS brand. So that why I'd leave the default as non-distinct (or indistinct?).
One user on the fw-db mailing list said that using GROUP BY performs better than DISTINCT, but I wouldn't use that because it relies on nonstandard MySQL behavior. To get good performance, one would group by the primary key of the destination table, so as to take advantage of the primary key index. But that would leave out the other columns from the destination table, even though they are listed in the select-list. This is permitted in MySQL, but not in standard SQL, so it would break usage in some other RDBMS brands. Can we postpone this enhancement (as well as ordering, limiting, etc.) to a future version? I'd like to slow down the feature expansion in Zend_Db_Table until we can get our hands around the current functionality. I can imagine an interface for declaring a "template" Zend_Db_Select object to use in Table and Row methods. But this feature deserves to go through the proposal process. It should be that we can implement it without breaking backward compatibility, so we could do it post-ZF1.0. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Index: Abstract.php
===================================================================
— Abstract.php (revision 4462)
+++ Abstract.php (working copy)
@@ -657,7 +657,8 @@
$select = $db->select()
->from(array('i' => $interName), array())
+ ->join(array('m' => $matchName), $joinCond, '*')
+ ->distinct();
$callerMap = $this->_prepareReference($intersectionTable, $this->_getTable(), $callerRefRule);