ZF-5554: records missed in returned when doing a search with two or more strings with AND
Description
I have found cases that some records has missed in return to a search with two or more strings with AND criteria (even using generic search API); I've traced into the code and found the following facts:
- the score of a MultiTerm search becomes zero (the terms when searched separately can return results, i.e. score > 0)
- some of the $this->_termsFreqs[$termId][$docId] using in the for loop of member function "_conjunctionScore" is null and caused $reader->getSimilarity()->tf($this->_termsFreqs[$termId][$docId]) [MultiTerm.php line 473] to return 0
- the "used" $docsFilter created in [MultiTerm.php line 330] in member function "_calculateConjunctionResult" caused the last loop for the termFreqs failed to be assigned values
As I have only limited time for reverse engine, I can't fully understand the use of "DocsFilter" at the moment, I am wondering whether this can be corrected by simply modify the member function as follows:
private function _calculateConjunctionResult(Zend_Search_Lucene_Interface $reader)
{
$this->_resVector = null;
if (count($this->_terms) == 0) {
$this->_resVector = array();
}
// Order terms by selectivity
$docFreqs = array();
$ids = array();
foreach ($this->_terms as $id => $term) {
$docFreqs[] = $reader->docFreq($term);
$ids[] = $id; // Used to keep original order for terms with the same selectivity and omit terms comparison
}
array_multisort($docFreqs, SORT_ASC, SORT_NUMERIC,
$ids, SORT_ASC, SORT_NUMERIC,
$this->_terms);
$docsFilter = new Zend_Search_Lucene_Index_DocsFilter();
foreach ($this->_terms as $termId => $term) {
$termDocs = $reader->termDocs($term, $docsFilter);
}
// Treat last retrieved docs vector as a result set
// (filter collects data for other terms)
$this->_resVector = array_flip($termDocs);
$docsFilter = new Zend_Search_Lucene_Index_DocsFilter(); // <---------------------- added to re-initialize the $docsFilter
foreach ($this->_terms as $termId => $term) {
$this->_termsFreqs[$termId] = $reader->termFreqs($term, $docsFilter);
}
}
Comments
Posted by Sjoerd van Noort (seven007) on 2009-05-29T02:06:01.000+0000
Reinitialising the DocsFilter as mentioned here solves the Undefined offset notice from issue ZF-5545 for me. Although I don't understand how/if this influences search result. It seems to me, although I don't understand the issue fully, that with this fix the function $reader->termFreqs has to redo some work because it does not have the Docfilter data from the previous $reader->termDoc calls