ZF-5554: records missed in returned when doing a search with two or more strings with AND

Description

I have found cases that some records has missed in return to a search with two or more strings with AND criteria (even using generic search API); I've traced into the code and found the following facts:

  1. the score of a MultiTerm search becomes zero (the terms when searched separately can return results, i.e. score > 0)
  2. some of the $this->_termsFreqs[$termId][$docId] using in the for loop of member function "_conjunctionScore" is null and caused $reader->getSimilarity()->tf($this->_termsFreqs[$termId][$docId]) [MultiTerm.php line 473] to return 0
  3. the "used" $docsFilter created in [MultiTerm.php line 330] in member function "_calculateConjunctionResult" caused the last loop for the termFreqs failed to be assigned values

As I have only limited time for reverse engine, I can't fully understand the use of "DocsFilter" at the moment, I am wondering whether this can be corrected by simply modify the member function as follows:

private function _calculateConjunctionResult(Zend_Search_Lucene_Interface $reader)
{
    $this->_resVector = null;

    if (count($this->_terms) == 0) {
        $this->_resVector = array();
    }

    // Order terms by selectivity
    $docFreqs = array();
    $ids      = array();
    foreach ($this->_terms as $id => $term) {
        $docFreqs[] = $reader->docFreq($term);
        $ids[]      = $id; // Used to keep original order for terms with the same selectivity and omit terms comparison
    }
    array_multisort($docFreqs, SORT_ASC, SORT_NUMERIC,
                    $ids,      SORT_ASC, SORT_NUMERIC,
                    $this->_terms);

    $docsFilter = new Zend_Search_Lucene_Index_DocsFilter();
    foreach ($this->_terms as $termId => $term) {
        $termDocs = $reader->termDocs($term, $docsFilter);
    }
    // Treat last retrieved docs vector as a result set
    // (filter collects data for other terms)
    $this->_resVector = array_flip($termDocs);

    $docsFilter = new Zend_Search_Lucene_Index_DocsFilter();                                               // <---------------------- added to re-initialize the $docsFilter
    foreach ($this->_terms as $termId => $term) {
        $this->_termsFreqs[$termId] = $reader->termFreqs($term, $docsFilter);

    }

}

Comments

Reinitialising the DocsFilter as mentioned here solves the Undefined offset notice from issue ZF-5545 for me. Although I don't understand how/if this influences search result. It seems to me, although I don't understand the issue fully, that with this fix the function $reader->termFreqs has to redo some work because it does not have the Docfilter data from the previous $reader->termDoc calls