ZF-7665: MultiSearcher gives a document id out of range error

Description

using Zend_Search_Lucene_Interface_MultiSearcher gives an error when the find function is used. The reason for this is that the find function searches through each index and increments each result document id based on its position in the combined results. Commenting out that for loop resolved in Zend/Search/Lucene/MultiSearcher.php from 467/471 fixes. Though this feature was in the 1.8 reslease is this a production ready feature? (there isn't mention of it in the documentation and the namespacing doesn't follow codiing standards.)

Comments

Line 476 (ZF 1.9.6) $indexShift += $index->count();

Should be changed to something like this: $indexShift += count($hits);

The index id location should be incremented by the number of hits, not the complete size of the index. The above worked for me. Removing the loop might create duplicate id's if using multiple indexes in different locations.

"$indexShift += count($hits);" still wrong, but it will work if you have hits only in one of indexes. As far as I can see id is internal id of document in index and cannot be changed in any way or you will get this error. Each hit have index link so you can retrive document from right index. You can determine from which index this one is by comparing $hit->getIndex() withyour indexes or by adding unindexed field with type entity in all your documents. I have field "entity" to determine which kind of result is that hit.

I dont know how to add patch here (and I actually didn't patch but extend class) but there is my find function without that bug and with merging result to preserve sorting by score. It may be buggy as I just wrote it but seems working.

public function find($query) { if (count($this->_indices) == 0) { return array(); }

    $hitsList = array();

    foreach ($this->_indices as $index) {
        $hits = $index->find($query);
        if (count($hits) > 0) { //Adding only not empty results
            $hitsList[] = $hits;
        }
    }

    $partsCount = count($hitsList);
    //No results in all indexes
    if ($partsCount == 0) {
        return array();
    } elseif ($partsCount == 1) { //Only one index in results
        return $hitsList[0];
    }

    /** Merging arrays with results to preserve sorting
     *  (something like mergesort with any count of parts)
     */
    $result = array();
    $values = array();//Max values of score in all arrays

    foreach ($hitsList as $index => $hits) {
        $values[$index] = $hits[0]->score;
    }

    do {
        $max = reset($values);
        $maxkey = key($values);
        foreach ($values as $index => $value) {
            if ($value > $max) {
                $max = $value;
                $maxkey = $index;
            }
        }
        //Now we now where is a best score
        $result[] = array_shift($hitsList[$maxkey]);
        if (empty($hitsList[$maxkey])) {
            unset($values[$maxkey]);
        } else {
            $values[$maxkey] = reset($hitsList[$maxkey])->score;
        }
    } while(!empty($values));
    return $result;
}

This is still an issue in 1.11.7.

I extended Zend_Search_Lucene_Interface_MultiSearcher and overrode the find method with the below. This is a stripped down Merge Sort


public function find($query)
{
    if (count($this->_indices) == 0) {
        return array();
    }

    $hitsList = array();
    foreach ($this->_indices as $index) {
        $hits = $index->find($query);

        $hitsList = $this->merge($hits, $hitsList);
    }

    return $hitsList;
}

protected function merge(&$leftList, &$rightList)
{
    $results = array();
    while(0 < count($leftList) && 0 < count($rightList)) {
        if($leftList[0]->score < $rightList[0]->score) {
            $results[] = array_shift($leftList);
        } else {
            $results[] = array_shift($rightList);
        }
    }

    $results = count($leftList) > count($rightList) 
        ? array_merge($results, $leftList) : array_merge($results, $rightList);

    return $results;
}