ZF-5545: Undefined offset notice in Search/Lucene/Search/Query/MultiTerm.php

Description

When doing a search using boolean operators and more than one search terms:

+PHP +Zend

Or

PHP AND Zend

A PHP notice is thrown: PHP Notice: Undefined offset: 2511 in /path/to/Zend/Search/Lucene/Search/Query/MultiTerm.php on line 467

Please note that the number 2511 changes for each "hit". The first notice has the lowest number, and the last notice the highest number.

I've not been able to spot any problems relating to this notice, other than the fact that it's quite annoying to look at in the errorlog.

Comments

I just received same error. It looks like it's a problem with my data.

Notice: Undefined offset: 39996 in Z:\Search\Lucene\Search\Query\MultiTerm.php on line 467

My table has 50 rows, some row's data (because of encoding and hyphenations) seem to have characters which aren't representable or convertible to any latin characters, I'm not sure if that is a prob, but anyways,

Here is an example of my search, the # is actually a Danish character, O with a slash thru it, but because of encoding probs, shows up as >> :

Koffe S#rensen A

In search/query/Boolean.php I did a print_r($this -> _subqueries) and here are some highlights:

-----------------------------------------------------

[_terms:private] => Array ( [0] => Zend_Search_Lucene_Index_Term Object ( [field] => name [text] => rensen )

                [1] => Zend_Search_Lucene_Index_Term Object
                    (
                        [field] => name
                        [text] => s
                    )

                [2] => Zend_Search_Lucene_Index_Term Object
                    (
                        [field] => name
                        [text] => a
                    )

            )

-----------------------------------------------------

[_termInfoCache:private] => Array ( [name�s] => [name�rensen] => [name�koffe] => [name�a] => Zend_Search_Lucene_Index_TermInfo Object ( [docFreq] => 4 [freqPointer] => 0 [proxPointer] => 0 [skipOffset] => 0 [indexPointer] => )

                                                    )

-----------------------------------------------------

I don't get this error when I do the same search with data from the same table that is all normal latin chars.

I'm seeing this too with 1.7.7. Upgrading from 1.6.2 - so indexes were created in 1.6.2. On a site where there is likely non-latin UTF-8 content in the index I see this - where content is strictly latin characters I'm not seeing it. Does this mean a switch in the analyzer to utf8 and i have to compile in the php mb library (which so far I have avoided)?

more information

a development server where this works does have mbstring compiled into php. this error appears on the production server without mbstring. also downloaded 1.7.0 and the same error appears there - so this was introduced between 1.6.2 and 1.7.0.

i looked at the index with Luke and the Undefined offset numbers correspond to the Doc. Id in Luke.

Each time it throws the 'Notice' warning the result corresponding to the Doc Id is not returned.

new php with or without mbstring didn't make a difference on existing index

last thing I can think of for the day - this happens when searching both multiple terms in the same field as well as a separate single terms in two different fields.

ie.

content:foo AND content:bar = error

title:foo AND content:bar = error

content:bar = ok

Is this related to ZF-5554?

quick hack for this issue, checks if [$termId][$docId] isset in _termsFreqs array.

replace codeblock starting at line 472 in MultiTerm.php with this block ...


if (isset($this->_termsFreqs[$termId][$docId])) {           
    $score += $reader->getSimilarity()->tf($this->_termsFreqs[$termId][$docId]) *
    $this->_weights[$termId]->getValue() *
    $reader->norm($docId, $term->field);                      
}
else {
    $score += $this->_weights[$termId]->getValue() *
    $reader->norm($docId, $term->field);
}

The issue is still occuring in version 1.9.2.

The snippet of Gianluca Zumaglini is indeed 'fixing' the notice and actually displaying the results again, but I don't know why the {{$this->_termsFreqs}} becomes empty.

Hi, till version 1.10.2, the $docsFilter initialized in line 343 of MultiTerm.php is used a second time in line 352. This second usage reduce the objects in this instance. The same $docsFilter instance is later used in the funktion termDocs of the class SegmentInfo and reduce the results calling somthing like "if (isset($filter[$docId])) {"

Adding "$docsFilter = new Zend_Search_Lucene_Index_DocsFilter();" in line 351 of MultiTerm.ph resolves the problem.

Fixed.