ZF-5545: Undefined offset notice in Search/Lucene/Search/Query/MultiTerm.php

Issue Type: Bug Created: 2009-01-14T06:15:38.000+0000 Last Updated: 2010-08-20T11:05:34.000+0000 Status: Resolved Fix version(s): - 1.10.8 (25/Aug/10)

Reporter: Thomas Løcke (tl_cld) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene

Related issues: - ZF-7789



When doing a search using boolean operators and more than one search terms:

+PHP +Zend



A PHP notice is thrown: PHP Notice: Undefined offset: 2511 in /path/to/Zend/Search/Lucene/Search/Query/MultiTerm.php on line 467

Please note that the number 2511 changes for each "hit". The first notice has the lowest number, and the last notice the highest number.

I've not been able to spot any problems relating to this notice, other than the fact that it's quite annoying to look at in the errorlog.


Posted by Carl Simons (cancausecancer) on 2009-03-14T10:03:39.000+0000

I just received same error. It looks like it's a problem with my data.

Notice: Undefined offset: 39996 in Z:\Search\Lucene\Search\Query\MultiTerm.php on line 467

My table has 50 rows, some row's data (because of encoding and hyphenations) seem to have characters which aren't representable or convertible to any latin characters, I'm not sure if that is a prob, but anyways,

Here is an example of my search, the # is actually a Danish character, O with a slash thru it, but because of encoding probs, shows up as >> :

Koffe S#rensen A

In search/query/Boolean.php I did a print_r($this -> _subqueries) and here are some highlights:

[_terms:private] => Array ( [0] => Zend_Search_Lucene_Index_Term Object ( [field] => name [text] => rensen )

                [1] => Zend_Search_Lucene_Index_Term Object
                        [field] => name
                        [text] => s

                [2] => Zend_Search_Lucene_Index_Term Object
                        [field] => name
                        [text] => a


[_termInfoCache:private] => Array ( [name�s] => [name�rensen] => [name�koffe] => [name�a] => Zend_Search_Lucene_Index_TermInfo Object ( [docFreq] => 4 [freqPointer] => 0 [proxPointer] => 0 [skipOffset] => 0 [indexPointer] => )


I don't get this error when I do the same search with data from the same table that is all normal latin chars.

Posted by Garth Gillespie (garth.gillespie) on 2009-03-27T12:55:38.000+0000

I'm seeing this too with 1.7.7. Upgrading from 1.6.2 - so indexes were created in 1.6.2. On a site where there is likely non-latin UTF-8 content in the index I see this - where content is strictly latin characters I'm not seeing it. Does this mean a switch in the analyzer to utf8 and i have to compile in the php mb library (which so far I have avoided)?

Posted by Garth Gillespie (garth.gillespie) on 2009-03-27T14:55:58.000+0000

more information

a development server where this works does have mbstring compiled into php. this error appears on the production server without mbstring. also downloaded 1.7.0 and the same error appears there - so this was introduced between 1.6.2 and 1.7.0.

i looked at the index with Luke and the Undefined offset numbers correspond to the Doc. Id in Luke.

Each time it throws the 'Notice' warning the result corresponding to the Doc Id is not returned.

Posted by Garth Gillespie (garth.gillespie) on 2009-03-27T16:54:50.000+0000

new php with or without mbstring didn't make a difference on existing index

Posted by Garth Gillespie (garth.gillespie) on 2009-03-27T17:05:46.000+0000

last thing I can think of for the day - this happens when searching both multiple terms in the same field as well as a separate single terms in two different fields.


content:foo AND content:bar = error

title:foo AND content:bar = error

content:bar = ok

Posted by Garth Gillespie (garth.gillespie) on 2009-03-27T23:39:51.000+0000

Is this related to ZF-5554?

Posted by Gianluca Zumaglini (gzumaglini) on 2009-04-24T07:16:04.000+0000

quick hack for this issue, checks if [$termId][$docId] isset in _termsFreqs array.

replace codeblock starting at line 472 in MultiTerm.php with this block ...

<pre class="highlight">
if (isset($this->_termsFreqs[$termId][$docId])) {           
    $score += $reader->getSimilarity()->tf($this->_termsFreqs[$termId][$docId]) *
    $this->_weights[$termId]->getValue() *
    $reader->norm($docId, $term->field);                      
else {
    $score += $this->_weights[$termId]->getValue() *
    $reader->norm($docId, $term->field);

Posted by Jachim Coudenys (coudenysj) on 2009-12-04T07:14:37.000+0000

The issue is still occuring in version 1.9.2.

The snippet of Gianluca Zumaglini is indeed 'fixing' the notice and actually displaying the results again, but I don't know why the $this->_termsFreqs becomes empty.

Posted by Hying (hying) on 2010-03-16T10:09:48.000+0000

Hi, till version 1.10.2, the $docsFilter initialized in line 343 of MultiTerm.php is used a second time in line 352. This second usage reduce the objects in this instance. The same $docsFilter instance is later used in the funktion termDocs of the class SegmentInfo and reduce the results calling somthing like "if (isset($filter[$docId])) {"

Adding "$docsFilter = new Zend_Search_Lucene_Index_DocsFilter();" in line 351 of resolves the problem.

Posted by Alexander Veremyev (alexander) on 2010-07-29T15:08:33.000+0000


Have you found an issue?

See the Overview section for more details.


© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.