Issues

ZF-7518: Range Query doesn't work anyway

Description

Query a lucene index using the range format does not work.

<

pre class="highlight"> unlink($root."data/indexes/test/*"); $index = Zend_Search_Lucene::create($root."data/indexes/test"); $doc = new Zend_Search_Lucene_Document(); $doc->addField(Zend_Search_Lucene_Field::Keyword('test', 9)); $index->addDocument($doc); $index->commit();

echo "

";
$hits = $index->find('test:[1 TO 10]');
echo count($hits)." hits\n"; 
foreach($hits as $hit){
  echo "  -> $hit->score - $hit->id\n";
}

$hits = $index->find('test:9');
echo count($hits)." hits\n"; 
foreach($hits as $hit){
  echo "  -> $hit->score - $hit->id\n";
}

$hits = $index->find('test:11');
echo count($hits)." hits\n"; 
foreach($hits as $hit){
  echo "  -> $hit->score - $hit->id\n";
}

0 hits
1 hits
  -> 0.30685281944005 - 0
0 hits

1 hits
  -> 1 - 0
1 hits
  -> 1 - 0
0 hits

Comments

Out of interest: Your String is numeric.

Did you try setting the default analyzer to a one supporting numbers? Does that change anything? (For example: Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num )

I haven't looked into the code of Range Query yet, but some other querys run a tokenizer over the search term under some circumstances A numeric term tokenized with the default analyzer (which is text only) could result in an empty string, henceforth the empty result.

Only a guess.

{quote}Did you try setting the default analyzer to a one supporting numbers? Does that change anything? (For example: Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num ){quote}

The result is the same, even if I use the utf8num analyzer.

Confirmed: The problem seems to lay much deeper. It is indeed necessary to set a numeric tokenizer for this range query to work, HOWEVER it still leads to an empty result. Even more interesting: If using an In-memory index as of patch ZF-7736 the query works! That means the numbers get somehow lost when the index is written to disk or retrieved. Which means there seems to be quite a fundamental flaw deep down there. Hope that helps you guys pinpoint it faster...

Here is the code I used:

<

pre class="highlight">

$root="/tmp/";

// use a numeric tokenizer Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_Caseinsensitive());

unlink($root."data/indexes/test/*");

//$index = Zend_Search_Lucene::create($root."data/indexes/test"); // use a in-memory index (needs patch ZF-7736 ) $index = new Zend_Search_Lucene_TempIndex();

$doc = new Zend_Search_Lucene_Document(); $doc->addField(Zend_Search_Lucene_Field::Keyword('test', 9)); $index->addDocument($doc); $index->commit(); $index->optimize();

echo "

";
$querystring = 'test:[1 TO 10]';
$query = Zend_Search_Lucene_Search_QueryParser::parse($querystring);
print_r($query);
$hits = $index->find($query);
//$hits = $index->find('test:[1 TO 10]');
echo count($hits)." hits\n"; 
foreach($hits as $hit){
  echo "  -> $hit->score - $hit->id\n";
}

$hits = $index->find('test:9');
echo count($hits)." hits\n"; 
foreach($hits as $hit){
  echo "  -> $hit->score - $hit->id\n";
}

$hits = $index->find('test:11');
echo count($hits)." hits\n"; 
foreach($hits as $hit){
  echo "  -> $hit->score - $hit->id\n";
}


exit(0);

result:

 
Zend_Search_Lucene_Search_Query_Boolean Object
(
    [_subqueries:private] => Array
        (
            [0] => Zend_Search_Lucene_Search_Query_Range Object
                (
                    [_lowerTerm:private] => Zend_Search_Lucene_Index_Term Object
                        (
                            [field] => test
                            [text] => 1
                        )

                    [_upperTerm:private] => Zend_Search_Lucene_Index_Term Object
                        (
                            [field] => test
                            [text] => 10
                        )

                    [_field:private] => test
                    [_inclusive:private] => 1
                    [_matches:private] => 
                    [_boost:private] => 1
                    [_weight:protected] => 
                    [_currentColorIndex:private] => 0
                )

        )

    [_signs:private] => Array
        (
            [0] => 
        )

    [_resVector:private] => 
    [_coord:private] => 
    [_boost:private] => 1
    [_weight:protected] => 
    [_currentColorIndex:private] => 0
)
1 hits
  -> 0.30685281944 - 0
1 hits
  -> 0.30685281944 - 0
0 hits

Another juicy little detail: Setting the Storage Type to "Text" instead of "Keyword", the range query still doesn't work. But as expected witha non numeric tokenizer used the search result gets

Zend_Search_Lucene_Search_Query_MultiTerm Object
(
    [_terms:private] => Array
        (
            [0] => Zend_Search_Lucene_Index_Term Object
                (
                    [field] => 
                    [text] => test
                )

            [1] => Zend_Search_Lucene_Index_Term Object
                (
                    [field] => 
                    [text] => to
                )

        )

    [_signs:private] => Array
        (
            [0] => 
            [1] => 
        )

    [_resVector:private] => 
    [_termsFreqs:private] => Array
        (
        )

    [_coord:private] => 
    [_weights:private] => Array
        (
        )

    [_boost:private] => 1
    [_weight:protected] => 
    [_currentColorIndex:private] => 0
)
0 hits
0 hits
0 hits

note that with a non numeric tokenizer, the numbers went missing - but the rest of the range query gets interpreted as a Multiterm query - which is at least interesting undocumented fallback behaviour ;)

I know at first that this doesn't seem related because I'm not using a range query, however, I was having similar luck with the range query. I suspect the behavior has less to do with range and more to do with searching secondary text fields where the string perhaps either begins with an integer or contains only integers. I began by working with essentially a date string. To test my hypothesis, I programmatically prepended 'a', 'aa' and 'aaa' to the date string before submitting the Document to the indexer. This gets a little weird, but I think these series of tests with crashes and successes is telling, and indicates this isn't just a problem with range:

USING THIS TO START WITH (.....) php > include('Zend/Search/Lucene.php'); php > $indexPath = '/data/lucindex'; php > $index = Zend_Search_Lucene::open($indexPath);

CHECK THIS OUT: ..... php > $hits = $index->find( 'dateposted:20090308' ); php > echo count($hits); 0 php > $hits = $index->find( '20090308' ); php > echo count($hits); 0 php > $hits = $index->find( 'dateposted:200*' );

Fatal error: Uncaught exception 'Zend_Search_Lucene_Exception' with message 'At least 3 non-wildcard characters are required at the beginning of pattern.' in /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Wildcard.php:145 Stack trace:

0 /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Preprocessing/Term.php(190): Zend_Search_Lucene_Search_Query_Wildcard->rewrite(Object(Zend_Search_Lucene))

1 /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Boolean.php(143): Zend_Search_Lucene_Search_Query_Preprocessing_Term->rewrite(Object(Zend_Search_Lucene))

2 /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene.php(922): Zend_Search_Lucene_Search_Query_Boolean->rewrite(Object(Zend_Search_Lucene))

3 [internal function]: Zend_Search_Lucene->find('dateposted:200*')

4 /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Proxy.php(346): call_user_func_array(Array, Array)

5 p in /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Wildcard.php on line 145

...... php > $hits = $index->find('aaa20090308'); php > echo count($hits); 26 php >

FURTHERMORE: ..... php > $hits = $index->find('a200');

Fatal error: Uncaught exception 'Zend_Search_Lucene_Exception' with message 'At least 3 non-wildcard characters are required at the beginning of pattern.' in /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Wildcard.php:145 Stack trace:

0 /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Preprocessing/Term.php(190): Zend_Search_Lucene_Search_Query_Wildcard->rewrite(Object(Zend_Search_Lucene))

1 /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Preprocessing/Term.php(104): Zend_Search_Lucene_Search_Query_Preprocessing_Term->rewrite(Object(Zend_Search_Lucene))

2 /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Boolean.php(143): Zend_Search_Lucene_Search_Query_Preprocessing_Term->rewrite(Object(Zend_Search_Lucene))

3 /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene.php(922): Zend_Search_Lucene_Search_Query_Boolean->rewrite(Object(Zend_Search_Lu in /usr/local/lib/php/ZendFramework-1.10.0-minimal/library/Zend/Search/Lucene/Search/Query/Wildcard.php on line 145

AND FINALLY: ...... php > $hits = $index->find('dateposted:[aaa20090201 TO aaa20090401'); php > echo count($hits); 56

I hope this is found useful to someone who knows their way around the codebase.

I'm facing the same pb... Does some someone have resolved / fixed this one?

Fixed.

BTW The following thing has to be noted. Range query uses lexicographical order instead of numeric order. In this order "9" is greater than "10". Pad numbers to the same length to provide correct range query work: ' 1', ' 7', ' 9', ' 45', '346'