ZF-85: Query Parser not handling fieldbname with underscores

Issue Type: Bug Created: 2006-06-21T06:49:13.000+0000 Last Updated: 2007-07-05T14:43:09.000+0000 Status: Resolved Fix version(s): - 0.7.0 (18/Jan/07)

Reporter: Alex Tearse (reefnet_alex) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene

Related issues: Attachments:


parsing a query such as:

$query = Zend_Search_Lucene_Search_QueryParser::parse('title:bob');

correctly creates a private term with field set to title and value to bob like so: [_term:private] => Zend_Search_Lucene_Index_Term Object ( [field] => title [text] => bob )

However if the field contains an underscore it tokenizes from the underscore for example:

$query = Zend_Search_Lucene_Search_QueryParser::parse('title_en:bob');

gives: [_terms:private] => Array ( [0] => Zend_Search_Lucene_Index_Term Object ( [field] => contents [text] => title )

        [1] => Zend_Search_Lucene_Index_Term Object
                [field] => en
                [text] => bob


This may be expected behaviour. Maybe underscores should be banned from index fields. I was using them because we've a bilingual collection so my fields are: title_en title_gd contents_en contents_gd etc. ( which has been fun trying to work around inability to set default field!)

But it would strike me that the use of ctype_alnum to decide on token types in QueryTokenizer may be a tad stricter than necessary.

Esoteric one this I'm sure though. Probably the only person in the world who's used underscores in their fields :)


Posted by Jayson Minard (jayson) on 2006-07-09T01:00:19.000+0000

Anything happening on this issue? If so, set a fix version with the expected time frame it will come in, otherwise assign to Alex. Thanks. I'm setting fo r 0.3.0 in the meantime.

Posted by Lyubomir Petrov (lpetrov) on 2006-09-12T19:11:30.000+0000

I found the problem, tomorrow i will submit here the fixed verion of the tokenizer.

Posted by Lyubomir Petrov (lpetrov) on 2006-09-13T13:29:02.000+0000

Here is the diff:

Index: C:/Apps/www/lib/3rdparty/zend_framework/library/Zend/Search/Lucene/Search/QueryTokenizer.php

--- C:/Apps/www/lib/3rdparty/zend_framework/library/Zend/Search/Lucene/Search/QueryTokenizer.php (revision 5742) +++ C:/Apps/www/lib/3rdparty/zend_framework/library/Zend/Search/Lucene/Search/QueryTokenizer.php (revision 5743) @@ -64,7 +64,7 @@

     $currentToken = '';
     for ($count = 0; $count < strlen($inputString); $count++) {
  • if (ctype_alnum( $inputString{$count} )) { + if (ctype_alnum( $inputString{$count} ) || $inputString{$count} == "_") { $currentToken .= $inputString{$count}; } else { // Previous token is finished

Posted by Alexander Veremyev (alexander) on 2006-09-13T17:32:55.000+0000

The issue is already fixed. Please take current SVN version (…) Sorry that I missed your first comment, so you made work which is already done.

And welcome to development team! :)

PS Have you already signed CLA? (

Have you found an issue?

See the Overview section for more details.


© 2006-2018 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.