ZF-623: Query terms for Keyword type fields should not be tokenized

Issue Type: Improvement Created: 2006-12-06T09:53:52.000+0000 Last Updated: 2012-05-05T03:21:29.000+0000 Status: Closed Fix version(s): Reporter: Lukas Zapletal (lzap) Assignee: Adam Lundrigan (adamlundrigan) Tags: - Zend_Search_Lucene

  • zf-caretaker-adamlundrigan

Related issues: - ZF-2982



When you trying to find documents via their id or path or any other KEYWORD TYPE field that includes other characters than [A-Za-z0-9] you cannot get any results. Example:

document XY with path=/some/file.txt

query: path:"/some/file.txt" -> no results

document XY with path=abc

query: path:"abc" -> ok query: path:abc -> ok too


Posted by Lukas Zapletal (lzap) on 2006-12-06T10:03:10.000+0000

From the mailing list:

It seems the query is parsed and tokenized always. It should not parse and tokenize those fields that are marked as KEYWORDs. Is it possible to implement this? If not there could be method like findRaw -- finds documents but doesn`t analyze and tokenize the query.

Posted by Bill Karwin (bkarwin) on 2006-12-09T16:47:55.000+0000

Assigning to Alexander.

Posted by Alexander Veremyev (alexander) on 2006-12-13T18:36:26.000+0000

Query parser always uses default analyzer to tokenize or normalize terms and phrases.

Query parser is index independent, so it can't "know", which field should be tokenized.

Moreover, index doesn't store information, which field was tokenized and which wasn't.

Thus keywords containing non-alphanumeric characters can only be added to a query through API:

<pre class="highlight"> 
$parsedQuery = Zend_Search_Lucene_Search_QueryParser::parse($query);

$query = new Zend_Search_Lucene_Search_Query_Boolean();
$query->addSubquery($parsedQuery, true /* required */);

$keywordTerm = new Zend_Search_Lucene_Index_Term('/my/cool/path', 'path');
$keywordQuery = new Zend_Search_Lucene_Search_Query_Term($keywordTerm);

$query->addSubquery($keywordQuery, true /* required */);

It's also possible to extend query language to give possibility to signal which field is a keyword field.

Ex. "bla/bla/bla" is tokenized, but 'bla/bla/bla' isn't. It looks reasonable from PHP point of view :), but I am not sure, that it's a common practice for search engines...

Posted by Adam Lundrigan (adamlundrigan) on 2012-05-05T03:21:29.000+0000

This issue is very, very old and unlikely to be implemented in ZFv1.

Have you found an issue?

See the Overview section for more details.


© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.