Issues

ZF-623: Query terms for Keyword type fields should not be tokenized

Description

When you trying to find documents via their id or path or any other KEYWORD TYPE field that includes other characters than [A-Za-z0-9] you cannot get any results. Example:

document XY with path=/some/file.txt

query: path:"/some/file.txt" -> no results

document XY with path=abc

query: path:"abc" -> ok query: path:abc -> ok too

Comments

From the mailing list:

It seems the query is parsed and tokenized always. It should not parse and tokenize those fields that are marked as KEYWORDs. Is it possible to implement this? If not there could be method like findRaw -- finds documents but doesn`t analyze and tokenize the query.

Assigning to Alexander.

Query parser always uses default analyzer to tokenize or normalize terms and phrases.

Query parser is index independent, so it can't "know", which field should be tokenized.

Moreover, index doesn't store information, which field was tokenized and which wasn't.

Thus keywords containing non-alphanumeric characters can only be added to a query through API:

 
$parsedQuery = Zend_Search_Lucene_Search_QueryParser::parse($query);

$query = new Zend_Search_Lucene_Search_Query_Boolean();
$query->addSubquery($parsedQuery, true /* required */);

$keywordTerm = new Zend_Search_Lucene_Index_Term('/my/cool/path', 'path');
$keywordQuery = new Zend_Search_Lucene_Search_Query_Term($keywordTerm);

$query->addSubquery($keywordQuery, true /* required */);

It's also possible to extend query language to give possibility to signal which field is a keyword field.

Ex. "bla/bla/bla" is tokenized, but 'bla/bla/bla' isn't. It looks reasonable from PHP point of view :), but I am not sure, that it's a common practice for search engines...

This issue is very, very old and unlikely to be implemented in ZFv1.