Issues

ZF-2982: Keyword field are not searchable using the QueryParser if not lowercase

Issue Type: Bug Created: 2008-03-27T09:08:20.000+0000 Last Updated: 2012-05-05T03:15:59.000+0000 Status: Closed Fix version(s): Reporter: Patrick ALLAERT (pallaert) Assignee: Adam Lundrigan (adamlundrigan) Tags: - Zend_Search_Lucene

  • state:need-feedback
  • zf-caretaker-adamlundrigan
  • zf-crteam-padraic
  • zf-crteam-priority
  • zf-crteam-review

Related issues: - ZF-623

Attachments: - ZF-2982.patch

Description

I have to enter my keyword lowercase to find them with the QueryParser, otherwise I get no results.

How to reproduce:

addField(Zend\_Search\_Lucene\_Field::Keyword('firstname', 'Patrick')); $doc->addField(Zend\_Search\_Lucene\_Field::Keyword('lastname', 'Allaert')); $index->addDocument($doc); // ------------> OOPS, expecting "1" in both cases echo count($index->find(Zend\_Search\_Lucene\_Search\_QueryParser::parse('Patrick'))) . "\\n"; // print "0" echo count($index->find(Zend\_Search\_Lucene\_Search\_QueryParser::parse('patrick'))) . "\\n"; // print "0" $index = Zend\_Search\_Lucene::create('test2'); $doc = new Zend\_Search\_Lucene\_Document(); $doc->addField(Zend\_Search\_Lucene\_Field::Keyword('firstname', 'patrick')); $doc->addField(Zend\_Search\_Lucene\_Field::Keyword('lastname', 'allaert')); $index->addDocument($doc); // OK, here i correctly have "1" in both cases echo count($index->find(Zend\_Search\_Lucene\_Search\_QueryParser::parse('Patrick'))) . "\\n"; // print "1" echo count($index->find(Zend\_Search\_Lucene\_Search\_QueryParser::parse('patrick'))) . "\\n"; // print "1" ?>

Comments

Posted by Wil Sinclair (wil) on 2008-03-27T16:04:42.000+0000

Can you please assess and categorize as necessary?

Posted by Patrick ALLAERT (pallaert) on 2008-11-08T07:28:51.000+0000

Test reproducing this bug

Posted by Adam Lundrigan (adamlundrigan) on 2011-09-23T14:11:40.000+0000

I could not reproduce this bug exactly as the OP shows; I get a slightly different result:

These unit tests pass against trunk:

<pre class="highlight">
Index: tests/Zend/Search/Lucene/SearchTest.php
===================================================================
--- tests/Zend/Search/Lucene/SearchTest.php     (revision 24462)
+++ tests/Zend/Search/Lucene/SearchTest.php     (working copy)
@@ -486,4 +486,36 @@

         Zend_Search_Lucene::setResultSetLimit($storedResultSetLimit);
     }
+
+    /**
+     * @group ZF-2982
+     */
+    public function testKeywordFieldSearchableUsingQueryParserWhenTermNotLowercase()
+    {
+        $index = Zend_Search_Lucene::create('test');
+
+        $doc = new Zend_Search_Lucene_Document();
+        $doc->addField(Zend_Search_Lucene_Field::Keyword('firstname', 'Patrick'));
+
+        $index->addDocument($doc);
+
+        $this->assertTrue(count($index->find(Zend_Search_Lucene_Search_QueryParser::parse('Patrick'))) == 1);
+        $this->assertTrue(count($index->find(Zend_Search_Lucene_Search_QueryParser::parse('patrick'))) == 0);
+    }
+
+    /**
+     * @group ZF-2982
+     */
+    public function testKeywordFieldSearchableUsingQueryParserWhenTermIsLowercase()
+    {
+        $index = Zend_Search_Lucene::create('test');
+
+        $doc = new Zend_Search_Lucene_Document();
+        $doc->addField(Zend_Search_Lucene_Field::Keyword('firstname', 'patrick'));
+
+        $index->addDocument($doc);
+
+        $this->assertTrue(count($index->find(Zend_Search_Lucene_Search_QueryParser::parse('Patrick'))) == 1);
+        $this->assertTrue(count($index->find(Zend_Search_Lucene_Search_QueryParser::parse('patrick'))) == 1);
+    }
 }

So, when the keyword contains a capital letter ('Patrick') the search is exact ('Patrick' matches, 'patrick' doesn't match). However, when the keyword is all lowercase ('patrick') the search is not exact (both 'Patrick' and 'patrick' match)

ie: I added this unit test to prove the capital-letter point, and it passes:

<pre class="highlight">
/**
 * @group ZF-2982
 */
public function testKeywordFieldSearchableUsingQueryParserWhenTermNotLowercase2()
{
    $index = Zend_Search_Lucene::create('test');

    $doc = new Zend_Search_Lucene_Document();
    $doc->addField(Zend_Search_Lucene_Field::Keyword('firstname', 'paTrIcK'));        
    $index->addDocument($doc);

    $this->assertTrue(count($index->find(Zend_Search_Lucene_Search_QueryParser::parse('Patrick'))) == 0);
    $this->assertTrue(count($index->find(Zend_Search_Lucene_Search_QueryParser::parse('patrick'))) == 0);
}

'paTrIcK' matches neither 'Patrick' nor 'patrick'.

I believe the issue stems from the way case-insensitive searches are performed: {quote} The standard analyzer may transform the query string to lower case for case-insensitivity, remove stop-words, and stem among other transformations.

The API method doesn't transform or filter input terms in any way. It's therefore more suitable for computer generated or untokenized fields. {quote} (source: http://framework.zend.com/manual/en/…)

The simplest workaround is to make sure that all your keyword field values are lowercased (strtolower).

The current setup appears to be that if the index keyword value is all lowercase, the match is treated as case-insensitive, and if it contains any upper-case characters it's treated as case-sensitive. Seems counter-intuitive, but is this a bug? If so, is it something that should be fixed in ZFv1 so late in it's lifecycle? Or should we push the issue up to ZF2 for resolution? It's been around since at least 1.5, so chances are anyone encountering it have workarounds for it already in place.

Thoughts?

Posted by Patrick ALLAERT (pallaert) on 2011-09-23T14:36:52.000+0000

I indeed have a work around named "Solr". Zend Search Lucene is definetely not an option for anyone serious about using search engine technology. A workaround is difficult as you may not know whether your user, when entering a term, is about to search case sensitively or not. So yes, I consider this is a bug. But honestly, I don't care anymore about Zend_Search :)

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts