ZF-2055: numerical values (e.g. phone number) are not searcheable

Description

Numerical values e.g. phone number are not searcheable. The field types "text", "keyword" and "unstored" are tested. The TestCase for PHPUnit exploits the problem:



<?php
set_include_path('.' . PATH_SEPARATOR . '/opt/lampp/lib/php/' . PATH_SEPARATOR . '../application/library/');

require_once 'PHPUnit/Extensions/PerformanceTestCase.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Search/Query/Boolean.php';

/**
 * This TestCase tests the search beavior in fields of type "keyword",
 * "text" and "unstored".
 */
class BugExploitTest extends PHPUnit_Extensions_PerformanceTestCase{
    private $index = null;
    
    // private $numericalValue = 'Zziqwez'; // found in fields of any type
    // private $numericalValue = 'Hallowe'; // not found in field of type "text". why? 
    private $numericalValue = '12345678'; // not found
    
    /**
    * Creates an index and adds a document to it.
    */
    protected function setUp() {
        try{
            $this->index = Zend_Search_Lucene::open("/tmp/index");
        }catch(Exception $e){
            $this->index = Zend_Search_Lucene::create("/tmp/index");
        }
        
        $doc = new Zend_Search_Lucene_Document();       
        $doc->addField(Zend_Search_Lucene_Field::Keyword('keyword', $this->numericalValue));
        $doc->addField(Zend_Search_Lucene_Field::Text('text', $this->numericalValue));
        $doc->addField(Zend_Search_Lucene_Field::UnStored('unstored', $this->numericalValue));
        $this->index->addDocument($doc);
    }

    /**
    * Shuting down the index
    */ 
    protected function tearDown() {
        $this->index->commit();
        unset($this->index);
    }
    
    /**
    * Searching in the field of type "keyword".
    * Our index should have one document at least. 
    */
    public function testSearchKeyword(){
        $this->searchInField('keyword');
    }
    
    /**
    * Searching in the field of type "text".
    * Our index should have two documents at least. 
    * (tearDown non't deleletes any dokument) 
    */
    public function testSearchText(){
        $this->searchInField('text');
    }
    
    /**
    * Searching in the field of type "unstored".
    * Our index should have two documents at least. 
    * (tearDown non't deleletes any dokument) 
    */
    public function testSearchUnStored(){
        $this->searchInField('unstored');
    }
    
    private function searchInField($fieldName){
        $userQuery = Zend_Search_Lucene_Search_QueryParser::parse($this->numericalValue);
        Zend_Search_Lucene::setDefaultSearchField($fieldName);
        $hits = $this->index->find($userQuery);
        // after adding a document we expect one search result at least
        $this->assertNotEquals(0, count($hits));
    }
}
?>


Comments

Assigned to Alexander

The default is to look for text (a-zA-Z) only, you can change this to include numbers by using:


Zend_Search_Lucene_Analysis_Analyzer::setDefault(
  new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive()
  );

If you need more flexibility in searching, try creating your own analzyer based on e.g. Zend_Search_Lucene_Analysis_Analyzer_Common_Text

That said, the default setting to search for text only is perhaps confusing for first-time users.

I had a slightly different problem. I was trying to search for words_with_underscores in a Keyword field. The keyword field is indexed but not tokenized, so I expected to get an exact match when is did a search for words_with_underscores. Instead, a search for 'words with underscores' was performed yielding no matches, as the keyword field wasn't tokenized. My solution was to create a ...TextCode.... analyzer.

Hope this helps!

Ciao, Marc.

Thanks, Marc, for your helpfull comment. I inserted your code in the setUp block and the test successes!

Default text analyzer skips numbers.

So you can either set another analyzer:


Zend_Search_Lucene_Analysis_Analyzer::setDefault(
   new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

or use Keyword field type.

In the second case you have to use search API to search through keyword fields:


$subquery1 = Zend_Search_Lucene_Search_QueryParser::parse($queryString);

$term  = new Zend_Search_Lucene_Index_Term('12345678', 'keyword');
$subquery2 = new Zend_Search_Lucene_Search_Query_Term($term);

$finalQuery = new Zend_Search_Lucene_Search_Query_Boolean();
$finalQuery->addSubquery(subquery1);
$finalQuery->addSubquery(subquery2);

$hits  = $index->find($finalQuery);