Zend Framework

numerical values (e.g. phone number) are not searcheable

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.0.2
  • Fix Version/s: 1.8.0
  • Component/s: Zend_Search_Lucene
  • Labels:
    None
  • Fix Version Priority:
    Nice to Have

Description

Numerical values e.g. phone number are not searcheable. The field types "text", "keyword" and "unstored" are tested. The TestCase for PHPUnit exploits the problem:

<?php
set_include_path('.' . PATH_SEPARATOR . '/opt/lampp/lib/php/' . PATH_SEPARATOR . '../application/library/');

require_once 'PHPUnit/Extensions/PerformanceTestCase.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Search/Query/Boolean.php';

/**
 * This TestCase tests the search beavior in fields of type "keyword",
 * "text" and "unstored".
 */
class BugExploitTest extends PHPUnit_Extensions_PerformanceTestCase{
	private $index = null;
	
	// private $numericalValue = 'Zziqwez'; // found in fields of any type
	// private $numericalValue = 'Hallowe'; // not found in field of type "text". why? 
	private $numericalValue = '12345678'; // not found
	
	/**
	* Creates an index and adds a document to it.
	*/
	protected function setUp() {
		try{
			$this->index = Zend_Search_Lucene::open("/tmp/index");
		}catch(Exception $e){
			$this->index = Zend_Search_Lucene::create("/tmp/index");
		}
		
		$doc = new Zend_Search_Lucene_Document();		
		$doc->addField(Zend_Search_Lucene_Field::Keyword('keyword', $this->numericalValue));
		$doc->addField(Zend_Search_Lucene_Field::Text('text', $this->numericalValue));
		$doc->addField(Zend_Search_Lucene_Field::UnStored('unstored', $this->numericalValue));
		$this->index->addDocument($doc);
	}

	/**
	* Shuting down the index
	*/ 
	protected function tearDown() {
		$this->index->commit();
		unset($this->index);
	}
	
	/**
	* Searching in the field of type "keyword".
	* Our index should have one document at least. 
	*/
	public function testSearchKeyword(){
		$this->searchInField('keyword');
	}
	
	/**
	* Searching in the field of type "text".
	* Our index should have two documents at least. 
	* (tearDown non't deleletes any dokument) 
	*/
	public function testSearchText(){
		$this->searchInField('text');
	}
	
	/**
	* Searching in the field of type "unstored".
	* Our index should have two documents at least. 
	* (tearDown non't deleletes any dokument) 
	*/
	public function testSearchUnStored(){
		$this->searchInField('unstored');
	}
	
	private function searchInField($fieldName){
		$userQuery = Zend_Search_Lucene_Search_QueryParser::parse($this->numericalValue);
		Zend_Search_Lucene::setDefaultSearchField($fieldName);
		$hits = $this->index->find($userQuery);
		// after adding a document we expect one search result at least
		$this->assertNotEquals(0, count($hits));
	}
}
?>

Activity

Hide
Thomas Weidner added a comment -

Assigned to Alexander

Show
Thomas Weidner added a comment - Assigned to Alexander
Hide
Marc Boeren added a comment -

The default is to look for text (a-zA-Z) only, you can change this to include numbers by using:


Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive()
);

{/code}

If you need more flexibility in searching, try creating your own analzyer based on e.g. Zend_Search_Lucene_Analysis_Analyzer_Common_Text

That said, the default setting to search for text only is perhaps confusing for first-time users.

I had a slightly different problem. I was trying to search for words_with_underscores in a Keyword field. The keyword field is indexed but not tokenized, so I expected to get an exact match when is did a search for words_with_underscores. Instead, a search for 'words with underscores' was performed yielding no matches, as the keyword field wasn't tokenized. My solution was to create a ...TextCode.... analyzer.

Hope this helps!

Ciao, Marc.

Show
Marc Boeren added a comment - The default is to look for text (a-zA-Z) only, you can change this to include numbers by using:

Zend_Search_Lucene_Analysis_Analyzer::setDefault( new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive() ); {/code} If you need more flexibility in searching, try creating your own analzyer based on e.g. Zend_Search_Lucene_Analysis_Analyzer_Common_Text That said, the default setting to search for text only is perhaps confusing for first-time users. I had a slightly different problem. I was trying to search for words_with_underscores in a Keyword field. The keyword field is indexed but not tokenized, so I expected to get an exact match when is did a search for words_with_underscores. Instead, a search for 'words with underscores' was performed yielding no matches, as the keyword field wasn't tokenized. My solution was to create a ...TextCode.... analyzer. Hope this helps! Ciao, Marc.
Hide
Wladimir Schwitin added a comment -

Thanks, Marc, for your helpfull comment. I inserted your code in the setUp block and the test successes!

Show
Wladimir Schwitin added a comment - Thanks, Marc, for your helpfull comment. I inserted your code in the setUp block and the test successes!
Hide
Alexander Veremyev added a comment -

Default text analyzer skips numbers.

So you can either set another analyzer:

Zend_Search_Lucene_Analysis_Analyzer::setDefault(
   new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

or use Keyword field type.

In the second case you have to use search API to search through keyword fields:

$subquery1 = Zend_Search_Lucene_Search_QueryParser::parse($queryString);

$term  = new Zend_Search_Lucene_Index_Term('12345678', 'keyword');
$subquery2 = new Zend_Search_Lucene_Search_Query_Term($term);

$finalQuery = new Zend_Search_Lucene_Search_Query_Boolean();
$finalQuery->addSubquery(subquery1);
$finalQuery->addSubquery(subquery2);

$hits  = $index->find($finalQuery);
Show
Alexander Veremyev added a comment - Default text analyzer skips numbers. So you can either set another analyzer:
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
   new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());
or use Keyword field type. In the second case you have to use search API to search through keyword fields:
$subquery1 = Zend_Search_Lucene_Search_QueryParser::parse($queryString);

$term  = new Zend_Search_Lucene_Index_Term('12345678', 'keyword');
$subquery2 = new Zend_Search_Lucene_Search_Query_Term($term);

$finalQuery = new Zend_Search_Lucene_Search_Query_Boolean();
$finalQuery->addSubquery(subquery1);
$finalQuery->addSubquery(subquery2);

$hits  = $index->find($finalQuery);

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: