ZF-11009: Highlighting does not work as expected (accented characters)

Issue Type: Bug Created: 2011-01-30T04:11:22.000+0000 Last Updated: 2011-01-30T04:11:22.000+0000 Status: Open Fix version(s): Reporter: Jacek Kobus ( Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene

Related issues: Attachments:


The query get's tokenized twice ( ? ). First time in class Zend_Search_Lucene_Search_Query_Preprocessing_Term on line 299:

<pre class="highlight">
$tokens = Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($this->_word, $this->_encoding);

Now everything's okay - $this->_encoding = utf-8. But what happens next is that we have another ->tokenize() method call in Zend_Search_Lucene_Document_Html on line 427 (inside foreach):

<pre class="highlight">
$wordsToHighlightList[] = $analyzer->tokenize($wordString);

but the analyzer is missing encoding (line 424):

<pre class="highlight">
$analyzer = Zend_Search_Lucene_Analysis_Analyzer::getDefault();

... and unlike during the first tokenize call I get corrupted string, and in effect - no highlight. In the analyzer the encoding appears as an empty string. When I change the code from line 427 to:

<pre class="highlight">
$wordsToHighlightList[] = $analyzer->tokenize($wordString, 'utf-8');

.. everything works fine.


<pre class="highlight">
$query = "jaźń";

try {
    $index = Zend_Search_Lucene::open($this->getOption('indexDirectory'));
}catch(Zend_Search_Lucene_Exception $e){
    $index = Zend_Search_Lucene::create($this->getOption('indexDirectory'));

    new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive()

$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::text('title', 'żółć gęślą jaźń ąćź','utf-8'));

$query = Zend_Search_Lucene_Search_QueryParser::parse($query);

try {
    $results = $index->find($query);
    foreach ($results as $result){
        echo $query->highlightMatches($result->title, 'utf-8') . '<br></br>';
}catch(Exception $e){}


No comments to display

Have you found an issue?

See the Overview section for more details.


© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.