ZF-2857: Zend_Search_Lucene_Query::highlightMatches doesn't take the Analyzers encoding into account

Issue Type: Bug Created: 2008-03-11T04:57:01.000+0000 Last Updated: 2012-08-31T09:13:44.000+0000 Status: Open Fix version(s): Reporter: Stefan Oestreicher (dlx) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene

Related issues: - ZF-3626



Zend_Search_Lucene_Query::highlightMatches doesn't ensure that the highlighted text has the same encoding as the the analyzer uses to extract the tokens. This results in wrong token offsets and ultimately breaks the highlighting.

To reproduce just pass a multibyte string to the method while using the default analyzer. One can easily work around this issue by converting the text manually to ASCII//TRANSLIT before invoking highlightMatches.

Currently the code in Zend_Search_Lucene_Document_Html::_highlightTextNode looks like this:

<pre class="highlight">
$analyzer = Zend_Search_Lucene_Analysis_Analyzer::getDefault();
$analyzer->setInput($node->nodeValue, $this->_doc->encoding); //converts from _doc->encoding to ASCII//TRANSLIT
foreach ($matchedTokens as $token) {
    // Cut text after matched token
    $node->splitText($token->getEndOffset()); //uses wrong character offset
    // ...

I suggest to provide a method in the analyzer to convert any text to its internal encoding and invoke this function before creating the Zend_Search_Lucene_Document_Html instance like this (in Zend_Search_Lucene_Search_Query::highlightMatches):

<pre class="highlight">
$input = Zend_Search_Lucene_Analysis_Analyzer::getDefault()->encode($inputHTML);
$doc = Zend_Search_Lucene_Document_Html::loadHTML($input);


Posted by Wil Sinclair (wil) on 2008-03-25T20:31:42.000+0000

Please categorize/fix as needed.

Posted by WIlliam Bailey (wb-hornbill) on 2008-05-13T00:51:30.000+0000



Posted by WIlliam Bailey (wb-hornbill) on 2008-05-13T00:59:50.000+0000

For example - POP3 - will get email via telnet. With Exchange and Lotus Notes we use client software to connect/retrieve email.



Have you found an issue?

See the Overview section for more details.


© 2006-2018 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.