ZF-6041: Query Highlighting has problems with non-ASCII characters
Description
Testcase:
<?php
error_reporting(E_ALL | E_NOTICE);
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Search/QueryParser.php';
/**
* The following gives a notice locally:
* Notice: iconv() [function.iconv]: Detected an illegal character in input
* string in library/Zend/Search/Lucene/Field.php on line 221
*/
$query = Zend_Search_Lucene_Search_QueryParser::parse('*test*', 'utf-8');
/**
* This should output "Übergrößes Bild - Test", but it doesn't
*/
echo "\n\n"
. html_entity_decode(strip_tags($query->highlightMatches('Übergroßes Bild - Test')), ENT_COMPAT, 'UTF-8')
. "\n\n";
Comments
Posted by Alexander Veremyev (alexander) on 2009-04-30T02:57:46.000+0000
Issue actually duplicates [ZF-3629]. The problem is in the DOMText::splitText() method. It needs binary offset instead of UTF-8 characters offset.