Issues

ZF-6041: Query Highlighting has problems with non-ASCII characters

Description

Testcase:


<?php
error_reporting(E_ALL | E_NOTICE);

require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Search/QueryParser.php';

/**
 * The following gives a notice locally:
 * Notice:  iconv() [function.iconv]: Detected an illegal character in input
 * string in library/Zend/Search/Lucene/Field.php on line 221
 */
$query = Zend_Search_Lucene_Search_QueryParser::parse('*test*', 'utf-8');

/**
 * This should output "Übergrößes Bild - Test", but it doesn't
 */
echo "\n\n"
     . html_entity_decode(strip_tags($query->highlightMatches('Übergroßes Bild - Test')), ENT_COMPAT, 'UTF-8')
     . "\n\n";

Comments

Issue actually duplicates [ZF-3629]. The problem is in the DOMText::splitText() method. It needs binary offset instead of UTF-8 characters offset.