ZF-3743: ShortWords token filter not working with utf-8 charset

Issue Type: Improvement Created: 2008-07-24T12:44:01.000+0000 Last Updated: 2012-11-20T20:52:37.000+0000 Status: Closed Fix version(s): Reporter: Hugues Lismonde (hlidotbe) Assignee: None Tags: - Zend_Search_Lucene

Related issues: Attachments: - ShortWordsUtf8.php


When using the ShortWords token filter with the UTF-8 Analyser, it fails to skip tokens containing UTF-8 characters.

For example, with a length of 2, the token "à" (common in french) is not skipped because strlen returns 2.

The solution would be to make a ShortWordsUtf8 that uses iconv_strlen instead of strlen.


Posted by Hugues Lismonde (hlidotbe) on 2008-07-24T12:46:07.000+0000

Working ShortWordsUtf8 using iconv_strlen instead of strlen (based on ShortWord.php from release-1.5.2)

Posted by Rob Allen (rob) on 2012-11-20T20:52:37.000+0000

Bulk change of all issues last updated before 1st January 2010 as "Won't Fix".

Feel free to re-open and provide a patch if you want to fix this issue.

Have you found an issue?

See the Overview section for more details.


© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.