Issue Type: Improvement Created: 2008-07-24T12:44:01.000+0000 Last Updated: 2012-11-20T20:52:37.000+0000 Status: Closed Fix version(s): Reporter: Hugues Lismonde (hlidotbe) Assignee: None Tags: - Zend_Search_Lucene
Related issues: Attachments: - ShortWordsUtf8.php
When using the ShortWords token filter with the UTF-8 Analyser, it fails to skip tokens containing UTF-8 characters.
For example, with a length of 2, the token "à" (common in french) is not skipped because strlen returns 2.
The solution would be to make a ShortWordsUtf8 that uses iconv_strlen instead of strlen.
Posted by Hugues Lismonde (hlidotbe) on 2008-07-24T12:46:07.000+0000
Working ShortWordsUtf8 using iconv_strlen instead of strlen (based on ShortWord.php from release-1.5.2)
Posted by Rob Allen (rob) on 2012-11-20T20:52:37.000+0000
Bulk change of all issues last updated before 1st January 2010 as "Won't Fix".
Feel free to re-open and provide a patch if you want to fix this issue.
Have you found an issue?
See the Overview section for more details.