ZF-10150: Zend_Date broken when using mbstring with overloading
Description
When outputting Zend_Date::DATE_LONG in locales with UTF-8 in the format string (such as 'zh') the output is returning half the string with only half the last character.
This is only when mbstring.func_overload is enabled for strlen and the internal encoding is set to UTF-8.
mbstring.internal_encoding = UTF-8
mbstring.func_overload = 7
Test case:
$date = new Zend_Date('2008-12-10');
echo $date->get(Zend_Date::DATE_LONG, 'zh');
Output: 2008年10�
Expected: 2008年10月12日
This issue appears to still be happening in SVN trunk.
Faulty function: Zend_Date::_toToken() _toToken loops over the format string using strlen() to determine the length of the string, and uses $part[$i] to access the individual bytes. The issue comes about when strlen is overloaded by turning on mbstring.func_overload. This causes strlen to return the number of UTF-8 characters in the string, not the number of bytes.
Rejected solution: use mb_strlen($part, '8bit') in place of strlen($part) to make sure it returns the number of bytes in the string regardless of the current internal encoding. NOTE: This solution was rejected because use of mb_ functions is not acceptable in the Zend_Date.
New proposed solution: use isset($part[$i]) in place of strlen($part) in the for loop. This is the same check that is used further down that same function to make sure it does not read off the end of the string when reading of the current position.
Comments
Posted by Michael Cooper (mic159) on 2010-07-13T21:54:14.000+0000
Attached proposed patch
Posted by Michael Cooper (mic159) on 2010-07-13T21:56:47.000+0000
Added relevant php.ini configuration settings.
Posted by Thomas Weidner (thomas) on 2010-07-13T23:30:50.000+0000
Patch not accepted. Usage of mb* is not allowed within these components.
Posted by Michael Cooper (mic159) on 2010-07-14T00:35:43.000+0000
Attached alternative solution patch.
Alternative solution: Use isset($part[$i]) instead to detect the end of the string.
isset() is also used further down in the function to detect the end of the string.
Posted by Thomas Weidner (thomas) on 2010-07-29T01:41:36.000+0000
Verification: I just added a unittest.
The problem does not occur when mbstring is used without overloading and it does also not occur when mbstring isn't available at all.
In all cases this method works as expected.
Only when mbstring is used with overloading then things do not work anymore because utf8 is no longer handled as utf8.
Posted by Thomas Weidner (thomas) on 2010-07-29T01:42:32.000+0000
Changed issue header
Posted by Michael Cooper (mic159) on 2010-07-29T02:30:22.000+0000
Updated description with new proposed solution.
Posted by Thomas Weidner (thomas) on 2010-07-29T05:35:34.000+0000
Fixed with r22713 Thank you for your time and report