Details
Description
When outputting Zend_Date::DATE_LONG in locales with UTF-8 in the format string (such as 'zh') the output is returning half the string with only half the last character.
This is only when mbstring.func_overload is enabled for strlen and the internal encoding is set to UTF-8.
mbstring.internal_encoding = UTF-8 mbstring.func_overload = 7
Test case:
$date = new Zend_Date('2008-12-10');
echo $date->get(Zend_Date::DATE_LONG, 'zh');
Output:
2008年10�
Expected:
2008年10月12日
This issue appears to still be happening in SVN trunk.
Faulty function:
Zend_Date::_toToken()
_toToken loops over the format string using strlen() to determine the length of the string, and uses $part[$i] to access the individual bytes.
The issue comes about when strlen is overloaded by turning on mbstring.func_overload. This causes strlen to return the number of UTF-8 characters in the string, not the number of bytes.
Rejected solution:
use mb_strlen($part, '8bit') in place of strlen($part) to make sure it returns the number of bytes in the string regardless of the current internal encoding.
NOTE: This solution was rejected because use of mb_ functions is not acceptable in the Zend_Date.
New proposed solution:
use isset($part[$i]) in place of strlen($part) in the for loop.
This is the same check that is used further down that same function to make sure it does not read off the end of the string when reading of the current position.
Attached proposed patch