Issues

ZF-10150: Zend_Date broken when using mbstring with overloading

Description

When outputting Zend_Date::DATE_LONG in locales with UTF-8 in the format string (such as 'zh') the output is returning half the string with only half the last character.

This is only when mbstring.func_overload is enabled for strlen and the internal encoding is set to UTF-8.


mbstring.internal_encoding = UTF-8
mbstring.func_overload = 7

Test case:


$date = new Zend_Date('2008-12-10');
echo $date->get(Zend_Date::DATE_LONG, 'zh');

Output: 2008年10�

Expected: 2008年10月12日

This issue appears to still be happening in SVN trunk.

Faulty function: Zend_Date::_toToken() _toToken loops over the format string using strlen() to determine the length of the string, and uses $part[$i] to access the individual bytes. The issue comes about when strlen is overloaded by turning on mbstring.func_overload. This causes strlen to return the number of UTF-8 characters in the string, not the number of bytes.

Rejected solution: use mb_strlen($part, '8bit') in place of strlen($part) to make sure it returns the number of bytes in the string regardless of the current internal encoding. NOTE: This solution was rejected because use of mb_ functions is not acceptable in the Zend_Date.

New proposed solution: use isset($part[$i]) in place of strlen($part) in the for loop. This is the same check that is used further down that same function to make sure it does not read off the end of the string when reading of the current position.

Comments

Attached proposed patch

Added relevant php.ini configuration settings.

Patch not accepted. Usage of mb* is not allowed within these components.

Attached alternative solution patch.

Alternative solution: Use isset($part[$i]) instead to detect the end of the string.

isset() is also used further down in the function to detect the end of the string.

Verification: I just added a unittest.

The problem does not occur when mbstring is used without overloading and it does also not occur when mbstring isn't available at all.

In all cases this method works as expected.

Only when mbstring is used with overloading then things do not work anymore because utf8 is no longer handled as utf8.

Changed issue header

Updated description with new proposed solution.

Fixed with r22713 Thank you for your time and report