Issues

ZF-10150: Zend_Date broken when using mbstring with overloading

Issue Type: Bug Created: 2010-07-13T21:38:26.000+0000 Last Updated: 2010-07-29T05:37:24.000+0000 Status: Resolved Fix version(s): - 1.11.0 (02/Nov/10)

Reporter: Michael Cooper (mic159) Assignee: Thomas Weidner (thomas) Tags: - Zend_Date

Related issues: Attachments: - Zend.patch

Description

When outputting Zend_Date::DATE_LONG in locales with UTF-8 in the format string (such as 'zh') the output is returning half the string with only half the last character.

This is only when mbstring.func_overload is enabled for strlen and the internal encoding is set to UTF-8.

<pre class="highlight">
mbstring.internal_encoding = UTF-8
mbstring.func_overload = 7

Test case:

<pre class="highlight">
$date = new Zend_Date('2008-12-10');
echo $date->get(Zend_Date::DATE_LONG, 'zh');

Output: 2008年10�

Expected: 2008年10月12日

This issue appears to still be happening in SVN trunk.

Faulty function: Zend_Date::_toToken() _toToken loops over the format string using strlen() to determine the length of the string, and uses $part[$i] to access the individual bytes. The issue comes about when strlen is overloaded by turning on mbstring.func_overload. This causes strlen to return the number of UTF-8 characters in the string, not the number of bytes.

Rejected solution: use mb_strlen($part, '8bit') in place of strlen($part) to make sure it returns the number of bytes in the string regardless of the current internal encoding. NOTE: This solution was rejected because use of mb_ functions is not acceptable in the Zend_Date.

New proposed solution: use isset($part[$i]) in place of strlen($part) in the for loop. This is the same check that is used further down that same function to make sure it does not read off the end of the string when reading of the current position.

Comments

Posted by Michael Cooper (mic159) on 2010-07-13T21:54:14.000+0000

Attached proposed patch

Posted by Michael Cooper (mic159) on 2010-07-13T21:56:47.000+0000

Added relevant php.ini configuration settings.

Posted by Thomas Weidner (thomas) on 2010-07-13T23:30:50.000+0000

Patch not accepted. Usage of mb* is not allowed within these components.

Posted by Michael Cooper (mic159) on 2010-07-14T00:35:43.000+0000

Attached alternative solution patch.

Alternative solution: Use isset($part[$i]) instead to detect the end of the string.

isset() is also used further down in the function to detect the end of the string.

Posted by Thomas Weidner (thomas) on 2010-07-29T01:41:36.000+0000

Verification: I just added a unittest.

The problem does not occur when mbstring is used without overloading and it does also not occur when mbstring isn't available at all.

In all cases this method works as expected.

Only when mbstring is used with overloading then things do not work anymore because utf8 is no longer handled as utf8.

Posted by Thomas Weidner (thomas) on 2010-07-29T01:42:32.000+0000

Changed issue header

Posted by Michael Cooper (mic159) on 2010-07-29T02:30:22.000+0000

Updated description with new proposed solution.

Posted by Thomas Weidner (thomas) on 2010-07-29T05:35:34.000+0000

Fixed with r22713 Thank you for your time and report

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts