ZF-9242: Zend_Locale_Format seriously broken with non-english locale regex format (french)

Issue Type: Bug Created: 2010-02-23T08:53:43.000+0000 Last Updated: 2012-09-24T09:05:36.000+0000 Status: Resolved Fix version(s): - 1.10.3 (01/Apr/10)

Reporter: Nicolas Grevet (nyko18) Assignee: Thomas Weidner (thomas) Tags: - Zend_Locale

Related issues: Attachments:


Is Zend_Validate_Int seriously broken with non-english locales or is it just me? Because we're facing two different problems.

  • In order to validate an integer using the french format (with spaces as thousands separator, ie: 8 000), the framework expects us to input digits separated with non-breakable spaces (ie: 8(\xA0)000). Is this a sad joke? Because I don't even know how to input a non-breakable space using my keyboard. Regular spaces are not allowed. Sure you can still avoid thousands separators, but in this case, a second problem appears..

(this issue is discussed here, I can understand it's not really an issue, but still:

  • So you can't validate a french space-separated number, let's use a number with no separators at all. Unfortunately, that does not seem to be a solution either, since you didn't specified the 'u' modifier in the preg_match() that validates the number format in Zend_Locale_Format (line 503). And it seems that the non-breakable space is a special UTF-8 character that you can't use in UTF-8 regex without the 'u' modifier. At least, adding the 'u' modifier makes it work like a charm.

So basically, you can't use integers in french.


Posted by Thomas Weidner (thomas) on 2010-02-25T14:13:26.000+0000

Issue not reproducable.

Running the following testcode I get "true":

<pre class="highlight">
$date = new Zend_Validate_Int('fr');
var_dump($date->isValid('1 234'));

Regarding space as separator: This will not be implemented as it's a false behaviour. As you can see in the above example code using a non-breaking space as separator is accepted and returns true. Note that this works even without using /u as non-breaking space is no special utf8 character.

When you want to allow your users to enter non-integers (with whitespaces) and have them accepted as integer then you need to add a filter which converts nb-space to space.

As example, your proposed change would allow the user to enter "123 456 789"... in this case you would never know if he entered 3 independend numbers or if he entered 3 seperated numers.

Posted by Nicolas Grevet (nyko18) on 2010-02-26T01:11:54.000+0000

Tried your code, I created a blank php file and put this inside:

<pre class="highlight">
require_once 'Zend/Validate/Int.php';

$date = new Zend_Validate_Int('fr');
var_dump($date->isValid('1 234'));

All of them gives me "false". Are you using UTF-8 in your project ?

Posted by Ramon Henrique Ornelas (ramon) on 2010-02-26T03:10:37.000+0000

@Thomas See, maybe might help.

<pre class="highlight">

Returned group *Â *. Problem with whitespace in group, my project is utf-8. This is occurring with values that contain special characters in fr.xml.

infinity returned ∞

Posted by Nicolas Grevet (nyko18) on 2010-02-26T03:36:14.000+0000

I guess this is a totally different problem, probably occuring because your project encoding is wrongly set. We have a full UTF-8 project here, and Zend_Locale_Data::getList('fr','symbols') returns the expected result.

Posted by Thomas Weidner (thomas) on 2010-02-26T15:46:53.000+0000

@Nicolas: FR defines "A0" as grouping, and not "20". And chr does not support extended ascii codes.

Therefor non of your examples work.

Your first does not define grouping (it's not localized) Your second uses space (20) as grouping (which is not french) Your third uses chr(160) which does not return the proper "A0" as it does not support extended ascii codes.

Regarding your qestion: Working with localization you always have to use UTF-8.

@Ramon: I know how the components work which I added to ZF. And you should use UTF-8 like described in the manual. I18n works only with UTF-8, the CLDR is in UTF-8 and also the returned values are UTF-8. So when you are getting "â" instead of " " then you have set something wrong (probably ISO instead of UTF8)

Posted by Nicolas Grevet (nyko18) on 2010-03-01T00:45:25.000+0000

Ok, at this point I guess this is more a problem of wether you live in France or not. Nobody (and when I say nobody, I mean nobody) in France ever use the space separator when dealing with the web and most of the tech industry. The space separator is a beautification of numbers in case you're working with very large numbers, it is rarely used to input anything, just to display. Preventing people to input non-separated digits is like preventing them to input text without punctuation, it's weird. How are we supposed to use the Zend components for this case? The framework is already slow enough for us to add yet as many filters as we have numeric input fields. I understand that the 'standards' say this is the right way to proceed, but it's not like it's the only way of proceeding with numbers in France.

Posted by Nicolas Grevet (nyko18) on 2010-03-01T01:51:29.000+0000

Thanks to someone on the Mailing List, I found we should use the 'digits' validator if we want to validate numbers without separators. As these are used 99% of the time in french, I guess we'll just use this one.

Posted by Nicolas Grevet (nyko18) on 2010-03-01T02:37:20.000+0000

By the way, this does not mean there's no problem at all. The regular expression returned by the Locale is supposed to allow us to use no separators at all, just like we could expect.

<pre class="highlight">
/^[++]{0,1}[0-9]{1,3}(\ {0,1}[0-9]{3})*(\,{1}[0-9]{1,}){0,1}$/ 

If you look carefully, there's a "\ {0,1}", which means zero or one. I don't think the Locale format is the problem, and I never said so. The problem is an encoding issue (like I said in the initial report), the non-breakable space is in the Zend/Locale/Data/fr.xml line 3954 (xpath: //ldml/numbers/symbols/group), and this file is UTF-8. Validating this regex wth a preg_match() that doesn't have the unicode flag is a mistake. By the way, adding a 'u' modifier to the regex makes it work!

Posted by Thomas Weidner (thomas) on 2010-03-01T12:19:54.000+0000

Reopened to check the new information

Posted by Thomas Weidner (thomas) on 2010-03-05T14:59:10.000+0000

Fixed with r21349. Thanks for the detailed info.

Posted by Ciprian Popovici (cpopovici) on 2012-09-24T09:05:36.000+0000

Zend/Locale/Data/fr.xml in SVN seems to have had 0xA0 replaced with a normal space, but official packages everywhere (including Zend's own ZendFramework-1.12.0.tar.gz and Linux distro packages) still seem to ship with the broken version.

As a result, this bug is still very much alive and well in the wild.

Have you found an issue?

See the Overview section for more details.


© 2006-2018 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.