ZF-9242: Zend_Locale_Format seriously broken with non-english locale regex format (french)

Description

Is Zend_Validate_Int seriously broken with non-english locales or is it just me? Because we're facing two different problems.

  • In order to validate an integer using the french format (with spaces as thousands separator, ie: 8 000), the framework expects us to input digits separated with non-breakable spaces (ie: 8(\xA0)000). Is this a sad joke? Because I don't even know how to input a non-breakable space using my keyboard. Regular spaces are not allowed. Sure you can still avoid thousands separators, but in this case, a second problem appears..

(this issue is discussed here, I can understand it's not really an issue, but still: http://framework.zend.com/issues/browse/ZF-7175)

  • So you can't validate a french space-separated number, let's use a number with no separators at all. Unfortunately, that does not seem to be a solution either, since you didn't specified the 'u' modifier in the preg_match() that validates the number format in Zend_Locale_Format (line 503). And it seems that the non-breakable space is a special UTF-8 character that you can't use in UTF-8 regex without the 'u' modifier. At least, adding the 'u' modifier makes it work like a charm.

So basically, you can't use integers in french.

Comments

Issue not reproducable.

Running the following testcode I get "true":


$date = new Zend_Validate_Int('fr');
var_dump($date->isValid('1 234'));

Regarding space as separator: This will not be implemented as it's a false behaviour. As you can see in the above example code using a non-breaking space as separator is accepted and returns true. Note that this works even without using /u as non-breaking space is no special utf8 character.

When you want to allow your users to enter non-integers (with whitespaces) and have them accepted as integer then you need to add a filter which converts nb-space to space.

As example, your proposed change would allow the user to enter "123 456 789"... in this case you would never know if he entered 3 independend numbers or if he entered 3 seperated numers.

Tried your code, I created a blank php file and put this inside:


set_include_path('/usr/lib/php5/zend/trunk/library');
require_once 'Zend/Validate/Int.php';

$date = new Zend_Validate_Int('fr');
var_dump($date->isValid('1234'));
var_dump($date->isValid('1 234'));
var_dump($date->isValid('1'.chr(160).'234'));

All of them gives me "false". Are you using UTF-8 in your project ?

@Thomas See, maybe might help.


Zend_Locale_Data::getList('fr','symbols')

Returned group *Â *. Problem with whitespace in group, my project is utf-8. This is occurring with values that contain special characters in fr.xml.

infinity returned ∞

I guess this is a totally different problem, probably occuring because your project encoding is wrongly set. We have a full UTF-8 project here, and Zend_Locale_Data::getList('fr','symbols') returns the expected result.

@Nicolas: FR defines "A0" as grouping, and not "20". And chr does not support extended ascii codes.

Therefor non of your examples work.

Your first does not define grouping (it's not localized) Your second uses space (20) as grouping (which is not french) Your third uses chr(160) which does not return the proper "A0" as it does not support extended ascii codes.

Regarding your qestion: Working with localization you always have to use UTF-8.

@Ramon: I know how the components work which I added to ZF. And you should use UTF-8 like described in the manual. I18n works only with UTF-8, the CLDR is in UTF-8 and also the returned values are UTF-8. So when you are getting "â" instead of " " then you have set something wrong (probably ISO instead of UTF8)

Ok, at this point I guess this is more a problem of wether you live in France or not. Nobody (and when I say nobody, I mean nobody) in France ever use the space separator when dealing with the web and most of the tech industry. The space separator is a beautification of numbers in case you're working with very large numbers, it is rarely used to input anything, just to display. Preventing people to input non-separated digits is like preventing them to input text without punctuation, it's weird. How are we supposed to use the Zend components for this case? The framework is already slow enough for us to add yet as many filters as we have numeric input fields. I understand that the 'standards' say this is the right way to proceed, but it's not like it's the only way of proceeding with numbers in France.

Thanks to someone on the Mailing List, I found we should use the 'digits' validator if we want to validate numbers without separators. As these are used 99% of the time in french, I guess we'll just use this one.

By the way, this does not mean there's no problem at all. The regular expression returned by the Locale is supposed to allow us to use no separators at all, just like we could expect.


/^[++]{0,1}[0-9]{1,3}(\ {0,1}[0-9]{3})*(\,{1}[0-9]{1,}){0,1}$/ 

If you look carefully, there's a "\ {0,1}", which means zero or one. I don't think the Locale format is the problem, and I never said so. The problem is an encoding issue (like I said in the initial report), the non-breakable space is in the Zend/Locale/Data/fr.xml line 3954 (xpath: //ldml/numbers/symbols/group), and this file is UTF-8. Validating this regex wth a preg_match() that doesn't have the unicode flag is a mistake. By the way, adding a 'u' modifier to the regex makes it work!

Reopened to check the new information

Fixed with r21349. Thanks for the detailed info.

{{Zend/Locale/Data/fr.xml}} in SVN seems to have had {{0xA0}} replaced with a normal space, but official packages everywhere (including Zend's own {{ZendFramework-1.12.0.tar.gz}} and Linux distro packages) still seem to ship with the broken version.

As a result, this bug is still very much alive and well in the wild.