Zend Framework

Zend_Locale_Format seriously broken with non-english locale regex format (french)

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Blocker Blocker
  • Resolution: Fixed
  • Affects Version/s: 1.10.1
  • Fix Version/s: 1.10.3
  • Component/s: Zend_Locale
  • Labels:
    None

Description

Is Zend_Validate_Int seriously broken with non-english locales or is it just me? Because we're facing two different problems.

  • In order to validate an integer using the french format (with spaces as thousands separator, ie: 8 000), the framework expects us to input digits separated with non-breakable spaces (ie: 8(\xA0)000). Is this a sad joke? Because I don't even know how to input a non-breakable space using my keyboard. Regular spaces are not allowed. Sure you can still avoid thousands separators, but in this case, a second problem appears..

(this issue is discussed here, I can understand it's not really an issue, but still: http://framework.zend.com/issues/browse/ZF-7175)

  • So you can't validate a french space-separated number, let's use a number with no separators at all. Unfortunately, that does not seem to be a solution either, since you didn't specified the 'u' modifier in the preg_match() that validates the number format in Zend_Locale_Format (line 503). And it seems that the non-breakable space is a special UTF-8 character that you can't use in UTF-8 regex without the 'u' modifier. At least, adding the 'u' modifier makes it work like a charm.

So basically, you can't use integers in french.

Activity

Hide
Thomas Weidner added a comment -

Issue not reproducable.

Running the following testcode I get "true":

$date = new Zend_Validate_Int('fr');
var_dump($date->isValid('1 234'));

Regarding space as separator:
This will not be implemented as it's a false behaviour.
As you can see in the above example code using a non-breaking space as separator is accepted and returns true. Note that this works even without using /u as non-breaking space is no special utf8 character.

When you want to allow your users to enter non-integers (with whitespaces) and have them accepted as integer then you need to add a filter which converts nb-space to space.

As example, your proposed change would allow the user to enter "123 456 789"... in this case you would never know if he entered 3 independend numbers or if he entered 3 seperated numers.

Show
Thomas Weidner added a comment - Issue not reproducable. Running the following testcode I get "true":
$date = new Zend_Validate_Int('fr');
var_dump($date->isValid('1 234'));
Regarding space as separator: This will not be implemented as it's a false behaviour. As you can see in the above example code using a non-breaking space as separator is accepted and returns true. Note that this works even without using /u as non-breaking space is no special utf8 character. When you want to allow your users to enter non-integers (with whitespaces) and have them accepted as integer then you need to add a filter which converts nb-space to space. As example, your proposed change would allow the user to enter "123 456 789"... in this case you would never know if he entered 3 independend numbers or if he entered 3 seperated numers.
Hide
Nicolas Grevet added a comment -

Tried your code, I created a blank php file and put this inside:


set_include_path('/usr/lib/php5/zend/trunk/library');
require_once 'Zend/Validate/Int.php';

$date = new Zend_Validate_Int('fr');
var_dump($date->isValid('1234'));
var_dump($date->isValid('1 234'));
var_dump($date->isValid('1'.chr(160).'234'));

{/code}

All of them gives me "false".
Are you using UTF-8 in your project ?

Show
Nicolas Grevet added a comment - Tried your code, I created a blank php file and put this inside:

set_include_path('/usr/lib/php5/zend/trunk/library'); require_once 'Zend/Validate/Int.php'; $date = new Zend_Validate_Int('fr'); var_dump($date->isValid('1234')); var_dump($date->isValid('1 234')); var_dump($date->isValid('1'.chr(160).'234')); {/code} All of them gives me "false". Are you using UTF-8 in your project ?
Hide
Ramon Henrique Ornelas added a comment - - edited

@Thomas
See, maybe might help.

Zend_Locale_Data::getList('fr','symbols')

Returned group *Â *.
Problem with whitespace in group, my project is utf-8.
This is occurring with values that contain special characters in fr.xml.

infinity returned ∞

Show
Ramon Henrique Ornelas added a comment - - edited @Thomas See, maybe might help.
Zend_Locale_Data::getList('fr','symbols')
Returned group * *. Problem with whitespace in group, my project is utf-8. This is occurring with values that contain special characters in fr.xml. infinity returned ∞
Hide
Nicolas Grevet added a comment -

I guess this is a totally different problem, probably occuring because your project encoding is wrongly set. We have a full UTF-8 project here, and Zend_Locale_Data::getList('fr','symbols') returns the expected result.

Show
Nicolas Grevet added a comment - I guess this is a totally different problem, probably occuring because your project encoding is wrongly set. We have a full UTF-8 project here, and Zend_Locale_Data::getList('fr','symbols') returns the expected result.
Hide
Thomas Weidner added a comment - - edited

@Nicolas:
FR defines "A0" as grouping, and not "20". And chr does not support extended ascii codes.

Therefor non of your examples work.

Your first does not define grouping (it's not localized)
Your second uses space (20) as grouping (which is not french)
Your third uses chr(160) which does not return the proper "A0" as it does not support extended ascii codes.

Regarding your qestion: Working with localization you always have to use UTF-8.

@Ramon:
I know how the components work which I added to ZF.
And you should use UTF-8 like described in the manual.
I18n works only with UTF-8, the CLDR is in UTF-8 and also the returned values are UTF-8. So when you are getting "â" instead of " " then you have set something wrong (probably ISO instead of UTF8)

Show
Thomas Weidner added a comment - - edited @Nicolas: FR defines "A0" as grouping, and not "20". And chr does not support extended ascii codes. Therefor non of your examples work. Your first does not define grouping (it's not localized) Your second uses space (20) as grouping (which is not french) Your third uses chr(160) which does not return the proper "A0" as it does not support extended ascii codes. Regarding your qestion: Working with localization you always have to use UTF-8. @Ramon: I know how the components work which I added to ZF. And you should use UTF-8 like described in the manual. I18n works only with UTF-8, the CLDR is in UTF-8 and also the returned values are UTF-8. So when you are getting "â" instead of " " then you have set something wrong (probably ISO instead of UTF8)
Hide
Nicolas Grevet added a comment -

Ok, at this point I guess this is more a problem of wether you live in France or not. Nobody (and when I say nobody, I mean nobody) in France ever use the space separator when dealing with the web and most of the tech industry. The space separator is a beautification of numbers in case you're working with very large numbers, it is rarely used to input anything, just to display. Preventing people to input non-separated digits is like preventing them to input text without punctuation, it's weird. How are we supposed to use the Zend components for this case? The framework is already slow enough for us to add yet as many filters as we have numeric input fields. I understand that the 'standards' say this is the right way to proceed, but it's not like it's the only way of proceeding with numbers in France.

Show
Nicolas Grevet added a comment - Ok, at this point I guess this is more a problem of wether you live in France or not. Nobody (and when I say nobody, I mean nobody) in France ever use the space separator when dealing with the web and most of the tech industry. The space separator is a beautification of numbers in case you're working with very large numbers, it is rarely used to input anything, just to display. Preventing people to input non-separated digits is like preventing them to input text without punctuation, it's weird. How are we supposed to use the Zend components for this case? The framework is already slow enough for us to add yet as many filters as we have numeric input fields. I understand that the 'standards' say this is the right way to proceed, but it's not like it's the only way of proceeding with numbers in France.
Hide
Nicolas Grevet added a comment -

Thanks to someone on the Mailing List, I found we should use the 'digits' validator if we want to validate numbers without separators. As these are used 99% of the time in french, I guess we'll just use this one.

Show
Nicolas Grevet added a comment - Thanks to someone on the Mailing List, I found we should use the 'digits' validator if we want to validate numbers without separators. As these are used 99% of the time in french, I guess we'll just use this one.
Hide
Nicolas Grevet added a comment -

By the way, this does not mean there's no problem at all. The regular expression returned by the Locale is supposed to allow us to use no separators at all, just like we could expect.


/^[++]{0,1}[0-9]{1,3}(\ {0,1}[0-9]{3})*(\,{1}[0-9]{1,}){0,1}$/
{/code}
If you look carefully, there's a "\ {0,1}", which means zero or one. I don't think the Locale format is the problem, and I never said so. The problem is an encoding issue (like I said in the initial report), the non-breakable space is in the Zend/Locale/Data/fr.xml line 3954 (xpath: //ldml/numbers/symbols/group), and this file is UTF-8. Validating this regex wth a preg_match() that doesn't have the unicode flag is a mistake. By the way, adding a 'u' modifier to the regex makes it work!

Show
Nicolas Grevet added a comment - By the way, this does not mean there's no problem at all. The regular expression returned by the Locale is supposed to allow us to use no separators at all, just like we could expect.

/^[++]{0,1}[0-9]{1,3}(\ {0,1}[0-9]{3})*(\,{1}[0-9]{1,}){0,1}$/ {/code} If you look carefully, there's a "\ {0,1}", which means zero or one. I don't think the Locale format is the problem, and I never said so. The problem is an encoding issue (like I said in the initial report), the non-breakable space is in the Zend/Locale/Data/fr.xml line 3954 (xpath: //ldml/numbers/symbols/group), and this file is UTF-8. Validating this regex wth a preg_match() that doesn't have the unicode flag is a mistake. By the way, adding a 'u' modifier to the regex makes it work!
Hide
Thomas Weidner added a comment -

Reopened to check the new information

Show
Thomas Weidner added a comment - Reopened to check the new information
Hide
Thomas Weidner added a comment -

Fixed with r21349.
Thanks for the detailed info.

Show
Thomas Weidner added a comment - Fixed with r21349. Thanks for the detailed info.

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: