ZF-710: _parseDate() - Erratic behavior when the optional $locale is not given

Description

Many methods relying on Zend_Locale_Format::_parseDate(), including Zend_Date*, yield erratic behavior. The problem relates to how months are handled in the data parsed, when no locale is explicitly provided. We could use the default locale, but then many functions would silently produce "strange" results, if that was not the intended locale.

In several methods in Zend_Date, the optional parameter $locale either does not default to $this->_Locale, or the methods invoked pass along the "false" value for the $locale parameter, where some do correctly default to $this->_Locale and others do not (e.g. static methods that lack access to $this).

I will commit a patch soon that addresses these issues, and resolves several broken unit tests.

Invoking non-static methods on existing instances of Zend_Date should default to the locale of that instance. Using any other locale would not be intuitive. Separately, we should also consider the default value for $gmt in similar circumstances.

Comments

See Fisheye link for changeset information.

The functions which pass along the "false" locale parameter call a lowered function where the locale is set properly.

But most of the functions where the patch was added had no unit tests for now. Non of the existing unit tests were broken... and I got non reported from others.

I agree with the handling of default locale... the default useage should be clear.

Depending to Default GMT... This could lead to massive problems...

Think of a user setting GMT to TRUE... $date->new Zend_Date('01.01.2000 00:00:00', TRUE, 'de_AT');

Now he will have massive problems when calculating with hours. $date->addHour(1); will add no hour because de_AT has GMT +1, and as GMT is TRUE +1 will be calculated back to GMT. +1 hour de_AT is equal to 0 hours GMT.

Same problem with many other functions. $date->addTime('01:00:00'); will also produce an error behavior... as GMT is set to TRUE.

GMT should not be set to a default other than the one from the function... The constructor GMT is only relevant for knowing if the date/time is a GMT or a localized time.

Related to the Patch 710:

The Zend_Locale_xxx functions will now have false behavior.

These functions should not be related to a Zend_Locale object. They do not need to have access to an object, they only need the identifier of the locale.

Example:

$array = Zend_Locale::getNumber('1,000.19', 'de_AT');

should produce the same output as

$locale = new Zend_Locale('de_AT'); $array = Zend_Locale::getNumber('1,000.19', $locale);

As the Zend_Locale_xxx functions are static they should also not be related to objects only because they need the identifier.

Related to the preg_match_all useage:

Dates can be given with UTF8 content... for example german 10.Jänner.2000 or even with arabic month or day names. So the useage of preg_match_all is necessary until we have PHP6.0.

The first patch addressed the following unit test failures, reproducible with: svn export -r 2583 http://framework.zend.com/svn/framework/trunk/ zftrunk-2583 After the patch, all locale/date unit tests passed.

{quote}There were 6 failures:

1) testCompareTime(Zend_DateTest) Failed asserting that

<<

<<

<<

<<

<

I agree with the latest changeset SVN 2592.

But related to preg_match_all..

Expect someone has a localized date. 10.März.2000 but he wrote 10.Mräz.2000 because of misstyping. Or someone having a date string with other content added.

Are you sure the second, third or forth byte of a UTF8 character can never be 30-39 hex ????

I am not... 0430-0439 is related to the normal cyrilic alphabet, 0530-0539 is related to the armenian alphabet, 0630-0630 is related to the arabic alphabet... and so on.

Another way would be to use the iconv_ string functions but I expect them to be as slow as preg_match.

Or we define in the docu that misstyping and/or the useage of other content than date names can corrupt the getDate/getTime functions. Then we could strip the useage of preg.

RFC 3629 shows all bytes of extended UTF-8 characters must begin with "10", thus preventing "second, third, or fourth bytes of a UTF8 character" from ever being 30-39 hex:

http://tools.ietf.org/html/rfc3629 - section 3

I already knew this...

I just checked the UTF8 hex table and realized that UTF8 chars would never have hex 00-79... so all numers will be parsed correct. That was my mistake

But this has nothing to do with the leading 10... the second, third and 4th character are always from 80 to ff.

Ok... I will look into changing this behavior as expected soon...

I should clarify that this is a performance optimization, and therefore low priority.