History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: ZF-1641
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Andries Seutens
Reporter: Darby Felton
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Google issue summary
Zend Framework

PCRE UTF-8 Support Unavailable on Some Platforms

Created: 27/Jun/07 12:06 PM   Updated: 05/Jul/07 02:44 PM
Component/s: Zend_Filter, Zend_Validate
Affects Version/s: 1.0.0 RC3
Fix Version/s: 1.0.0

Time Tracking:
Not Specified

Tags:
Participants: Andries Seutens, Darby Felton, Graham Anderson and Joshua L Ross


 Description  « Hide
Filter and validation classes such as Zend_Filter_Alpha are known to make use of the /u PCRE pattern modifier, but UTF-8 support is not compiled by default into various PHP distributions.

 All   Comments   Work Log   Change History   FishEye   Crucible      Sort Order: Ascending order - Click to sort in descending order
Darby Felton - 27/Jun/07 12:41 PM
Original comment by Christian Gräfe:

I can see why this change was made, but for me it is causing problems on certain platforms. On some, the extendes PCRE syntax "\p{}" doesn't seem to match anything. I could quite figure out what exactly is causing this problem. Up to now, I tried the following platforms:

Not working:
Fedora Core 5, PHP 5.1.6, PCRE 6.3
Fedora Core 6, PHP 5.1.6, PCRE 6.6

Working:
Fedora 7, PHP 5.2.2, PCRE 7.0
Solaris 10 x86, PHP 5.1.5, PCRE 6.6
Debian Etch, PHP 4.4.4, PCRE 6.7

Maybe, you could amend the docs to state the exact prerequisites to get Zend_Filter_Alnum et al. to work. That would surely help me a lot.



Joshua L Ross - 27/Jun/07 01:16 PM
I am also have this issue on

RHEL5, PHP 5.1.6, PCRE 6.6

I check redhat default build spec for PCRE 6.6 and --enable-utf8 is specified. At this point I am not 100% sure it is UTF-8 related but I do know since ZF1.0RC3 the filters Zend_Filter_Alnum and Zend_Filter_Alpha are not matching strings that should be valid.


Joshua L Ross - 27/Jun/07 01:24 PM
These patterns continually fail regardless of the value input:

$pattern = '/[^\p{L}\p{N}' . ($this->allowWhiteSpace ? '\s' : '') . ']/u';
$pattern = '/[^\p{L}\p{N}' . ($this->allowWhiteSpace ? '\s' : '') . ']
/';

So I modified the pattern to use the old alnum which works both with and without utf-8 specified. So these patterns are working:

$pattern = '/[^[:alnum:]' . ($this->allowWhiteSpace ? '\s' : '') . ']/u';
$pattern = '/[^[:alnum:]' . ($this->allowWhiteSpace ? '\s' : '') . ']/';


Darby Felton - 27/Jun/07 01:39 PM
The Unicode property patterns (the ones beginning with \p) will not work if UTF-8 support is unavailable in PCRE.

From the PCRE man pages:

UTF-8 SUPPORT

       To build PCRE with support for UTF-8 character strings, add

         --enable-utf8

       to  the  configure  command.  Of  itself, this does not make PCRE treat
       strings as UTF-8. As well as compiling PCRE with this option, you  also
       have  have to set the PCRE_UTF8 option when you call the pcre_compile()
       function.

UTF-8 AND UNICODE PROPERTY SUPPORT

       From  release  3.3,  PCRE  has  had  some support for character strings
       encoded in the UTF-8 format. For release 4.0 this was greatly  extended
       to  cover  most common requirements, and in release 5.0 additional sup-
       port for Unicode general category properties was added.

       In order process UTF-8 strings, you must build PCRE  to  include  UTF-8
       support  in  the  code,  and, in addition, you must call pcre_compile()
       with the PCRE_UTF8 option flag. When you do this, both the pattern  and
       any  subject  strings  that are matched against it are treated as UTF-8
       strings instead of just strings of bytes.

       If you compile PCRE with UTF-8 support, but do not use it at run  time,
       the  library will be a bit bigger, but the additional run time overhead
       is limited to testing the PCRE_UTF8 flag occasionally, so should not be
       very big.

       If PCRE is built with Unicode character property support (which implies
       UTF-8 support), the escape sequences \p{..}, \P{..}, and  \X  are  sup-
       ported.  The available properties that can be tested are limited to the
       general category properties such as Lu for an upper case letter  or  Nd
       for  a  decimal number, the Unicode script names such as Arabic or Han,
       and the derived properties Any and L&. A full  list  is  given  in  the
       pcrepattern documentation. Only the short names for properties are sup-
       ported. For example, \p{L} matches a letter. Its Perl synonym,  \p{Let-
       ter},  is  not  supported.   Furthermore,  in Perl, many properties may
       optionally be prefixed by "Is", for compatibility with Perl  5.6.  PCRE
       does not support this.

Graham Anderson - 27/Jun/07 05:22 PM
From testing on my platforms ( openSUSE/SUSE-OSS 10.0, 10.1, 10.2 ), I have
the following results:

Up to ( but not including ) apache2-mod_php5-5.2.0-10.rpm testing against the
PCRE expression in current Zend_Filter_Alnum passes.

From mod_php5 version that ships with openSUSE 10.2 ( 5.2.0-10 ) to the
current available in the openSUSE build service, ( 5.2.3-29.1 ), testing
against the pattern fails.

On openSUSE mod_php5-5.2.0-10 and greater is built against the system PCRE
library and not the PHP bundled one, on opensuse 10.2 this is pcre-6.7-21.
While patterns with UTF-8 options match against this system library outside
of PHP; they do not, for whatever reason match when used inside of PHP.

Workaround, on openSUSE 10.2 an upgrade of the system PCRE to the latest
available from the openSUSE build service ( pcre-7.1-16 ) will allow pattern
matching inside of PHP to function as expected.


Andries Seutens - 28/Jun/07 07:29 AM
Issue resolved in r5468

Darby Felton - 28/Jun/07 08:30 AM
Zend_Filter_Alpha and Zend_Filter_Digits also use Unicode property matching.

Andries Seutens - 28/Jun/07 08:51 AM
Sorry, I forgat about those ..... It should be fixed now. Graham Anderson or anyone else running a none UTF-8 enabled system, could you please run the unit tests for: Zend_Filter_AllTests?

I'll await your response before i close the issue.


Andries Seutens - 28/Jun/07 09:30 AM
Thanks to Julian Davchev for testing this:


Here stuff on local machine:

svn update
U tests/Zend/Filter/AlphaTest.php
U library/Zend/Filter/Alnum.php
U library/Zend/Filter/Digits.php
U library/Zend/Filter/Alpha.php
Updated to revision 5469.
At revision 5469.

jmut@dexter:/storage/www/frameworks/zendframework$ php
tests/Zend/Filter/AllTests.php
PHPUnit 3.0.0 by Sebastian Bergmann.

.........................................
.........................................
...........................

Time: 00:00

OK (109 tests)