History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: ZF-269
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Incomplete
Priority: Major Major
Assignee: Darby Felton
Reporter: Georg von der Howen
Votes: 0
Watchers: 6
Operations

If you were logged in you would be able to see more operations.
Google issue summary
Zend Framework

Support UTF-8 string filtering and validation

Created: 19/Jul/06 02:18 AM   Updated: 05/Jul/07 02:43 PM
Component/s: Zend_Filter, Zend_Validate
Affects Version/s: 0.1.5
Fix Version/s: 1.0.0 RC3

Time Tracking:
Not Specified

Issue Links:
Related
 

Tags:
Participants: Andries Seutens, Bill Karwin, Christopher Thompson, Darby Felton, Gavin, Georg von der Howen, Mark Evans and Thomas Weidner


 Description  « Hide
If you try to filter or validate a string that contains, for example, a German umlaut, and the server uses UTF-8 encoding, you will find that filter and validation classes do not support such characters.

 All   Comments   Work Log   Change History   FishEye   Crucible      Sort Order: Ascending order - Click to sort in descending order
Mark Evans - 21/Jul/06 07:02 AM
The problem with the mb_* functions are they require a non standard extension to be enabled.

So we would need to check to make sure the mb_* functions are available before trying to use them which could get very messy.


Darby Felton - 24/Jul/06 11:25 AM
Original comment by Lenny:

the Zend_Filter does not have a filter for checked the character strings containing of the accents.
A solution exists ?
So, if not i code it !


Georg von der Howen - 25/Jul/06 01:03 AM
Mark, I see the problem. Although it might still be worthwile considering it because AFAIU the ZF is supposed to be full UTF-8 compatible, where this is a major break. An alternative might be to have something like Zend_Filter_Mb for those of us with a non ASCII tongue. What do you think?

Christopher Thompson - 25/Jul/06 06:17 PM
Perhaps a better solution would be to allow the replace function to be configurable in some way. This could be done in several ways, but implementing a replace() function within the class might be the place to start. At least then it could be overloaded and that function would be a place to centrally make whatever implmentation seems best.

Mark Evans - 26/Jul/06 03:38 AM
Reading the php manual, it looks like if we changed from preg_replace to ereg_replace the mbstring extension can automatically overload the functions.

see http://www.php.net/manual/en/ref.mbstring.php#mbstring.overload

not sure if this is a possible solution?


Georg von der Howen - 26/Jul/06 04:11 AM
Mark's idea sounds good - but I also see a potential problem.

From my experience with shared hosting packages you don't normally have access to the php.ini. This would then be similiar to mb_* being a non-standard extension.

But maybe you can also set overwriting per directory in .htaccess, which would probably help most people in shared hosting environments.


Gavin - 02/Aug/06 07:37 PM
Currently, my best guess is that a large percentage of deployment environments (e.g. many web hosters) do not provide the mbstring PHP extension. Thus, the only reliable alternative I've seen is: http://framework.zend.com/wiki/x/sgo

Bill Karwin - 13/Nov/06 03:23 PM
Changing fix version to 0.9.0.

Darby Felton - 28/Mar/07 11:31 AM
Updated the issue details.

Related original comment from Gavin:

Summarizing from historical threads:

  • the mbstring extension is not deprecated and is the recommended way to fully UTF-8 enable an entire PHP application
  • the mbstring extension should not be required in ZF /library core code (the extension is not a ZF requirement)
  • the /u modifier for PCRE does not work in all conditions and situations (see past topic threads and pcre.org for details)
  • test suites might make use of more "tools" (e.g. mbstring) than are available in ZF /library core, but the tests then need to be optional

Thomas Weidner - 28/Mar/07 12:05 PM
Just a small comment:

With

Zend_Locale_Data::getContent($locale, 'characters');

you can receive a list of chars which are accepted and official supported within this language/locale.

German for example returns "[a-z ä ö ü ß]", english retunrs "[a-z]" and greek returns "[ΐά-ώ]"

Maybe this can be usefull for you.


Gavin - 28/Mar/07 12:59 PM
A link to the discussion thread summarizing the community's consensus:

http://www.nabble.com/forum/ViewPost.jtp?post=6490854&framed=y&skin=16154


Andries Seutens - 12/Apr/07 09:29 AM
Those wo actually need the mb_string functions will most probably have the mb_* functions enabled. My idea is to work with "locales", and if one does not provide a locale, we will use "default behaviour".

Andries Seutens - 18/Jun/07 09:08 AM
Darby, isn't this one resolved?

Darby Felton - 18/Jun/07 10:49 AM
Since the issue does not name a specific unresolved item to address, I mark the resolution as incomplete. The issue can be reopened, naming a specific problem [set], if necessary.