ZF-269: Support UTF-8 string filtering and validation

Description

If you try to filter or validate a string that contains, for example, a German umlaut, and the server uses UTF-8 encoding, you will find that filter and validation classes do not support such characters.

Comments

The problem with the mb_* functions are they require a non standard extension to be enabled.

So we would need to check to make sure the mb_* functions are available before trying to use them which could get very messy.

Original comment by Lenny:

the Zend_Filter does not have a filter for checked the character strings containing of the accents. A solution exists ? So, if not i code it !

Mark, I see the problem. Although it might still be worthwile considering it because AFAIU the ZF is supposed to be full UTF-8 compatible, where this is a major break. An alternative might be to have something like Zend_Filter_Mb for those of us with a non ASCII tongue. What do you think?

Perhaps a better solution would be to allow the replace function to be configurable in some way. This could be done in several ways, but implementing a replace() function within the class might be the place to start. At least then it could be overloaded and that function would be a place to centrally make whatever implmentation seems best.

Reading the php manual, it looks like if we changed from preg_replace to ereg_replace the mbstring extension can automatically overload the functions.

see http://php.net/manual/en/…

not sure if this is a possible solution?

Mark's idea sounds good - but I also see a potential problem.

From my experience with shared hosting packages you don't normally have access to the php.ini. This would then be similiar to mb_* being a non-standard extension.

But maybe you can also set overwriting per directory in .htaccess, which would probably help most people in shared hosting environments.

Currently, my best guess is that a large percentage of deployment environments (e.g. many web hosters) do not provide the mbstring PHP extension. Thus, the only reliable alternative I've seen is: http://framework.zend.com/wiki/x/sgo

Changing fix version to 0.9.0.

Updated the issue details.

Related original comment from [~gavin]:

Summarizing from historical threads:

* the mbstring extension is not deprecated and is the recommended way to fully UTF-8 enable an entire PHP application
* the mbstring extension should not be required in ZF /library core code (the extension is not a ZF requirement)
* the /u modifier for PCRE does not work in all conditions and situations (see past topic threads and pcre.org for details)
* test suites might make use of more "tools" (e.g. mbstring) than are available in ZF /library core, but the tests then need to be optional

Just a small comment:

With ``` you can receive a list of chars which are accepted and official supported within this language/locale.

German for example returns "[a-z ä ö ü ß]", english retunrs "[a-z]" and greek returns "[ΐά-ώ]"

Maybe this can be usefull for you.

A link to the discussion thread summarizing the community's consensus:

http://nabble.com/forum/ViewPost.jtp/…

Those wo actually need the mb_string functions will most probably have the mb_* functions enabled. My idea is to work with "locales", and if one does not provide a locale, we will use "default behaviour".

Darby, isn't this one resolved?

Since the issue does not name a specific unresolved item to address, I mark the resolution as incomplete. The issue can be reopened, naming a specific problem [set], if necessary.