ZF-269: Support UTF-8 string filtering and validation

Issue Type: Improvement Created: 2006-07-19T02:18:05.000+0000 Last Updated: 2007-07-05T14:43:15.000+0000 Status: Resolved Fix version(s): - 1.0.0 RC3 (23/Jun/07)

Reporter: Georg von der Howen (ghowen) Assignee: Darby Felton (darby) Tags: - Zend_Filter

  • Zend_Validate

Related issues: - ZF-1483



If you try to filter or validate a string that contains, for example, a German umlaut, and the server uses UTF-8 encoding, you will find that filter and validation classes do not support such characters.


Posted by Mark Evans (sparky) on 2006-07-21T07:02:45.000+0000

The problem with the mb_* functions are they require a non standard extension to be enabled.

So we would need to check to make sure the mb_* functions are available before trying to use them which could get very messy.

Posted by Darby Felton (darby) on 2006-07-24T11:25:10.000+0000

Original comment by Lenny:

the Zend_Filter does not have a filter for checked the character strings containing of the accents. A solution exists ? So, if not i code it !

Posted by Georg von der Howen (ghowen) on 2006-07-25T01:03:50.000+0000

Mark, I see the problem. Although it might still be worthwile considering it because AFAIU the ZF is supposed to be full UTF-8 compatible, where this is a major break. An alternative might be to have something like Zend_Filter_Mb for those of us with a non ASCII tongue. What do you think?

Posted by Christopher Thompson (christopher) on 2006-07-25T18:17:35.000+0000

Perhaps a better solution would be to allow the replace function to be configurable in some way. This could be done in several ways, but implementing a replace() function within the class might be the place to start. At least then it could be overloaded and that function would be a place to centrally make whatever implmentation seems best.

Posted by Mark Evans (sparky) on 2006-07-26T03:38:21.000+0000

Reading the php manual, it looks like if we changed from preg_replace to ereg_replace the mbstring extension can automatically overload the functions.


not sure if this is a possible solution?

Posted by Georg von der Howen (ghowen) on 2006-07-26T04:11:27.000+0000

Mark's idea sounds good - but I also see a potential problem.

From my experience with shared hosting packages you don't normally have access to the php.ini. This would then be similiar to mb_* being a non-standard extension.

But maybe you can also set overwriting per directory in .htaccess, which would probably help most people in shared hosting environments.

Posted by Gavin (gavin) on 2006-08-02T19:37:30.000+0000

Currently, my best guess is that a large percentage of deployment environments (e.g. many web hosters) do not provide the mbstring PHP extension. Thus, the only reliable alternative I've seen is:

Posted by Bill Karwin (bkarwin) on 2006-11-13T15:23:36.000+0000

Changing fix version to 0.9.0.

Posted by Darby Felton (darby) on 2007-03-28T11:31:18.000+0000

Updated the issue details.

Related original comment from [~gavin]:

Summarizing from historical threads:

* the mbstring extension is not deprecated and is the recommended way to fully UTF-8 enable an entire PHP application
* the mbstring extension should not be required in ZF /library core code (the extension is not a ZF requirement)
* the /u modifier for PCRE does not work in all conditions and situations (see past topic threads and for details)
* test suites might make use of more "tools" (e.g. mbstring) than are available in ZF /library core, but the tests then need to be optional

Posted by Thomas Weidner (thomas) on 2007-03-28T12:05:57.000+0000

Just a small comment:

With ``` you can receive a list of chars which are accepted and official supported within this language/locale.

German for example returns "[a-z ä ö ü ß]", english retunrs "[a-z]" and greek returns "[ΐά-ώ]"

Maybe this can be usefull for you.

Posted by Gavin (gavin) on 2007-03-28T12:59:26.000+0000

A link to the discussion thread summarizing the community's consensus:…

Posted by Andries Seutens (andries) on 2007-04-12T09:29:25.000+0000

Those wo actually need the mb_string functions will most probably have the mb_* functions enabled. My idea is to work with "locales", and if one does not provide a locale, we will use "default behaviour".

Posted by Andries Seutens (andries) on 2007-06-18T09:08:22.000+0000

Darby, isn't this one resolved?

Posted by Darby Felton (darby) on 2007-06-18T10:49:07.000+0000

Since the issue does not name a specific unresolved item to address, I mark the resolution as incomplete. The issue can be reopened, naming a specific problem [set], if necessary.

Have you found an issue?

See the Overview section for more details.


© 2006-2018 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.