Zend Framework

Problems with German Umlaute since ZF 1.5

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Blocker Blocker
  • Resolution: Fixed
  • Affects Version/s: 1.5.0
  • Fix Version/s: 1.5.2
  • Component/s: Zend_Filter
  • Labels:
    None
  • Fix Version Priority:
    Must Have

Description

We have some strange behaviour with Zend_Filter_Alnum &
Zend_Filter_Alpha since ZF 1.5. Everything works correct by using ZF 1.0.4.

By folowing code line
echo Zend_Filter_Alnum::filter("Lüge"); // in utf-8 encoding
delivers Lge instead of Lüge until ZF 1.0.x the result was correct.

Corresponding code block in Zend/Filter/Alnum.php:
<code>
if (!self::$_unicodeEnabled) { // POSIX named classes are not supported, use alternative a-zA-Z0-9 match $pattern = '/[^a-zA-Z0-9' . $whiteSpace . ']/'; } else if (extension_loaded('mbstring')) { // Unicode safe filter for the value with mbstring $pattern = '/[^[:alnum:]' . $whiteSpace . ']/u'; } else {
// Unicode safe filter for the value without mbstring
$pattern = '/[^\p{L}\p{N}' . $whiteSpace . ']/u';
}
</code>


The problem is we have the mbstring enabled and this line don't work
$pattern = '/[^[:alnum:]' . $whiteSpace . ']/u';
but this line
$pattern = '/[^\p{L}\p{N}' . $whiteSpace . ']
/u';
The internal-encoding is correctly set to UTF-8.

The question is why the [:alnum:] line don't work, and why it is useful to handle on another way if the extension is enabled ?

Issue Links

Activity

Hide
Satoru Yoshida added a comment -

Hello, Dominik.
Is the mbstring extension used in German ?

In the last version, before changing in ZF-2107, only "^\p{L}\p{N}" pattern is used.

But I found it causes error in Japanese. The problem is all character of Japanese is passed.
So, I changed because I thought mbstring extension used only in the language that has many multibyte characters.

But if the mbstring extension used in German , (or Czeck, Polish...etc) ZF-2107 happens your problems.

Do you have any idea instead of using "if (extension_loaded('mbstring'))" ?
It seems to be better if we use language location in if statement.

Show
Satoru Yoshida added a comment - Hello, Dominik. Is the mbstring extension used in German ? In the last version, before changing in ZF-2107, only "^\p{L}\p{N}" pattern is used. But I found it causes error in Japanese. The problem is all character of Japanese is passed. So, I changed because I thought mbstring extension used only in the language that has many multibyte characters. But if the mbstring extension used in German , (or Czeck, Polish...etc) ZF-2107 happens your problems. Do you have any idea instead of using "if (extension_loaded('mbstring'))" ? It seems to be better if we use language location in if statement.
Hide
Wil Sinclair added a comment -

Please categorize/fix as needed.

Show
Wil Sinclair added a comment - Please categorize/fix as needed.
Hide
Dominik Bors added a comment -

Hello Satoru,

sorry for my late answer.

I don't think mbstring extension is used in German, it shouldn't. But the mbextension often is installed by default.

I think we should add a second condition to the if statement, maybe something that asks if the default zend_locale is japanese.

Best regards
Dominik

Show
Dominik Bors added a comment - Hello Satoru, sorry for my late answer. I don't think mbstring extension is used in German, it shouldn't. But the mbextension often is installed by default. I think we should add a second condition to the if statement, maybe something that asks if the default zend_locale is japanese. Best regards Dominik
Hide
Satoru Yoshida added a comment -

Hello, Dominik.
Thank you for your reply.

Ok, I try to add some condition by using Zend_Locale.

Show
Satoru Yoshida added a comment - Hello, Dominik. Thank you for your reply. Ok, I try to add some condition by using Zend_Locale.
Hide
Kirill added a comment -

I have same problem with russian characters:

$filter = new Zend_Filter_Alnum(true);  
Zend_Debug::dump($filter->filter('это странненько - mbstring enabled'));  
//string(19) "   mbstring enabled"{/code}
$filter = new Zend_Filter_Alnum(true);
Zend_Debug::dump($filter->filter('это странненько - mbstring disabled'));
//string(48) "это странненько mbstring disabled"{/code}

In my case $pattern = '/[^[:alnum:]' . $whiteSpace . ']/u'; is used by filter.

Show
Kirill added a comment - I have same problem with russian characters:
$filter = new Zend_Filter_Alnum(true);  
Zend_Debug::dump($filter->filter('это странненько - mbstring enabled'));  
//string(19) "   mbstring enabled"{/code}
$filter = new Zend_Filter_Alnum(true); Zend_Debug::dump($filter->filter('это странненько - mbstring disabled')); //string(48) "это странненько mbstring disabled"{/code} In my case $pattern = '/[^[:alnum:]' . $whiteSpace . ']/u'; is used by filter.
Hide
Satoru Yoshida added a comment -

Thank You for Your information, Kirill.

Show
Satoru Yoshida added a comment - Thank You for Your information, Kirill.
Hide
Satoru Yoshida added a comment -

Resolved in SVN r9266

Show
Satoru Yoshida added a comment - Resolved in SVN r9266

People

Vote (1)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: