ZF-5757: Zend_Translate strips capital Umlaut

Description

When a translation contains a capital Umlaut as first letter, this letter is stripped by Zend_Translate

Comments

You would be the first person since this component exists with such a problem. Zend_Translate returns the translation as it is written. It does wether change the encoding nor the content from the file.

Please give some additional data. I expect that you have a encoding or a view problem.

Hello Thomas,

I use the CSV adapter, something like this: $adapter = new Zend_Translate('csv', $pathToCsvFile, 'de');

If I print_r() the $adapter, I already can see that the capital umlauts have disappeared.

When I add a blank in the CSV file before a capital umlaut, the blank and the capital umlaut are stripped. When I add another capital umlaut before the capital umlaut, BOTH are stripped. When I add an underscore before a capital umlaut, everything is fine (except for the unwanted underscore, of course).

I have checked with a different adapter (array) and everything is okay.

So, I guess there must be a problem with the CSV adapter.

This problem only occurs when the capital umlaut is the first character of the translation.

Cheers, LowTower.

The problem is not the adapter.

The adapter uses php's fgetcsv method to read the csv file. So when there would be a problem with stripped chars the problem would also be within PHP itself.

For me it seems that the CSV file you created is broken. For example not using UTF-8 but ISO-8859-1. Or not using the UTF-8 syntax (which means that each sentence is like "sentence";"answer".

See a file created by Excel to see the difference to your file.

Hello Thomas,

utf-8 charset of the csv file was correct.

The CSV looked like something;translation without the double quotes.

With double quotes, everything is fine.

But, a far as I know, the csv specification doesn't require double quotes for correctness of csv files.

Cheers, LowTower.

Have you tried this manually by reading the file with fgetcsv ?

Because a note from the PHP manual itself reads: "Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function. "

This means if the encoding of the file does not match the encoding of the server it will not be read correct.

Still, as mentioned before, this is not a problem of ZF and is mentioned in the manual.

This is a PHP issue which can not be solved by ZF. Closing as Won't Fix.

I get exactly the same problem when exchanging ZF 1.5.2 with 1.7.6 - no other changes. It's a Debian Lenny box with current PHP version and setting LOCALES () to empty, en_US.UTF-8 or de_DE.UTF-8 made no change.

So this is clearly a change in ZF.

With ZF 1.6 the community decided that we have to change the internal implementation of the CSV Adapter to use fgetcsv because it's faster to rely on php methods than on self written methods.

As all data, which is stored, comes from fgetcsv, a pure php function, there is no way to fix this behaviour without changing again to the internal implementation.

As my question, with the check what fgetcsv returns, was not answered, I must assume that this problem is a php only problem and not a problem of ZF's implementation.

For known issues of fgetcsv look into php's manual (where I copied out this locale theme).

As there is nothing which I can do, this issue will stay at won't fix.

Okay, that change in 1.6 explains the behavior. I tested it with fgetcsv and got the same results as with ZF 1.7.6. So the workaround would be to update all csv's from key;value to "key";"value" which could be done by importing and exporting each csv into and from Open Office. Excel has problems with UTF-8.

Furthermore it seams that the Adapter.php from Translate seams to have changed for the "scan" option. With 1.5.2 it was kind of relaxed and would use translations from other csv's as it find the terms there. Which had the nice side effect, that you don't had to store so much double. This behavior has changed as well between 1.5.2 and 1.7.6 (probably with 1.6), since it now seams very strict in the scan mode. Translations of a term do only appear if they exist in that very file it is supposed to be through the MVC logic.

No, the scan option has not changed. It is not necessary to store messages multiple times.