Issues

ZF-8375: Properly handle upper case language name in TMX

Description

At present, tmx file adapter simply puts whatever language name provided in the array, keeping the case of the language in the tmx file as a key in the array.

Therefore, if the file contains "EN", it will have $_data['EN']['test message'] = "test message - english";

However, a proper locale string must be "en". So, if the xml file contains "EN", adapter does not find the message for query against locale "en".

It will return "test message" instead of "test message - english". It is not efficient to change "EN" in xml to "en" as this file is generated by an editor which keeps the language as upper case.

ZF should do one of the following:

  1. Convert locale to lower case for user.
  2. Error out the xml file as it is not a valid file. (As you cannot set locate to "EN").

Comments

  1. can not be done... it would disallow en_US as it would convert it to en_us making the same problems for region as before for language.

  2. can not be done as invalid files are ignored while processing a directory search

For 1. it makes sense that the last two are upper cased. However, is it not possible to make sure that we store en_us as en_US by simply splitting the string and making sure that first part is always lower cased and second part is always upper cased?

So, to elaborate, how about if we change line # 114 on Tmx.php from:

$this->_tuv = $attrib['xml:lang'];

to:

$tuv_array = explode("", $attrib['xml:lang']); $this->_tuv = strtolower($tuv_array[0]). (($tuv_array[1])?"".strtoupper($tuv_array[1]):"");

sorry properly formatted message for to: {quote} $tuv_array = explode("_", $attrib['xml:lang']); $this->_tuv = strtolower($tuv_array[0]). (($tuv_array[1])?"".strtoupper($tuv_array[1]):""); {quote}

Does not work as a locale can include also other informations. And we expect that the lang attribut holds the locale information and not only the lang.

xml:lang could for example look like this: ar_Arab_JE or de_DE_Punji

Hi Thomas,

Thanks for your quick reply and attention. I apologize if I am totally missing the point.

In any case, how about just simply changing the language part "EN" to lower case "en":

{quote} $this->tuv = strtolower($tuv_array[0]). (($tuv_array[1])?"".implode("_", array_slice($tuv_array,1) ):""); {quote}

I think the code definately needs to change one way or the other. If the case cannot be changed to lowercase, then xml should not be preocessed. In my opinion, it does not make sense to process and store XML data if it cannot be accessed.

I agree with xml:lang being an locale identifier. That's the reason why I did not close this issue.

But, and this is more important for you, a locale is never uppercased.

This means that EN will also not be recognised afterwards. en-us, en_us, en-US, en_US, en-us-Latn, en-us-ISO8859 and so on would then be recognised and switched to en_US.

In future an notice will be raised when an unidentified locale has been found. But the data will still be added. You could be in need of this feature when you extend the base class. And you could turn off the notice by an already existing option.

Thanks, Thomas. That makes sense. If at all possible, it would be great if you can keep the update to the class such that a person extending the tmx adapter can override behavior for matching locale by overriding a method.

Either way, I appreciate your time and please keep up the good work.

New feature implemented with r19261 as described before.