ZF-3305: Flaw in file scanning

Description

We found some unexpected behaviour with Zend Translate today. In our dev environment, we had no problem, but as soon as we moved to production, we had an exception: "Internal error :Language (en_CA) has to be added before it can be used." We use the Gettext adapter using 'scan' => Zend_Translate::LOCALE_FILENAME.

It's a bit complicated so please bear with me, but here is how we managed to discover the bug: the locale "en_CA" indeed did not exist: we have en_US.mo and fr_FR.mo in our locale directory. Yet in dev, it had always worked. After using the Zend Studio Debugger to dig pretty deep within Zend_Translate_Adapter, we saw that the file scan also read our hidden .svn files and directories. Of course, those files are not present in prod, so the behaviour was different.

On lines 123-127 of Zend/Translate/Adapter.php, in our dev environment: - the file iterator finds a file such as "en_US.mo.svn-base" in a directory such as "locale/.svn/text-base/" - it removes the last element after the period (lines 123-124), $filename becomes "en_US.mo" - Zend_Locale::isLocale (line 126) determines that "en_US.mo" is not a real locale, but the first part of the string before the underscore is and keeps that ("en"). So the $locale is now "en". Then on line 146, we have $this->_addTranslationData("locale/.svn/text-base/en_US.mo.svn-base", "en"): the hidden .svn file is indeed a valid .mo file (I don't know much about the internal working of svn but it's no doubt a svn cache file identical to the latest actual file), but it's loaded into the "en" locale of $this->_locale.

So our application ends up loading the hidden svn file for en_US into the "en" locale, and since en_CA does not exist, it takes "en" instead which is indeed loaded in the _translate array. Which, I'll remind you, contains en_US translations from the en_US.mo.svn-base file!

However, in production, since we don't have the svn file, neither the "en_CA" nor the defaulting "en" .mo files are found, so an Exception is thrown instead.

I realise this is obscure and possibly difficult to reproduce, but my suggestion would be to change the file-scanning behaviour to ignore hidden files (files that start with a period), or at least add the option to ignore it. Of course on our end we can patch our production environment to have a default en.mo file or to catch the Exception, but the fact remains that the code behaviour is different depending on the environment, and that's a Bad Thing. :) Since many development teams use svn, I find it plausible that this problem could affect many more people and there's really no reason why the file scanner should read hidden files, is there?

Thanks, Véronique

Comments

Another suggestion, in case the ignoring of hidden files is not an acceptable solution, would be to replace the lines 123-125 (which only remove the portion of the filename after the last dot), with this:

$filename = substr((string)$info, 0, strpos((string)$info, '.'));

This would make $filename be "en_US" instead of "en_US.mo" in the above example, which is the intended behaviour. That way, Zend_Locale::isLocale won't transform the "en_US" into "en", like it does with "en_US.mo".

Hope that makes sense...

Filename scanning first looks if the file itself is a locale... "en_US.mo". Then it explodes the file with the seperators ".", "_" and "-". We can not simply strip all behind "." because many people use a syntax like "file.en.mo".

Any related to hidden files... actually there is no save way for all OSes to rely on the fileattribut "hidden".

The main problem, as I see it, is that you have not defined a default locale for translations or a locale which is not available.

Btw: The code lines you gave are not available in the actual trunk / release... please try with 1.5.2 as there was another problem with scanning which has been solved. Your code lines refer to comments, empty lines and else...

Well, I did set the "affected version" to 1.5.0 and thus my report refers to that, which is the version we use (I didn't even know 1.5.2 was out, it wasn't announced on the RSS feed...? anyway). But I just checked in the 1.5.2 version and the lines are 147 to 152. The entire addTranslation() method is identical from versoin 1.5.0 so whatever was changed was not in this method.

I'll add that while it's true that we also had a bug on our end (no handling of default locales), the fact remains that this is still unexpected behaviour, and we didn't notice the presence of our bug until we went in prod (since the behaviour is different). Perhaps having an option with an ignore pattern could do the trick. Is there any OS where directories starting with dots are not hidden by default, anyway?

Please test the latest SVN trunk.

I've changed the scanning logic. It now no longer degrades the locale when possible. So your original problem of en_US degrading to en will no longer occur.

The original flaw error in filename scanning has been solved. Check out r9511 or later for the fix.

Updating for the 1.6.0 release.