Issues

ZF-3439: TMX Adapter shifts some source ISO-8859-1 diacritic characters into double-byte UTF-8 output

Issue Type: Bug Created: 2008-06-11T09:13:08.000+0000 Last Updated: 2008-09-02T10:38:51.000+0000 Status: Resolved Fix version(s): - 1.6.0 (02/Sep/08)

Reporter: Pádraic Brady (padraic) Assignee: Thomas Weidner (thomas) Tags: - Zend_Translate

Related issues: Attachments:

Description

This specific bug relates to the Zend_Translate TMX Adapter. It may also apply to other XML translation adapters.

It appears from a cursory test that the Adapter's use of xml_parser_create() creates a situation where in the incoming character encoding of the source input, is transferred to UTF-8 when output. This behaviour is enabled by default and documented in the PHP manual.

As a result ISO-8859-1 diacritics in the input source are transformed into their equivelant double-byte UTF-8 counterparts (i.e. the base character + diacritic). This breaks translations using ISO-8859-1 diacritics in the TMX XML format, when the HTML output is sent to the client browser using an ISO-8859-1 Content-Type.

A simple test is sufficient to prove the encoding shift.

Assuming an ISO-8859-1 encoding TMX file (see http://svn.astrumfutura.org/zfblog/branches/… ), the following does a simple strlen() byte count on the output from a single translated umlaut.

<pre class="highlight">require_once 'Zend/Translate.php';
$translate = new Zend_Translate('tmx', 'translate_de.xml', 'de'); // ISO-8859-1 encoded source
$translatedUmlaut = $translate->_('umlaut');
echo strlen($translatedUmlaut); // 2, expected 1 byte only

Notably, no browser here so that part of encoding is irrelevant. Instead a simple internal check shows the output umlaut is a double-byte character when output. Throwing this into a browser with a Content-Type set to the ISO-8859-1 charset results in "Ã ¼" being output.

The rawest solution is to locate the xml_parser_call() used in the TMX adapter and pass it the optional output encoding parameter set to "ISO-8859-1" which ensures encoding is untouched, and corrects the above problem. I have not done a similar check against other XML based adapters, but presumably whereever xml_parser_create() turns up this possible problem will follow it.

Comments

Posted by Thomas Weidner (thomas) on 2008-06-12T00:47:07.000+0000

This new feature has been added with r9676

Posted by Wil Sinclair (wil) on 2008-09-02T10:38:51.000+0000

Updating for the 1.6.0 release.

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts