ZF-588: Can't draw special characters (ie. euro symbol) on linux

Description

Trying to draw the euro symbol with a line like this:

$page->drawText(chr(128), 100, 100);

I get a correctly printed euro symbol using apache 2.0.54, php 5.1.2 on windows, but only a set of white spaces if using apache and php (same versions) on linux (tried ubuntu 6 and fedora 5). I also tried different combinations of encodings with no success.

Comments

Assigning to Alexander.

Looks like at least on my linux systems (ubuntu and fedora core 5), there's a problem in Zend_Pdf_Page->drawText(). This row: $textObj = new Zend_Pdf_Element_String($this->_font->encodeString($text, $charEncoding)); calls encodeString() which tries to encode the text to CP1252 charset with no success. I tried a text with characters like 'à' or 'ì' (which are common in my language) and they don't get encoded. If I change the line above to $textObj = new Zend_Pdf_Element_String($text);

In my ubuntu system the output of iconv_get_encoding() says ISO-8859-1 for all three kinds. Hope this helps

Sorry, there's a missing part in my last message:

If I change the line above to: $textObj = new Zend_Pdf_Element_String($text); I get correct accented characters

{{Zend_Pdf_Resource_Font::encodeString()}} is necessary because the actual character encoding method used in the binary stream of the PDF file is WinAnsiEncoding, which is CP-1252.

CP-1252 is over a dozen years old and so predates the Euro; there is no Euro symbol character in the CP-1252 character map. This is why, in the current code base, you were unable to draw that character. There is an effort underway to replace the dated WinAnsiEncoding method with a Unicode-based system which will completely eliminate these kinds of problems.

Your results from bypassing {{encodeString()}} are interesting though... it should not have worked. Or, at least, if it did, there should be problems when viewing the PDF on other, non-Microsoft platforms. I'll take a closer look at what's happening here and try to come up with a reasonable explanation and/or fix.

In the interim, can you attach your sample PDF file (the one that works) to this issue? I'm curious to see if the PDF works properly on my Mac.

I attached two files: - test_encodeString.pdf: generated with original Zend Framework - test_no_encodeString.pdf: generated after having modified Zend_Pdf_Resource_Font::encodeString() this way: public function encodeString($string, $charEncoding) { return $string; }

Text is generated calling drawText this way: $pg->drawText("a -> à", $x, $y); $pg->drawText("e -> è", $x, $y); $pg->drawText("i -> ì", $x, $y); $pg->drawText("euro -> ".chr(128), $x, $y);

I can see the symbols in the second file and not in the first (I open the files both on linux and windows, they look the same). I noticed that the characters 'à', etc. are recognized as invalid by the iconv function in encodeString. For example, if I remove the //IGNORE statement for the output encoding, trying to render the text:

Test àà write

The result is:

Test

While leaving the //IGNORE statement the result is

Test write

Thanks for your help, Frank

P.S. According to this: http://en.wikipedia.org/wiki/Windows-1252 It looks that the euro symbol is included in this encoding, along with accented letters like 'à' and 'ì'. In fact I produced pdf documents with the euro symbol with tools like gs.

Thanks

This doesn't appear to have been fixed in 1.5.0. Please update if this is not correct.

Actually I solved the euro symbol problem like this in 1.5:

$page->drawText(html_entity_decode("€", ENT_COMPAT, "UTF-8"), $x, $y, "UTF-8");

Not perfect but it works, I think there's also a problem with source code file encoding while specifying non standard characters. Thanks, Frank

Sorry, I noticed that the editor converted my code, the first argument of html_entity_decode must be the entity for the euro symbol $ e u r o ;

PDF standard fonts support smal set of single byte encodings (See PDF Reference, Appendix D). They are generally equal to Latin1 (CP1252).

BUT! You have to specify character encoding for the string provided to drawText() method:


$font = Zend_Pdf_Font::fontWithName(Zend_Pdf_Font::FONT_COURIER);
$pdfPage->setFont($font, 36)
        ->drawText('Euro sign - €', 72, 720, 'UTF-8')
        ->drawText('Text with umlauts - à è ì', 72, 650, 'UTF-8');

You also must be sure, that string is actualy uses specified encoding.

[ZF-3650] documentation task should describe this more clear.