Issues

ZF-3912: Incorrect encoding of 'subject' in Chinese e-mail, sent with Zend_Mail

Issue Type: Bug Created: 2008-08-09T05:28:02.000+0000 Last Updated: 2008-10-06T08:44:24.000+0000 Status: Resolved Fix version(s): Reporter: Jonathan Maron (jonathan_maron) Assignee: Nico Edtinger (nico) Tags: - Zend_Mail

Related issues: - ZF-1950

Attachments: - r10901.patch

Description

With the following code, I wish to send an e-mail with Chinese in the subject:

<pre class="highlight">// ZendFramework-1.5.2

$mail = new Zend_Mail('UTF-8');

$mail->setFrom('info@example.org');
$mail->addTo('sam@example.org');
$mail->setSubject('机器视觉组件生产商');
$mail->setBodyText('机器视觉组件生产商 机器视觉组件生产商');
$mail->send();

However, the subject of the sent e-mail is garbled and displayed incorrectly in e-mail client (Thunderbird).

<pre class="highlight">

See full e-mail, which is sent, is below:

From - Sat Aug 9 14:17:45 2008 X-Account-Key: account2 X-UIDL: BNb"!%58"!UQ]!!J>'"! X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000 Return-Path: xxx@xxx.org X-Original-To: yyy@xxx.org Delivered-To: yyy@xxx.org Received: from xxx.xxx.org (localhost [127.0.0.1]) by localhost (Postfix) with ESMTP id B6B67501847 for yyy@xxx.org; Sat, 9 Aug 2008 14:14:08 +0200 (CEST) To: yyy@xxx.org Subject: =?UTF-8?Q?=E6=9C=BA=E5=99=A8=E8=A7=86=E8=A7=89=E7=BB=84=E4=BB=B6=E7=94=9F=E4=BA=A7= =E5=95=86?= From: "" info@example.org Date: Sat, 09 Aug 2008 14:27:55 +0200 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: 20080809122755.C0CAB8C056@xxx.xxx.org

=E6=9C=BA=E5=99=A8=E8=A7=86=E8=A7=89=E7=BB=84=E4=BB=B6=E7=94=9F=E4=BA=A7=E5= =95=86 =E6=9C=BA=E5=99=A8=E8=A7=86=E8=A7=89=E7=BB=84=E4=BB=B6=E7=94=9F=E4= =BA=A7=E5=95=86

<pre class="highlight">

It seems that the method Zend_Mail::_encodeHeader() is causing the problem. 

E-Mail with a Chinese subject is sent correctly, when the following code is used for the method:

protected function _encodeHeader($value) { if (Zend_Mime::isPrintable($value)) { return $value; } else { return '=?' . $this->_charset . '?B?' . base64_encode($value) . '?='; } } ```

Jacky Chen posted this solution on Fri, 28 Dec 2007 17:21:26 -0800 at:

http://mail-archive.com/fw-general@lists.zend.com/…

Can his version of Zend_Mail::_encodeHeader() be used to update the official version?

In the meantime, I am subclassing Zend_Mail, overloading Zend_Mail::_encodeHeader() with his version, as I need to send e-mail with Chinese subjects and *also* From and To addresses - Zend_Mail::_encodeHeader() is used in multiple methods.

Thank you for your kind consideration of this change.

Jonathan Maron

Comments

Posted by Tomas Markauskas (tamole) on 2008-08-09T15:05:34.000+0000

This solution would work only for short headers, because the RFC does not allow the encoded words in the headers to be longer than 75 characters. The string has to be split to avoid problems with multibyte characters and therefore has to be split between those characters. I had the same problem and now I have a solution which fixes it for utf-8 string (other charsets later).

I will post a patch in the next few days. It should also close a few other tickets related to Zend_Mail/Zend_Mime.

Posted by Jonathan Maron (jonathan_maron) on 2008-08-09T21:58:05.000+0000

Hello Thomas

Thank you very much for your prompt answer.

I very much look forward to seeing your patch.

Would it be possible for you to send me a copy already?

TIA

Jonathan Maron

Posted by Tomas Markauskas (tamole) on 2008-08-13T13:42:08.000+0000

Patch for Zend_Mail/Zend_Mime (@10901)

This fixes the described problem (and a few more).

Posted by Tomas Markauskas (tamole) on 2008-08-13T13:51:24.000+0000

What was the problem:

From http://tools.ietf.org/html/rfc2047#section-2

An encoded-word may not be more than 75 characters long, including charset, encoding, encoded-text, and delimiters. If it is desirable to encode more text than will fit in an encoded-word of 75 characters, multiple encoded-words (separated by CRLF SPACE) may be used.

While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more encoded-words is limited to 76 characters.

From http://tools.ietf.org/html/rfc2047#section-5

The 'encoded-text' in an 'encoded-word' must be self-contained; 'encoded-text' MUST NOT be continued from one 'encoded-word' to another. This implies that the 'encoded-text' portion of a "B" 'encoded-word' will be a multiple of 4 characters long; for a "Q" 'encoded-word', any "=" character that appears in the 'encoded-text' portion will be followed by two hexadecimal characters.

Each 'encoded-word' MUST encode an integral number of octets. The 'encoded-text' in each 'encoded-word' must be well-formed according to the encoding specified; the 'encoded-text' may not be continued in the next 'encoded-word'. (For example, "=?charset?Q?=?= =?charset?Q?AB?=" would be illegal, because the two hex digits "AB" must follow the "=" in the same 'encoded-word'.)

Each 'encoded-word' MUST represent an integral number of characters. A multi-octet character may not be split across adjacent 'encoded- word's.

There is still one problem with my patch: the first line of the header will probably exceed the length of 76 characters because we don't count the length of the header name. Zend_Mime::encodeQuotedPrintableHeader should get the length of the header name, so it could reduce the length of the first "encoded-word". I haven't seen any cases where this would cause big problems, so I haven't resolved it.

Posted by Tomas Markauskas (tamole) on 2008-08-16T04:57:15.000+0000

It would probably fix these two issues too.

Posted by Niko Sams (nikosams) on 2008-09-02T01:47:12.000+0000

Patch worked out fine for me. Thanks.

Posted by Tomas Markauskas (tamole) on 2008-09-02T10:07:33.000+0000

All these issues seem to be related.

Posted by Tomáš Fejfar (tomas.fejfar@gmail.com) on 2008-09-11T10:24:54.000+0000

Someone should fix this, it's critical bug for everyone usin non-ascii charset...

Posted by Tomas Markauskas (tamole) on 2008-09-17T05:53:11.000+0000

I would like to know if there's any need for other character sets than UTF-8. Another option to deal with multibyte characters would be to convert all multibyte character sets to UTF-8 in encodeQuotedPrintableHeader (see attached patch).

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts