ZF-3912: Incorrect encoding of 'subject' in Chinese e-mail, sent with Zend_Mail

Description

With the following code, I wish to send an e-mail with Chinese in the subject:

// ZendFramework-1.5.2

$mail = new Zend_Mail('UTF-8');

$mail->setFrom('info@example.org');
$mail->addTo('sam@example.org');
$mail->setSubject('机器视觉组件生产商');
$mail->setBodyText('机器视觉组件生产商 机器视觉组件生产商');
$mail->send();

However, the subject of the sent e-mail is garbled and displayed incorrectly in e-mail client (Thunderbird).



See full e-mail, which is sent, is below:

From - Sat Aug 9 14:17:45 2008 X-Account-Key: account2 X-UIDL: BNb"!%58"!UQ]!!J>'"! X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000 Return-Path: xxx@xxx.org X-Original-To: yyy@xxx.org Delivered-To: yyy@xxx.org Received: from xxx.xxx.org (localhost [127.0.0.1]) by localhost (Postfix) with ESMTP id B6B67501847 for yyy@xxx.org; Sat, 9 Aug 2008 14:14:08 +0200 (CEST) To: yyy@xxx.org Subject: =?UTF-8?Q?=E6=9C=BA=E5=99=A8=E8=A7=86=E8=A7=89=E7=BB=84=E4=BB=B6=E7=94=9F=E4=BA=A7= =E5=95=86?= From: "" info@example.org Date: Sat, 09 Aug 2008 14:27:55 +0200 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: 20080809122755.C0CAB8C056@xxx.xxx.org

=E6=9C=BA=E5=99=A8=E8=A7=86=E8=A7=89=E7=BB=84=E4=BB=B6=E7=94=9F=E4=BA=A7=E5= =95=86 =E6=9C=BA=E5=99=A8=E8=A7=86=E8=A7=89=E7=BB=84=E4=BB=B6=E7=94=9F=E4= =BA=A7=E5=95=86



It seems that the method Zend_Mail::_encodeHeader() is causing the problem. 

E-Mail with a Chinese subject is sent correctly, when the following code is used for the method:

protected function _encodeHeader($value) { if (Zend_Mime::isPrintable($value)) { return $value; } else { return '=?' . $this->_charset . '?B?' . base64_encode($value) . '?='; } } ```

Jacky Chen posted this solution on Fri, 28 Dec 2007 17:21:26 -0800 at:

http://mail-archive.com/fw-general@lists.zend.com/…

Can his version of Zend_Mail::_encodeHeader() be used to update the official version?

In the meantime, I am subclassing Zend_Mail, overloading Zend_Mail::_encodeHeader() with his version, as I need to send e-mail with Chinese subjects and *also* From and To addresses - Zend_Mail::_encodeHeader() is used in multiple methods.

Thank you for your kind consideration of this change.

Jonathan Maron

Comments

This solution would work only for short headers, because the RFC does not allow the encoded words in the headers to be longer than 75 characters. The string has to be split to avoid problems with multibyte characters and therefore has to be split between those characters. I had the same problem and now I have a solution which fixes it for utf-8 string (other charsets later).

I will post a patch in the next few days. It should also close a few other tickets related to Zend_Mail/Zend_Mime.

Hello Thomas

Thank you very much for your prompt answer.

I very much look forward to seeing your patch.

Would it be possible for you to send me a copy already?

TIA

Jonathan Maron

Patch for Zend_Mail/Zend_Mime (@10901)

This fixes the described problem (and a few more).

What was the problem:

From http://tools.ietf.org/html/rfc2047#section-2

An encoded-word may not be more than 75 characters long, including charset, encoding, encoded-text, and delimiters. If it is desirable to encode more text than will fit in an encoded-word of 75 characters, multiple encoded-words (separated by CRLF SPACE) may be used.

While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more encoded-words is limited to 76 characters.

From http://tools.ietf.org/html/rfc2047#section-5

The 'encoded-text' in an 'encoded-word' must be self-contained; 'encoded-text' MUST NOT be continued from one 'encoded-word' to another. This implies that the 'encoded-text' portion of a "B" 'encoded-word' will be a multiple of 4 characters long; for a "Q" 'encoded-word', any "=" character that appears in the 'encoded-text' portion will be followed by two hexadecimal characters.

Each 'encoded-word' MUST encode an integral number of octets. The 'encoded-text' in each 'encoded-word' must be well-formed according to the encoding specified; the 'encoded-text' may not be continued in the next 'encoded-word'. (For example, "=?charset?Q?=?= =?charset?Q?AB?=" would be illegal, because the two hex digits "AB" must follow the "=" in the same 'encoded-word'.)

Each 'encoded-word' MUST represent an integral number of characters. A multi-octet character may not be split across adjacent 'encoded- word's.

There is still one problem with my patch: the first line of the header will probably exceed the length of 76 characters because we don't count the length of the header name. Zend_Mime::encodeQuotedPrintableHeader should get the length of the header name, so it could reduce the length of the first "encoded-word". I haven't seen any cases where this would cause big problems, so I haven't resolved it.

It would probably fix these two issues too.

Patch worked out fine for me. Thanks.

All these issues seem to be related.

Someone should fix this, it's critical bug for everyone usin non-ascii charset...

I would like to know if there's any need for other character sets than UTF-8. Another option to deal with multibyte characters would be to convert all multibyte character sets to UTF-8 in encodeQuotedPrintableHeader (see attached patch).