Issues

ZF-1688: Long header lines containing non-printable characters are corrupted

Description

The following test case generates a mail with a non-readable subject line:


<?php
require_once('Zend/Mail.php');
require_once('Zend/Mail/Transport/Smtp.php');
$mail = new Zend_Mail();
$mail->setSubject("Das ist eine Nachricht mit deutschen Ümläuten im Betreff und einer absichtlich extralangen Betreffzeile, statt dessen ohne Text");
$mail->setBodyText('no text');
$mail->addTo('recipient@domain.com');
$mail->send(new Zend_Mail_Transport_Smtp());
?>

=?iso-8859-1?Q?Das=20ist=20eine=20Nachricht=20mit=20deutschen=20=DCml=E4uten=20im=20Betreff=20und=20einer=20abs=?ichtlich=20extralangen=20Betreffzeile,=20statt=20dessen=20ohne=20Text?=

The reason of this bad behaviour is easily locateable: {{Zend_Mail::addSubject()}} first replaces line breaks by question marks, then {{_encodeHeader()}} wraps lines longer than 74 chars *only if they contain non-printable characters*, and afterwards {{_storeHeader()}} again replaces line breaks by question marks.

The problem applies to any type of header, not just the subject line.

Comments

After digging a bit, I'd like to suggest having a look into Pear::Mail_Mime, and how RFC2047 conformance is ensured there. Using the same method for encoding header lines and body parts might not be sufficient since the requirements are different.

Assigning to [~nico] to initiate issue review.

I have such issue with long russian (non-printable) subjects too. Is there any progress on fixing this bug? Hope, it will be fixed in 1.0.1.

Postponed because encoding will (most likely) be changed to iconv_mime_encode(), which should also fix this issue.

same fix

Please look at my attached mail. The break in header causes (compare X-Original-Subject with Subject) that mails with cyrillic characters can´t be decoded and will moved to junk.

Whats the current status of this bugfix? It seems that this error still occurs in ZF 1.5. Long subject headlines will break the email encoding.

I am unsure if i should create a new issue or not.

http://nabble.com/ZF-1.5---Zend_Mail-and-long-subj…

I can confirm the same problem. Examining the headers, it appears that an extra newline is inserted into the header when the line length exceeds the length to which ZF formats the message's horizontal distance.

A year has passed by since the fix was postponed. Now ZF 1.6 is out, and guess what? Still no fix. For non-English people this issue is a blocker for using the Zend_Mail class (and most probably the Zend Framework) at all.

I have to agree at some points with Willy, this bugreport is a year old, one with the most votes and a big deal breaker for most non-English users. Pointing out the bug in the #zftalk.dev channel doesnt seem to have changed anything.

Somehow this is disappointing.

Yes it is indeed disappointing, not only to users who are non-English, but moreso to people who develop with ZF, that want to use Zend_Mail in a product that targets non-English audiences. If I were to commit some time to fixing this, are we able to submit patches to the ZF project?

Nico, what is your current take on this issue? It has a lot of votes, so I think we should address it ASAP.

,Wil

Still nothing ...

There is some kind comical relief in this though :) We seemingly can't 'communicate' with the guy who wrote Zend_Mail! :)

Seems so, lets see if i can bring up the time to fix this, but i do not think so.

We could write our own! Zend_Mail_More! I think though, that a proper solution would lie in separating headers from the body before any kind of decoding takes place. I offered to dedicate some company time to fixing this (we're currently using imap as a fallback to solve this) since a contained solution would be nice; I just want some guarantee that it will not be work-wasted. Our time, as is everyone's, is expensive.

Some reliable alternatives then:

  1. the very old html_mime_mail class found here: http://www.phpguru.org/static/mime.mail.html

  2. The PHP IMAP implementations. http://php.net/IMAP

  3. MailParse http://pear.php.net/package/mailparse

Zend Mail is simply not fit for production with this fatal flaw

I am starting to notice a pattern with bugs I encounter in ZF: - bug is already reported a year ago - various solutions are proposed in the bug report - a lot of people are suffering - new ZF releases add all sorts of candy, but fail to address the open bugs

Hello,

If this can help some of you, for my part I set up the following solution:

<?php

class My_Mail extends Zend_Mail {

public function __construct() {
    parent::__construct();
}

/**
 * Sets the subject of the message
 *
 * @param   string    $subject
 * @return  Zend_Mail Provides fluent interface
 * @throws  Zend_Mail_Exception
 */
public function setSubject($subject)
{
    Zend_Debug::dump($subject, 'sujet 1');
    if ($this->_subject === null) {
        $subject = strtr($subject,"\r\n\t",'???');
        $this->_subject = $subject;
        $this->_storeHeader('Subject', $this->_subject);
    } else {
        /**
         * @see Zend_Mail_Exception
         */
        require_once 'Zend/Mail/Exception.php';
        throw new Zend_Mail_Exception('Subject set twice');
    }
    return $this;
}    

}

I know that this code is vulnerable in security but the subjects are hard-written within the application. Do not use this if your subject is set by user in form.

It works for me waiting for a better solution.

Bests Regards,

I'm sorry the code is bad formatted, I try again with 'code' option.


<?php

class My_Mail extends Zend_Mail {
    
    public function __construct() {
        parent::__construct();
    }
    
    /**
     * Sets the subject of the message
     *
     * @param   string    $subject
     * @return  Zend_Mail Provides fluent interface
     * @throws  Zend_Mail_Exception
     */
    public function setSubject($subject)
    {
        if ($this->_subject === null) {
            $subject = strtr($subject,"\r\n\t",'???');
            $this->_subject = $subject;
            $this->_storeHeader('Subject', $this->_subject);
        } else {
            /**
             * @see Zend_Mail_Exception
             */
            require_once 'Zend/Mail/Exception.php';
            throw new Zend_Mail_Exception('Subject set twice');
        }
        return $this;
    }    
}

Bests regards

From RFC 1522:


While there is no limit to the length of a multiple-line header
field, each line of a header field that contains one or more
encoded-words is limited to 76 characters.

Here's my take on this. Tested and seems to work fine. I had some problems when the subject only contains Umlauts (such as a string with the length of 128 chars consisting of only "äüöß"), which threw an iconv-error (error-code 7), but AFAIK this is reated to a bug in the iconv-extension.


    protected function _encodeHeader($value)
    {
        if (Zend_Mime::isPrintable($value)) {
            return $value;
        } else {

            $mimePrefs = array(
                'scheme'           => 'Q';
                'input-charset'    => $this->_charset;
                'output-charset'   => $this->_charset;
                'line-length'      => 74;
                'line-break-chars' => "\n";
            );

            $value = iconv_mime_encode('DUMMY', $value, $mimePrefs);
            $value = preg_replace("#^DUMMY\:\ #", "", $value);

            return $value;
        }
    }

I hope this helps you guys until this bug is officially fixed.

You might want to replace the semicolons ( ; ) with commas ( , )


            $mimePrefs = array(
                'scheme'           => 'Q',
                'input-charset'    => $this->_charset,
                'output-charset'   => $this->_charset,
                'line-length'      => 74,
                'line-break-chars' => "\n"
            );

Thanks a lot Thorsten, it works fine. ;)

The fix works great. Why isn't this patch done in release 1.7?? Cost me 2 freaking hours to find this page and solve my problem.

And why bug-hunt day doesn't touch this? :(

//sorry for my english

Unfortunatelly there is also a bug in iconv_mime_encode function, which makes the fix of Thorsten Suckow-Homberg not reliable :(

http://bugs.php.net/bug.php?id=43314

and +1 for

"I am starting to notice a pattern with bugs I encounter in ZF: bug is already reported a year ago various solutions are proposed in the bug report a lot of people are suffering new ZF releases add all sorts of candy, but fail to address the open bugs "

Okay, so i decided to investigated the issue and found the problem. Its not Zend_Mail alone but also Zend_Mime and the problem consists of two bugs. :)

It is a bit complicated and not so easy to explain so this became a fairly long read. Also please feel free to correct me or ask questions if something is incorret or unclear!

Ill start with the RFC definitions that will be usede for referencing when pointing out the issues and that help to understand the problem a bit better. And just for clarification a encoded header-value =?ISO-8859-1?Q?aaaaaa?= is referred as encoded-word. The string "aaaaaa" is referred as encoded-text and "=?ISO-8859-1?Q?" and "?=" as delimiters, charset and encoding, i refer to this three as DCE.

RFC definitions:

The key and value needs to be composed of printable US-ASCII characters, to accomplish this you have to encode the header value either with quoted-printable or base64.

An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used.

While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters. It doesnt have to be exactly 76 chars long but should be about that length, some chars more or less dont hurt.

The 'encoded-text' may not be continued in the next 'encoded-word'.

Each value MUST represent an integral number of characters. A multi-octet character may not be split across adjacent 'encoded-word's.

Phew, in words that everyone understand this means:

Take the Header-line, encode its value.

Calculate how long a line would be when it includes the current DCE.

Split the encoded-value to chunks with a newline at the calculated length WITHOUT breaking an encoded-char.

Place the DCE at the start and end of every line.

???

PROFIT!

Lets take the following example subject: This is än germän multiline sübject with randöm ümläuts.

After doing all four steps from above the header value should look like this (i used the method mb_encode_mimeheader for this result): This is =?UTF-8?Q?=C3=A4n=20germ=C3=A4n=20multiline=20s=C3=BCbject=20with?= =?UTF-8?Q?=20rand=C3=B6m=20=C3=BCml=C3=A4uts=2E?=

This is how it looks like after the string has been encoded with Zend_Mime::encodeQuotedPrintable, notice the length and the missing space at the start of the second line which is required by point 2 of the definition list. The string gets splitted into to big chunks because the method does not take into account the DCE length. This is =C3=A4n germ=C3=A4n multiline s=C3=BCbject with rand=C3=B6m =C3=BC= ml=C3=A4uts.

And this is the final result after Zend_Mail::_encodeHeader has done its job. =?UTF-8?Q?This=20is=20=C3=A4n=20germ=C3=A4n=20multiline=20s=C3=BCbject=20with=20rand=C3=B6m=20=C3=BC= ml=C3=A4uts.?=

Besides the not so tragical error of the length, you will see that the DCE is missing at the end of the first line (?=) and the start of the second ( =?UTF-8?Q?)! As also stated above all encoded-words need to be incased by the delimiters, charset and encoding syntax, this also means encoded-words on new lines! _encodeHeader simply concats the corresponding DCE at the start and end of the string. This would be the first major problem. A not-so-good solution would be to replace every newline char with the DCE and a newline to get the desired result. When using an iso charset this could work but not with utf8 or so called multibytechars and here comes the second issue into play.

When looking at the Zend_Mime code you will see that the code for splitting only checks if it found an equalsign, if yes it starts a newline. Now take a look at the first umlaut we have, the "ä" that got encoded into "=C3=A4" because it consists of two bytes, now imagine that umlaut would start exactly at the 73 position of a string, Zend_Mime would break the char apart ....=C3= \n A4 and thus breaking our encoded string, resulting in a borked email. This is the second major issue because point 5 from above states that a multi-octet char may not be splited accross adjacent encoded-words.

Lets recap the errors:

When an encoded value gets splitted onto mulitple lines the DCE params are missing.

Zend_Mime::encodeQuotedPrintable is not multibyte compatible.

Zend_Mime is not build for encoding email headers!

It would take alof of effort to fix both the issues but i think its possible. It definitively would need an own encoding method which is build for mail header encoding.

A pass-by solution, as long as the iconv method is broken, would be to use the much better mbstring methods. Please note that you need to explicitly activate the mbstring extension! This is the replace method we use at work, right now:


protected function _encodeHeader($value)
{
    if (!Zend_Mime::isPrintable($value)) {
        $value = mb_encode_mimeheader($value, $this->_charset, 'Q', "\n", 74);
    }
    
    return $value;
}

To be honest i encourage everyone to use the above method because there can be other issues with Zend_Mime and the way email headers get encoded. Also note that Zend_Mime works perfectly fine for the body encoding!

I really hope i explained the issue so we can start to discuss further steps.

Awww F*CK, why cant you edit your own posts?

The above posted version is not the one i wanted to post, it includes some spelling errors and stuff. So please bear with me.

sigh I also noticed the stupid replacement of newlines and tabs in most of the set* methods with question marks. I would suggest to replace the three question marks at the start of _encodeHeader with one space.

Oh and the RFCs i used for everyone who is interested into them http://tools.ietf.org/html/rfc5322 http://tools.ietf.org/html/rfc2047

mb_encode_mimeheader works fine with Ota's example. I also need a base64 encodingScheme so the Zend_Mime should be something like this.

iconv_mime_encode trick by Thorsten also works, but it will return the shorter lines since it also counts 'DUMMY:' as the line length.

I don't think there is a good solution with Zend_Mime::encode, since the '?', ' ', '_' replacement will overflow the line length. We need to extend Zend_Mime::encode.

I hope the following code can help you.


protected static $_encodingScheme = Zend_Mime::ENCODING_QUOTEDPRINTABLE;

public static function setDefaultEncodingScheme($encodingScheme)
{
    // should check here if quotedprintable or base64?
    self::$_encodingScheme = $encodingScheme;
}
public static function getDefaultEncodingScheme()
{
    return self::$_encodingScheme;
}

protected function _encodeHeader($value)
{
  if (Zend_Mime::isPrintable($value)) {
      return $value;
  } else {
      $encodingScheme = self::getDefaultEncodingScheme();
      $schemeChar = $encodingScheme == Zend_Mime::ENCODING_BASE64 ? 'B' : 'Q';
      
      if (function_exists('mb_encode_mimeheader')) {
          $encodedValue = mb_encode_mimeheader($s, $this->_charset, $schemeChar);
      } else if (function_exists('iconv_mime_encode')) {
          // shorter but no problem
          $s = iconv_mime_encode('X', $s, array(
              'scheme'           => $schemeChar,
              'output-charset'   => $this->_charset,
              'line-break-chars' => "\r\n ",
          ));
          $s = preg_replace("#^X\:\ #", "", $s);
      } else {
          $prefix = '=?' . $this->_charset . '?' . $schemeChar . '?';
          $suffix = '?=';
          $lineLength = Zend_Mime::LINELENGTH - strlen($prefix) - strlen($suffix);
          $lineEndChar = Zend_Mime::LINEEND;
          if ($encodingScheme == Zend_Mime::ENCODING_BASE64) {
              $encodedValue = Zend_Mime::encodeBase64($value, $encodeScheme, $lineLength, $lineEndChar);
          } else ($encodeScheme == Zend_Mime::ENCODING_QUOTEDPRINTABLE) {
              $encodedValue = Zend_Mime::encodeQuotedPrintable($value, $encodeScheme, $lineLength, $lineEndChar);
              // FIXME This replacement may exceeds the max linelength!  Maybe Zend_Mime::encode should have the additional parameters to set the replace character set
              $quotedValue = str_replace(array('?', ' ', '_'), array('=3F', '=20', '=5F'), $quotedValue);
          } else {
              throw new Zend_Mail_Exception('Unsupported encoding scheme "' . $encodingScheme '"');
          }
          $quotedValue = str_replace($lineEndChar, $suffix . "\r\n " . $prefix, $quotedValue);
          $encodedValue = $prefix . $quotedValue . $suffix;
      }
      
      return $encodedValue;
  }
}

Hello, Ota Mares.

mb_encode_mimeheader needs mbstring extension is loaded.

I usually use the extension for treating Japanese. But I do not know Europians and other peoples use the extension.

Do you know? If there is no problem that the mbstring extension will be required, I will try to fix this issue.

Twk's method would be a nice solution if there wouldnt be the problems with the iconv or zend_mime method. (Btw. why the hell do you people use preg_replace for removing the first x chars, use substr instead.)

Satoru, i cannot speak for all europeans :) but most hosting servers have it activated and still its pretty easy to install. Anyway by default the extension is not loaded!

Currently we are in a kind of dilemma, because iconv_mime_encode has a serious bug that results in broken mime headers if a unspecified string length is reached. See the link above from Andrei Nikolov, i wouldnt use that method until its fixed. Zend_Mime is not build for email header encoding and the mbstring extension needs to be present.

To create a fix that would be extension indepentend you need to rewrite parts of zend_mail mainly removing the, to me unexplainable, replacement of newlines with question marks (i would change that anyway) and as stated above create a new encoding method for email headers. The second part would be the hardest, mainly because you need to study all the rfc documents and need to apply the rules for email mime headers.

Solved in SVN r13598

I will be happy if you evaluate this fix. I do not copy to release-1.7 branch now.

Additional change in SVN r13602.

Adds _filterOther() function instead of strtr("\r\n\t", '???') .

I will evaluate the fix during the course of this day.

Overall i like the changes, you also fixed some other bugs i wanted to report/look into. :)

Still i suggest to temporarly remove the iconv and zend_mail code parts and make zend_mail mbstring dependant until the specific problems/bugs are fixed. The iconv method has a very frustrating bug that can cause more harm then good and zend_mime will break the encoded string when utf8 is used, both problems where stated more then once in the comments.

I found out that you should set the internal encoding of mb before using the mb_encode_mimeheader method, else you will get some weird results from the method. (This is also stated in the PHP documentation). And i interpreted the documentation wrong, the fourth param "indent" is not the overall linelength but then length of the header key, its optional so i would remove it.

protected function _encodeHeader($value)
{
    if (!Zend_Mime::isPrintable($value)) {
        // pick the encoding to use. quoted-printable or base64
        $encoding = ($this->_encodingOfHeaders === Zend_Mime::ENCODING_QUOTEDPRINTABLE) ? 'Q' : 'B';
        
        // set mb internal encoding, this should be the same encoding as used for mb_encode_mimeheader
        mb_internal_encoding($this->_charset);
        
        $value = mb_encode_mimeheader($value, $this->_charset, $encoding, Zend_Mime::LINEEND);
    }
    
    return $value;
}

Oh and could you please change all "encodingofHeaders" instances to "headerEncoding", the former is just wrong :x

Yoshida-san, thanks for commiting.

Ota, iconv_ issue (http://bugs.php.net/bug.php?id=43314) has a status "No Feedback". It would be a reason the problem iss not fixed for a long time. If you know the reproduceable input/output value pair with the latest php release build, could you please add them there?

mb_internal_encoding is necessary and probably you need to rollback the current value.


$current_internal_encoding = mb_internal_encoding();
mb_internal_encoding($this->_charset);        
$value = mb_encode_mimeheader($value, $this->_charset, $encoding, Zend_Mime::LINEEND);
mb_internal_encoding($current_internal_encoding);

Additional commit in SVN r13612.

Hi, Ota Mares, thank You for evaluating.

At first, I find the 'Q' encoding causes error in the iconv_mime_encode function . So I will check return value of the function.

At second, I add mb_internal_encoding() function.

At third, I remove forth parameter from mb_encode_mimeheader() function.

At last , I change the _encodingOfHeaders to _headerEncoding .

Thanks Your points!

Hi, twk, thank Your proposed logics.

I add as following. $formerEncoding = mb_internal_encoding() .

They help me a lot!

Bug has been reopened as mbstring is not allowed to be used for this fix.

I have therefore added two new functions to Zend_Mime which encode Mime Headers correctly. I have added unitt-tests to both Zend_Mail and Zend_Mime testsuites for this functionality and commited the fixes into the SVN repo.

Before i close this bug again, please re-check that the fixes are working for you please.

Hi , Benjamin. I find the lineLength parameter of encodeQuotedPrintableHeader() takes no effect .

encodeBase64Header is OK, thank you. :-)

Hello Saturo,

i have added a testcase (testLineLengthInQuotedPrintableHeaderEncoding()) to Zend_MimeTest which checks for the lineLength parameter.

Could you add your test-case that produces the failure?

But: There is one case, when no line-break will ocour at the given linelength: When the complete line has only encoded chars (non US-ascii), because in this case the algorithm doesn't know where to cut.

Hi, Benjamin. I attach code for reproduce.

The code contains 4 patterns.

B-1. Zend_Mime::encodeBase64Header Q-1. Zend_Mime::encodeQuotedPrintableHeader B-2. Base64 by mb_encode_mimeheader Q-2. QuotedPrintable by mb_encode_mimeheader

Results of the B-1 and B-2 are almost equal, but the Q-1 differs from the Q-2. The difference is the result of Q-1 contains no linefeeds.

For example, Result of the Q-1: =?UTF-8?Q?=E3=81=84=E3=82=8D=E3=81=AF=E3=81=AB=E3=81=BB=E3=81=B8=E3=81=A8=E3=81=A1=E3=82=8A=E3=81=AC=E3=82=8B=E3=82=92=E3=82=8F=E3=81=8B=E3=82=88=E3=81=9F=E3=82=8C=E3=81=9D=E3=81=A4=E3=81=AD=E3=81=AA=E3=82=89=E3=82=80?=

Result of the Q-2 =?UTF-8?Q?=E3=81=84=E3=82=8D=E3=81=AF=E3=81=AB=E3=81=BB=E3=81=B8=E3=81=A8?= =?UTF-8?Q?=E3=81=A1=E3=82=8A=E3=81=AC=E3=82=8B=E3=82=92=E3=82=8F=E3=81=8B?= =?UTF-8?Q?=E3=82=88=E3=81=9F=E3=82=8C=E3=81=9D=E3=81=A4=E3=81=AD=E3=81=AA?= =?UTF-8?Q?=E3=82=89=E3=82=80?=

I hope it will help for you.

Hi, Benjamin. I take some fix in encodeBase64Header() function in SVN r13625, and I think this issue seems to be solved.

I think I have better to use base64 encoding more than quoted printable encoding in the case of my last comment.

Because If header value contains only multibyte characters, it causes no new line, but the new line may destroy multibyte characters where is acrossed between former line and next line.

you are correct, i guess base64 encoding is the way to go on multibyte characters.

i have made another commit to this issue, breaking only at space and at sentence ending characters such as .!: and so on. This is to support clients that produce a space when words are broken between lines.

Since i got no complaints about the fix i am resolving issue, will be included in next minor release 1.8

I copied to 1.7 branch at SVN r13886.

Sorry, not in 1.7.4. I think it will be released in next minor or major.

This will be in 1.7.5