Issues

ZF-8103: Massive speedup in Zend_Mime::encodeQuotedPrintable()

Description

There is a possibility to have a massive speedup in Zend_Mime::encodeQuotedPrintable(). If the imap extension in php is enabled, the function imap_8bit() should be used instead of doing it on your own. Here's a little script showing the speedup.


$client = new Zend_Http_Client("http://de.wikipedia.org/w/index.php?title=PHP&printable=yes");

$string = $client->request()->getBody();

echo "Starting encoding with imap_8bit() (5000 times)\n";
$startTime = time();
for ($i = 0; $i < 5000; $i++){
   imap_8bit($string);
}
$durationImap_8bit = time()-$startTime;
echo "Duration: " . $durationImap_8bit . " seconds\n";
echo "Starting encoding with Zend_Mime::encodeQuotedPrintable() (5000 times)\n";
$startTime = time();
for ($i = 0; $i < 5000; $i++){
   Zend_Mime::encodeQuotedPrintable($string, Zend_Mime::LINELENGTH, Zend_Mime::LINEEND);
}
$durationZendMime = time()-$startTime;
echo "Duration: " . $durationZendMime . " seconds\n\n";

echo "imap_8bit() is " . ($durationZendMime-$durationImap_8bit) . " seconds faster than Zend_Mime!\n";
echo "It only needs " . ceil((100/$durationZendMime)*$durationImap_8bit) . "% of time";

OUTPUT:

Starting encoding with imap_8bit() (5000 times) Duration: 7 seconds Starting encoding with Zend_Mime::encodeQuotedPrintable() (5000 times) Duration: 273 seconds

imap_8bit() is 266 seconds faster than Zend_Mime! It only needs 3% of time

Possible changes for Zend_Mime::encode():


public static function encode($str, $encoding, $EOL = self::LINEEND)
    {
        switch ($encoding) {
            case self::ENCODING_BASE64:
                return self::encodeBase64($str, self::LINELENGTH, $EOL);

            case self::ENCODING_QUOTEDPRINTABLE:
                if (function_exists('imap_8bit')){
               return imap_8bit($str);
                }
                return self::encodeQuotedPrintable($str, self::LINELENGTH, $EOL);

            default:
                /**
                 * @todo 7Bit and 8Bit is currently handled the same way.
                 */
                return $str;
        }
    }

Greetings, Jonas

Comments

Your proposition is very interesting, Jonas.

But I worry whether we would do against rule about extensions if we use IMAP extension. The rule was written in PHP Coding Standard (draft).

cf:

http://framework.zend.com/wiki/display/…

The page says, ??Zend Framework components may use an extension only if that extension is enabled by default in the PHP.net binary distribution supported by the current release of Zend Framework.??

Therefore, I think the exptension is not available at present unfortunately. :-(

If the extension is not available, it is not used. There is no problem I think.

I have the same "problem" with the bad performance...

What about Zend_Db_Adapter_Sqlsrv? This class even uses a 3rd party extension (sqlsrv.dll from Microsoft Corp.) which is currently only available on Win32.

The Coding Standard should not be used in this case because: 1. it's a draft 2. there are other classes that don't follow it 3. the Zend_Mime class will work without the extension enabled furthermore

An extension, which is not in the default build may be used as a soft dependency AFAIK. BTW adapters are a bad example, because they invert the dependency (you use an adapter, because you have database X).

As long as imap/c-client doesn't have problems, like different output than build-in methods or platform specific issues, it could be used if the benefit is big enough. The given example doesn't really test a real life issue if the page content is not fetched inside of the loop, so the benefit hasn't been measured yet.

Hi, all ;)

I will be happy if you give me more several weeks for my setting up Unit tests.

The tests should be able to check PHP script when both imap extension exists or not, I think.

Hi Satoru,

can you tell me something about the progress on this issue? I would be very happy if I don't have to change code everytime I update the framework ;)

Greetings

There are new functions available since PHP 5.3: quoted_printable_encode & quoted_printable_decode

php.net: - encode Returns a quoted printable string created according to » RFC2045, section 6.7. This function is similar to imap_8bit(), except this one does not require the IMAP module to work.

  • decode: This function returns an 8-bit binary string corresponding to the decoded quoted printable string (according to » RFC2045, section 6.7, not » RFC2821, section 4.5.2, so additional periods are not stripped from the beginning of line). This function is similar to imap_qprint(), except this one does not require the IMAP module to work.

Sorry, I have been inactive since last April.

Any news on this? Or should I ask: Will it ever be fixed?

Hello?!? Could anyone please be so kind and put the changes in the trunk? Greets

The original patch could use a couple of improvements before going into trunk. First, the optimization should be applied in Zend_Mime::_encodeQuotedPrintable() because this makes it available for calls to Zend_Mime::encodeQuotedPrintable() as well as ::encode().

Also, Marc Bennewitz noted, PHP5.3 now includes quoted_printable_encode(). I have benchmarked this and it is even faster than using the IMAP extension. So, IMO the ideal solution here is to test for quoted_printable_encode(), then test for imap_8bit(), then fall back to the existing behavior. Speed tests from the benchmark (with 5000 iterations):

imap_8bit() is 145.0137 seconds faster than Zend_Mime!
It is 42.36 times faster

quoted_printable_encode() is 145.3856 seconds faster than Zend_Mime!
It is 47.39 times faster

Attaching two files:

Revised benchmark script which includes testing of quoted_printable_encode(), and

Revised patch which:

Moves the fix to Zend_Mime::_encodeQuotedPrintable()

Modifies the behavior to try quoted_printable_encode(), then fall back to imap_8bit(), then fall back to existing internal encoding behavior.

Adds a note to the documentation about how enabling the IMAP extension on PHP < 5.3 will result in a speedup.

Adds a test to tests/Zend/MimeTest.php to ensure that calls optimized to these faster PHP functions as well as the internal ability of Zend_Mime to handle quoted-printable encoding are tested no matter what platform or extensions are active when the tests are run.

If someone could test this patch by running tests/Zend/MimeTest on PHP5.2 that would be great as I've only tested this on PHP5.3.

Targeting for next mini release since the patch is backwards-compatible and fixes a performance bottleneck.

I've noticed a needed modification in this patch. Instead of directly returning the result of imap_8bit() or quoted_printable_encode() you have to replace the lineendings "\r\n" by "$EOL, which defaults to "\n". Without this modification the lineends in the mail are "\r\r\n". When sending this mail with a SMTP-Server that tries to repair broken lineends (e.g. PowerMTA) the lineends are now "\r\n\r\n". This leads to misformatted (HTML-)mails. So the function should look like this

private static function _encodeQuotedPrintable($str) { // PHP or extension functions are much faster, use if available if (function_exists('quoted_printable_encode')) { return str_replace("\r\n", $EOL, quoted_printable_encode($str)); } elseif (function_exists('imap_8bit')) { return str_replace("\r\n", $EOL, imap_8bit($str)); } else { return self::_encodeQuotedPrintableInternal($str); } }

Greets, Jonas

@Jonas: Good catch. Hadn't noticed that the PHP methods do the line breaks for you. This was at odds with how Zend_Mime allows specifying custom lineLength and lineBreak, so I refactored the patch a little more to support it.

Also added a few additional tests that should have been in there to confirm lineLength and lineBreak behavior.

Revised patch is attached.