Issues

ZF-2098: Problem with Content-Length value using raw POST data with mbstring extension

Description

When using Zend_Http_Client to POST raw data, the strlen() function is called to determine the value for the Content-length header. However, when using the mbstring extension and overriding the string functions (mbstring.func_overload = 4), the strlen() call (which is overridden using mb_strlen()) must be made with the second argument of "ASCII". With string functions overridden by mbstring, the byte count of the raw POST data is less than its actual size as multibyte characters will be counted a single character.

The patch below corrects the problem by performing an ini_get() and checking the value to see if the bit for string function overriding is set. If it is, the Content-length will be set accordingly by calling $this->setHeaders('content-length', strlen($this->raw_post_data, 'ASCII'));.

This patch was generated against trunk as of 10/24/2007.


--- library/Zend/Http/Client.php    (revision 6682)
+++ library/Zend/Http/Client.php    (working copy)
@@ -913,7 +913,15 @@

         // If we have raw_post_data set, just use it as the body.
         if (isset($this->raw_post_data)) {
-            $this->setHeaders('Content-length', strlen($this->raw_post_data));
+            // If user is not overriding string functions with mbstring, the
+            // call to strlen() with the data is sufficient.  Otherwise,
+            // ensure that we have the ASCII byte count for the POST data.
+            if (!ini_get('mbstring.func_overload') & 4) {
+                $this->setHeaders('content-length', strlen($this->raw_post_data));
+            } else {
+                $this->setHeaders('content-length', strlen($this->raw_post_data, 'ASCII'));
+             }
+
             return $this->raw_post_data;
         }

Comments

Please categorize/fix as needed.

First: the bitmask is 2 for str functions not 4.

Second, I cannot reproduce this issue, either with a setting in php.ini oder with ini_set. my testcase looks as following:

This is added into Zend_Http_Client_TestAdapterTest.

    /**
     * Tests that overriding strlen with mb_strlen via php.ini option is recognized.
     *
     * @group ZF-2098
     */
    public function testStrlenFunctionOverrideByMbstringIsRecognized()
    {
        $client = new Zend_Http_Client('http://example.com', array('adapter' => $this->adapter));
        $string = "asdföäü€";

        $previousSetting = ini_get("mbstring.func_overload");

        ini_set("mbstring.func_overload", 0);
        $client->setRawData($string);
        $client->request("POST");

        echo $client->getLastRequest();

        ini_set("mbstring.func_overload", 2);
        $client->setRawData($string);
        $client->request("POST");

        echo $client->getLastRequest();
    }

any additional feedback or a fixed testsuite?

A thread was started on the forums about this bug about a year ago:

http://nabble.com/Utility-functions:-Calculating-b…

I'm looking to try and reproduce it again. The particular use case I was running into was when POSTing the raw contents of a binary file to a virus scanning HTTP service.

In that case, if the file was, say, 500 bytes, the Content-Length header was being set to less than 500 bytes because certain multi-byte characters were being counted as 1 byte instead of 2.

I'm taking a look into this now to try to reproduce.

I was able to reproduce this using an older version of Zend Framework and 1.6.2 on a RHEL4 machine.

My file is 19,620 bytes on disk. $string is the value of file_get_contents() for the file on disk. In this case, I've got mbstring.func_overload set to 6.

print "ASCII strlen = " . strlen($string) . "\n"; print "mb strlen = " . mb_strlen($string) . "\n"; print "mb strlen ASCII = " . mb_strlen($string, 'ASCII') . "\n";

...yields:

ASCII strlen = 13389 mb strlen = 13389 mb strlen ASCII = 19620

...which means that Content-Length is being set to 13,389 bytes in the HTTP request:

POST / HTTP/1.1 Host: localhost Connection: close Accept-encoding: gzip, deflate Content-Type: application/x-www-form-urlencoded User-Agent: Zend_Http_Client Content-Length: 13389

This is using PHP 5.2.2. Note that I was NOT able to reproduce this on my Mac running Leopard and PHP 5.2.6, which is surprising.

Hope this helps! Sorry, though, it's been quite some time since I've looked into this issue.

I was able to reproduce and create some tests for this, and it seem to depend on the current locale. I will try to share some tests and perhaps some fix code later this week.

Surprisingly, I actually fixed this in rev. 17041.

I set fix version. I find this at SVN r17118 in 1.9 branch.