ZF-7493: ZendAMF serialization slow?

Description

I've compared ZendAMF to AMFPHP, by returning a dataset with 5000+ rows from a table, using the exact same code (both in PHP, and AS3). ZendAMF averages a 16 second return time, whereas AMFPHP returns in 3 seconds. I'm guessing the added time is coming from the process of serializing data to send back to Flash.

Comments

I'd like to confirm this issue. Spent the day trying to chase down why the performance difference was so drastic. With 20000 rows/3 columns: AMFPHP - approx 5 seconds ZendAMF - approx 70 seconds

I was able to narrow it down to the Serializer (serialization of objects) class (AMF3 object / array writing). probably related to the Stream class. Was also able to confirm this was not a "output" issue at the apache/php level to the browser.

I'd like to confirm this issue too: we don't send thousands of rows but we create complex objects with 10 or more arrays, e.g.: class person var vehicles: Array; var groups: Array; var a1: Array; var a2: Array; ... In each array there are also some complex objects, but not many. It takes more than 3 seconds to get an object person from PHP to Flex.

I just got finished working on a project which showed a drastic performance hit when switching from AMFphp to Zend_Amf. Our serialization performance exploded from 2.3 seconds to 25.5 seconds.

I'm going to attach two patches to this issue which brought the serialization time down to roughly the same speed as AMFphp.

Any comments on whether these patches help others would be appreciated.

These two patches were tested on php 5.3 and are against Zend AMF 1.9.6. After applying both the serialization times for our project containing many multi-dimensional arrays decreased from 25.5 seconds to 2.3 seconds which matches the speed of using the older AMFphp serialization. Comments on whether these patches help others would be appreciated.

Amf.noref-writeString.diff:

This patch changes Zend_Amf to always write strings without doing an array_search to see if a string has already been written. This is how AMFphp handled strings and did result in the download size increasing from around 179kb to 287kb for our project, but that's acceptable to us for the increased serialization speed. Perhaps a better approach would be to use some sort of hash for string lookups to decrease the number of array elements array_search must search through.

Amf.data-as-refs.diff:

This patch passes data by reference rather than by value in the serialization routines. This prevents making copies of large data while doing serialization.

I will run this throught the unit tests tonight. If all passes I will submit a patch for the next mini release. Mark thanks for your patch!!

After running through the AllTests.php unit tests from http://framework.zend.com/svn/framework/…, the attached patch provides the same results as testing against Zend_Amf 1.9.6 with the data-as-refs patch applied.

The noref-writeString patch causes 3 errors, but I believe this is due to not using references to some strings (larger payload, much reduced response time).

Zend Framework requires that we php strict mode standards. If we apply this patch I get all kinds of "Only variables should be passed by reference" so I guess we need to jump into this and figure out were we don't need to explicitly pass by reference because php is already passing it that way.

Wade, did you apply patch #3 [Amf.Response_Body_By_Value.diff] when you ran your tests? I'm not seeing any of the "Only variables should be passed by reference" errors when I run the tests with all three patches applied.

Without patch #3 I get many of the passed by reference errors and a summary of "Tests: 160, Assertions: 289, Failures: 2, Errors: 72."

With all three patches I get a summary of "OK (160 tests, 416 assertions)" which matches what I get when run against 1.9.6.

Wade, if you're still getting the "Only variables should be passed by reference" errors with the previous 3 patches, here's an initial try at passing non-objects by reference as the first parameter and php objects (objects, dates, xml, etc) as a new third parameter of the writeTypeMarker serialization methods.

I haven't gotten to do much testing with this latest patch, but it does pass the unit tests and initial testing of our project.

So your suggesting apply both Amf.Response_Body_By_Value.diff and Amf.Combined-NoObjectsByRef.diff patches?

Nope just Amf.Combined-NoObjectsByRef.diff it looks like

Wade, I had some time to do more testing on this over the weekend.

I created a unit test for serializing a large array and after using this test, I believe the only patch that is needed is the one to not reference strings.

My previous patch [Amf.noref-writeString.diff] causes some test failures due to not referencing any strings which some of the tests expect, so here's a new patch based on a recommendation that in_array can be very slow as more elements are added to an array while checking if an array key exists is roughly constant.

Hopefully this much simpler patch will fix this performance issue and it passes the unit tests.

Patches: Amf.perform.ref-writeString.diff : Use the string as the array key and store the reference number as the value for much quicker lookup performance.

Amf.ResponseTest.php.diff : Unit test to make sure the large array serialization time hasn't ballooned by a factor of 10. (Is there a better way of testing the speed other than comparing against a "high enough" number that works on today's hardware?)

largeArrayData.bin : This is simply my test dataset compressed with gzcompress that consists of several large arrays containing almost . It is 624kB in size though, so maybe this isn't acceptable to include for unit testing.

Using the latest zendfw beta and the patch provided on your side, i get the following error using zamfbrowser



There was an error loading the server's info.  Error: (mx.rpc.events::FaultEvent)#0
  bubbles = false
  cancelable = true
  currentTarget = (mx.rpc.remoting.mxml::RemoteObject)#1
    channelSet = (mx.messaging::ChannelSet)#2
      authenticated = false
      channelIds = (Array)#3
        [0] (null)
      channels = (Array)#4
        [0] (mx.messaging.channels::AMFChannel)#5
          authenticated = false
          channelSets = (Array)#6
            [0] (mx.messaging::ChannelSet)#2
          connected = true
          connectTimeout = -1
          enableSmallMessages = true
          endpoint = "http://............../gateway.php"
          failoverURIs = (Array)#7
          id = (null)
          mpiEnabled = false
          netConnection = (flash.net::NetConnection)#8
            client = (mx.messaging.channels::AMFChannel)#5
            connected = false
            maxPeerConnections = 8
            objectEncoding = 3
            proxyType = "none"
            uri = "http://............../gateway.php"
          piggybackingEnabled = false
          polling = false
          pollingEnabled = true
          pollingInterval = 3000
          protocol = "http"
          reconnecting = false
          recordMessageSizes = false
          recordMessageTimes = false
          requestTimeout = -1
          uri = "http://............../gateway.php"
          url = "http://............../gateway.php"
          useSmallMessages = false
      clustered = false
      connected = true
      currentChannel = (mx.messaging.channels::AMFChannel)#5
      initialDestinationId = (null)
      messageAgents = (Array)#9
        [0] (mx.rpc::AsyncRequest)#10
          authenticated = false
          autoConnect = true
          channelSet = (mx.messaging::ChannelSet)#2
          clientId = (null)
          connected = true
          defaultHeaders = (null)
          destination = "AMF"
          id = "2D1CCD62-E448-16FD-79B8-4465C9CE0DDB"
          reconnectAttempts = 0
          reconnectInterval = 0
          requestTimeout = -1
          subtopic = ""
    concurrency = "multiple"
    destination = "AMF"
    endpoint = "http://............../gateway.php"
    getServices = (mx.rpc.remoting.mxml::Operation)#11
      argumentNames = (Array)#12
      arguments = (Object)#13
      concurrency = "multiple"
      lastResult = (null)
      makeObjectsBindable = true
      name = "getServices"
      service = (mx.rpc.remoting.mxml::RemoteObject)#1
      showBusyCursor = true
    makeObjectsBindable = true
    operations = (Object)#14
      getServices = (mx.rpc.remoting.mxml::Operation)#11
    requestTimeout = -1
    showBusyCursor = true
    source = "ZendAmfServiceBrowser"
  eventPhase = 2
  fault = (mx.rpc::Fault)#15
    content = (Object)#16
    errorID = 0
    faultCode = "Server.Acknowledge.Failed"
    faultDetail = "Was expecting mx.messaging.messages.AcknowledgeMessage, but received null"
    faultString = "Didn't receive an acknowledge message"
    message = "faultCode:Server.Acknowledge.Failed faultString:'Didn't receive an acknowledge message' faultDetail:'Was expecting mx.messaging.messages.AcknowledgeMessage, but received null'"
    name = "Error"
    rootCause = (null)
  headers = (null)
  message = (mx.messaging.messages::ErrorMessage)#17
    body = (Object)#16
    clientId = (null)
    correlationId = "06927F94-0A5F-A7AC-EAA6-4465C9CFB388"
    destination = ""
    extendedData = (null)
    faultCode = "Server.Acknowledge.Failed"
    faultDetail = "Was expecting mx.messaging.messages.AcknowledgeMessage, but received null"
    faultString = "Didn't receive an acknowledge message"
    headers = (Object)#18
    messageId = "A2AAC85B-DA13-57AE-45A9-4465D0C4D16C"
    rootCause = (null)
    timestamp = 0
    timeToLive = 0
  messageId = "A2AAC85B-DA13-57AE-45A9-4465D0C4D16C"
  statusCode = 0
  target = (mx.rpc.remoting.mxml::RemoteObject)#1
  token = (mx.rpc::AsyncToken)#19
    message = (mx.messaging.messages::RemotingMessage)#20
      body = (Array)#21
      clientId = (null)
      destination = "AMF"
      headers = (Object)#22
        DSEndpoint = (null)
        DSId = "nil"
      messageId = "06927F94-0A5F-A7AC-EAA6-4465C9CFB388"
      operation = "getServices"
      source = "ZendAmfServiceBrowser"
      timestamp = 0
      timeToLive = 0
    responders = (null)
    result = (null)
  type = "fault"

print_r-ing the result from ->handle() i get this:

``` Zend_Amf_Response_Http Object ( [_objectEncoding:protected] => 3 [_bodies:protected] => Array ( [0] => Zend_Amf_Value_MessageBody Object ( [_targetUri:protected] => /2/onResult [_responseUri:protected] => [_data:protected] => Zend_Amf_Value_Messaging_AcknowledgeMessage Object ( [correlationId] => 06927F94-0A5F-A7AC-EAA6-4465C9CFB388 [clientId] => 5A02856A-1B19-BEA9-AB02-000012DDD1D1 [destination] => [messageId] => 5CBFBDB2-E06E-A688-7232-0000064516F9 [timestamp] => 126386792600 [timeToLive] => 0 [headers] => stdClass Object ( )

                        [body] => <methods>....correctxml....</methods>
                    )

            )

    )

[_headers:protected] => Array
    (
    )

[_outputStream:protected] => Zend_Amf_Parse_OutputStream Object
    (
        [_stream:protected] => 

please ignore my previous post. the problems i encountered can be traced back to errors i did myself and with the usage of zamfbrowser.

allthough, i found another issue - i'm not sure if this is a bug or a behaviour: http://framework.zend.com/issues/browse/ZF-8870

thanks for your brilliant work!

This issue still affects ZendFramework 1.10.2.

The only change needed to greatly increase performance is to use an associative array and array_key_exists instead of array_search when writing referenced strings in the AMF3 serializer.

The strict type checking used in array_search is unnecessary since only string data is ever passed to the writeString method and checking if an array key exists is much faster than searching an array as the array becomes increasingly large.

Patch is below:

--- Parse/Amf3/Serializer.orig.php 2010-01-18 12:34:23.000000000 -0600 +++ Parse/Amf3/Serializer.php 2010-03-04 13:56:36.000000000 -0600 @@ -212,31 +212,31 @@ /** * Send string to output stream * * @param string $string * @return Zend_Amf_Parse_Amf3_Serializer */ public function writeString($string) { $len = strlen($string); if(!$len){ $this->writeInteger(0x01); return $this; }

  • $ref = array_search($string, $this->_referenceStrings, true);
  • $ref = array_key_exists($string, $this->_referenceStrings) ? $this->_referenceStrings[$string] : false; if($ref === false){
  • $this->_referenceStrings[] = $string;
  •      $this->_referenceStrings[$string] = count($this->_referenceStrings);
         $this->writeBinaryString($string);
     } else {
         $ref <<= 1;
         $this->writeInteger($ref);
     }
    
     return $this;
    

    }

    /**

    • Send ByteArray to output stream
    • @param string|Zend_Amf_Value_ByteArray $data
    • @return Zend_Amf_Parse_Amf3_Serializer

What about moving out the serialization to Zend_Serializer_Adapter_Amf[0|1] ? -> This could minimize include files for Zend_Amf and speed up AMF* serializer of Zend_Serializer ?

For now Zend_Serializer_Adapter_Amf needs instantiate 3 classes for serializing: * Zend_Serializer_Adapter_Amf* * Zend_Amf_Parse_OutputStream * Zend_Amf_Parse_Amf*_Serializer


$stream     = new Zend_Amf_Parse_OutputStream();
$serializer = new Zend_Amf_Parse_Amf*_Serializer($stream);
$serializer->writeTypeMarker($value);
return $stream->getStream();

and on unserializing, too: * Zend_Serializer_Adapter_Amf* * Zend_Amf_Parse_InputStream * Zend_Amf_Parse_Amf*_Deserializer


$stream       = new Zend_Amf_Parse_InputStream($value);
$deserializer = new Zend_Amf_Parse_Amf*_Deserializer($stream);
return $deserializer->readTypeMarker();

-> This could be handled similar to the PythonPickle serializer.

Patches reviewed and applied to trunk and 1.10 release branch.

Matthew, I just downloaded 1.10.4 and it seems that the patch from Mark Reidenbach hasn't been applied. I am looking at Serializer.php line 231, function writeString. The string is not stored as a key but as a value. Can you please check it out? Thanks

I also downloaded 1.10.4 today.

As Philippe mentioned, the patch from Mark Reidenbach mentioned at 04/Mar/10 12:09 PM does not seem to have been applied. I manually made the changes to 1.10.4 using the details provided by Mark and noticed substantial performance improvements.

On a 70,000 row table the query took 13 seconds as opposed to 60+ seconds prior to the patch being applied.

I also tried Mark's patch from 04/Mar/10. It's only slightly faster here though. Getting 400 objects I go from 2300ms -> 2000ms. Is the count part of his patch really necessary? I've modified it, and it appears to work and also appears to be slightly faster.

-        $ref = array_search($string, $this->_referenceStrings, true);
+        $ref = array_key_exists($string, $this->_referenceStrings) ? $this->_referenceStrings[$string] : false;
         if($ref === false){
-            $this->_referenceStrings[] = $string;
+            $this->_referenceStrings[$string] = 1;

Upon further testing it looks like the count part of Mark's patch is indeed needed so please disregard my addition.

Patch from March 4, 2010 updated against 1.10.5 as an attachment.

Is the patch applied in 1.10.6?