ZF-11401: Use of $s3 getObjectsByBucket() when using a delimiter parameter does not provide access to CommonPrefix resultset.

Description

When you want to retrieve Objects within a bucket you can emulate a normal filesystem by using the params "prefix" and "delimiter".

If I were to use:


 $s3 = new Zend_Service_Amazon_S3($aws_key, $aws_secret_key);
 $s3->getObjectsByBucket("my-bucket");

Then I would get all of the keys in this bucket. Such as "folder/", "folder/childFolder/anotherFolder", "someFile.jpg", "anotherImage.jpg".

However, if I only wanted the immediate children in order to emulate a "Folders" and "Files" hierarchy like you would see in a normal filesystem, then you would use a delimiter param of "/". This then aggregates the keys which have "/" in their key name and adds them to the "CommonPrefix".

However, the getObjectsByBucket() mthod ignores the "CommonPrefix" result in the XML response object, and only checks if the Contents is set. As a consequence, it does not provide access to the CommonPrefix anywhere.

This is a XML response. There is a CommonPrefixes key in this XML object, but it is ignored, making the use of $params for the most part useless.


object(SimpleXMLElement)#265 (8) {
  ["Name"]=>
  string(7) "samplebucket"
  ["Prefix"]=>
  object(SimpleXMLElement)#267 (0) {
  }
  ["Marker"]=>
  object(SimpleXMLElement)#268 (0) {
  }
  ["MaxKeys"]=>
  string(3) "100"
  ["Delimiter"]=>
  string(1) "/"
  ["IsTruncated"]=>
  string(5) "false"
  ["Contents"]=>
  array(2) {
    [0]=>
    object(SimpleXMLElement)#269 (6) {
      ["Key"]=>
      string(8) "myobject"
      ["LastModified"]=>
      string(24) "2011-05-23T14:05:42.000Z"
      ["ETag"]=>
      string(34) ""aefaf7502d52994c3b01957636a3cdd2""
      ["Size"]=>
      string(1) "8"
      ["Owner"]=>
      object(SimpleXMLElement)#275 (2) {
        ["ID"]=>
        string(64) "5021341ae50ddc73d3c34c6ae80d86d4f03b20cc71bbc99a6cddd720c2"
        ["DisplayName"]=>
        string(9) "username"
      }
      ["StorageClass"]=>
      string(8) "STANDARD"
    }
    [1]=>
    object(SimpleXMLElement)#270 (6) {
      ["Key"]=>
      string(15) "rowPosition.jpg"
      ["LastModified"]=>
      string(24) "2011-05-20T13:29:49.000Z"
      ["ETag"]=>
      string(34) ""49e20234ad87244150e6092a47998b8a7d9""
      ["Size"]=>
      string(6) "294142"
      ["Owner"]=>
      object(SimpleXMLElement)#275 (2) {
        ["ID"]=>
        string(64) "50ca3cfbc61b7ae5021345ddc73d3c34c6ae80d86d4f03b20cc71bbc99a6cddd720c2"
        ["DisplayName"]=>
        string(9) "chrisacky"
      }
      ["StorageClass"]=>
      string(8) "STANDARD"
    }
  }
  ["CommonPrefixes"]=>
  array(4) {
    [0]=>
    object(SimpleXMLElement)#271 (1) {
      ["Prefix"]=>
      string(13) "anhoterhfols/"
    }
    [1]=>
    object(SimpleXMLElement)#272 (1) {
      ["Prefix"]=>
      string(6) "asdaf/"
    }
    [2]=>
    object(SimpleXMLElement)#273 (1) {
      ["Prefix"]=>
      string(7) "folder/"
    }
    [3]=>
    object(SimpleXMLElement)#274 (1) {
      ["Prefix"]=>
      string(9) "original/"
    }
  }
}

Comments

Hi Chris, my proposal is to add a boolean optional parameter to the getObjectsByBucket() method of Zend_Service_Amazon_S3 to specify if include or not the CommonPrefix values in the result list. This is the change in the source code of Zend_Service_Amazon_S3:


public function getObjectsByBucket($bucket, $params = array(), $commonPrefix=false)
{
  $response = $this->_makeRequest('GET', $bucket, $params);
  if ($response->getStatus() != 200) {
    return false;
  }
  $xml = new SimpleXMLElement($response->getBody());
  $objects = array();
  if (isset($xml->Contents)) {
    foreach ($xml->Contents as $contents) {
      foreach ($contents->Key as $object) {
        $objects[] = (string)$object;
      }
    }
  }
  if ($commonPrefix && isset($xml->CommonPrefixes)) {
    foreach ($xml->CommonPrefixes as $prefix) {
      foreach ($prefix->Prefix as $object) {
        $objects[] = (string)$object;
      }
    }
  }
  return $objects;
}

The benefits of using a prefix is to be able to partition the main Objects listings from your prefix. Mixing them in the $objects response means that you can't easily retrieve the Objects and the grouped Common Prefixes.

Also, lets say that you were to return a two $objects arrays... It might be better to check if $params has a prefix key. I don't know if Amazon AWS returns a XML key if no prefix matches. If this is the case then what you have proposed works okay, but if no prefixes match and they decide not to return a CommonPrefix XML key, then it might throw an unexpected result back as a response.

Would it be bad for backwards compatibility to check if $commonPrefix boolean is set, and if it is, then return an array back like


array(
'objects' => $objects,
'prefixes' => $prefixes
);

Only people utilising the new boolean will be affected so it shouldn't affect BC and it will mean that the prefixes are clearly partitioned from the main objects resultset. (Which is in my opinion important).

Following your considerations I think is better to create a new method to retrieve Objects and CommonPrefixes values (something like getObjectsAndPrefixesByBucket()). Your idea to get the result as associative array of 'objects' and 'prefixes' is very good. In this way we can easily distinguish between objects and prefixes.

Added the getObjectsAndPrefixesByBucket() method in S3.php (trunk, commit 24095). Try it and let me know, thanks.

This commit seems to not have made it into the 'release-1.11' branch on SVN. Any good reason? I really need to use this additional method, or I'll have to ditch Zend_Service_Amazon_S3 altogether.. :(

The new method appears to be present now (v1.12.1), so can this issue therefore be marked as resolved and closed?

@Jon You are right, we can close this issue.

Thanks for your feedback!

Can the 'Fix Version' be added above? I'm not sure exactly when this was actually introduced..

@Jon Done. :)