Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

Zend Framework: Zend_Uri Component Proposal

Proposed Component Name Zend_Uri
Developer Notes http://framework.zend.com/wiki/display/ZFDEV/Zend_Uri
Proposers Shahar Evron
Zend Liaison TBD
Revision 1.0 - 2 August 2010: Ready for community review
0.1 - 26 July 2010: Initial Draft (wiki revision: 38)

Table of Contents

1. Overview

Zend_Uri is the Zend Framework component responsible for representing URIs (Uniform Resource Identifiers) as objects.

In Zend Framework 1.x and before, Zend_Uri was mostly used for representation and validation of URIs of specific schemes, and the only scheme implemented was of HTTP URIs. In addition, Zend_Uri 1.0 was not capable of representing partial or relative URIs, and URIs of arbitrary schemes, and did not provide tools for resolving, encoding and normalizing URIs.

This proposal describes a set of changes and improvements (effectively a complete rewrite) to Zend_Uri for Zend Framework 2.0, that will address the deficiencies mentioned above.

2. References

  • Preview code is available on the zend-uri-se branch at git://arr.gr/zf2.git

3. Component Requirements, Constraints, and Acceptance Criteria

  • Zend\Uri will allow representation of generic URIs as objects
  • Zend\Uri will allow programmatic composition of URIs using getter/setter methods to various URI parts
  • Zend\Uri will allow creation of URI objects through string parsing
  • Zend\Uri will allow representation of partial and relative URIs
  • Zend\Uri will closely follow RFC-3986 definitions for URI syntax
  • Zend\Uri will always produce RFC-3986 compliant URIs when converting URI objects back to a string
  • Zend\Uri will attempt to be flexible when parsing string URIs and accept invalid URIs or URI parts if these can be encoded into a syntactically valid URI
  • Zend\Uri will provide subclasses for representation of scheme-specific URIs
  • Zend\Uri will allow users to easily create their own scheme-specific classes
  • Zend\Uri scheme subclasses may enforce additional validation rules on URIs
  • Zend\Uri scheme subclasses may provide additional protocol or scheme specific APIs
  • Zend\Uri will allow automatic string-to-object conversion using the factory pattern
  • Zend\Uri will provide API for resolving relative URIs
  • Zend\Uri will provide API for normalizing URI strings
  • Zend\Uri will provide API for converting absolute URIs into relative URIs based on a shared absolute base URI
  • Zend\Uri will provide generic methods for validating and encoding different URI parts
  • Zend\Uri will not provide an interface for strict validation or encoding of URI strings.
    • These operations should be provided by Zend\Validate and Zend\Filter classes that may internally rely on Zend\Uri encoding and validation methods.

4. Dependencies on Other Framework Components

  • Zend\Validate\Hostname
  • Zend\Validate\Ip
  • Zend\Exception

5. Theory of Operation

Subclassing

The Zend\Uri component will provide a concrete class (Zend\Uri\Uri) implementing RFC-3986 compatible Generic URI Syntax parsing, composition, resolution, validation encoding and normalization of URIs. This class will be concrete and could be used to represent any compliant URI, including scheme specific URIs and partial or relative URIs.

In addition, Zend\Uri will provide a set of subclasses of Zend\Uri\Uri (initially Zend\Uri\Http and Zend\Uri\File) that will only be capable of representing URIs of specific schemes, and will enforce additional validation rules in addition to those defined by the Generic Syntax RFC. These subclasses may still be able to represent partial or relative URIs, as long as they comply with any rules imposed by the scheme.

Parsing and Composition

URI parsing and composition will be done following the parsing and composition rules defined in the RFC. The aim is to be relatively lax when parsing string URIs and setting different URI parts using accessor methods, and accept input as long as it can eventually be encoded into a valid URI when the object is converted back to a string.

For example, the URI file:///C:/Program Files/Zend will be accepted by the parser despite the fact that it's path component (C:/Program Files/Zend) contains an invalid space character. When the URI is composed back into a string and is normalized, it will be represented as file:///C:/Program%20Files/Zend, which is a valid and RFC-compliant URI.

Zend\Uri will refuse to parse a string or accept a part set through one of the mutator methods only if the input can never be unambiguously converted into a valid URI part.

For example, the following will not be allowed:

Since the scheme of a URI may never contain spaces and the URI syntax rules do not define a mean to represent a space character in the scheme part.

In contrast, the following will be allowed:

Since although the ' ' and '#' signs may not be used literally in the query part of a URI, they can be encoded as '%20' and '%23' respectively when the URI is re-composed.

Relative URI Resolution

One of the common tasks to preform with URIs is resolving relative URIs and merging a base URI with a relative URI to form a canonical representation of the relative URI. Unlike Zend_Uri 1.0, the new implementation will expose an API for resolving a (possibly relative) URI against an absolute base URI to form an absolute URI.

Additionally, Zend\Uri will expose an API to perform the opposite operation: "subtract" a common base URI from an absolute URI to form a relative reference.

Both methods can be useful for example when composing or parsing HTML pages, and when creating links in portable applications.

Normalization

Zend\Uri\Uri and it's subclasses will expose an API to normalize URI objects. This normalization method should be used, for example, before comparing two URI strings to check if they are identical.

For example, the following URLs, while syntactically different, are semantically equivalent: http://www.example.com:80/?foo=b%61
HTTP://www.example.com?foo=bar

The normalization API will allow the user to compare these two URIs, by normalizing them using the RFC defined normalization rules (and possibly scheme-specific normalization added in Zend\Uri\Uri subclasses). In the example above, the normalized URIs would both be converted to: http://www.example.com/?foo=bar

Normalizing URIs will include:

  • Converting the scheme to lower case
  • Removing the port if it is equal to the scheme's default port
  • Decoding any percent-encoded characters which do not need to be encoded
  • Replacing an empty path with '/' in URIs that have an authority part
  • Converting percent-encoding hexadecimal characters to upper case
  • Removing empty port, query or fragment parts
  • Additional scheme specific normalization (e.g. in HTTP URLs lower-casing the host name)
Automatic Scheme-specific Class Selection

Zend\Uri will provide a factory-pattern class (Zend\Uri\UriFactory) which will allow users to pass URI strings into it. Depending on the URI string scheme, the Factory method will return a scheme-specific class to represent the URI if such class is registered with the UriFactory class. By default, the scheme-specific classes provided by Zend\Uri will be registered with the factory class, and users will be able to register additional scheme-specific subclasses of Zend\Uri (or overwrite any pre-registered schemes) with the factory class, if they wish to implement their own URI classes.

If the URI string is relative and does not specify a scheme, users may specify a default scheme to fall back to (e.g. when parsing an HTML page fetched using HTTP, relative href links are assumed to be of the HTTP scheme).

If the factory method will not find the appropriate scheme class, or is unable to detect the URI scheme and no default scheme was specified, it will fall back to using the generic syntax Zend\Uri\Uri class.

Scheme-specific Functionality

In addition to the above, and to the enforcement of specific validation rules, Zend\Uri\Uri subclasses may also expose additional scheme-specific functionality.

For example, the File scheme class may expose methods to convert a Win32 or Unix file path into a file:/// URI.

6. Milestones / Tasks

The implementation and general availability of this change should be coordinated with the release schedule of Zend Framework 2.0, with the following milestones:

  • Milestone 1: [PARTIALLY DONE] design notes will be published here
  • Milestone 2: [PARTIALLY DONE] A working prototype with 80%+ unit test coverage is checked into the developer's git repository
  • Milestone 3: Working prototypes of scheme specific classes are checked into the developer's git repository
  • Milestone 4: Complete unit test coverage and documentation is checked into the developer's git repository
  • Milestone 5: Fixes to components relying on Zend_Uri (namely Zend\Http\Client) is checked into the developer's git repository
  • Milestone 6: Zend\Filter and Zend\Validate classes for URI normalization and validation are made available
  • Milestone 7: Developer branch is merged into the public git repository in time for preview releases of Zend Framework 2.0

7. Class Index

  • Zend\Uri\Uri
  • Zend\Uri\Http
  • Zend\Uri\File
  • Zend\Uri\UriFactory

Exception classes:

  • Zend\Uri\Exception
  • Zend\Uri\InvalidUriException
  • Zend\Uri\InvalidUriPartException
  • Zend\Uri\InvalidUriClassException

Filter / Validator implementations (optional):

  • Zend\Filter\Uri - Zend Filter for normalizing URI strings
  • Zend\Validator\Uri - Zend Validator for validating URIs

8. Use Cases

UC-01: URI parsing
UC-02: Accessing different URI parts
UC-03: Resolving relative URIs
UC-04: Creating configuration-based URIs depending on environment
UC-05: Normalizing URIs
UC-06: Creating a relative reference from an absolute URI
UC-07: Extracing HTTP URLs from an HTML page
UC-08: Using the Factory class with custom URI classes
UC-09: Example of using Zend\Uri to get files from FTP server
UC-10: Accessing and Encoding various HTTP URI parts (part of an HTTP client implementation)

9. Class Skeletons

The Zend\Uri\Uri class
The ZendUriUriFactory class

]]></ac:plain-text-body></ac:macro>

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.