Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

Zend Framework: Zend_Utf8 Component Proposal

Proposed Component Name Zend_Utf8
Developer Notes
Proposers Andrea Ercolino
Zend Liaison TBD
Revision 1.0 - 11 January 2011: Initial Draft. (wiki revision: 28)

Table of Contents

1. Overview

Zend_Utf8 is a simple component that offers escape and unescape functionalities. It's intended as a replacement for some code that is already available in ZF, though embedded in the Zend_Json and Zend_Serializer components. I've recently published a post about it at my site:

The Zend_Utf8 class is really simple, wholly coded, and ready for delivery, I hope. Note that still in the last release-1.11.2 the UTF-8 escaping feature in Zend_Json doesn't take into account all possible UTF-8 characters: in fact it lacks any support for the so called extended unicode characters, with a code point between 0×10000 and 0x10FFFF. This class does provide support for all unicode.

Encoding PHP values to some other string format, like JSON, could require escaping UTF-8 characters. It respectively goes for decoding and unescaping. I think it's sufficiently justified the existence of a class for basic UTF-8 support in the Zend Framework. When this class will be available, the Zend_Json and Zend_Serializer modules should be refactored to call Zend_Utf8 methods where needed.

2. References

3. Component Requirements, Constraints, and Acceptance Criteria

4. Dependencies on Other Framework Components

5. Theory of Operation

Zend_Utf8 exposes six static functions: two are the main functions for escaping and unescaping strings and four are the ancillary functions for mapping UTF-8 characters to unicode integers and the other way around. Usage of the ancillary functions is well documented by the main functions, so I'll describe only usage of the latter.

In the use cases I'm going to use the following functions:


Options are needed for changing the default behavior, and must be provided as an associative array.

key type description example
extendedUseSurrogate boolean It controls how an extended unicode character will be represented: TRUE = a surrogate pair, FALSE = a code point. In the former case there will be two calls to write/read handlers (one for each member of the pair), and in the latter just one.
write handler
read handler
filters array

6. Milestones / Tasks

  • Milestone 1: [DONE] Working prototype
  • Milestone 2: Unit tests exist, work, and are checked into SVN.
  • Milestone 3: Initial documentation exists.

7. Class Index

  • Zend_Utf8_Exception
  • Zend_Utf8

8. Use Cases

9. Class Skeletons


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.