Skip to end of metadata
Go to start of metadata

<ac:macro ac:name="note"><ac:parameter ac:name="title">Under Construction</ac:parameter><ac:rich-text-body>
<p>Superseded by Zend_Locale_Utf8.</p></ac:rich-text-body></ac:macro>

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

Zend Framework: Zend_UTF8 Component Proposal

Proposed Component Name Zend_UTF8
Developer Notes http://framework.zend.com/wiki/display/ZFDEV/Zend_UTF8
Proposers André Hoffmann
Revision 1.0 - 2 August 2006: inital release of the Zend_UTF8 proposal (wiki revision: 23)

Table of Contents

1. Overview

Until Zend raises the requirements to PHP 6 we should have a solution for handling UTF-8 stuff. That way we won't be in a rush to upgrade to PHP 6 and ZF could support both PHP 5 and 6 for as long as needed.

Zend_UTF8 is a library of UTF-8 constants(patterns for PCRE) and functions, that is:

  • functions that convert strings to ascii(at least the letters that are available in acii), convert strings to hmtlentities
  • some standard string functions: strtolower/upper, strlen, strstr, ord/chr, str_replace and so on
  • and some other stuff depending on what is needed by other components (if you are a developer or even an user and think that you can't live without a specific string function please let me know)

2. References

3. Component Requirements, Constraints, and Acceptance Criteria

  • should provide basic functions for PHP65 utf-8 handling
  • should not require mbyte extension, but should use it when installed
  • an UTF-8 enabled PCRE extension should not be required, but used when available
  • should make it possible to support both PHP6 and PHP5

4. Dependencies on Other Framework Components

  • Zend_UTF8_Exception
  • mbstring ?

5. Theory of Operation

Phases:

  1. ZF supports PHP5 only and uses Zend_UTF8 to achieve PHP6 UTF8 ability
  2. ZF supports both PHP5 and PHP6 and let's PHP_UTF8 decide whether to use the workarounds or to use the PHP6-wrapper
  3. ZF supports PHP6 only: Zend_UTF8 should be removed to achieve a little performance boost

Zend_UTF8 should not replace PHP6's UTF-8 support completely, but just add some often used and important functions to PHP5 as a temporary solution.

6. Milestones / Tasks

zone: Missing {zone-data:milestones}

7. Class Index

  • Zend_UTF8
  • Zend_UTF8_PHP5
  • Zend_UTF8_PHP5_Library
  • Zend_UTF8_PHP6
  • Zend_UTF8_PHPx_String
  • Zend_UTF8_PHPx_CharacterClass
  • Zend_UTF8_Exception
    (x is either 5 or 6)

8. Use Cases

Usage by other components
Usage by users
Usage of Zend_UTF8_String by users
Usage of Zend_UTF8_PHPx_CharacterClass

9. Class Skeletons

[!Zend_UTF8_v4_small.JPG!|Zend_Db_Adapter_Odbtp_Mssql^Zend_UTF8_v4.JPG]

Please note that I'm yet to define the functions that will be available, so for now there's only strtolower as an example to illustrate how the design is planned to be like.

]]></ac:plain-text-body></ac:macro>

]]></ac:plain-text-body></ac:macro>

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Aug 03, 2006

    <p>Although this proposal is not complete, nor marked "ready for review", some other component authors are waiting for a decision regarding this proposal.</p>

    <p>How many truly need mbstring (iconv doesn't work), but don't have mbstring and can't install PHP with the mbstring extension?</p>

    <p>Even if a UTF8-enabled PCRE had no problems, it is not a replacement for the mbstring extension ( <a class="external-link" href="http://www.php.net/mbstring">http://www.php.net/mbstring</a> ). So we are left with the alternatives of implementing something like the Zend_Utf8 proposal or requiring mbstring. </p>

  2. Aug 04, 2006

    <blockquote>
    <p>> From: Gavin Vess<br />
    > Sent: Thursday, August 03, 2006 5:57 PM<br />
    > Subject: Re: <ac:link><ri:page ri:content-title="fw-general" /></ac:link> Zend_UTF8?<br />
    > <br />
    > Since this issue clearly impacts many others, I am escalating the<br />
    > Zend_Utf8 proposal to "Proposals Under Review" category in the wiki.<br />
    > <br />
    > How many truly need mbstring (iconv doesn't work), but don't <br />
    > have mbstring and can't install PHP with the mbstring extension?<br />
    > <br />
    > Even if a UTF8-enabled PCRE had no problems, it is not a <br />
    > replacement for the mbstring extension ( <br />
    > <a class="external-link" href="http://www.php.net/mbstring">http://www.php.net/mbstring</a> ). So we are left with the <br />
    > alternatives of implementing something like the Zend_Utf8 <br />
    > proposal or requiring mbstring.<br />
    > <br />
    > I vote for Zend_Utf8, provided we continue our ZF minimalist <br />
    > philosophy. I will defer to Alexander V., who is far more <br />
    > familiar with UTF issues than I. He is on vacation and returns soon.<br />
    > <br />
    > Cheers,<br />
    > Gavin</p></blockquote>

  3. Aug 04, 2006

    <blockquote>
    <p>Subject: RE: <ac:link><ri:page ri:content-title="fw-general" /></ac:link> Zend_UTF8?<br />
    Date: Thu, 3 Aug 2006 18:40:49 -0700<br />
    From: Andi Gutmans <andi@zend.com></p>

    <p>Yep I agree. In general making a fully Unicode framework is out of the scope<br />
    for 1.0. I believe we will not be able to fully tackle that before we have<br />
    PHP 6.</p>

    <p>That said, I definitely see value in Zend_Utf8 for application developer's,<br />
    and I see value for certain parts of the framework (Search, Locale) to<br />
    leverage that functionality. It should be clear though that the end-goal<br />
    will not be to leverage this functionality later. We will need to make<br />
    tactical improvements which allow mbstring users to leverage the framework<br />
    in an acceptable way, and then when PHP 6 comes along we should be able to<br />
    declare victory.</p>

    <p>Just to be clear, using strstr() and other functions in various components<br />
    such as Zend_Http_Client is therefore acceptable and will not be tackled for<br />
    1.0. mbstring overloads some of these functions, so in most cases we'll be<br />
    fine. In general, mbstring users are used to sweating a bit and we will of<br />
    course consider tiny patches they might need us to do down the road.</p></blockquote>

    1. Aug 06, 2006

      <p>I figured I would post my <span style="text-decoration: line-through;">quick</span> thoughts on this. Though I have not submitted any code to ZF yet it does greatly interest me. Also Unicode and Internationalization are my primary interests in software development.</p>

      <p>I wrote PEAR's I18N_UnicodeString a while back, and I sent André an email offering my help if he had any questions about how I did things. Just to clarify my code was meant as a proof of concept and was never meant to be released as a stable working recommendation. It was also from a time before the big Unicode push.</p>

      <p>However I definitely agree with Andi in that it is outside the scope of any framework to expect its developers/users to use Unicode/UTF8 strings primarily, until it is in C-space (so until PHP6). There has been a use for my I18N_UnicodeString code for parsing UTF8 strings from outside sources like network protocols. This however is a long shot from providing a robust and complete string library that every developer should use exclusively.</p>

      <p>Also I would like to point out that it is in my belief outside the scope of a Userspace Unicode/UTF8 library to convert the case of characters as it is dependent on locale and soon becomes bloated with lookup tables.</p>

      <p>So in the meantime before PHP6 I think ZF could benefit from a UTF8/Unicode <em>utility</em> library that can help with parsing outside strings that are going to come in as UTF8. I would be happy to assist André with this if he would like. When PHP6 starts going into beta perhaps the code can be updated to use PHP6's internals if it can find them and then eventually this class can be deprecated in favor of straight PHP6.</p>

      1. Aug 06, 2006

        <p>Also I would like to point out that it is in my belief outside the scope of a Userspace Unicode/UTF8 library to convert the case of characters as it is dependent on locale and soon becomes bloated with lookup tables.<br />
        >Upper and lower characters don't exist in every language, so I wouldn't consider this bloat.<br />
        >But as this is an important feature we should definitely implement this feature. (you could use it for example to compare case insensitive passwords or other strings to name an use-case)</p>

        <p>So in the meantime before PHP6 I think ZF could benefit from a UTF8/Unicode utility library that can help with parsing outside strings that are going to come in as UTF8. I would be happy to assist André with this if he would like.<br />
        >I sure would like that. In my opinion we need to get this component finished in a very short time in favour of other components that are waiting for this implementation.<br />
        >Together we'll have a better chance. First off you might wanna take a look at the proposal as is and comment/change parts if you think you've got a better approach.<br />
        >You should also read the 'thread' on the mailing list, even if it became way too long.</p>

        <p>When PHP6 starts going into beta perhaps the code can be updated to use PHP6's internals if it can find them and then eventually this class can be deprecated in favor of straight PHP6.<br />
        >If you had read the proposal you'd know that this is exactly what I was going for <ac:emoticon ac:name="wink" /></p>

  4. Aug 13, 2006

    <p>Hi,<br />
    just wanted to let you know what I'm doing now (concerning Zend_UTF8):<br />
    I started to write basic functions like chr() and ord() and found the design of Zend_UTF8 to be not perfect. So, I'm about to change it a little.</p>

    <p>Nevertheless, I don't think the design is the critical point of this proposal as it's more a general decision on whether to wait for PHP6 or implement a component like this, so I hope this is not in the way when it comes to deciding on whether to approve or to disapprove.</p>

    <p>PS: I did some benchmark testing which shows that the strtolower function of Zend_UTF8 is about 8-9 times slower than PHP5's strtolower(without UTF-8). I currently support the whole Unicode 5.0 standard so I could strip some stuff out of the <a href="http://www.unicode.org/Public/UNIDATA/CaseFolding.txt">Unicode Character Database</a>(I already stripped out the comments) to gain some performance and lose UTF-32/16-support(the scope is UTF-8).<br />
    In my opinion that's a profitable extra-amount of time if you take into account that it should only be used internally by people that know what they are doing.</p>

    <p>So far,<br />
    André</p>

  5. Sep 11, 2006

    <p>What do you think, may this module be responsible for alphabetic characters/digits recognition? (UTF-8 versions of ctype_alpha(), ctype_digit(), ... functions)</p>

    <p>It's very important for Zend_Search compenent, for example.</p>

  6. Sep 14, 2006

    <ac:macro ac:name="note"><ac:parameter ac:name="title">Zend Feedback</ac:parameter><ac:rich-text-body><p>Due to a very complicated situation, with mb_string extension, PHP5, PHP6, UTF8 complexity, performance issues, and several other factors, we are recommending the following:</p>

    <ol>
    <li>The Zend_Locale_Utf8 helper class will provide a minimalistic implementation of functions absolutely needed internally by Zend_Locale related classes.</li>
    <li>Zend_Search may also use Zend_Locale_Utf8, again trying to minimize the number of "UTF8" functions used.</li>
    <li>Zend_Utf8 may move to the Laboratory, and include more functions to experiment with and discover what works, is popular, and most needed by others.</li>
    <li>Later, we can examine other places in the ZF that absolutely need these very few functions in Zend_Locale_Utf8. Then we can consider re-factoring code, or promoting Zend_Locale_Utf8 for use by other Zend components.</li>
    </ol>
    </ac:rich-text-body></ac:macro>