Issues

ZF-10686: Space or tag in <body> causes loadHTML to fail on the title

Description

Document body.

Comments

The following patch fixes the issue:

--- Html.php 2010-11-15 19:13:29.000000000 +0100 +++ /usr/share/php/Zend/Search/Lucene/Document/Html.php 2010-11-15 19:12:57.000000000 +0100 @@ -102,7 +102,7 @@ // Document encoding is not recognized

         /** @todo improve HTML vs HTML fragment recognition */

- if (preg_match('//i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { + if (preg_match('/<html(.*)>/i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { // It's an HTML document // Add additional HEAD section and recognize document $htmlTagOffset = $matches[0][1] + strlen($matches[0][0]);

Sorry, the actual patch is (a \ was too much)

— Html.php 2010-11-15 19:13:29.000000000 +0100 +++ /usr/share/php/Zend/Search/Lucene/Document/Html.php 2010-11-15 19:12:57.000000000 +0100 @@ -102,7 +102,7 @@ // Document encoding is not recognized

/** @todo improve HTML vs HTML fragment recognition */

if (preg_match('//i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { + if (preg_match('/<html(.*)>/i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { // It's an HTML document // Add additional HEAD section and recognize document $htmlTagOffset = $matches[0][1] + strlen($matches[0][0]);

Fixed in r23391 merged to release branch 1.11 r23392 - thanks.