ZF-10686: Space or tag in <body> causes loadHTML to fail on the title
- Issue Type:
- Bug
- Created:
- 2010-11-15T10:14:47.000+0000
- Last Updated:
- 2010-11-19T01:53:55.000+0000
- Status:
- Resolved
- Fix version(s):
- Reporter:
-
Sylvestre Ledru (sylvestre)
- Assignee:
-
Ramon Henrique Ornelas (ramon)
- Tags:
- Related issues:
- Attachments:
Description
Document body.
Comments
Posted by Sylvestre Ledru (sylvestre) on 2010-11-15T10:16:02.000+0000
The following patch fixes the issue:
--- Html.php 2010-11-15 19:13:29.000000000 +0100 +++ /usr/share/php/Zend/Search/Lucene/Document/Html.php 2010-11-15 19:12:57.000000000 +0100 @@ -102,7 +102,7 @@ // Document encoding is not recognized
- if (preg_match('//i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { + if (preg_match('/<html(.*)>/i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { // It's an HTML document // Add additional HEAD section and recognize document $htmlTagOffset = $matches[0][1] + strlen($matches[0][0]);
Posted by Sylvestre Ledru (sylvestre) on 2010-11-15T12:34:16.000+0000
Sorry, the actual patch is (a \ was too much)
— Html.php 2010-11-15 19:13:29.000000000 +0100 +++ /usr/share/php/Zend/Search/Lucene/Document/Html.php 2010-11-15 19:12:57.000000000 +0100 @@ -102,7 +102,7 @@ // Document encoding is not recognized
/** @todo improve HTML vs HTML fragment recognition */
if (preg_match('//i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { + if (preg_match('/<html(.*)>/i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { // It's an HTML document // Add additional HEAD section and recognize document $htmlTagOffset = $matches[0][1] + strlen($matches[0][0]);
Posted by Ramon Henrique Ornelas (ramon) on 2010-11-19T01:53:52.000+0000
Fixed in r23391 merged to release branch 1.11 r23392 - thanks.