Issues

ZF-10686: Space or tag in causes loadHTML to fail on the title

Issue Type: Bug Created: 2010-11-15T10:14:47.000+0000 Last Updated: 2010-11-19T01:53:55.000+0000 Status: Resolved Fix version(s): - 1.11.1 (30/Nov/10)

Reporter: Sylvestre Ledru (sylvestre) Assignee: Ramon Henrique Ornelas (ramon) Tags: - Zend_Search_Lucene

Related issues: Attachments:

Description

Document body.

Comments

Posted by Sylvestre Ledru (sylvestre) on 2010-11-15T10:16:02.000+0000

The following patch fixes the issue:

--- Html.php 2010-11-15 19:13:29.000000000 +0100 +++ /usr/share/php/Zend/Search/Lucene/Document/Html.php 2010-11-15 19:12:57.000000000 +0100 @@ -102,7 +102,7 @@ // Document encoding is not recognized

         /** @todo improve HTML vs HTML fragment recognition */
  • if (preg_match('//i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { + if (preg_match('/<html(.*)>/i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { // It's an HTML document // Add additional HEAD section and recognize document $htmlTagOffset = $matches[0][1] + strlen($matches[0][0]);

Posted by Sylvestre Ledru (sylvestre) on 2010-11-15T12:34:16.000+0000

Sorry, the actual patch is (a \ was too much)

— Html.php 2010-11-15 19:13:29.000000000 +0100 +++ /usr/share/php/Zend/Search/Lucene/Document/Html.php 2010-11-15 19:12:57.000000000 +0100 @@ -102,7 +102,7 @@ // Document encoding is not recognized

/** @todo improve HTML vs HTML fragment recognition */

if (preg_match('//i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { + if (preg_match('/<html(.*)>/i', $htmlData, $matches, PREG_OFFSET_CAPTURE)) { // It's an HTML document // Add additional HEAD section and recognize document $htmlTagOffset = $matches[0][1] + strlen($matches[0][0]);

Posted by Ramon Henrique Ornelas (ramon) on 2010-11-19T01:53:52.000+0000

Fixed in r23391 merged to release branch 1.11 r23392 - thanks.

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts