Issues

ZF-4252: Zend_Search_Lucene_Document_Html add spaces for html tags

Description

When converting an html document to text, the class Zend_Search_Lucene_Document_Html add spaces after each dom node, and so after each html closing tag.

For example, html "ZendFramework" (without space) will be returned as "Zend Framework". Then the search query "ZendFramework" (no space) won't find this document.

Comments

I have confirmed this issue. Attached is a patch and unit tests.

Processing tags without additional space merges several words into one in some cases (e.g. within table markup)

This new patch only gets rid of the space if it is an inline tag.

Fixed.

Great patch, Christopher! [ZF-8740] is still on my plate :)