ZF-4252: Zend_Search_Lucene_Document_Html add spaces for html tags
Description
When converting an html document to text, the class Zend_Search_Lucene_Document_Html add spaces after each dom node, and so after each html closing tag.
For example, html "ZendFramework" (without space) will be returned as "Zend Framework". Then the search query "ZendFramework" (no space) won't find this document.
Comments
Posted by Christopher Thomas (cwt137) on 2010-01-21T06:09:28.000+0000
I have confirmed this issue. Attached is a patch and unit tests.
Posted by Alexander Veremyev (alexander) on 2010-01-21T08:36:51.000+0000
Processing tags without additional space merges several words into one in some cases (e.g. within table markup)
Posted by Christopher Thomas (cwt137) on 2010-03-16T21:13:58.000+0000
This new patch only gets rid of the space if it is an inline tag.
Posted by Alexander Veremyev (alexander) on 2010-04-18T12:10:23.000+0000
Fixed.
Great patch, Christopher! [ZF-8740] is still on my plate :)