Zend_Search_Lucene_Document_Html class uses the DOMDocument::loadHTML() and DOMDocument::loadHTMLFile() methods to parse the source HTML, so it doesn't need HTML to be well formed or to be XHTML. On the other hand, it's sensitive to the encoding specified by the "meta http-equiv" header tag. Zend_Search_Lucene_Document_Html class recognizes document title, body and document header meta tags. The 'title' field is actually the /html/head/title value. It's stored within the index, tokenized and available for search. The 'body' field is the actual body content of the HTML file or string. It doesn't include scripts, comments or attributes.