ZF-6088: Mixing 'find' and 'addDocument' with empty fields results in wrong indexes
Description
If you mix "find" and "addDocument" (e.g. if you want to update an existing document), and some of the documents are filled with empty strings, this may result in a wrong index. Here is an example:
// Start over with a fresh index
passthru("rm -rf test.index");
$index = Zend_Search_Lucene::create("text.index");
// The list if entries we want to index
$entries = array(
'asdf',
'abc',
'def',
'ghj',
'klm',
'nop',
'qrs',
'uvw',
'abc',
'', // If this field is not empty, everything is fine!
'hij',
);
foreach ($entries AS $key => $entry) {
// Find and delete existing documents
if ($old = $index->find("pk:$key")){
// Note that this code is never reached in this example
$index->delete($old->id);
}
// Add new document
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnIndexed("pk", $pk));
$doc->addField(Zend_Search_Lucene_Field::Text("contents", $entry));
$index->addDocument($doc);
}
foreach ($index->find("asdf") AS $r) {
echo "{$r->contents} ($r->score}\n";
}
// Expected result: asdf (1)
// Actual result: ghj (1)
Comments
Posted by Roland Tapken (roland) on 2009-03-22T08:22:43.000+0000
This seems to have nothing to do with empty values. I'm having this problem on real-life-data even if I'm filtering empty fields. The only thing that seems to help is to set the following parameters:
$index->setMaxBufferedDocs(PHP_INT_MAX); $index->setMaxMergeDocs(1); $index->setMergeFactor(100);Posted by Roland Tapken (roland) on 2009-03-22T08:26:16.000+0000
Sorry, even that doesn't solved the problem.
Posted by Roland Tapken (roland) on 2009-03-22T08:36:22.000+0000
I found out that this problem does occur if an field is empty or totally omitted. The only solutions seems to be to add dummy data if an field is empty:
// Remove an existing entry if ($index->find('pk:'.$this->id)) { $index->delete($hit->id); } $doc = new Zend_Search_Lucene_Document(); $doc->addField(Zend_Search_Lucene_Field::UnIndexed('pk', $this->id)); $fields = array('first_name', 'last_name', 'company', 'address', 'city'); foreach ($fields AS $field) { if ("" != ($value = trim($this->$field))) { $doc->addField(Zend_Search_Lucene_Field::UnStored($field, $value, sfConfig::get('sf_charset'))); }else { // Field is empty, add dummy data $doc->addField(Zend_Search_Lucene_Field::UnStored($field, "null", sfConfig::get('sf_charset'))); } } $index->addDocument($doc);Posted by Alexander Veremyev (alexander) on 2009-11-23T12:20:16.000+0000
Fixed.
Thenks for really helpful issue description!