Issues

ZF-6088: Mixing 'find' and 'addDocument' with empty fields results in wrong indexes

Issue Type: Bug Created: 2009-03-22T08:01:23.000+0000 Last Updated: 2009-11-23T12:20:16.000+0000 Status: Resolved Fix version(s): - 1.9.6 (24/Nov/09)

Reporter: Roland Tapken (roland) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene

Related issues: Attachments:

Description

If you mix "find" and "addDocument" (e.g. if you want to update an existing document), and some of the documents are filled with empty strings, this may result in a wrong index. Here is an example:

<pre class="literal">
// Start over with a fresh index
passthru("rm -rf test.index");
$index = Zend_Search_Lucene::create("text.index");

// The list if entries we want to index
$entries = array(
  'asdf',
  'abc',
  'def',
  'ghj',
  'klm',
  'nop',
  'qrs',
  'uvw',
  'abc',
  '', // If this field is not empty, everything is fine!
  'hij',
);

foreach ($entries AS $key => $entry) {
  // Find and delete existing documents
  if ($old = $index->find("pk:$key")){
    // Note that this code is never reached in this example
    $index->delete($old->id);
  }

  // Add new document
  $doc = new Zend_Search_Lucene_Document();
  $doc->addField(Zend_Search_Lucene_Field::UnIndexed("pk", $pk));
  $doc->addField(Zend_Search_Lucene_Field::Text("contents", $entry));
  $index->addDocument($doc);
}

foreach ($index->find("asdf") AS $r) {
  echo "{$r->contents} ($r->score}\n";
}

// Expected result: asdf (1)
// Actual result: ghj (1)

Comments

Posted by Roland Tapken (roland) on 2009-03-22T08:22:43.000+0000

This seems to have nothing to do with empty values. I'm having this problem on real-life-data even if I'm filtering empty fields. The only thing that seems to help is to set the following parameters:

<pre class="literal">
      $index->setMaxBufferedDocs(PHP_INT_MAX);
      $index->setMaxMergeDocs(1);
      $index->setMergeFactor(100);

Posted by Roland Tapken (roland) on 2009-03-22T08:26:16.000+0000

Sorry, even that doesn't solved the problem.

Posted by Roland Tapken (roland) on 2009-03-22T08:36:22.000+0000

I found out that this problem does occur if an field is empty or totally omitted. The only solutions seems to be to add dummy data if an field is empty:

<pre class="literal">
    // Remove an existing entry
    if ($index->find('pk:'.$this->id))
    {
      $index->delete($hit->id);
    }

    $doc = new Zend_Search_Lucene_Document();
    $doc->addField(Zend_Search_Lucene_Field::UnIndexed('pk', $this->id));
    $fields = array('first_name', 'last_name', 'company', 'address', 'city');
    foreach ($fields AS $field)
    {
      if ("" != ($value = trim($this->$field)))
      {
        $doc->addField(Zend_Search_Lucene_Field::UnStored($field, $value, sfConfig::get('sf_charset')));
      }else {
        // Field is empty, add dummy data
        $doc->addField(Zend_Search_Lucene_Field::UnStored($field, "null", sfConfig::get('sf_charset')));
      }
    }

    $index->addDocument($doc);

Posted by Alexander Veremyev (alexander) on 2009-11-23T12:20:16.000+0000

Fixed.

Thenks for really helpful issue description!

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts