ZF-6283: Zend_Search_Lucene indexing memory usage improvement.
Description
The issue was originaly reported by Jurriƫn Stutterheim:
Zend_Search_Lucene memory leak?
I ran into a problem indexing a relatively small amount of records (+/- 50k). Memory usage slowly rises from roughly 45MB to the memory limit of 128M (using the default merge factor of 10). When the indexing process hits the limit, I've only indexed roughly 15k documents. When using a merge factor of 5 I manage to squeeze in a few thousand extra records before running out of memory again. Executing the same code, but uncommenting the $index->addDocument() call, I happily iterate over the 50k records, using only 47MB of memory.
Strangely enough I've never experienced this in a previous application, where I'd index 150k records (roughly the same size as the current ones) without a problem. Could there be a memory leak of some sorts? Or is this expected? I'm using Zend Server 4.0.1 on Mac OS X 10.5.6 with mbstring enabled
Comments
Posted by Alexander Veremyev (alexander) on 2009-04-13T07:31:16.000+0000
That's possible to get such behavior in some cases: 1. PHP doesn't detect cyclic object references and doesn't destroy object structures with cyclic references. It only checks if object is not referred (using references counter) and destroys object if number of references is 0. Zend_Search_Lucene shouldn't create such structures, but... it's better to check it again.
Some strings operations (like .= operator) produces high level of free memory fragmentation. It increases overall memory usage. So it also should be checked.
Index growing also increases memory usage.
Could you perform the following experiments?
b) Open index using separate script and check memory usage:
Check, if increasing memory usage is not caused by missing destroy operation for document objects. Use the_same document object for each addDocument() operation and check memory usage during indexing (in compare to the current memory utilization).
If above experiments don't give enough information, try to track document objects creation/destroy operations (add destructor to the document object and make debug output for document creation/destroy operations).
PS Which Analyzer do you use for indexing? If it's UTF-8 analyzer, then it may be [ZF-4997] related problem.
Posted by Jurrien Stutterheim (norm2782) on 2009-04-15T15:24:31.000+0000
I will try and get you the result of 1.a. later today. In the mean time...
1.b: int(1693916) int(2097152)
Is the document retained in the index somehow?
Index setup:
The problem also occurred without the UTF-8 analyzer though : )
Posted by Jurrien Stutterheim (norm2782) on 2009-04-15T18:43:27.000+0000
The indexing performed better than before this run... I managed to squeeze in roughly 45500 records before hitting the memory limit of 128MB. Not sure why it performed better this time around though...
Memory usage:
int(133245448) int(134217728)