Issue Type: Improvement Created: 2007-11-13T11:37:33.000+0000 Last Updated: 2008-12-04T13:56:00.000+0000 Status: Closed Fix version(s): - 1.7.0 (17/Nov/08)
Reporter: Alexander Veremyev (alexander) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene
Related issues: Attachments:
skipData information is stored within Lucene index. It's actually a "sub-index" of term documents list
Processing this info may help with a performance of some special query types.
If we process phrase query or multiterm query with several required terms and one term has very low selectivity (high cardinality), then we can process other terms first to limit result set. SkipData processing allows to avoid full document list scan for these high cardinality terms.
That makes sense in the case of huge indices (hundreds of thousands documents) and queries with terms having extremely low selectivity ('a', 'the', 'in', 'is', ...) StopWords analyzer may be used as workaround for this problem.
Posted by Wil Sinclair (wil) on 2008-12-04T13:36:45.000+0000
I believe this was implemented for 1.7. If not, please reopen, Alex.
Posted by Alexander Veremyev (alexander) on 2008-12-04T13:56:00.000+0000
Yeah. That's closed. There were some additional ideas concerning skipData usage for performance improvement, so I didn't close these issues.
But I'll create another issue if these ideas become more concrete.
Have you found an issue?
See the Overview section for more details.