Issues

ZF-2189: skipData processing

Description

skipData information is stored within Lucene index. It's actually a "sub-index" of term documents list

Processing this info may help with a performance of some special query types.

If we process phrase query or multiterm query with several required terms and one term has very low selectivity (high cardinality), then we can process other terms first to limit result set. SkipData processing allows to avoid full document list scan for these high cardinality terms.

That makes sense in the case of huge indices (hundreds of thousands documents) and queries with terms having extremely low selectivity ('a', 'the', 'in', 'is', ...) StopWords analyzer may be used as workaround for this problem.

Comments

I believe this was implemented for 1.7. If not, please reopen, Alex.

Yeah. That's closed. There were some additional ideas concerning skipData usage for performance improvement, so I didn't close these issues.

But I'll create another issue if these ideas become more concrete.