ZF-3683: Performance improvement: reuse token object in lowercase filter
There is a small performance improvement that could be made to the part of the code that creates indexes. In class Zend_Search_Lucene_Analysis_Analyzer_Common_Text in the nextToken() method the following line of code is executed:
$token = $this->normalize(new Zend_Search_Lucene_Analysis_Token($str, $pos, $endpos));
This calls the normalize() method of Zend_Search_Lucene_Analysis_TokenFilter_LowerCase. A Token object is created as part of the call to normalize(), inside normalize() a second Token object is created which is an exact copy of the first apart from the fact that the text is converted to lower case.
This means that two Token objects are created for every token - this has some impact on performance (about 7% on the examples that I've looked at). One very simple fix would be to change normalize() so that it doesn't create a second object but just updates the text in the object that is passed to it. This would also require a change to Token.php to allow the text field to be set. I expect that there are more architecturally pleasing ways to fix.