ZF-7721: Weak check of chunked body structure may cause infinite loop

Description

Zend_Http_Response::decodeChunkedBody() checks format of chunked HTTP response with weak regular expression:

/^([\da-fA-F]+)[^\r\n]*\r\n/sm

As can be seen - it allows potentially any string that starts with hexadecimal number giving wide space for errors. Also it doesn't comply with format specification (http://tools.ietf.org/html/rfc2616#section-3.6.1).

Incorrect treating of chunked file format may lead to infinite loop, as can be seen in provided real world example. Proposed solution is to change format checking regular expression to one with more strict format check, for example:

/^([\da-fA-F]+)\s(;[^()\<>@,;:\\"\/[]\?={}\t\s]+=[^\r\n])?\r\n/sm

Comments

Real world example of page which causes infinite loop when being processed by decodeChunkedBody()

Confirmed (1.10.8). Site crawler hangs every day. During investigation I found that decodeChunked function causes infinite loop. E.g. try to do $response->getBody() on http://ir{.}kvh{.}com url or ir{.}stanleyblackanddecker{.}com.

I can confirm this issue as well.

@Shahar: What do you think about the improved regex provided by Alexander?