Zend Framework

Segmentaion fault on preg_replace in Zend_Db_Statement

Details

  • Type: Bug Bug
  • Status: Reopened Reopened
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: 1.6.2, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.7.8, 1.7.9, 1.8.0, 1.8.1, 1.8.2, 1.8.3, 1.8.4, 1.8.5, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.9.4, 1.9.5, 1.9.6, 1.9.7, 1.9.8, 1.10.0, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6, 1.10.7, 1.10.8, 1.11.0, 1.11.1, 1.11.2, 1.11.3, 1.11.4
  • Fix Version/s: None
  • Component/s: Zend_Db

Description

This seems not to exactly be a Zend Framework bug, but more likely PHP or PRCE lib bug, but I'll post it here for your opinion.

On line 190 of Zend_Db_Statement is the following preg_replace:

$sql = preg_replace("/$q($qe|\\\\{2}|[^$q])*$q/", '', $sql);

When the sql passed is longer than about 4100 characters the process dies with the following error in the log:

[notice] child pid 7426 exit signal Segmentation fault (11)

I can drill it down to be this part of the regex that fails: /([^'])*/

The content in the sql i UTF8 encoded text - all characters are ascii though. I don't know if that might be of any help.

It seems to be somewhat related to this filed php bug:

http://bugs.php.net/bug.php?id=46551

I'm using PHP 5.2.6-0.dotdeb.1 with Suhosin-Patch 0.9.6.2 (cli) (built: May 2 2008 10:24:03) on a debian-server. I will upgrade to latest snapshot tomorrow to see if that helps.

  1. ZF-5063.patch
    21/Sep/09 8:43 AM
    0.7 kB
    Vincent de Lau
  2. ZF-7911_ungreedy.patch
    17/Oct/11 1:22 AM
    2 kB
    Adam Lundrigan

Issue Links

Activity

Hide
Asger Hallas added a comment -

I guess it has something to do with this: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=476419

Show
Asger Hallas added a comment - I guess it has something to do with this: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=476419
Hide
Ralph Schindler added a comment -

This appears to be an issue within php itself and not an issue in ZF. Also appears to have been fixed in php.

Show
Ralph Schindler added a comment - This appears to be an issue within php itself and not an issue in ZF. Also appears to have been fixed in php.
Hide
Michael Rehbein added a comment -

It appears that it is actually an issue with pcre, which has been flagged as "won't fix".

See: http://bugs.exim.org/show_bug.cgi?id=797

Raising the process stack limit helps handle larger queries, but doesn't "fix" the issue.

// simple "query" that triggers segfault.
$sql = str_pad("'", 6000, 'a');

// should throw invalid sql, actually segfaults
$dbAdapter->query($sql);

Is there another way to write code to prevent large queries from segfaulting php since there doesn't appear to be a fix coming for the pcre library?

Show
Michael Rehbein added a comment - It appears that it is actually an issue with pcre, which has been flagged as "won't fix". See: http://bugs.exim.org/show_bug.cgi?id=797 Raising the process stack limit helps handle larger queries, but doesn't "fix" the issue. // simple "query" that triggers segfault. $sql = str_pad("'", 6000, 'a'); // should throw invalid sql, actually segfaults $dbAdapter->query($sql); Is there another way to write code to prevent large queries from segfaulting php since there doesn't appear to be a fix coming for the pcre library?
Hide
Vincent de Lau added a comment -

This issue can be fixed or better, worked around by changing the regexp slightly. Basically, the (..|..|[^.])* will 'eat' every character between quotes one-by-one, incrementing the stack every character. Adding a + sign after the negated character class will eat as much characters as it can, significantly reducing the stack usage.

I've attached a patch that will fix this issue, and also will correct http://framework.zend.com/issues/browse/ZF-3025

Show
Vincent de Lau added a comment - This issue can be fixed or better, worked around by changing the regexp slightly. Basically, the (..|..|[^.])* will 'eat' every character between quotes one-by-one, incrementing the stack every character. Adding a + sign after the negated character class will eat as much characters as it can, significantly reducing the stack usage. I've attached a patch that will fix this issue, and also will correct http://framework.zend.com/issues/browse/ZF-3025
Hide
Vincent de Lau added a comment - - edited

submitted patch also fix a wrong fix for issue ZF-3025

Show
Vincent de Lau added a comment - - edited submitted patch also fix a wrong fix for issue ZF-3025
Hide
Vincent de Lau added a comment -

I'd really like to see this fix in 1.9.4 when ZF-7911 is not ready yet.

Show
Vincent de Lau added a comment - I'd really like to see this fix in 1.9.4 when ZF-7911 is not ready yet.
Hide
Patrick Schwisow added a comment -

The patch works, why hasn't this fix made it to production in the last six months?

Show
Patrick Schwisow added a comment - The patch works, why hasn't this fix made it to production in the last six months?
Hide
Brian Holub added a comment -

this cost me a couple hours. please bump up the priority on this one (the patch works), the seg fault might not be ZF's... fault.. but it's extremely painful.

Show
Brian Holub added a comment - this cost me a couple hours. please bump up the priority on this one (the patch works), the seg fault might not be ZF's... fault.. but it's extremely painful.
Hide
Daniel Kreuer added a comment -

This Issue still exists in ZF.

Show
Daniel Kreuer added a comment - This Issue still exists in ZF.
Hide
Georgy Korshunov added a comment -

the patch doesn't work for me. also it's not correct though adding '' after $qe means 'last character of $qe repeated one or more times'. this would be better: $sql = preg_replace("/$d(($de)|(\\\\{2})+|([^$d])+)*$d/", '', $sql);
and the best solution for me is just making the pattern ungreedy.
i'm not an expert in PCRE, but as i read on wikipedia, the PCRE engine would work faster with ungreedy patterns. and with this pattern greediness actually doesn't matter

Show
Georgy Korshunov added a comment - the patch doesn't work for me. also it's not correct though adding '' after $qe means 'last character of $qe repeated one or more times'. this would be better: $sql = preg_replace("/$d(($de)|(\\\\{2})+|([^$d])+)*$d/", '', $sql); and the best solution for me is just making the pattern ungreedy. i'm not an expert in PCRE, but as i read on wikipedia, the PCRE engine would work faster with ungreedy patterns. and with this pattern greediness actually doesn't matter
Hide
Misha Krassovski added a comment -

Guys, does anybody know how to fight this problem? I have RHE6, ZF 1.11.3, and issue is still there.
Published patch fixes nothing as well as others published in ZF-7911
Today is 10/3/2011, meaning it has been in the air for almost 3 years now!!!!!!

Show
Misha Krassovski added a comment - Guys, does anybody know how to fight this problem? I have RHE6, ZF 1.11.3, and issue is still there. Published patch fixes nothing as well as others published in ZF-7911 Today is 10/3/2011, meaning it has been in the air for almost 3 years now!!!!!!
Hide
Georgy Korshunov added a comment -

@Misha Krassovski
besides patches published here, there are two workarounds that worked perfectly for me:
1) make the pattern ungreedy by adding /U (more at http://php.net/manual/en/reference.pcre.pattern.modifiers.php)
here you need to change ZF code, but it will help with all your queries
2) use the adapter directly with those queries that fail, i.e:
$db->getAdapter()>getConnection()>exec($query);

Show
Georgy Korshunov added a comment - @Misha Krassovski besides patches published here, there are two workarounds that worked perfectly for me: 1) make the pattern ungreedy by adding /U (more at http://php.net/manual/en/reference.pcre.pattern.modifiers.php) here you need to change ZF code, but it will help with all your queries 2) use the adapter directly with those queries that fail, i.e: $db->getAdapter()>getConnection()>exec($query);
Hide
Adam Lundrigan added a comment -

Are there any known side-effects to switching the preg_replace call to ungreedy? That seems like the simplest solution, assuming we can come up with a test to prove it actually solves the issue.

Show
Adam Lundrigan added a comment - Are there any known side-effects to switching the preg_replace call to ungreedy? That seems like the simplest solution, assuming we can come up with a test to prove it actually solves the issue.
Hide
Vincent de Lau added a comment - - edited

The problem with this issue, is that it is not something that can reliably be unit tested. The segfault will only appear on certain systems when enough data is passed, where 'enough' depends on the system and how PHP and PCRE are build.

The patch that I suggested for this issue has been (and probably still is) in production. The patch would probably not fix the problem entirely, if enough data is passed, the query might still fail. I feel that if ZF-7911 is solved properly, this issue will go away or become even more unlikely.

In the mean time, I've got a new job where ZF is not used (yet).

Show
Vincent de Lau added a comment - - edited The problem with this issue, is that it is not something that can reliably be unit tested. The segfault will only appear on certain systems when enough data is passed, where 'enough' depends on the system and how PHP and PCRE are build. The patch that I suggested for this issue has been (and probably still is) in production. The patch would probably not fix the problem entirely, if enough data is passed, the query might still fail. I feel that if ZF-7911 is solved properly, this issue will go away or become even more unlikely. In the mean time, I've got a new job where ZF is not used (yet).
Hide
Adam Lundrigan added a comment -

Could someone here who is experiencing this issue please try the patch i've attached (ZF-7911_ungreedy.patch) and report back the result?
It's a modified version of the fix suggested in ZF-7911, with the addition of the ungreedy modifier as suggested here by Georgy Korshunov

Show
Adam Lundrigan added a comment - Could someone here who is experiencing this issue please try the patch i've attached (ZF-7911_ungreedy.patch) and report back the result? It's a modified version of the fix suggested in ZF-7911, with the addition of the ungreedy modifier as suggested here by Georgy Korshunov
Hide
Georgy Korshunov added a comment -

Adam Lundrigan, it worked for me

Show
Georgy Korshunov added a comment - Adam Lundrigan, it worked for me
Hide
Dvir Azulay added a comment -

Just casting my opinion on this - I've also wasted a couple of hours banging my head with this issue a few good months ago when it appeared on the zf-based product I was working on, but it only happened on my local env (which was Windows XP) and not on our linux production env. Nevertheless, important and really annoying issue that takes a while to drill down to.

Show
Dvir Azulay added a comment - Just casting my opinion on this - I've also wasted a couple of hours banging my head with this issue a few good months ago when it appeared on the zf-based product I was working on, but it only happened on my local env (which was Windows XP) and not on our linux production env. Nevertheless, important and really annoying issue that takes a while to drill down to.
Hide
Tilman added a comment -

I just ran into this issue with a very long query (inserting a PHP serialized array) with ZF 1.11.11 on PHP 5.3.3 and PHP 5.3.6. I tried out the ungreedy patch. I can't tell if the code in _stripQuoted() is now still doing what it is supposed to do, but the segmentation fault on preg_replace did go away.

This is a 3 year old bug, and knowing from experience now, it is pretty nasty once you run into it. I don't even remember running into segfault issues with PHP before. It took me a while to find the culprit and then this bug report.

If the maintainers are not willing to fix the problem, at least put a disclaimer in the manual of Zend_Db, that one should refrain from using Zend_Db_Statement, if they need to run certain long queries.

Show
Tilman added a comment - I just ran into this issue with a very long query (inserting a PHP serialized array) with ZF 1.11.11 on PHP 5.3.3 and PHP 5.3.6. I tried out the ungreedy patch. I can't tell if the code in _stripQuoted() is now still doing what it is supposed to do, but the segmentation fault on preg_replace did go away. This is a 3 year old bug, and knowing from experience now, it is pretty nasty once you run into it. I don't even remember running into segfault issues with PHP before. It took me a while to find the culprit and then this bug report. If the maintainers are not willing to fix the problem, at least put a disclaimer in the manual of Zend_Db, that one should refrain from using Zend_Db_Statement, if they need to run certain long queries.
Hide
Joern Heissler added a comment -

I agree that the bug is in libpcre, but with a different regex, the probability of segfault can be reduced.
The below fix is mysql specific and also addresses ZF-7911.
libpcre still goes into a recursion, but the depth seems to be only a fraction of the original.
A proper fix would be in the libpcre. Or ZF has to write its own FSM. But doing it in pure php is probably slow.

class Company_Db_Statement_Pdo_Mysql extends Zend_Db_Statement_Pdo
{
    /* remove "foo""b\ar\"b\\az" and 'foo''b\ar\'b\\az' style quoted strings from sql queries. `backticks` are not touched. Specific for mysql syntax.
     * for `foo``bar` style quotes, that regex part could be added (not tested):
     *    `([^`]+|``)*?`(?!`)
     */
    protected function _stripQuoted($sql)
    {
        $regex = "/\"([^\\\\\"]+|\\\\.|\"\")*?\"(?!\")|'([^\\\\']+|\\\\.|'')*?'(?!')/";
        return preg_replace($regex, '', $sql);
    }
}

$db = Zend_Db::factory('PDO_MYSQL', $config->params);
$db->setStatementClass('Company_Db_Statement_Pdo_Mysql');
Show
Joern Heissler added a comment - I agree that the bug is in libpcre, but with a different regex, the probability of segfault can be reduced. The below fix is mysql specific and also addresses ZF-7911. libpcre still goes into a recursion, but the depth seems to be only a fraction of the original. A proper fix would be in the libpcre. Or ZF has to write its own FSM. But doing it in pure php is probably slow.
class Company_Db_Statement_Pdo_Mysql extends Zend_Db_Statement_Pdo
{
    /* remove "foo""b\ar\"b\\az" and 'foo''b\ar\'b\\az' style quoted strings from sql queries. `backticks` are not touched. Specific for mysql syntax.
     * for `foo``bar` style quotes, that regex part could be added (not tested):
     *    `([^`]+|``)*?`(?!`)
     */
    protected function _stripQuoted($sql)
    {
        $regex = "/\"([^\\\\\"]+|\\\\.|\"\")*?\"(?!\")|'([^\\\\']+|\\\\.|'')*?'(?!')/";
        return preg_replace($regex, '', $sql);
    }
}

$db = Zend_Db::factory('PDO_MYSQL', $config->params);
$db->setStatementClass('Company_Db_Statement_Pdo_Mysql');

People

Vote (18)
Watch (18)

Dates

  • Created:
    Updated: