Issues

ZF-5063: Segmentaion fault on preg_replace in Zend_Db_Statement

Issue Type: Bug Created: 2008-11-25T12:55:05.000+0000 Last Updated: 2012-05-16T22:15:43.000+0000 Status: Resolved Fix version(s): - 1.12.0 (27/Aug/12)

Reporter: Asger Hallas (asgerhallas) Assignee: Bart McLeod (mcleod@spaceweb.nl) Tags: - Zend_Db

  • state:need-feedback
  • zf-caretaker-adamlundrigan
  • zf-crteam-padraic
  • zf-crteam-priority
  • zf-crteam-review

Related issues: - ZF-3025

Attachments: - ZF-5063.patch

Description

This seems not to exactly be a Zend Framework bug, but more likely PHP or PRCE lib bug, but I'll post it here for your opinion.

On line 190 of Zend_Db_Statement is the following preg_replace:

$sql = preg_replace("/$q($qe|\\{2}|[^$q])*$q/", '', $sql);

When the sql passed is longer than about 4100 characters the process dies with the following error in the log:

[notice] child pid 7426 exit signal Segmentation fault (11)

I can drill it down to be this part of the regex that fails: /([^'])*/

The content in the sql i UTF8 encoded text - all characters are ascii though. I don't know if that might be of any help.

It seems to be somewhat related to this filed php bug:

http://bugs.php.net/bug.php?id=46551

I'm using PHP 5.2.6-0.dotdeb.1 with Suhosin-Patch 0.9.6.2 (cli) (built: May 2 2008 10:24:03) on a debian-server. I will upgrade to latest snapshot tomorrow to see if that helps.

Comments

Posted by Asger Hallas (asgerhallas) on 2008-11-25T13:50:26.000+0000

I guess it has something to do with this: http://bugs.debian.org/cgi-bin/bugreport.cgi/…

Posted by Ralph Schindler (ralph) on 2009-01-11T21:53:50.000+0000

This appears to be an issue within php itself and not an issue in ZF. Also appears to have been fixed in php.

Posted by Michael Rehbein (tech13) on 2009-06-19T14:25:15.000+0000

It appears that it is actually an issue with pcre, which has been flagged as "won't fix".

See: http://bugs.exim.org/show_bug.cgi?id=797

Raising the process stack limit helps handle larger queries, but doesn't "fix" the issue.

// simple "query" that triggers segfault. $sql = str_pad("'", 6000, 'a');

// should throw invalid sql, actually segfaults $dbAdapter->query($sql);

Is there another way to write code to prevent large queries from segfaulting php since there doesn't appear to be a fix coming for the pcre library?

Posted by Vincent de Lau (vdelau) on 2009-09-21T08:40:54.000+0000

This issue can be fixed or better, worked around by changing the regexp slightly. Basically, the (..|..|[^.])* will 'eat' every character between quotes one-by-one, incrementing the stack every character. Adding a + sign after the negated character class will eat as much characters as it can, significantly reducing the stack usage.

I've attached a patch that will fix this issue, and also will correct http://framework.zend.com/issues/browse/ZF-3025

Posted by Vincent de Lau (vdelau) on 2009-09-21T08:41:51.000+0000

submitted patch also fix a wrong fix for issue ZF-3025

Posted by Vincent de Lau (vdelau) on 2009-09-30T02:51:15.000+0000

I'd really like to see this fix in 1.9.4 when ZF-7911 is not ready yet.

Posted by Patrick Schwisow (patrick.schwisow) on 2010-03-23T10:37:44.000+0000

The patch works, why hasn't this fix made it to production in the last six months?

Posted by Brian Holub (bholub) on 2010-06-27T12:04:31.000+0000

this cost me a couple hours. please bump up the priority on this one (the patch works), the seg fault might not be ZF's... fault.. but it's extremely painful.

Posted by Daniel Kreuer (dkreuer) on 2011-03-11T00:28:12.000+0000

This Issue still exists in ZF.

Posted by Georgy Korshunov (kipelovets) on 2011-05-13T20:12:45.000+0000

the patch doesn't work for me. also it's not correct though adding '+' after $qe means 'last character of $qe repeated one or more times'. this would be better: $sql = preg_replace("/$d(($de)+|(\\{2})+|([^$d])+)*$d/", '', $sql); and the best solution for me is just making the pattern ungreedy. i'm not an expert in PCRE, but as i read on wikipedia, the PCRE engine would work faster with ungreedy patterns. and with this pattern greediness actually doesn't matter

Posted by Misha Krassovski (mkrassovski) on 2011-10-03T21:35:08.000+0000

Guys, does anybody know how to fight this problem? I have RHE6, ZF 1.11.3, and issue is still there. Published patch fixes nothing as well as others published in ZF-7911 Today is 10/3/2011, meaning it has been in the air for almost 3 years now!!!!!!

Posted by Georgy Korshunov (kipelovets) on 2011-10-05T09:09:28.000+0000

@Misha Krassovski besides patches published here, there are two workarounds that worked perfectly for me: 1) make the pattern ungreedy by adding /U (more at http://php.net/manual/en/…) here you need to change ZF code, but it will help with all your queries 2) use the adapter directly with those queries that fail, i.e: $db->getAdapter()->getConnection()->exec($query);

Posted by Adam Lundrigan (adamlundrigan) on 2011-10-05T14:37:26.000+0000

Are there any known side-effects to switching the preg_replace call to ungreedy? That seems like the simplest solution, assuming we can come up with a test to prove it actually solves the issue.

Posted by Vincent de Lau (vdelau) on 2011-10-05T15:33:38.000+0000

The problem with this issue, is that it is not something that can reliably be unit tested. The segfault will only appear on certain systems when enough data is passed, where 'enough' depends on the system and how PHP and PCRE are build.

The patch that I suggested for this issue has been (and probably still is) in production. The patch would probably not fix the problem entirely, if enough data is passed, the query might still fail. I feel that if ZF-7911 is solved properly, this issue will go away or become even more unlikely.

In the mean time, I've got a new job where ZF is not used (yet).

Posted by Adam Lundrigan (adamlundrigan) on 2011-10-17T01:22:18.000+0000

Could someone here who is experiencing this issue please try the patch i've attached (ZF-7911_ungreedy.patch) and report back the result? It's a modified version of the fix suggested in ZF-7911, with the addition of the ungreedy modifier as suggested here by [~kipelovets]

Posted by Georgy Korshunov (kipelovets) on 2011-10-17T11:24:14.000+0000

[~adamlundrigan], it worked for me

Posted by Dvir Azulay (dvirazulay) on 2011-10-18T20:17:20.000+0000

Just casting my opinion on this - I've also wasted a couple of hours banging my head with this issue a few good months ago when it appeared on the zf-based product I was working on, but it only happened on my local env (which was Windows XP) and not on our linux production env. Nevertheless, important and really annoying issue that takes a while to drill down to.

Posted by Tilman (tilman) on 2012-01-03T21:49:29.000+0000

I just ran into this issue with a very long query (inserting a PHP serialized array) with ZF 1.11.11 on PHP 5.3.3 and PHP 5.3.6. I tried out the ungreedy patch. I can't tell if the code in _stripQuoted() is now still doing what it is supposed to do, but the segmentation fault on preg_replace did go away.

This is a 3 year old bug, and knowing from experience now, it is pretty nasty once you run into it. I don't even remember running into segfault issues with PHP before. It took me a while to find the culprit and then this bug report.

If the maintainers are not willing to fix the problem, at least put a disclaimer in the manual of Zend_Db, that one should refrain from using Zend_Db_Statement, if they need to run certain long queries.

Posted by Joern Heissler (joernh) on 2012-01-11T19:27:34.000+0000

I agree that the bug is in libpcre, but with a different regex, the probability of segfault can be reduced. The below fix is mysql specific and also addresses ZF-7911. libpcre still goes into a recursion, but the depth seems to be only a fraction of the original. A proper fix would be in the libpcre. Or ZF has to write its own FSM. But doing it in pure php is probably slow.

<pre class="highlight">
class Company_Db_Statement_Pdo_Mysql extends Zend_Db_Statement_Pdo
{
    /* remove "foo""b\ar\"b\\az" and 'foo''b\ar\'b\\az' style quoted strings from sql queries. `backticks` are not touched. Specific for mysql syntax.
     * for `foo``bar` style quotes, that regex part could be added (not tested):
     *    `([^`]+|``)*?`(?!`)
     */
    protected function _stripQuoted($sql)
    {
        $regex = "/\"([^\\\\\"]+|\\\\.|\"\")*?\"(?!\")|'([^\\\\']+|\\\\.|'')*?'(?!')/";
        return preg_replace($regex, '', $sql);
    }
}

$db = Zend_Db::factory('PDO_MYSQL', $config->params);
$db->setStatementClass('Company_Db_Statement_Pdo_Mysql');

Posted by Bart McLeod (mcleod@spaceweb.nl) on 2012-03-13T23:41:48.000+0000

Added a patch that tries to solve the following: - adds unit tests as proposed by Vincent de Lau with some supposed improvements where they seemed wrong - completes the _stripQuoted function, as it was quite incomplete if tested against the new unit test - used the U modifier in the regexes that need it to avoid the segfault that I could reproduce over and over again

Please review and give your opinion. If I get no complaints I will commit in about two weeks from now.

Posted by Joern Heissler (joernh) on 2012-03-14T20:33:59.000+0000

If I get no complaints

SELECT ''?`;

expected result: SELECT ; I get: SELECT '?`;

Also, \ has no special meaning within `` style quotes in mysql. i.e. \ or \\ are both valid and different.

Posted by Joern Heissler (joernh) on 2012-03-14T20:36:38.000+0000

\ or \ \ (without the blank) are both valid and different.

Posted by Bart McLeod (mcleod@spaceweb.nl) on 2012-03-14T21:30:59.000+0000

Joern, can you be more specific. It's good that you complain / comment. I'm just the other developer trying to fix something that has been broken for many years. I don't expect it to be fixed overnight.

From what I have understood from the research of Vincent de Lau, is that no attempt is made to validate or improve the query. The query is assumed to be correct and identifiers are stripped, so that the query can be parsed. Quoted identifiers have to be stripped as well.

From the research of others I have understood that a backtick can't be escaped with a backslash, only with an extra backtick. Please use [] to quote strings in comments here, as Vincent did, to avoid confusion.

When complaining about expected results, please do so in a format that I can readily use in the unit test.

In your case, this would be: Expected:

<pre class="highlight">
in: [SELECT '`'`?`;]
out: [SELECT ;]

Actual:

<pre class="highlight">
in: [SELECT '`'`?`;]
out: [SELECT '?`;]

I can explain why you get what you do not expect: the two mathing backticks are stripped first. What remains is the actual output. Why is this wrong? Can you explain that? If so, I'll by happy to fix it.

Either way, it doesn't segfault anymore, while it covers the most obvious use cases. I will welcome edge cases to be solved in separate issues if needed. Maybe we should even add an adapter specific way to override the order and format of the regexes if needed, but it seems to me that solving the segfault issue and the common use cases would already help a lot of people.

I find it hard to find a realistic use case for the format you supplied, do you know of any?

Posted by Joern Heissler (joernh) on 2012-03-14T22:28:52.000+0000

I'll try to make a more realistic example. Not sure if anyone would actually do it, but it looks like someone might. I hope that now it is obvious why the actual output is wrong.

Expected:

<pre class="highlight">
in: [SELECT '?`' `x`, col `y` FROM t WHERE u = ?;]
out: [SELECT  , col  FROM t WHERE u = ?;]

Actual:

<pre class="highlight">
in: [SELECT '?`' `x`, col `y` FROM t WHERE u = ?;]
out: [SELECT '?xy` FROM t WHERE u = ?;]

Here is another unrealistic use case which segfaults:

<pre class="highlight">
<?php

$regex = "/'(\\\\'|[^'])*'/U";
$sql = "'";
for ($i = 0; $i < 1000000; ++$i) $sql .= "\\x";
$sql .= "'";
$sql = preg_replace($regex, '', $sql);

I wish I could come up with a perfect solution myself, but everything I try is either deadly slow, segfaults, would require libpcre changes or would require making a C module for stripping quotes :-)

Posted by Bart McLeod (mcleod@spaceweb.nl) on 2012-03-15T07:56:21.000+0000

Thanks for your explanation and the excellent formatting of your code examples.

I still have trouble understanding, because I think the query is wrong and we do not attempt to validate the query. In this input:

<pre class="highlight">[SELECT '?`' `x`, col `y` FROM t WHERE u = ?;]

the first backtick is not allowed in my opinion and that's why is goes 'wrong' with the output. Or should a backtick not be considered a quote if inside single quotes and that is what I'm missing? Why would one want a single backtick inside the single quotes? I still find it very, very hard to imagine a use case for that.

I could imagine something like this:

<pre class="highlight">
[INSERT INTO `pcre` (`test`) VALUES ('In MySQL, the backtick (`) is used to quoted identifiers')]

Will I be in trouble with this?:

<pre class="highlight">
[INSERT INTO `pcre` (`test`) VALUES ('In MySQL, the backtick (`) is used to quoted identifiers, and here is another backtick: ` ...ooops')]

I admit that I will have to look deeper into examples like these and make sure I am confident about how these should be formatted to be valid queries and then what the expected output should be. Any help is still appreciated. Even if it doesn't get perfect, It would help if it would improve, or?

Your segfaulting examples worries me, although it is unrealistic. Maybe I should not be worried, in php you can't do everything either. Like allocating memory inside an infinite loop will certainly crash php.

Posted by Joern Heissler (joernh) on 2012-03-15T09:59:55.000+0000

{quote} Thanks for your explanation and the excellent formatting of your code examples. {quote} I'm trying my best to learn the formatting language of this bug tracker :-)

{quote} Or should a backtick not be considered a quote if inside single quotes and that is what I'm missing? Why would one want a single backtick inside the single quotes? I still find it very, very hard to imagine a use case for that. {quote}

Inside quotes (whatever the type) all other quotes lose their special meaning and are just ordinary characters. So '?' is a string consisting of a question mark and a backtick. My query would return two columns, named x and y, where the value of the former is always '?' and the latter's value comes from the database:

<pre class="literal">
mysql> create table t (u int unsigned not null, col varchar(20) not null);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t values (10,'ten'), (20,'twenty'), (42,'fortytwo');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT '?`' `x`, col `y` FROM t WHERE u = 20;
+----+--------+
| x  | y      |
+----+--------+
| ?` | twenty |
+----+--------+
1 row in set (0.00 sec)

And your two queries are perfectly valid in mysql:

<pre class="literal">
mysql> create table pcre (test varchar(255));
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO `pcre` (`test`) VALUES ('In MySQL, the backtick (`) is used to quoted identifiers');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `pcre` (`test`) VALUES ('In MySQL, the backtick (`) is used to quoted identifiers, and here is another backtick: ` ...ooops');
Query OK, 1 row affected (0.00 sec)

mysql> select * from pcre;
+----------------------------------------------------------------------------------------------------+
| test                                                                                               |
+----------------------------------------------------------------------------------------------------+
| In MySQL, the backtick (`) is used to quoted identifiers                                           |
| In MySQL, the backtick (`) is used to quoted identifiers, and here is another backtick: ` ...ooops |
+----------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Of course my examples appear to be unrealistic. But they can get very real when user-provided data is sent to the database and people can cause your application to segfault (resulting in DoS) or worse. I believe that with unrealistic code this bug can allow a user to at least inject random data into the database or tamper with SELECT results.

Posted by Bart McLeod (mcleod@spaceweb.nl) on 2012-03-15T11:30:36.000+0000

Thank you very much. You just clarified a key point I missed: the fact that inside any type of quotes all other quotes lose their meaning entirely. It is obvious, but I overlooked it's importance for this function completely. I'll have to redo my homework. It will have to wait for a while, but in any case, I won't commit my patches until I have give it additional attention.

Posted by Bart McLeod (mcleod@spaceweb.nl) on 2012-03-16T22:05:29.000+0000

By changing the order by which the quoted identifiers and values are stripped, I now have your use cases - and mine - convered by the unit test. So, maybe I solved the related ZF-7911 issue.

As far as the segfault is concerned: I tested that with 9000 chars. It segfaulted. I added the Ungreedy modifier, it stopped segfaulting. But if I try with only a few more characters, like 9500, it segfaults just the same, even with the /U modifier. No sophisticated string needs to be inserted to reproduce it: I just insert 9500 times the [a] character:

<pre class="highlight">
$db = Zend_Db_Table::getDefaultAdapter();
$value = str_repeat('a', 9500);
$query = "INSERT INTO `pcre` (`test`) VALUES ('$value')";
$db->query($query);

Posted by Bart McLeod (mcleod@spaceweb.nl) on 2012-05-10T22:42:57.000+0000

I commited some changes to trunk using issue ZF-7911. This does not mean everything is fixed and fine now. However, my fix adds a test that should cover what has been asked for here and in ZF-7911.

Today I did another commit with a slightly changed regex. While the old regex would segfault near 9,500 characters, the new regex segfaults near 65,000 characters. It's still not good, but we are improving.

I also want to point everyone at this new bugreport in the PCRE library, there is some usefull information in there, that we might be able to use in order to get regular errors from PCRE instead of segfaults and to allow us to increase the memory limit of PCRE based on the memory limit we set for php:

http://bugs.exim.org/show_bug.cgi?id=1233

I would appreciate help from anyone who has experience compiling php. What I would like to do is change it so that: - PCRE uses heap instead of stack - PCRE is configured with a memory limit based on the memory limit of php, so that if we increase the memory limit for php, we also increase the memory limit for PCRE and so that we get regular errors instead of segfaults.

Posted by Bart McLeod (mcleod@spaceweb.nl) on 2012-05-16T22:15:43.000+0000

I committed a new regex a while ago, that passes all unit tests added in ZF-7911. With this regex, you can use a query of around 65,000 characters if you use the default settings of php.

In addition, by asking the PCRE developers, I found there is a setting you can put in php.ini. pcre.recursion_limit and pcre.backtrack_limit

I experimented with those, and I found that if I set the pcre.recursion_limit to 1000, which is a very low value, compared to the default of 100k, it 'just works' (all the unit tests pass). You get a regular memory_limit exceeded php error if you use too long a query, but you can increase the memory_limit until you get a MySQL has gone away error (because the maximum packet size was exceeded). In either case, it is no longer Zend_Db_Statement that is the problem. I could insert 100k utf8 characters before MySQL gave up.

See also: http://bugs.exim.org/show_bug.cgi?id=1233

If you still experience problems even with the new regex and tweaked settings, feel free to reopen this issue.

Note that if you set the recursion limit to low, to 10 for example, the resulting replacement string will be empty and you will have no query anymore. In that case, the unit tests will not pass.

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts