Issues

ZF-7911: Zend_Db_Statement::_stripQuoted seems not to be complete

Description

While investigating http://framework.zend.com/issues/browse/ZF-5063, I noted that there were some inconsistencies in Zend_Db_Statement::_stripQuoted. I'll repeat what I've mailed to the list:

Last week we ran in an issue with a segfault caused by the preg_replace statements in Zend_Db_Statements::_stripQuoted (issues ZF-5063[1] and ZF-7585[2]). While trying to find a work-around, I discovered that this function is broken. The fix for issue ZF-3025[3] seems to be applied wrong (r9727).

The fix for my issue could be to modify the regular expression. Instead of the repetition, my replacement relies on assertions. During initial testing (running a 1MB query) it seems that this would not rely on the stack too much, reducing the chance of a segfault. This would need to be tested further. In the patch below, I restored the original replacement of quoted identifiers that was lost in r9727.

However, there are still some issues: - In my regexp, I assume that a quote can be escaped by repeating the quote or by prefixing it with a backslash. A backslash itself can be escaped by repeating it. I'm not sure if these are safe assumptions for all RDBMS's. - The patch should be tested thoroughly, although I'm quite confident about it. - MySQL accepts both single and double quotes for values. This is not accounted for, nor was it in the old version. I'm not sure how this is for other RDBMS's. Would it be safe to strip out all quote style: single ('), double (") and backtick (`)? I could imagine that for most systems this would be OK. Individual adapters could always override the default function and provide their own.

Comments

Initial patch, doesn't cover all problems mentioned in the issue.

Both issues are related to this function. Fixing this issue should also fix these.

Lets work together on this. I need to see some unit tests to understand what use cases are triggering the bad behavior.

The other issue described I feel like might be a PHP problem itself. Lets discuss in #zftalk.dev when you're available.

First a specification: The function should strip all locations where prepared statement parameters (? and :name) should be ignored. The obvious places where this could occur are within strings and identifiers. Identifiers with strange characters are usually also quoted. The quotes used can vary per SQL dialect. Common quotes include 'single' and "double" quotes and backticks. To include the quote character in a string, they usally can be escaped. This can be done by repeating the quote or by prefixing it with an escape character. This escape character is usually a backslash ().

h4. Some research: for MySQL: - identifiers can be escaped with backticks, escape can only be done by doubling the backtick. A backslash is not treated as a specials character. - strings should be escaped with single or double quotes, escaping can be done by doubling and escaping with a backslash

common other dialects (SQL-92?): - identifiers can be escaped with double quotes, escaping is done by doubling - strings should be escaped with single quotes, escaping is done by doubling

h4. Assumptions I think it would be save to generalize in saying that string enclosed in either single quotes, double quotes and backticks are to be stripped. A doubled quote character, or a character prefixed with a backslash should not be considered the ending quote. What happens if a dialect doesn't support a quoting style?

  • If doubling is not a valid escape, the whole sequence would be invalid, but stripped not the less: ['a''b']
  • If a backslash is not a valid escape, it will just be stripped. Again, it may fail for incorrect queries: ['a\'b']
  • If a specific quoting style is not supported, it should not appear in the query.

h4. Test case I've taken a quick look at the unit tests, but unfortunately can not spend enough time to create them. I'll provide some information and test cases so that someone else can maybe create unit tests for them.

This test set is as it would work for MySQL. Other RDBMS might have other conventions. To avoid confusion, I'm enclosing the strings in [] brackets.

in: [SELECT * FROM table]
out: [SELECT * FROM table]

in: [SELECT * FROM `table`]
out: [SELECT * FROM ]

in: [SELECT * FROM `strange:table`]
out: [SELECT * FROM ]

in: [SELECT * FROM `strange`table`]
out: [SELECT * FROM table`] (bad query)

in: [SELECT * FROM `strange``table`]
out: [SELECT * FROM ]

in: [SELECT * FROM `strange
table`]
out: [SELECT * FROM table`] (bad query)

in: [SELECT * FROM `strange\`table`]
out: [SELECT * FROM ]

in: [SELECT * FROM `strange\``table`]
out: [SELECT * FROM table`] (bad query)

in: [SELECT * FROM `strange\
table`] out: [SELECT * FROM ] in: [SELECT 'value' AS identifier] out: [SELECT AS identifier] in: [SELECT 'strange:value' AS identifier] out: [SELECT AS identifier] in: [SELECT 'strange'value' AS identifier] out: [SELECT value' AS identifier] (bad query) in: [SELECT 'strange''value' AS identifier] out: [SELECT AS identifier] in: [SELECT 'strange'''value' AS identifier] out: [SELECT value' AS identifier] (bad query) in: [SELECT 'strange\'value' AS identifier] out: [SELECT AS identifier] in: [SELECT 'strange\''value' AS identifier] out: [SELECT value' AS identifier] (bad query) in: [SELECT 'strange'''value' AS identifier] out: [SELECT AS identifier] in: [SELECT "value" AS identifier] out: [SELECT AS identifier] in: [SELECT "strange:value" AS identifier] out: [SELECT AS identifier] in: [SELECT "strange"value" AS identifier] out: [SELECT value" AS identifier] (bad query) in: [SELECT "strange""value" AS identifier] out: [SELECT AS identifier] in: [SELECT "strange"""value" AS identifier] out: [SELECT value" AS identifier] (bad query) in: [SELECT "strange\"value" AS identifier] out: [SELECT AS identifier] in: [SELECT "strange\""value" AS identifier] out: [SELECT value" AS identifier] (bad query) in: [SELECT "strange"""value" AS identifier] out: [SELECT AS identifier]

h4. References

http://dev.mysql.com/doc/refman/… http://dev.mysql.com/doc/refman/… http://publib.boulder.ibm.com/infocenter/rbhelp/… http://msdn.microsoft.com/en-us/library/ms179899(SQL.90).aspx

Modified and improved patch; may not solve all issues, but completes the majority. (Patch from xenforo.)

As far as I can tell, the patch was never committed. Is there any particular reason?

Initially, I think this was not committed because there are no unit tests.

A couple of observations: * The patch attached by Matthew does not handle the case where values are quoted using double-quotes. Does that need to be added? * The patch attached by Matthew makes the assumption that backslash is only escaping method. Is that a safe assumption? * It appears that this case is not handled properly:


IN: SELECT 'strange\'''value' AS identifier
OUT: SELECT AS identifier

My expectation is that it would behave the same as:


IN: SELECT 'strange\'value' AS identifier
OUT: SELECT AS identifier

as it also has an even number of unescaped quotes, however with Matthew's patch applied it works like this:


IN: SELECT 'strange\'''value' AS identifier
OUT: SELECT value' AS identifier

I've attached a patch with a first try at implementing a unit test to cover the "Test Case" part of Vincent's 30/Sep/09 post.

See my comment in ZF-5063. It's MySQL-specific, though.

I am building a unit test using the input from Vincent de Lau, also using his research. I think that since he states that a backtick in MySQL can only be escaped with a backtick, the following example input / output pair is wrong:


in: [SELECT * FROM `strange\`table`]
out: [SELECT * FROM ]

I would expect:


in: [SELECT * FROM `strange\`table`]
out: [SELECT * FROM `strange\]

because only the last pair of backticks matches. The backslash in this case has no special meaning.

But of course, I can be wrong. What I currently get when unit testing is:


in: [SELECT * FROM `strange\`table`]
out: [SELECT * FROM table`]

This probably makes even more sense, since if we do not consider the backtick a special character, the first two backticks match and the last one doesn't.

A similar thing happens with the next example:


in: [SELECT * FROM `strange\``table`]
out: [SELECT * FROM table`]

Since now we have a real backtick escape sequence (), the second and third backtick that form the sequence should not be considered and so the first and last backticks match, so the input / output would be: <pre class="highlight"><code> in: [SELECT * FROM `strange\table] out: [SELECT * FROM] </code></pre> Maybe the 4th and 5th example are just accidentally swapped, yet the sixth example shows a similar behavior, the current implementation does this, contrary to the example given, and it makes sense: <pre class="highlight"><code> in: [SELECT * FROMstrange\table] out: [SELECT * FROM table] ``` The fourth backtick in this case isn't escaped, so it matches the first and the last one remains.

I finally ended up with these expected input / output scenario's and I had to change quite a bit inside the function to make these all pass:


in: [SELECT * FROM `strange`table1`]
out: [SELECT * FROM table1`]
    
in: [SELECT * FROM `strange``table2`]
out: [SELECT * FROM ]
    
in: [SELECT * FROM `strange

table3] out: [SELECT * FROM table3]

in: [SELECT * FROM strange\table4] out: [SELECT * FROM table4]

in: [SELECT * FROM strange\``table5] out: [SELECT * FROM ]

in: [SELECT * FROM strange\<pre class="highlight"><code>table6] out: [SELECT * FROM table6`]

in: [SELECT 'value7' AS identifier] out: [SELECT AS identifier]

in: [SELECT 'strange:value8' AS identifier] out: [SELECT AS identifier]

in: [SELECT 'strange'value9' AS identifier] out: [SELECT value9' AS identifier]

in: [SELECT 'strange''value10' AS identifier] out: [SELECT AS identifier]

in: [SELECT 'strange'''value11' AS identifier] out: [SELECT value11' AS identifier]

in: [SELECT 'strange\'value12' AS identifier] out: [SELECT AS identifier]

in: [SELECT 'strange\''value13' AS identifier] out: [SELECT value13' AS identifier]

in: [SELECT 'strange'''value14' AS identifier] out: [SELECT value14' AS identifier]

in: [SELECT "value15" AS identifier] out: [SELECT AS identifier]

in: [SELECT "strange:value16" AS identifier] out: [SELECT AS identifier]

in: [SELECT "strange"value17" AS identifier] out: [SELECT value17" AS identifier]

in: [SELECT "strange""value18" AS identifier] out: [SELECT AS identifier]

in: [SELECT "strange"""value19" AS identifier] out: [SELECT value19" AS identifier]

in: [SELECT "strange\"value20" AS identifier] out: [SELECT AS identifier]

in: [SELECT "strange\""value21" AS identifier] out: [SELECT value21" AS identifier]

in: [SELECT "strange"""value22" AS identifier] out: [SELECT value22" AS identifier]

Added a test case, not yet in the patch, but this passes as well:


in: [SELECT 'strange\'''value23' AS identifier]
out: [SELECT  AS identifier]

Added a patch that adds most recent unit test and hopefully better implementation. It still segfaults, but this should be complete. Please review.

I committed the patch that contains all tests proposed here and updated code for _stripQuoted that satisfies all of those tests. The seqfault issue still exists above 9,000 characters. I will try to get this resolved using issue ZF-5063.