Issues

ZF-220: PostgreSQL: Zend_Db cannot bind PHP array for column of array SQL datatype

Description

The array syntax for a sql array is '{1,2,3}'::int[] for example. Zend_Db cannot properly bind to an inline array from a php array.

Current workaround is to avoid the prepared statement route and implode the array concatenating the query.

This is confirmed to affect operation with PGSQL, though I suspect it may affect all drivers.

:bindname::int[] results in 'array value must start with "{" or dimension information' error from the db. adding {:bindname}::int[] will result in a problem preparing the statement.

This may be a problem / limitation with the underlying pdo implementation.

Comments

Changing fix version to 0.8.0.

Arrays are not SQL datatypes. This is a PostgreSQL proprietary datatype.

InterBase/Firebird also supports a multidimensional array datatype, but does not support SQL syntax for literal values. Those products supply an API for converting structured application-level arrays into the special type of BLOB in the RDBMS that stores arrays.

Oracle has a VARRAY datatype. One can use this to make a user-defined datatype for a column that contains multiple values. But the syntax for specifying a literal value of that column is different than the syntax used by PostgreSQL.

Microsoft SQL Server, MySQL, and SQLite do not support an array or any other sort of multi-value datatype.

In general, multi-value datatypes are eschewed by the first normal form of relational database design.

The syntax is highly useful in sequence generation and the repeatable ordering of lists. It's also highly useful for performing IN/ANY predicate matching.

My specific use is for use with the generate_series() postgresql function.

Observe.

=# SELECT ('{a,b,c}'::varchar[])[i] AS demo FROM generate_series(1,3) s(i);

demo

a b c (3 rows)

Fixing this would be highly useful.

What happens if you do the following:


$db->query('SELECT (:bindname::varchar[])[i] AS demo FROM generate_series(1,3)',
    array('bindname' => '{' . implode(',', $phpArray) . '}' ));

The other choice would be to implement new code so that the PostgreSQL adapter recognizes arrays:


$db->query('SELECT (:bindname::varchar[])[i] AS demo FROM generate_series(1,3)',
    array('bindname' => (array) $phpArray));

And generates the '{a,b,c}' value. Ideally it should also generate the suffix ::varchar[] portion, but it would be hard for the adapter to infer the right SQL type to use.

The PostgreSQL JDBC driver solves this by requiring the user to pass an object of the class java.sql.Array, containing a collection of objects which the driver can examine and determine their class.

This may not be easy to do in PHP, since PHP is very loosely typed. How can the driver tell the correct SQL datatype to use? One option is to always assume it's ::varchar[], and let the SQL engine do type conversions if necessary.

Things get even more complicated if we want to support multi-dimensional arrays. Or PHP associative arrays.

I think this is an 80/20 rule thing. Single dimensional arrays of numbers or varchars should be easy to do and cover 80%+ of uses.

Your first idea does actually work.

    public function testFunction() {
      $phpArray = array('a','b','c');
      $ret = $this->_db->query('SELECT (:bindname::varchar[])[i] AS demo FROM generate_series(1,3) s(i)', array('bindname' => '{' . implode(',', $phpArra

y) . '}' )); print_r(iterator_to_array($ret)); }

Array ( [0] => Array ( [demo] => a )

[1] => Array
    (
        [demo] => b
    )

[2] => Array
    (
        [demo] => c
    )

)

(Note the s(i) is important)

The latter is however what I'm looking for. It should either bind natively to int[] or varchar[], depending if the values in the array are numbers... defaulting to varchar if not... (might be a good case to add a cidr catch too)

We could use isInt from filter to do a quick check on the array and determine if any of the values arent numbers.

ideally the syntax would be

SELECT (:bindname)[i] AS demo FROM generate_series(1,3) s(i)

and bind in 'bindname'=>(array) $phpArray;

a implicit ::type shouldnt fail it, and if you provide a int and :: cast it, pgsql should be smart enough to do that convert.

I'm most concerned however, with each element being properly escaped, which I'm not sure any of these solutions address yet.

K

In proper server-side parameterized queries, escaping is not necessary. The value should be sent to the server separately from the SQL query, and there is no opportunity for SQL injection. Malicious values cannot go beyond the bounds of a quoted string, because the bound value is combined into a pre-parsed representation of the SQL query (probably some kind of pseudo-compiled opcodes) internal to the RDBMS. That's what is so nice about using bound parameters to protect against SQL injection.

However, this mechanism is spoiled if the client simulates parameter binding by interpolating values into the SQL string before sending it to the server.

Anyway, back to arrays.

Regarding usage of isint(), we cannot always infer that the array should be an array of int[] just because the array elements consist of strings of digits.

For example, what if am inserting values to a column that is an array of dates, and I give it a PHP array of strings in 'YYYYMMDD' format. This is a valid date literal format in PostgreSQL -- if it's enclosed in quotes. But if Zend_Db infers that digits mean it's an integer, this would construct '{a,b,c}'::int[] instead of '{a,b,c}'::date[].

The 80/20 rule assumes that the 20% that isn't supported fails gracefully, but the above has the potential to give a very confusing error message, or even to insert wrong data.

Instead of an array then, why not create an array wrapping class that has an explicit type setting method. ArrayObject extended would probably do the trick.. Zend_Bound_Object... ?

Other than that I'm not sure there's a good solution with php's lack of strong typing.

Since this is a solution specific to PostgreSQL, I would expect the class to be something like Zend_Db_Adapter_Pdo_Pgsql_ArrayValue.

Regarding the escaping issue, now that I think about it, you may be right that it's an issue, because it must be interpolated into the SQL string in the client side. That is, unless PDO also understands how to send an ArrayObject to the PostgreSQL server, which I doubt.

This means that the ZF PostgreSQL driver needs to duplicate the parsing logic in PDO to handle both positional and named parameters. One might look at the PostgreSQL JDBC driver code to see how they do it.

To be frank, this is a nontrivial implementation for an obscure use case specific to an RDBMS that is not as widely used as others supported by ZF. Unless you write this class and the patch to use it, it's going to be prioritized very low, and it's not likely to get done by ZF 1.0.

I'll see if I can talk to Wez about it getting into core PDO.

I talked to some of the PGSQL project guys and they seem to imply that the C API has all the mechanisms for binding an array to a prepared query.

The PDO PGSQL driver is going to need some work to support PGSQL 8.2 (especially regarding INSERT RETURNING syntax) so maybe I can get this piggy-backed there.

If not, I'm happy to help on a patch, but I don't today have a really strong handle on how Zend_Db* works internally today.

K

Please categorize/fix as needed.

Assigning to Ralph since he maintains Zend_Db now. Ralph, please evaluate whether this is an issue we will address.

I am inclined to say that this should be a feature request for either the PDO PGSQL driver or the next version of PDO (whatever its called).

What do you think Kevin?

If not, can we close out?

-ralph

I guess it comes down to whether the zfw adapter is adding value to PDO or just repackaging it. If its the former, the feature request should stay open such that we can improve on the core infrastructure. If the latter, sure, close the ticket.

Unassigning myself, postponed till 2.0 development.

Reassigned to component maintainer