(Un)Support for prepared statements in Ruby on Rails

One of the many security problems generally found in web applications is the so-called SQL injection bug. If one isn’t careful, then an attacker can make the web application execute arbitrary SQL statement. There are two ways to prevent SQL injection bugs:

  1. Escape input before putting it into the final SQL statement. Ruby on Rails will automatically do this for you in most cases.
  2. Use prepared statements. In this scheme, one sends a statement containing placeholders to the DBMS. Then one binds arguments to the placeholders. Finally, one executes the prepared statements (with the bound arguments). The second and third phase can be repeated indefinitely, thus reusing the same prepared statement. Here is an example in pseudo code:
    Ruby
    1. # The ‘?’ is a placeholder
    2. statement = db.prepare("SELECT * FROM foo WHERE bread = ? AND cake = ?")
    3.  
    4. # Binds the first ‘?’ to "foo", and the second ‘?’ to "no’ SQL injection possible", then
    5. # executes the statement. This has the same effect as running:
    6. #
    7. #    SELECT * FROM foo WHERE brad = ‘foo’ AND cake = ‘no’ SQL injection possible’
    8. #
    9. # (Note that the string escape character is database-specific, this is only an example.)
    10. statement.execute("foo", "no’ SQL injection possible")
    11.  
    12. # We reuse the same prepared statement and execute it multiple times, with
    13. # different parameters.
    14. statement.execute("hello", "world")
    15. statement.execute("apple", "pear")
    16. statement.execute("a", "b")

    Using prepared statements is generally preferred. It looks better and one cannot forget to escape an argument. Many DBMSes will also compile and optimize prepared statements internally, so if you want to execute many similar statements, then using prepared statements will give you a performance boost. Databases like Oracle are heavily dependent on prepared statements, and will perform badly if you don’t use them.

Unforunately Ruby on Rails’s does not support prepared statements at all. People have blogged about this in the past. A quote from Greg Luck:

We have two production applications running on Ruby. And how is it. Well, despite being perhaps no more than 5% of the functionality of our applications, Ruby on Rails is the number one consumer of Oracle CPU and logical gets. Why? Rails does not support prepared statements, so Oracle has to reparse every time.

EDIT: someone told me that this has been fixed in Rails 1.2, by using the “cursor=similar” option or something.

This is a shame. Because it doesn’t only cause problems for the database, but also for Rails itself. If an argument is large (for example, a large binary blob), then Rails will use a lot of memory trying to escape the string. Some people have been bitten by this before. Using prepared statements will solve a lot of problems, so I began working on this.

How does ActiveRecord work right now?

Here is a diagram of ActiveRecord’s current architecture:
ActiveRecord’s current architecture
(SVG version)

A Rails application only interfaces with ActiveRecord::Base (and other public ActiveRecord classes). Internally, ActiveRecord::Base aggregates an AbstractAdapter object (in the connection field), which abstracts away database-specific code. An AbstractAdapter is an abstract class. For example, PostgreSQLAdapter is one of its child classes, and implements PostgreSQL support. Each AbstractAdapter, in turn, aggregates an ‘raw connection’ object, which is a Ruby class that directly interacts with the database server.
An AbstractAdapter implements multiple interfaces, such as DatabaseStatements (for executing database statements) and Quoting (for escaping arguments for use in an SQL statement). Each AbstractAdapter child class has a different implementation for these interfaces.

ActiveRecord::Base methods, such as find(), work by constructing an SQL query, which they then execute. For example, the find() method can be described in pseudo code, as follows:

Ruby
  1. def find(options)
  2.     sql = ""
  3.     # Appends "SELECT #{options[:select]} FROM #{table_name}" to _sql_
  4.     create_select(sql, options)
  5.     # Appends "WHERE a = ‘b’ AND c = ‘d’" to _sql_
  6.     create_condition(sql, options)
  7.     …
  8.     # Appends "GROUP BY #{options[:group]}" to _sql_
  9.     create_group_by(sql, options)
  10.  
  11.     # Now we have a full SQL statement.
  12.     return @connection.select_all(sql)
  13. end

If we look at create_condition(), we see something like this:

Ruby
  1. def create_condition(sql, options)
  2.     sql < < "WHERE #{sanitize_sql(options[:conditions])}"
  3. end

The sanitize_sql function is an important function ActiveRecord’s SQL statement building process. Its job is to bind parameters into a part of an SQL statement (and escaping strings where necessary). For example, all of these:

Ruby
  1. sanitize_sql(["a = ? AND b = ?", "hello", "world"])
  2. sanitize_sql(["a = %s AND b = %s", "hello", "world"])
  3. sanitize_sql({:a => "hello", b => "world"})

return "a = 'hello' AND b = 'world'".

Adding support for prepared statements

Most databases support prepared statements, but I have heard that not all do. Furthermore, some Ruby database modules have no support for prepared statements! Prior to version 0.7, the Ruby-PostgreSQL module doesn't support prepared statements at all. So my design will have to take this into account.

The new architecture will look as follows:
ActiveRecord with prepared statements
I have introduced 3 new classes: StatementBuilder (an abstract base class), StandardStatementBuilder and PreparedStatementBuilder. StatementBuilder allows one to build an SQL statement incrementally, possibly with arguments. It is supposed to replace ActiveRecord::Base's sanitize_sql() function.
StandardStatementBuilder is almost exactly the same as sanitize_sql, but in a different API. One can retrieve the final SQL statement (with bounded arguments) by calling statement().
PreparedStatementBuilder is similar, but saves the SQL statement (with placeholders) and the arguments in different fields. Separating the SQL statement and its arguments is required for database adapters that support prepared statements.

AbstractAdapter has gained a new method, new_statement(). This is a factory method which creates a new StatementBuilder. Depending on the specific AbstractAdapter child class (which may or may not support prepared statements), this method may create a StandardStatementBuilder, PreparedStatementBuilder, or some other StatementBuilder class. The default implementation of this method creates a StandardStatementBuilder, but MySQLAdapter will create a PreparedStatementBuilder if the underlying Ruby Mysql module supports prepared statements.

ActiveRecord::Base uses AbstractAdapter.new_connection() to create a StatementBuilder. It then uses that StatementBuilder object to build a complete SQL statement. Finally, it passes the StatementBuilder object to one of the DatabaseStatement methods for execution. Notice that the DatabaseStatement methods now accept both a StatementBuilder object and a String object. The latter is for backwards-compatibility.

The DatabaseStatement methods are implemented by the (concrete) AbstractAdapter child classes, so it goes without saying that the code for using StatementBuilder will go there.

Here's an example in pseudo code which demonstrates StatementBuilder's usage in ActiveRecord::Base:

Ruby
  1. def find(options)
  2.     statement = connection.new_statement(self.class.table_name)
  3.     create_select(statement, options)
  4.     create_condition(statement, options)
  5.     ...
  6.     return @connection.select_all(statement)
  7. end
  8.  
  9. def create_condition(statement, options)
  10.     statement.add_fragment(options[:conditions])
  11. end

And MySQLAdapter's execute() method (which executes an SQL statement) looks a bit like this:

Ruby
  1. def execute(statement)
  2.     if statement.is_a?(PreparedStatementBuilder)
  3.         st = @driver.prepare(statement.statement)
  4.         st.execute(*statement.arguments)
  5.         return st.result
  6.     else
  7.         # We call 'to_s' here because _statement_ can either be a StandardStatement
  8.         # or a String. This will ensure that we get the SQL statement string.
  9.         # StandardStatement.to_s() is an alias for statement()
  10.         return @driver.execute(statement.to_s)
  11.     end
  12. end

Incidentally, this architecture makes ActiveRecord::Base cleaner, since a lot of code has been moved to separate classes.

But where is the patch?

The patch isn't finished yet. At the moment, only the MySQL adapter supports prepared statements. In ActiveRecord::Base, only the create() method uses StatementBuilder - everything else still builds SQL statements with sanitize_sql(). It passes all MySQL unit tests though.

I'm posting this story on my blog so that the Ruby on Rails core team can review my work, and see whether they accept this architecture. I've submitted an email to the rubyonrails-core mailing list.

I'm also working on prepared statements support in the PostgreSQL adapter, but unfortunately this adapter has many bugs, so it fails the unit tests even without my changes. There is a patch which fixes a lot of bugs in this adapter, and it's two months old, but it hasn't been applied or even reviewed yet. I can't finish PostgreSQL support until this patch has been accepted.

FAQ

"Is there any reason why we *need* prepared statements?"

There is at least one such reason: handling binary data in select queries in PostgreSQL. In SELECT queries in ActiveRecord, binary strings cannot ("cannot", not "are not") be escaped properly, and the whole query fails. I was bitten by this 2 weeks ago, and it forced me to use non-binary columns.

"But am I to understand StatementBuilder code is going to
replace (cleanup) all the string hacks currently involved in building SQL
queries? Will the usage of prepared statements in MySQL/PostgreSQL become
implicit with this?"

The use of prepared statements in MySQL/PostgreSQL will
completely replace the argument escaping stuff. Database adapters that
don't support prepared statements (or haven't implemented support yet)
will silently fallback to argument escaping, as is done now.

"Will compiled prepared statements be reused?"

Yes, at least in the MySQL adapter. I wrote a cache class so that other adapters can easily implement caching of prepared statements. The cache has a maximum size and will remove items based on least-recently-used, similar to how some CPU caches work.
However, in MySQL, caching prepared statements doesn't give you much of a performance boost. When running the ActiveRecord unit tests, the cache hit rate is 65%, but resulted in no noticeable performance boost (or loss).

"Will querying with prepared statements be slower?"

On Oracle, no, or so I have heard. Even if one doesn't cache prepared statements, Oracle will be much faster.
On MySQL, yes. Running the ActiveRecord unit tests is approximately 40% slower when using prepared statements. However, I added an optimization to the MySQL adapter. It will fallback to not using prepared statements if it detects that no argument is larger than 32 KB. This way, I have entirely eliminted the performance penalty in the average case, while reducing Ruby memory usage in case one of the arguments is a large binary blob.

20 Comments »

  1. Hongli said,

    May 14, 2007 @ 7:06 pm

    Note to self: talk to ‘tpope’ on IRC, he can help me get the patch accepted.

  2. michel v said,

    May 15, 2007 @ 1:06 pm

    While we wait for the patch to hopefully land on the tree, do you think there is any way to somewhat refactor the MySQL-specific patches into a readily usable plugin?

  3. Hongli said,

    May 15, 2007 @ 5:15 pm

    It is possible, but would take some time. It changes a lot of ActiveRecord internals so the plugin will be ugly.

  4. Wes Gamble said,

    May 22, 2007 @ 6:32 am

    Looks great! I’ve been concerned about prepared statements in ActiveRecord for a long time now! I was hoping to be able to work on adding support some day, but have not had any time. Your approach looks great. What can I do to help?

    Adding prepared statement support to Rails is like a complete no-brainer. It’s hard to understand when I see that people seem to push back on it.

    Wes

  5. Hongli said,

    May 22, 2007 @ 9:30 am

    Hello Wes. Thank you for your offer to help. :) Actually there is something that you could help me with. The people on rubyonrails-core seem to be a bit sceptical about the advantages of prepared statements, so I’m looking for benchmarks which show in which situations prepared benchmarks will give a noticeable performance improvement. Everybody outside rubyonrails-core seem to say that prepared statements are Good and Good For Performance, but so far I couldn’t find any hard evidence at all.
    I wrote a little test script which inserts 300.000 rows into MySQL, and the version with prepared statements performed the same as the version without prepared statements. So in this simple case, there was no performance improvement. Someone pointed out that performance may improve if the SQL statement is complex (e.g. has many table joins).

    Could you try to find such a benchmark, or, if you can’t find one, to create one yourself? I’m mostly interested in MySQL and PostgreSQL benchmarks. I would greatly appreciated if you can do that, thanks. :)

  6. Daniel Berger said,

    May 22, 2007 @ 5:36 pm

    Here’s a great article from Oracle you’ll want to look at that discusses bind parameters and Rails:

    http://www.oracle.com/technology/pub/articles/mearelli-optimizing-oracle-rails.html

    As for MySQL, Eric Hodel tells me that the MySQL server clears the query cache as soon as you run an INSERT or DELETE, which makes it pretty useless IMHO.

  7. doug livesey said,

    May 24, 2007 @ 8:30 am

    Hi — is there any plan to support the entire suite of DBs supported by Rails?
    I was a very bad man in an earlier life, so I have to work with SQL Server, and have had to deal with all sorts of little peculiarities — particularly in migrations.
    Anyway, this looks fantastic!
    Cheers,
    Doug.

  8. Hongli said,

    May 24, 2007 @ 8:59 am

    doug livesey: I would like to, but I don’t have access to anything but MySQL and PostgreSQL (and perhaps SQLite).

  9. Charles Oliver Nutter said,

    June 1, 2007 @ 7:28 pm

    Hey were were talking on #jruby today about how to support prepared statements, and wham, someone pointed us to this blog post. This is exactly what we had in mind–feeding fragments to a statement builder, composing the actual statements later, possibly using a prepared statement (JIT style I’d guess, since we don’t want to compile every random query that comes through). This is great work…and there’s a new wrinkle I’d like to toss in.

    It would be trivial for us to use prepared statements in the backend code of the ActiveRecord JDBC adapter, since the API for such is exactly the same all the time…and almost identical to the API for normal statements. If you’ve got something working for MySQL, it may be possible to get it running for all databases in AR-JDBC in a very short time. I think that would be an excellent way to try this out across the board and see how well it works.

    If you get a chance, please contact me or join the JRuby or JRuby-extras (rubyforge) mailing lists…we could probably come up with something really cool.

  10. Matadon said,

    July 23, 2007 @ 10:51 pm

    This is exactly the sort of thing I had in mind, although more with bound variables than prepared statements. I use PostgreSQL as a backend, and Rails’ performance, especially with large JOINs and big chunks of data, leaves quite a bit to be desired. If you want a hand in testing and/or coding for Postgres, let me know!

  11. Ned Wolpert said,

    December 8, 2007 @ 7:26 pm

    What is the latest status of prepared statements and ActiveRecord? Is it still in patch form? Could the patch be turned into a plugin?

    Need any help?

  12. Hongli said,

    December 8, 2007 @ 8:02 pm

    I haven’t worked on this for a while now. But the source code is available on the following SVN repostory: http://public.railsplugins.net/repos/prepared-statements/trunk/
    This code is based on Rails revision 6737 (about 6 months ago).

  13. Ned Wolpert said,

    December 9, 2007 @ 12:29 am

    Do you think its possible to transform the patch into a plugin? The repo has tons of stuff in it… I assume only part of the activerecord side needs to occur. I’m willing to try and create a plugin against Rails 2.0 codebase… though I could see keeping it up to date with activerecord changes may be… interesting.

  14. Hongli said,

    December 9, 2007 @ 1:03 am

    Yes it’s possible. So far I’ve resisted the idea though: my patch changes a lot of ActiveRecord internals. Plugin-izing all this will mean overwriting a lot of private methods. If the Rails team changes even one line of code to ActiveRecord then it’s likely that the plugin will break.

    You can see exactly what has changed by comparing http://public.railsplugins.net/repos/prepared-statements/trunk/ with http://public.railsplugins.net/repos/prepared-statements/vendor/rails-r6737/
    The code will probably have to be updated in order to be Rails 2.0 compatible.

  15. Mike M said,

    June 25, 2008 @ 6:51 pm

    Hi,

    What is the status of the patch?

    Mike

  16. Wes Gamble said,

    December 1, 2008 @ 9:06 pm

    Any more news on this patch? Still very interested…

  17. Hongli said,

    December 1, 2008 @ 9:23 pm

    I’ve given up on this patch. There are not enough advantages to make it worthwhile.

  18. Barath said,

    April 29, 2009 @ 1:52 pm

    Hi,

    I am curious about your final comment. Could you make clear the reason why you think there is not enough performance advantages. Did you try to ‘benchmark’ your changes with a database/driver that caches prepared statements.

    .

    Barath

  19. Sam Homer said,

    March 7, 2010 @ 3:01 pm

    The DB2 database adapter for Rails 3 supports prepared statements:
    http://www.ibm.com/developerworks/data/library/techarticle/dm-1001db2rubyonrailsapps/

    Now if only somebody could add this feature to the PostgreSQL adapter as well. To me the main reason to use prepared statements is not performance but security. In addition, the very nature of a rails app makes it a prime candidate for prepared statements: Rails basically fills templates with live data, and every template invocation calls the same queries, only with different parameters.

  20. Joe Van Dyk said,

    February 4, 2011 @ 7:18 pm

    BTW, Rails 3.1 supports prepared statements for find_one(id) on postgresql! I would love to see more support for it.

RSS feed for comments on this post · TrackBack URI

Leave a Comment