Archive for May, 2007

RailsPlugins.net launched

I found it cumbersome to search the net for Ruby on Rails plugins that I need, every time I start a new Rails project. Today I launched a new website:


www.railsplugins.net

This is an index of Ruby on Rails plugins. There is nothing on it yet, so if you have a plugin, please submit it. :) This website was written in less than 24 hours in Ruby on Rails.

As with every index site, you can rate each plugin and post comments. This website also has a unique feature: favorites. Each plugin that you rate is automatically added to your favorites list (under the assumption that one only rates plugins he/she uses). The plugins that you have given the highest ratings will show up on top of your favorites list. So if there are plugins that you use over and over again, give them a high rating and you can easily find them by going to railsplugins.net.

Comments (4)

(Un)Support for prepared statements in Ruby on Rails

One of the many security problems generally found in web applications is the so-called SQL injection bug. If one isn’t careful, then an attacker can make the web application execute arbitrary SQL statement. There are two ways to prevent SQL injection bugs:

  1. Escape input before putting it into the final SQL statement. Ruby on Rails will automatically do this for you in most cases.
  2. Use prepared statements. In this scheme, one sends a statement containing placeholders to the DBMS. Then one binds arguments to the placeholders. Finally, one executes the prepared statements (with the bound arguments). The second and third phase can be repeated indefinitely, thus reusing the same prepared statement. Here is an example in pseudo code:
    Ruby
    1. # The ‘?’ is a placeholder
    2. statement = db.prepare("SELECT * FROM foo WHERE bread = ? AND cake = ?")
    3.  
    4. # Binds the first ‘?’ to "foo", and the second ‘?’ to "no’ SQL injection possible", then
    5. # executes the statement. This has the same effect as running:
    6. #
    7. #    SELECT * FROM foo WHERE brad = ‘foo’ AND cake = ‘no’ SQL injection possible’
    8. #
    9. # (Note that the string escape character is database-specific, this is only an example.)
    10. statement.execute("foo", "no’ SQL injection possible")
    11.  
    12. # We reuse the same prepared statement and execute it multiple times, with
    13. # different parameters.
    14. statement.execute("hello", "world")
    15. statement.execute("apple", "pear")
    16. statement.execute("a", "b")

    Using prepared statements is generally preferred. It looks better and one cannot forget to escape an argument. Many DBMSes will also compile and optimize prepared statements internally, so if you want to execute many similar statements, then using prepared statements will give you a performance boost. Databases like Oracle are heavily dependent on prepared statements, and will perform badly if you don’t use them.

Unforunately Ruby on Rails’s does not support prepared statements at all. People have blogged about this in the past. A quote from Greg Luck:

We have two production applications running on Ruby. And how is it. Well, despite being perhaps no more than 5% of the functionality of our applications, Ruby on Rails is the number one consumer of Oracle CPU and logical gets. Why? Rails does not support prepared statements, so Oracle has to reparse every time.

EDIT: someone told me that this has been fixed in Rails 1.2, by using the “cursor=similar” option or something.

This is a shame. Because it doesn’t only cause problems for the database, but also for Rails itself. If an argument is large (for example, a large binary blob), then Rails will use a lot of memory trying to escape the string. Some people have been bitten by this before. Using prepared statements will solve a lot of problems, so I began working on this.

How does ActiveRecord work right now?

Here is a diagram of ActiveRecord’s current architecture:
ActiveRecord’s current architecture
(SVG version)

A Rails application only interfaces with ActiveRecord::Base (and other public ActiveRecord classes). Internally, ActiveRecord::Base aggregates an AbstractAdapter object (in the connection field), which abstracts away database-specific code. An AbstractAdapter is an abstract class. For example, PostgreSQLAdapter is one of its child classes, and implements PostgreSQL support. Each AbstractAdapter, in turn, aggregates an ‘raw connection’ object, which is a Ruby class that directly interacts with the database server.
An AbstractAdapter implements multiple interfaces, such as DatabaseStatements (for executing database statements) and Quoting (for escaping arguments for use in an SQL statement). Each AbstractAdapter child class has a different implementation for these interfaces.

ActiveRecord::Base methods, such as find(), work by constructing an SQL query, which they then execute. For example, the find() method can be described in pseudo code, as follows:

Ruby
  1. def find(options)
  2.     sql = ""
  3.     # Appends "SELECT #{options[:select]} FROM #{table_name}" to _sql_
  4.     create_select(sql, options)
  5.     # Appends "WHERE a = ‘b’ AND c = ‘d’" to _sql_
  6.     create_condition(sql, options)
  7.     …
  8.     # Appends "GROUP BY #{options[:group]}" to _sql_
  9.     create_group_by(sql, options)
  10.  
  11.     # Now we have a full SQL statement.
  12.     return @connection.select_all(sql)
  13. end

If we look at create_condition(), we see something like this:

Ruby
  1. def create_condition(sql, options)
  2.     sql < < "WHERE #{sanitize_sql(options[:conditions])}"
  3. end

The sanitize_sql function is an important function ActiveRecord’s SQL statement building process. Its job is to bind parameters into a part of an SQL statement (and escaping strings where necessary). For example, all of these:

Ruby
  1. sanitize_sql(["a = ? AND b = ?", "hello", "world"])
  2. sanitize_sql(["a = %s AND b = %s", "hello", "world"])
  3. sanitize_sql({:a => "hello", b => "world"})

return “a = ‘hello’ AND b = ‘world’”.

Adding support for prepared statements

Most databases support prepared statements, but I have heard that not all do. Furthermore, some Ruby database modules have no support for prepared statements! Prior to version 0.7, the Ruby-PostgreSQL module doesn’t support prepared statements at all. So my design will have to take this into account.

The new architecture will look as follows:
ActiveRecord with prepared statements
I have introduced 3 new classes: StatementBuilder (an abstract base class), StandardStatementBuilder and PreparedStatementBuilder. StatementBuilder allows one to build an SQL statement incrementally, possibly with arguments. It is supposed to replace ActiveRecord::Base’s sanitize_sql() function.
StandardStatementBuilder is almost exactly the same as sanitize_sql, but in a different API. One can retrieve the final SQL statement (with bounded arguments) by calling statement().
PreparedStatementBuilder is similar, but saves the SQL statement (with placeholders) and the arguments in different fields. Separating the SQL statement and its arguments is required for database adapters that support prepared statements.

AbstractAdapter has gained a new method, new_statement(). This is a factory method which creates a new StatementBuilder. Depending on the specific AbstractAdapter child class (which may or may not support prepared statements), this method may create a StandardStatementBuilder, PreparedStatementBuilder, or some other StatementBuilder class. The default implementation of this method creates a StandardStatementBuilder, but MySQLAdapter will create a PreparedStatementBuilder if the underlying Ruby Mysql module supports prepared statements.

ActiveRecord::Base uses AbstractAdapter.new_connection() to create a StatementBuilder. It then uses that StatementBuilder object to build a complete SQL statement. Finally, it passes the StatementBuilder object to one of the DatabaseStatement methods for execution. Notice that the DatabaseStatement methods now accept both a StatementBuilder object and a String object. The latter is for backwards-compatibility.

The DatabaseStatement methods are implemented by the (concrete) AbstractAdapter child classes, so it goes without saying that the code for using StatementBuilder will go there.

Here’s an example in pseudo code which demonstrates StatementBuilder’s usage in ActiveRecord::Base:

Ruby
  1. def find(options)
  2.     statement = connection.new_statement(self.class.table_name)
  3.     create_select(statement, options)
  4.     create_condition(statement, options)
  5.     …
  6.     return @connection.select_all(statement)
  7. end
  8.  
  9. def create_condition(statement, options)
  10.     statement.add_fragment(options[:conditions])
  11. end

And MySQLAdapter’s execute() method (which executes an SQL statement) looks a bit like this:

Ruby
  1. def execute(statement)
  2.     if statement.is_a?(PreparedStatementBuilder)
  3.         st = @driver.prepare(statement.statement)
  4.         st.execute(*statement.arguments)
  5.         return st.result
  6.     else
  7.         # We call ‘to_s’ here because _statement_ can either be a StandardStatement
  8.         # or a String. This will ensure that we get the SQL statement string.
  9.         # StandardStatement.to_s() is an alias for statement()
  10.         return @driver.execute(statement.to_s)
  11.     end
  12. end

Incidentally, this architecture makes ActiveRecord::Base cleaner, since a lot of code has been moved to separate classes.

But where is the patch?

The patch isn’t finished yet. At the moment, only the MySQL adapter supports prepared statements. In ActiveRecord::Base, only the create() method uses StatementBuilder - everything else still builds SQL statements with sanitize_sql(). It passes all MySQL unit tests though.

I’m posting this story on my blog so that the Ruby on Rails core team can review my work, and see whether they accept this architecture. I’ve submitted an email to the rubyonrails-core mailing list.

I’m also working on prepared statements support in the PostgreSQL adapter, but unfortunately this adapter has many bugs, so it fails the unit tests even without my changes. There is a patch which fixes a lot of bugs in this adapter, and it’s two months old, but it hasn’t been applied or even reviewed yet. I can’t finish PostgreSQL support until this patch has been accepted.

FAQ

“Is there any reason why we *need* prepared statements?”

There is at least one such reason: handling binary data in select queries in PostgreSQL. In SELECT queries in ActiveRecord, binary strings cannot (”cannot”, not “are not”) be escaped properly, and the whole query fails. I was bitten by this 2 weeks ago, and it forced me to use non-binary columns.

“But am I to understand StatementBuilder code is going to
replace (cleanup) all the string hacks currently involved in building SQL
queries? Will the usage of prepared statements in MySQL/PostgreSQL become
implicit with this?”

The use of prepared statements in MySQL/PostgreSQL will
completely replace the argument escaping stuff. Database adapters that
don’t support prepared statements (or haven’t implemented support yet)
will silently fallback to argument escaping, as is done now.

“Will compiled prepared statements be reused?”

Yes, at least in the MySQL adapter. I wrote a cache class so that other adapters can easily implement caching of prepared statements. The cache has a maximum size and will remove items based on least-recently-used, similar to how some CPU caches work.
However, in MySQL, caching prepared statements doesn’t give you much of a performance boost. When running the ActiveRecord unit tests, the cache hit rate is 65%, but resulted in no noticeable performance boost (or loss).

“Will querying with prepared statements be slower?”

On Oracle, no, or so I have heard. Even if one doesn’t cache prepared statements, Oracle will be much faster.
On MySQL, yes. Running the ActiveRecord unit tests is approximately 40% slower when using prepared statements. However, I added an optimization to the MySQL adapter. It will fallback to not using prepared statements if it detects that no argument is larger than 32 KB. This way, I have entirely eliminted the performance penalty in the average case, while reducing Ruby memory usage in case one of the arguments is a large binary blob.

Comments (14)