validates_uniqueness_of does not guarantee uniqueness

Using validates_uniqueness_of in conjunction with ActiveRecord::Base#save does not guarantee the absence of duplicate record insertions, because uniqueness checks on the application level are inherently prone to racing conditions. For example, suppose that two users try to post a Comment at the same time, and a Comment‘s title must be unique. At the database-level, the actions performed by these users could be interleaved in the following manner:

              User 1                 |               User 2
 ------------------------------------+--------------------------------------
 # User 1 checks whether there's     |
 # already a comment with the title  |
 # 'My Post'. This is not the case.  |
 SELECT * FROM comments              |
 WHERE title = 'My Post'             |
                                     |
                                     | # User 2 does the same thing and also
                                     | # infers that his title is unique.
                                     | SELECT * FROM comments
                                     | WHERE title = 'My Post'
                                     |
 # User 1 inserts his comment.       |
 INSERT INTO comments                |
 (title, content) VALUES             |
 ('My Post', 'hi!')                  |
                                     |
                                     | # User 2 does the same thing.
                                     | INSERT INTO comments
                                     | (title, content) VALUES
                                     | ('My Post', 'hello!')
                                     |
                                     | # ^^^^^^
                                     | # Boom! We now have a duplicate
                                     | # title!

This could even happen if you use transactions with the ‘serializable’ isolation level. There are several ways to get around this problem:

  • By locking the database table before validating, and unlocking it after saving. However, table locking is very expensive, and thus not recommended.
  • By locking a lock file before validating, and unlocking it after saving. This does not work if you‘ve scaled your Rails application across multiple web servers (because they cannot share lock files, or cannot do that efficiently), and thus not recommended.
  • Creating a unique index on the field, by using ActiveRecord::ConnectionAdapters::SchemaStatements#add_index. In the rare case that a racing condition occurs, the database will guarantee the field’s uniqueness.

    When the database catches such a duplicate insertion, ActiveRecord::Base#save will raise an ActiveRecord::StatementInvalid exception. You can either choose to let this error propagate (which will result in the default Rails exception page being shown), or you can catch it and restart the transaction (e.g. by telling the user that the title already exists, and asking him to re-enter the title). This technique is also known as optimistic concurrency control.

    Active Record currently provides no way to distinguish unique index constraint errors from other types of database errors, so you will have to parse the (database-specific) exception message to detect such a case.

I’ve just contributed this documentation to docrails, so you’ll see it in Rails 2.2’s validates_uniqueness_of API documentation.

16 Comments »

  1. Nikos Dimitrakopoulos said,

    September 20, 2008 @ 2:59 pm

    Hm… Didn’t know this but isn’t this a good reason to also do some of the business logic on the lower level, aka the Database itself?

    I mean, ok you may lose some convenience like DB Implementation agnostic application (which is rarely the case based on my small experience so far), create some duplicate code (validations on AR plus constraints on the DB), etc., but on the other hand you *know* and you are *sure* that your data are safe and solid and the logic of the application is always enforced. If the DB makes a boom-boom (as an Exception raised because of a race condition that Rails couldn’t handle but the DB constraints caught) you know that something is wrong either in your code or in your libraries’ code (cause you DO watch the errors logs for exceptions, don’t you?) and also you don’t leave the app enter into an invalid state without ever knowing.

    This is just my opinion though…

  2. Eric said,

    September 20, 2008 @ 5:23 pm

    “Creating a unique index on the field, by using ActiveRecord::ConnectionAdapters::SchemaStatements#add_index. In the rare case that a racing condition occurs, the database will guarantee the field’s uniqueness.”

    In my experience, the common cause seems to be a user double-clicking a form’s submit button in rapid succession (which isn’t really that rare). In addition to this last solution you outline, I think it’s worthwhile to use JavaScript to disable these forms on submit. Thanks for highlighting this problem — it can be the source of some nasty bugs.

  3. Hongli said,

    September 20, 2008 @ 6:11 pm

    Hm… Didn’t know this but isn’t this a good reason to also do some of the business logic on the lower level, aka the Database itself?

    Depends on what you mean. I don’t consider adding a unique constraint to the database to be “business logic”, merely defining my data store. I don’t see any reason to put business logic into the database. Other than potential racing conditions in constructing unique values (which can be prevented by using correct constraints), nothing else can go wrong by putting business logic in the application layer.

  4. Nikos Dimitrakopoulos said,

    September 21, 2008 @ 7:19 pm

    I think that *up to a point* we’re saying the same thing.

    Using ActiveRecord::ConnectionAdapters::SchemaStatements#add_index with :unique => true will create a unique index (which means that we’ll have a unique constraint anyway). What *I* had in mind was just a simple UNIQUE constraint (without the index). Through a quick search in AR/Migrations’ API I noticed that there is no such option/helper (which would be really handy) so that leaves us with the option of running SQL code directly for the creation of the table (at least in Postgres you can’t add a UNIQUE constraint AFTER you have created the table and I think this is also the case with MySQL but I’m not quite sure). An example can be found here: http://www.postgresql.org/docs/8.1/static/ddl-constraints.html#AEN2016

    So, what’s the difference between UNIQUE and UNIQUE INDEX after all? In short: *almost* none – only semantics. Here are two interesting links with a little more technical details:
    http://archives.postgresql.org/pgsql-general/2007-02/msg01642.php
    http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqls.doc/sqls285.htm

    So, it comes down (at least for me) that using UNIQUE INDEX is much more practical and does the job without much complications (#add_index method, ability to add it and remove it *after* the table is created) but the UNIQUE constraint (without the INDEX) is more semantically correct and it *may* save us from unexpected behaviors (as outlined in the second link, which may not be case for other databases).

    In both cases we do rely on the database to due some sort of “validations” on top (or below if you prefer) of the application layer. This is what I meant when I was saying that it may a good thing to put some business logic in the DB, but we may understand differently what “business logic” is :)

    P.S.: Wouldn’t be a nice addition a :unique option for the columns’ definition (like :null, :default, etc)? I may open a ticket on lighthouse later…

  5. Manfred said,

    September 22, 2008 @ 8:36 am

    I think you mean “race conditions” instead of “racing conditions”. Nice find though.

  6. Fjan said,

    September 22, 2008 @ 1:25 pm

    I don’t think this is true. By default ActiveRecord wraps all validations inside a transaction for precisely this reason, so as long as you do “the validates_uniqueness_of” inside the model where it belongs there should be no problem. You can try it by adding ‘BEGIN’ and ‘COMMIT” to your example, the second user should fail.

  7. Hongli said,

    September 22, 2008 @ 1:36 pm

    Fjan, transactions doesn’t help you here, even if you use serializable. Try it out for yourself. Unless user 2’s first SELECT statement blocks until user 1’s transaction has been completed (which PostgreSQL doesn’t do), transactions won’t prevent the duplicate insertion.

  8. Typical Rubyist said,

    September 23, 2008 @ 4:40 pm

    Hahaha use JS to disable doubleclicks? Are you serious? If you rely on client, you will lose.

  9. Hongli said,

    September 23, 2008 @ 5:58 pm

    Typical Rubyist: it isn’t such a bad idea. Of course it shouldn’t be the *only* protection, but when implemented it can drastically reduce the chance of duplicate inserts.

  10. Eric said,

    September 24, 2008 @ 7:34 pm

    Typical Rubyist: If you read my comment, you would see I recommend client-side prevention _in addition to_ backend constraints. On many applications, the client-side component will prevent the vast majority of these problems, making for a much better visitor experience.

    As a general rule, it’s a good idea to re-read a comment before you flat out insult it.

  11. Henrik N said,

    October 29, 2008 @ 1:52 pm

    This is how I prevent double submits client-side: http://henrik.nyh.se/2008/07/jquery-double-submission

  12. fernando said,

    November 26, 2008 @ 12:55 am

    You didn’t discover anything at all. This race condition is known for a long time. I posted something about it 6 months ago at the following link: http://www.ruby-forum.com/topic/154078

    A good work around is to use a SELECT FOR UPDATE wrapped in a transaction, which will lock the database.

  13. Hongli said,

    November 26, 2008 @ 12:59 am

    Fernando, I never claimed I discovered anything. I merely documented it.

  14. eno said,

    March 4, 2009 @ 6:29 pm

    Hell, no, “A good work around is to use a SELECT FOR UPDATE wrapped in a transaction, which will lock the database.”, is a workaround, and a nonworking at that, because

    * it breaks any login based on transactions (unless you run a database that supports nested transactions) and
    * because it doesn’t work anyways because, erm, it just doesn’t work. Just try it manually in two MySQL consoles in parallel.

    what *does* work instead is locking the database, see (if you are interested) the related posts here: http://1rad.wordpress.com/2009/02/23/0x17-atomic-science-revisited/

  15. Paul Russell said,

    April 8, 2011 @ 6:44 am

    Can anyone explain /why/ setting the transaction isolation level to serialised doesn’t help here? I’m not really interested in a ‘try it yourself’ response, I’m interested in an explanation that doesn’t involve the database violating the rules of serialised transactions (i.e. that it MUST yield identical results to as if the transactions had been executed separately). The information in this post has propagated far and wide (including here, which is where I picked it up: http://stackoverflow.com/questions/5577562/rails-maximum-number-of-associated-records/5580367#5580367) as a result of being added to the rails docs, and I just can’t get my head around how this can be true! Help! :)

  16. Hongli said,

    April 8, 2011 @ 9:34 am

    I see it as a limitation in SQL. It has no explicit support for “upsert” commands (a name for the concept “update or insert if not exist”, which is what this use case is all about). When those two SELECTs in the above example are being executed, it is impossible to know beforehand whether they are regular SELECTs or whether they’re part of an upsert command. Databases try to optimize for the highest concurrency, therefore they treat them like regular SELECTs and don’t isolate.

    Running SELECT … FOR UPDATE on the other hand does work, and it signals the database that an upsert is imminent. However this is a non-standard MySQL extension.

RSS feed for comments on this post · TrackBack URI

Leave a Comment