default_value_for Rails plugin: declaratively define default values for ActiveRecord models

We’ve just released default_value_for, a plugin for declaratively defining default values for ActiveRecord models.

Comments

Who has experience with Qt 4 on OS X?

I’ve had pretty bad experiences with wxWidgets. Not that the toolkit itself is bad, but it’s lacking functionality and polish in various unexpected and awkward ways. The biggest turn-off for me is that it doesn’t support buttons with icons. Not only do I want my GUI apps to work well, I also want them to look nice and to integrate well into the environment. On Linux/GNOME, having buttons with icons is almost essential - not having icons just makes the GUI look plain and ugly. On Windows it can make a big difference as well when it comes to UI aesthetics. The wxWidgets developers commented that they won’t implement this because not all platforms (e.g. Windows) support it. I personally think this is nonsense - Delphi has supported buttons with icons since version 1.0 (for Windows 3.1). Besides, why not just implement it on platforms that do support it, and document it as such? This is already the case for things such as the flat button style.

Another thing I don’t like about wxWidgets is how it forces one to build the GUI top-down. One must first construct a parent container before one can construct child widgets. It’s not possible to construct an invisible child widget and then later on attach that onto a parent. This seems to be a design decision influenced by limitations in Windows.
wxWidgets also has the tendency to layout the GUI differently on different platforms. I usually develop wxWidgets applications on Linux, and port them to Windows later on. What usually happens is that the GUI looks fine on Linux, but totally breaks on Windows - buttons being laid out differently, controls that have the wrong size, etc. I usually end up having to fix the GUI code for Windows. Apparently wxWidgets has different layout implementations for different platforms, and they behave subtly different ways.

The list can go on and on. But generally, wxWidgets feels clunky and awkward except for simple and standard user interfaces without a lot of dynamics. The differences in layout and resize behavior on different platforms seem to be bigger than the differences in CSS implementations in different browsers (with the exception of IE of course).

Qt 4 seems to be a good cross-platform GUI toolkit and doesn’t suffer from these issues. It looks very nice on Linux. However, I’ve seen Mac people flaming Qt for looking “totally miserable” on OS X. I couldn’t find any screenshots of Qt 4 on OS X so I can’t confirm whether that’s true. Does anybody have experience with Qt on OS X, and can show me some screenshots?

Comments (6)

Who’s running Phusion Passenger in production?

An interesting thread appeared on the Phusion Passenger mailing list, in which user asked who’s running Phusion Passenger in production. We’re actually very interested as well, seeing as we’re currently building a new website for Phusion. Please drop a note at the mailing list (or here, though the mailing list is preferred) if you’re running it in production as well.

Comments

validates_uniqueness_of does not guarantee uniqueness

Using validates_uniqueness_of in conjunction with ActiveRecord::Base#save does not guarantee the absence of duplicate record insertions, because uniqueness checks on the application level are inherently prone to racing conditions. For example, suppose that two users try to post a Comment at the same time, and a Comment‘s title must be unique. At the database-level, the actions performed by these users could be interleaved in the following manner:

              User 1                 |               User 2
 ------------------------------------+--------------------------------------
 # User 1 checks whether there's     |
 # already a comment with the title  |
 # 'My Post'. This is not the case.  |
 SELECT * FROM comments              |
 WHERE title = 'My Post'             |
                                     |
                                     | # User 2 does the same thing and also
                                     | # infers that his title is unique.
                                     | SELECT * FROM comments
                                     | WHERE title = 'My Post'
                                     |
 # User 1 inserts his comment.       |
 INSERT INTO comments                |
 (title, content) VALUES             |
 ('My Post', 'hi!')                  |
                                     |
                                     | # User 2 does the same thing.
                                     | INSERT INTO comments
                                     | (title, content) VALUES
                                     | ('My Post', 'hello!')
                                     |
                                     | # ^^^^^^
                                     | # Boom! We now have a duplicate
                                     | # title!

This could even happen if you use transactions with the ‘serializable’ isolation level. There are several ways to get around this problem:

  • By locking the database table before validating, and unlocking it after saving. However, table locking is very expensive, and thus not recommended.
  • By locking a lock file before validating, and unlocking it after saving. This does not work if you‘ve scaled your Rails application across multiple web servers (because they cannot share lock files, or cannot do that efficiently), and thus not recommended.
  • Creating a unique index on the field, by using ActiveRecord::ConnectionAdapters::SchemaStatements#add_index. In the rare case that a racing condition occurs, the database will guarantee the field’s uniqueness.

    When the database catches such a duplicate insertion, ActiveRecord::Base#save will raise an ActiveRecord::StatementInvalid exception. You can either choose to let this error propagate (which will result in the default Rails exception page being shown), or you can catch it and restart the transaction (e.g. by telling the user that the title already exists, and asking him to re-enter the title). This technique is also known as optimistic concurrency control.

    Active Record currently provides no way to distinguish unique index constraint errors from other types of database errors, so you will have to parse the (database-specific) exception message to detect such a case.

I’ve just contributed this documentation to docrails, so you’ll see it in Rails 2.2’s validates_uniqueness_of API documentation.

Comments (10)

wiki.rubyonrails.org was down

Today wiki.rubyonrails.org was briefly down, for like 30 minutes or so.

I thought it might be a problem in Phusion Passenger, seeing that the wiki is running on it. I wanted to the restart Apache, but I decided to look in the wiki log files before doing that.

It turned out the server ran out of disk space. I truncated the Apache log files, which were consuming 40 GB or so. After that, everything went back to normal.

Comments (2)

daemon_controller: a library for robust daemon management

Phusion has recently released a library for robust daemon management. Check it out. Description and tutorials are available on that page.

Comments (1)

Re: Strange HTTP header?

Yesterday I challenged people to look at sandbox.phusion.nl’s HTTP headers and check whether they notice anything weird. The HTTP response header of the front page is:

HTTP/1.1 200 OK
Server: nginx/0.6.32

X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.1.0
X-Runtime: 0.00173

Wow, I got a lot more responses than I expected.

Sorry guys, there’s a reason why I didn’t post this on the Phusion blog, but on my personal blog instead. :) Chu Yeow said:

Wow Passenger on Nginx (I think that’s it - doubt you’d run Nginx on top of Apache+Passenger ;)).

Well actually… we are running Nginx on top of Apache+Passenger. :)

The first reaction of many people is probably “WTF, are you out of your mind? Why would you do such a thing?” Let me explain a little bit about our server.

Initial motivation: security

This server is shared by many users, including a few which we don’t fully trust. It not only runs Rails applications but also a bunch of PHP applications, and in the not too distant past some mod_perl applications. In the usual Apache setup, all those PHP/mod_perl applications will run under the same user and have the same rights. This means that there is no security between different people’s web applications: Jane’s PHP script can read Joe’s forum database password file. Not so nice.

server_setup1.jpg

Now, how do we solve this? These days, server virtualization is the latest hype: just give Joe and Joe different virtual machines! But virtualization wastes a lot of memory. Joe and Jane’s websites are really low-traffic compared to mine. The server “only” has 1 GB of RAM, and allocating a fixed amount of RAM (which must be at least 128 MB for a more or less usable server OS) is really wasteful.

Our solution was simple. Each user got his own Apache installation and runs all his web applications under his own user account. Users cannot read from and write to other users’ home folders. Each of these backend Apache installations are firewalled, and a frontend web server proxies requests to these backend Apache installations.

server_setup2.jpg

But the setup is of course not limited to one-Apache-per-real-user. blog.phusion.nl is running on Wordpress, which doesn’t exactly have a good security track record. My personal Wordpress installation had been hacked once: apparently some spam bot changed the file upload folder to /tmp and put a .exe in there. It also disabled Akismet. I wouldn’t be surprised if someone one day finds a remote shell code execution vulnerability. One really wouldn’t want to run Wordpress with the same rights as all the other web applications. So we gave Wordpress its own user account and Apache installation. Wordpress is now completely sandboxed and cannot do any harm to the other websites.

Efficiency

Indeed, what about efficiency? We’ve been using this setup for almost 2 years now, and it’s actually running quite well. Not too long ago, this server hosted a website which got about 30 000 unique visitors per day (about 120 000 requests per day on this server; we load balanced that website over multiple web servers) and it was able to handle the load with ease. We noticed no delay in response times compared to when the website was running on the frontend web server directly. That said, we did go through several stages of optimization:

  1. A long long time ago, the frontend web server was Apache 1.3, which proxies requests via mod_accel. mod_accel is like mod_proxy, but you can specify a list of URI extensions that it won’t proxy. For example, you can tell mod_accel only to proxy requests that don’t end with .css, .jpg, .png, etc.
  2. Unfortunately Apache 1.3 was ancient and not well-supported, so we switched to Apache 2 with mod_proxy instead. mod_proxy provides no way to skip proxying certain URIs, so we had to live with this. Performance was acceptable, though the backend web servers are being hit harder than before because static asset requests are now also being proxied.
  3. Apache 2 proved to be too memory- and CPU-hungry for a reverse proxy, so we switched the frontend web server to Lighttpd instead. This reduced our CPU- and memory usage dramatically. We configured Lighttpd to serve static assets directly, so that the backend web servers are only there to serve PHP.
  4. Unfortunately Lighttpd leaks memory: after a few days, memory usage would jump to 200 MB. From time to time it will also “go out of control” and consume 100% CPU, although it’s still serving requests just fine. 2 days ago I finally got tired of that, and replaced Lighttpd with Nginx.

Finally, we used Apache with the worker MPM and Phusion Passenger development version (from the git repository) for hosting our Rails applications. The worker MPM, which uses a combination of threads and processes, is a lot more memory efficient than the default prefork MPM, which only use processes. This is our Apache worker MPM setup:

StartServers             1
ThreadsPerChild         10
MaxClients              10
MinSpareThreads          1
MaxSpareThreads          1
MaxRequestsPerChild  50000
ThreadStackSize     500000

This tells Apache to use only one process. That process is multi-threaded and will have 10 threads for serving requests. Furthermore, each thread will have a stack size of 500 KB. The default system stack size is usually something along the lines of 8 MB, so setting such a small stack size reduces Apache’s VM size a lot. 500 KB has proven to be sufficient for Apache.

Now, let’s compare the memory usage between Nginx and our Apache installation:

USER   PID   %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
root   11700  0.0  0.2  3452  2012  ??  Is    6:23AM   0:00.00 nginx: master process /usr/local/sbin/nginx
www    11701  0.0  0.3  3452  2880  ??  S     6:23AM   2:28.91 nginx: worker process (nginx)
www    11702  0.0  0.3  3452  2880  ??  S     6:23AM   2:47.10 nginx: worker process (nginx)
app    82548  0.0  0.3  7656  3572  ??  Ss   Tue03PM   0:05.79 /home/app/apache/bin/httpd -k start
app    89467  0.0  0.4 10144  4632  ??  I     5:11AM   0:02.45 /home/app/apache/bin/httpd -k start

The server’s running on FreeBSD, not Linux, so we can’t measure memory usage excluding any copy-on-write savings (i.e. the private dirty RSS). But let’s compare the total Resident Set Sizes (RSS):

  • Nginx: 7772 KB (7.6 MB)
  • Apache: 8204 KB (8.0 MB)

Not a big difference.

Apache’s slowness and Nginx’s performance, both overrated?

People commented:

Pretty fast runtime?

X-Runtime: 0.00171

and

Damn!! It is impossible!
0.00173 per request?!

So the Rails application is running in Apache and behind an Nginx reverse proxy, and it’s still fast.

FastCGI

Lighttpd and Nginx both support PHP via FastCGI, so why didn’t we use that instead? The answer is ease of use. Setting up a PHP-FastCGI process pool for every user is quite a hassle. Plus, the user might be running CGI or mod_perl applications as well. Giving each user his own Apache installation is by far the easiest way. Apache also supports .htaccess, which Lighttpd and Nginx don’t support. Wordpress’s URI rewriting feature writes mod_rewrite rules to .htaccess. Configuring the same rules in Lighttpd was a total pain, and I wouldn’t want to do that again.

Conclusion

I believe that all the fuss about web server performance is usually overrated. As we can see, Apache can be memory-efficient. Running Rails applications on Phusion Passenger behind an Nginx reverse proxy is viable. You just need to know how to tweak and mix-and-match the two.

What we’re doing is not very unlike proxying to a Mongrel cluster from Nginx. Instead of proxying to a Mongrel cluster, we proxy to Apache. This still makes Rails deployment a lot easier because Phusion Passenger will take care of managing the Rails processes for me. The only redundant thing that I have to do now is having to setup 2 virtual host definitions: one in Apache and one in Nginx.

Morale of the story: it’s all HTTP, you can proxy everything in any way you want. Some people on the Phusion Passenger mailing list asked how to horizontally scale Phusion Passenger. The answer is: the same way you’re used to when you were using Mongrel clusters.

This also shows that it is possible to run multiple Apache installations on the same server. It’s only a matter of specifying different configuration files for each installation. It seems that a lot of people aren’t aware of that. In a recent Google talk about Rails scalability, a speaker claimed that there is a limit to the amount of hardware resources that Apache can utilize. He said that if you have 16 cores and 20 GB of RAM, one Apache instance cannot utilize all those resources, and that in order to make full use of your hardware, one must virtualize. But why? It’s easier and more efficient to run multiple Apache instances on the same machine.

By the way, we use the following Nginx config snippet for Phusion Passenger-powered hosts:

proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_redirect http://localhost/ $scheme://$http_host/;

server {
    listen 80;
    server_name sandbox.phusion.nl;
    root /u/apps/sandbox/current/public;
    location / {
        proxy_redirect http://localhost:1234/ $scheme://$http_host/;
        if (!-f $request_filename) {
            proxy_pass http://localhost:1234;
            break;
        }
        if ($request_method != GET) {
            proxy_pass http://localhost:1234;
            break;
        }
    }
}

This forwards all non-static-asset requests to Apache. Static assets are served directly by Nginx.

Comments (17)

Strange HTTP header?

I challenge you to type the following command:

curl -i http://sandbox.phusion.nl/ | head

This shows the HTTP output of http://sandbox.phusion.nl/ (which is, unsurprisingly, a Rails app), including HTTP headers.

Do you notice anything strange about this HTTP header? ;) (I’m not going to comment for a few days. I’ll let you guys speculate.)

Comments (11)

Making Ruby’s garbage collector copy-on-write friendly, part 8

Hi folks, it has been a while since the last “Making Ruby’s garbage collector copy-on-write friendly” post. Many things have happened in the mean time, and my copy-on-write work is now usable (and used) in production environments, but it seems that there is still confusion. So I’ve decided to write a new post which explains the situation.

Copy-on-write updates

In March I submitted my work to the Ruby core mailing list. There has been some discussion. As a result, various people, including myself, have made a number of improvements.

The improvements are as follows:

  • The copy-on-write friendly garbage collector is now a few % faster thanks to various micro-optimizations.
  • The mark table implementation is now pluggable.

    On Windows, a copy-on-write friendly garbage collector is totally useless because fork() is not supported on Windows. Furthermore, not all Ruby applications call fork(). So I’ve made two mark table implementations: one based on the old one (which marks objects directly by setting a flag on the object) and a copy-on-write friendly one. It is now possible to change the mark table implementation during runtime by calling GC.copy_on_write_friendly = (boolean value).

    This has huge performance implications. The copy-on-write friendly mark table makes the garbage collector about 0%-20% slower, depending on the application and the workload. However, the non-copy-on-write friendly mark table is enabled by default, so by default there is only a 1% performance penalty. This performance penalty comes from the fact that marking an object now requires a function call which sets the mark flag, instead of setting the mark flag directly. But I think 1% is acceptable.

  • Various little bugs in the debugging code have been fixed.
  • Me and Ninh are working on a scientific paper regarding the copy-on-write work.

Unfortunately the discussion stranded. Matz had some concerns about performance, which is why I made the mark table implementation pluggable. I will re-submit the patch for further evaluation when the time is right.

Ruby Enterprise Edition

Many of you have probably heard of Ruby Enterprise Edition. There has been, and still is, a lot of fuss about the name. But that’s intentional and is all part of the plan — if people make a fuss about the name then it means we’re not in the Zone of Mediocrity. :)

What is Ruby Enterprise Edition? People thought it’s a closed source product, but in fact the website’s front page and download page has the following huge sticker:

(We actually added this sticker after we’ve seen that people think it’s going to be closed source.)

In one sentence:
Ruby Enterprise Edition is an easy to install Ruby interpreter that includes, among other things, my copy-on-write work.

Facts and myths:

  • It’s open source, not closed source. It’s freely available to all.
  • It’s not an entirely new Ruby implementation. It’s based on the official Ruby interpreter (MRI), version 1.8.6-p286. This means that all your existing Ruby applications are compatible with Ruby Enterprise Edition.
  • It does not only include my copy-on-write work. There’s more. Read on.
  • It is not a hostile fork, but a friendly one. The work included in Ruby Enterprise Edition is meant to be merged back to upstream at some point in the future.
  • The copy-on-write work has been submitted to the Ruby core team in the past.
  • Phusion Passenger is very well-integrated with Ruby Enterprise Edition. If you use Phusion Passenger in combination with Ruby Enterprise Edition, then your Rails applications will transparently use 33% less memory and will be faster, as if it’s magic. You don’t need to do anything special, it just works.

    The only condition is that you must not be using conservative spawning in your application. But if you don’t know what conservative spawning is then you’re not using it, and you’ll have nothing to worry about.

Why was Ruby Enterprise Edition made?

Consider the following facts:

  • My copy-on-write work can potentially save a lot of memory in Rails applications.
  • The patch has been submitted to upstream, but hasn’t been accepted yet.
  • There is a demand for lower memory usage in Rails applications, right now, not X months/years in the future.

Given the circumstances, and to satisfy the demands (including that of ourselves), we have decided that it would be best to maintain our own Ruby fork which includes these patches.

You might be wondering: Why not just release the patch? Why create a fork?

The answer is user friendliness. Telling people to download Ruby’s source code and apply a patch is not user friendly. In fact, to many people, it’s downright scary. Imagine that you want a transparent and easy way to make your Rails applications “magically” use 33% less memory. Which of the following instructions would you prefer?

Use Phusion Passenger to deploy your application. Then download the Ruby interpreter source code from www.ruby-lang.org. Download it and extract the tarball. Then, download this patch and apply it with this and that command. Then, run ‘./configure –prefix=/somewhere’. Make sure that /somewhere is not /usr in order to prevent overwriting your old Ruby installation, you don’t want that to happen. Then type ‘make’, and then ’sudo make install’. Then download RubyGems, extract it, and type ’sudo /somewhere/bin/ruby setup.rb’ in the RubyGems source folder. Then type ‘/somewhere/bin/gem install rails’ to install Ruby on Rails and whatever other gems you might need.

or:

Use Phusion Passenger to deploy your application. Then download Ruby Enterprise Edition. Run the installer and follow the instructions. Done.

The first one contains a lot of caveats. Many many things can go wrong. Many many people aren’t experienced in installing Ruby from source. It’s just easier if there’s a vendor that takes care of everything for you. And we are that vendor.

We want Phusion Passenger and everything surrounding it to have a “just works” experience.

So if it’s not just the copy-on-write work, then what else does Ruby Enterprise Edition include?

  • This one is huge: by using Google’s tcmalloc, an alternative memory allocator, Ruby becomes 20% faster even with the copy-on-write friendly garbage collector! Furthermore, tcmalloc seems to be more copy-on-write friendly than ptmalloc2, Linux’s default memory allocator, so by using tcmalloc we can save even more memory!
    We discovered this shortly after submitting the patch to the Ruby core mailing list. So Ruby Enterprise Edition also includes tcmalloc.
  • Ruby Enterprise Edition includes an easy-to-use installer which takes care of installing tcmalloc, Ruby, RubyGems and important/useful gems for you. It also teaches you how to tell Phusion Passenger to use Ruby Enterprise Edition instead of normal Ruby.
  • In the future we might include more patches that might be useful in production environments.

Who’s already using Ruby Enterprise Edition?

I’m not sure because we’ve never asked our users. But the Ruby on Rails Wiki is running on it, and it has been great. I’ve been monitoring the Wiki for a while now, and ever since we’ve switched it to Phusion Passenger + Ruby Enterprise Edition, it has been rock-solid (before, it used to crash often). We also observed a great reduction in memory usage.

Michael Koziarski, a Rails core developer, runs Phusion Passenger with Ruby Enterprise Edition on his blog. He said that he downgraded his server because Phusion Passenger + Ruby Enterprise Edition saved him so much memory.

Final words

I hope this post has shed some light on matters. I’m just a little surprised that there’s all this confusion going on because all of this is also documented on the Ruby Enterprise Edition website’s FAQ. eustaquiorangel.com recently interviewed me and asked similar questions. You should check it out.

I’m also a little surprised that people seem to be reluctant about installing Ruby Enterprise Edition. If I have the choice between two products A and B, and B is the same as A but is much more efficient and is easy to install, then I’d choose B.
It is that people are suspicious about our claims? We’ve published a performance and memory usage comparison. Anybody can read this comparison, perform it himself, and check whether our claims are true. Everything we claim is verifiable so I don’t understand what there is to be suspicious about.

Please feel free to post your thoughts on this, I’d really like to hear what people have to say.

Comments (9)

CERN Rap

Comments (1)

« Previous entries