Making Ruby’s garbage collector copy-on-write friendly, part 8
Hi folks, it has been a while since the last “Making Ruby’s garbage collector copy-on-write friendly” post. Many things have happened in the mean time, and my copy-on-write work is now usable (and used) in production environments, but it seems that there is still confusion. So I’ve decided to write a new post which explains the situation.
Copy-on-write updates
In March I submitted my work to the Ruby core mailing list. There has been some discussion. As a result, various people, including myself, have made a number of improvements.
The improvements are as follows:
- The copy-on-write friendly garbage collector is now a few % faster thanks to various micro-optimizations.
- The mark table implementation is now pluggable.
On Windows, a copy-on-write friendly garbage collector is totally useless because fork() is not supported on Windows. Furthermore, not all Ruby applications call fork(). So I’ve made two mark table implementations: one based on the old one (which marks objects directly by setting a flag on the object) and a copy-on-write friendly one. It is now possible to change the mark table implementation during runtime by calling GC.copy_on_write_friendly = (boolean value).
This has huge performance implications. The copy-on-write friendly mark table makes the garbage collector about 0%-20% slower, depending on the application and the workload. However, the non-copy-on-write friendly mark table is enabled by default, so by default there is only a 1% performance penalty. This performance penalty comes from the fact that marking an object now requires a function call which sets the mark flag, instead of setting the mark flag directly. But I think 1% is acceptable.
- Various little bugs in the debugging code have been fixed.
- Me and Ninh are working on a scientific paper regarding the copy-on-write work.
Unfortunately the discussion stranded. Matz had some concerns about performance, which is why I made the mark table implementation pluggable. I will re-submit the patch for further evaluation when the time is right.
Ruby Enterprise Edition
Many of you have probably heard of Ruby Enterprise Edition. There has been, and still is, a lot of fuss about the name. But that’s intentional and is all part of the plan — if people make a fuss about the name then it means we’re not in the Zone of Mediocrity.
What is Ruby Enterprise Edition? People thought it’s a closed source product, but in fact the website’s front page and download page has the following huge sticker:

(We actually added this sticker after we’ve seen that people think it’s going to be closed source.)
In one sentence:
Ruby Enterprise Edition is an easy to install Ruby interpreter that includes, among other things, my copy-on-write work.
Facts and myths:
- It’s open source, not closed source. It’s freely available to all.
- It’s not an entirely new Ruby implementation. It’s based on the official Ruby interpreter (MRI), version 1.8.6-p286. This means that all your existing Ruby applications are compatible with Ruby Enterprise Edition.
- It does not only include my copy-on-write work. There’s more. Read on.
- It is not a hostile fork, but a friendly one. The work included in Ruby Enterprise Edition is meant to be merged back to upstream at some point in the future.
- The copy-on-write work has been submitted to the Ruby core team in the past.
- Phusion Passenger is very well-integrated with Ruby Enterprise Edition. If you use Phusion Passenger in combination with Ruby Enterprise Edition, then your Rails applications will transparently use 33% less memory and will be faster, as if it’s magic. You don’t need to do anything special, it just works.
The only condition is that you must not be using conservative spawning in your application. But if you don’t know what conservative spawning is then you’re not using it, and you’ll have nothing to worry about.
Why was Ruby Enterprise Edition made?
Consider the following facts:
- My copy-on-write work can potentially save a lot of memory in Rails applications.
- The patch has been submitted to upstream, but hasn’t been accepted yet.
- There is a demand for lower memory usage in Rails applications, right now, not X months/years in the future.
Given the circumstances, and to satisfy the demands (including that of ourselves), we have decided that it would be best to maintain our own Ruby fork which includes these patches.
You might be wondering: Why not just release the patch? Why create a fork?
The answer is user friendliness. Telling people to download Ruby’s source code and apply a patch is not user friendly. In fact, to many people, it’s downright scary. Imagine that you want a transparent and easy way to make your Rails applications “magically” use 33% less memory. Which of the following instructions would you prefer?
Use Phusion Passenger to deploy your application. Then download the Ruby interpreter source code from www.ruby-lang.org. Download it and extract the tarball. Then, download this patch and apply it with this and that command. Then, run ‘./configure –prefix=/somewhere’. Make sure that /somewhere is not /usr in order to prevent overwriting your old Ruby installation, you don’t want that to happen. Then type ‘make’, and then ’sudo make install’. Then download RubyGems, extract it, and type ’sudo /somewhere/bin/ruby setup.rb’ in the RubyGems source folder. Then type ‘/somewhere/bin/gem install rails’ to install Ruby on Rails and whatever other gems you might need.
or:
Use Phusion Passenger to deploy your application. Then download Ruby Enterprise Edition. Run the installer and follow the instructions. Done.
The first one contains a lot of caveats. Many many things can go wrong. Many many people aren’t experienced in installing Ruby from source. It’s just easier if there’s a vendor that takes care of everything for you. And we are that vendor.
We want Phusion Passenger and everything surrounding it to have a “just works” experience.
So if it’s not just the copy-on-write work, then what else does Ruby Enterprise Edition include?
- This one is huge: by using Google’s tcmalloc, an alternative memory allocator, Ruby becomes 20% faster even with the copy-on-write friendly garbage collector! Furthermore, tcmalloc seems to be more copy-on-write friendly than ptmalloc2, Linux’s default memory allocator, so by using tcmalloc we can save even more memory!
We discovered this shortly after submitting the patch to the Ruby core mailing list. So Ruby Enterprise Edition also includes tcmalloc. - Ruby Enterprise Edition includes an easy-to-use installer which takes care of installing tcmalloc, Ruby, RubyGems and important/useful gems for you. It also teaches you how to tell Phusion Passenger to use Ruby Enterprise Edition instead of normal Ruby.
- In the future we might include more patches that might be useful in production environments.
Who’s already using Ruby Enterprise Edition?
I’m not sure because we’ve never asked our users. But the Ruby on Rails Wiki is running on it, and it has been great. I’ve been monitoring the Wiki for a while now, and ever since we’ve switched it to Phusion Passenger + Ruby Enterprise Edition, it has been rock-solid (before, it used to crash often). We also observed a great reduction in memory usage.
Michael Koziarski, a Rails core developer, runs Phusion Passenger with Ruby Enterprise Edition on his blog. He said that he downgraded his server because Phusion Passenger + Ruby Enterprise Edition saved him so much memory.
Final words
I hope this post has shed some light on matters. I’m just a little surprised that there’s all this confusion going on because all of this is also documented on the Ruby Enterprise Edition website’s FAQ. eustaquiorangel.com recently interviewed me and asked similar questions. You should check it out.
I’m also a little surprised that people seem to be reluctant about installing Ruby Enterprise Edition. If I have the choice between two products A and B, and B is the same as A but is much more efficient and is easy to install, then I’d choose B.
It is that people are suspicious about our claims? We’ve published a performance and memory usage comparison. Anybody can read this comparison, perform it himself, and check whether our claims are true. Everything we claim is verifiable so I don’t understand what there is to be suspicious about.
Please feel free to post your thoughts on this, I’d really like to hear what people have to say.

TaQ said,
August 17, 2008 @ 3:54 pm
Hey, great post. As I said on my article, I think the more clear the things are, less chance to allow people to think (and write) bull**** about it. Sometimes looks like we need some repetition trying to make things more clear, but it’s a little price to pay. It avoids gossips on a fine community like the Ruby one.
Cheers!
Stephen Sykes said,
August 17, 2008 @ 8:00 pm
Hi! Interesting about tcmalloc being such a benefit. But I am on x86-64 and you say in the REE FAQ that the tcmalloc does not work on 64bit platforms.
However, I notice that the current docs for tcmalloc say that tcmalloc itself works fine, it’s just some of the other parts of the google perftools that can cause problems/deadlocks. Therefore is it still the case that there is no performance benefit for 64bit?
Hongli said,
August 17, 2008 @ 9:09 pm
We’ve had no luck with tcmalloc on 64-bit platforms so far, so the installer doesn’t install tcmalloc on such platforms.
Brian Smith said,
August 18, 2008 @ 2:06 am
What are your plans regarding the Python support for Phusion Passenger? For a long time I’ve been interested in doing something very similar for Python to what you are doing in Ruby EE/Passenger. It looks like your Python support is at the “it works” stage but it doesn’t seem to be optimized like the Ruby support.
Also, how do you feel about derivative works? I have some ideas for architectural changes to Passenger to improve IPC performance that are very disruptive. I’d like to share those changes publically even if you guys don’t want to merge them.
Hongli said,
August 18, 2008 @ 6:38 am
I personally have no plans to implement Python support because I don’t write web applications in Python, but I would welcome any contributions for improving Python support.
Please feel free to share your ideas, I’d very much like to hear them. Especially if they’re architecturally disruptive — that just means that you must be up to something.
J. Ryan Sobol said,
August 18, 2008 @ 7:07 am
Thank you for your hard work! More importantly, thank you for contributing back to the community!
The transparency of Phusion *will* most certainly aid in early adoption. I only hope the branch in the trunk wraps tightly around the base.
Kent Sibilev said,
August 18, 2008 @ 8:08 pm
Thank you for your great work! I’ve switched all my production sites to Passeger/REE which resulted in a considerable performance boost.
Phil Murray said,
August 20, 2008 @ 2:06 am
Have you done any comparisons of tcmalloc with jemalloc? jemalloc is FreeBSDs new malloc implementation, that the Mozilla project have also started using for their builds of Firefox etc.
Hongli said,
August 20, 2008 @ 8:31 am
Yes I’ve tried jemalloc for a short while. I didn’t write down the results but I noticed that tcmalloc is more efficient when it comes to performance and copy-on-write friendliness.
roger said,
October 16, 2008 @ 7:28 am
Re: speed drop when copy_on_write is turned on. I was thining one way to overcome this would be [hear me out]
Require that heaps be aligned on certain boundaries, like say 2K boundaries [1]. Then for every entry within the heap that is located at exactly a multiple of 2K, have it not belong to the freelist, nor ever be used as a ruby object, but instead be a pointer back to the root of the heap. Thus locating the heap on which an object resides [to locate the mark table for that heap] is able to be looked up in constant time — something like *(object_address & (~2K))
Also doesn’t add much memory, at all, though it does add some complexity.
I could code it up if you’d like. Or does it already do this?
Thanks!
-=R
[xmalloc does this?]