Archive for July, 2007

Making Ruby’s garbage collector copy-on-write friendly, part 3

I wrote my own memory pool implementation, for allocating entries in the pointer set/marking table. I also modified to garbage collector’s sweep routine to only modify some flags if they need to be modified (so that that part doesn’t mark pages unnecessarily as dirty). The garbage collector will also reset the marking table after a sweep – that is, deallocate all memory used by the marking table. Finally, I modified mongrel_light_cluster to perform a forced garbage collection just before forking. That will prevent some things to be garbage collected after forking (which will mark pages dirty).

My work is finally showing some fruits. :) If I run the following test script in standard Ruby

Ruby
  1. GC.disable
  2. list = []
  3. 500000.times do
  4.         list < < "foo"
  5. end
  6.  
  7. pid = fork do
  8.         pid = fork do
  9.                 puts "forked"
  10.                 STDIN.readline
  11.  
  12.                 GC.enable
  13.                 if ObjectSpace.respond_to?(:garbage_collect!)
  14.                         ObjectSpace.garbage_collect!
  15.                 else
  16.                         ObjectSpace.garbage_collect
  17.                 end
  18.                 puts "garbage collected"
  19.                 STDIN.readline
  20.                 exit!
  21.         end
  22.         Process.waitpid(pid)
  23.         exit!
  24. end
  25. Process.waitpid(pid)

then the writable memory after “garbage collected” is printed, is 14.5 MB. When run in my patched Ruby, the writable memory is 320 KB! :) According to some benchmarks, performance has improved by 15% compared to the last implementation (which used the system’s malloc()/free()).

Download the Ruby 1.8.6 patch here.

Technical details

The memory pool contains multiple memory chunks. Each memory chunk has a fixed size, and cannot grow. If the memory pool notices that a memory chunk is full, it will allocate a new one. If an item is deallocation, empty memory chunks are freed. The memory pool has 4 bytes (1 pointer) overhead for every allocated memory item. Allocation time is O(1) and deallocation time is O(n), where n is the number of chunks (a chunk contains 4096 items by default). This can probably be faster if the garbage collector can tell the marking table to build an index of chunks.

For those who don’t know, there are two ways that the standard C library’s malloc() can allocate memory:

  1. Allocate in some internal memory pool. However, if you free() this memory, then that memory is not released to the operating system. This is the case if malloc() is given the command to allocate objects less than 76 bytes.
  2. Allocate with mmap(). This is the case if malloc() is given the command to allocate objects of 76 bytes or more. If you free() such memory, then that memory *is* released back to the operating system.

(I’m assuming a GNU/Linux system with glibc, on x86. I don’t know how C libraries on other operating systems or architectures behave.)

The pointer set contains many 8-byte entries, so if I use malloc() then all that memory is never released back to the operating system even if I deallocate them. By using a memory pool, which allocates big chunks of memory at once, I force malloc to use mmap(). Therefore, all memory allocated for the pointer set entries can be released to the operating system. The memory used by the bucket array is not affected by this, since the bucket array is usually quite large, so malloc() already uses mmap() to allocate memory for the bucket array.

We're (still) not done yet!

Consider this second test script:

Ruby
  1. GC.disable
  2. require 'rubygems'
  3. gem 'rails'
  4. require 'set'
  5. require 'pathname'
  6. require 'cgi'
  7. require 'initializer'
  8.  
  9. GC.enable
  10. ObjectSpace.garbage_collect
  11. GC.disable
  12. pid = fork do
  13.         pid = fork do
  14.                 puts "forked"
  15.                 STDIN.readline
  16.  
  17.                 GC.enable
  18.                 if ObjectSpace.respond_to?(:garbage_collect!)
  19.                         ObjectSpace.garbage_collect!
  20.                 else
  21.                         ObjectSpace.garbage_collect
  22.                 end
  23.                 puts "garbage collected"
  24.                 STDIN.readline
  25.                 exit!
  26.         end
  27.         Process.waitpid(pid)
  28.         exit!
  29. end
  30. Process.waitpid(pid)

It loads some code, then forks and runs garbage collection. When run in standard Ruby, then the final writable memory is 4.7 MB. When run in my patched Ruby, the final writable memory is 4.1 MB. While this is an improvement, it's not much. Most of the memory occupied by the first test script is in strings. In this second test script, most of the memory is occupied by code (i.e. the AST). The garbage collector seems to write to AST nodes even when it's not necessary.

I also tested mongrel_light_cluster. After mongrel_light_cluster has forked, each child has about 15 MB of writable memory. After loading some pages, their writable memory jump to 17 MB. Compared to the 26 MB used by regular mongrel_cluster, this certainly is an improvement. But I'm still not satisfied. I'm pretty sure this can be optimized further. Somewhere, somehow, pages are being marked as dirty. Finding them (and finding a solution for them) will be hard.

Comments (9)

GoDaddy sucks

Never use GoDaddy.com to purchase SSL certificates. They’re cheap, but their service sucks.

Yesterday I purchased an SSL certificate with my GoDaddy account. A friend agreed to pay the fee for me, so we used his Paypal account, which was under a different name. I can’t find anywhere on the GoDaddy website where they mention that the Paypal account information must match the GoDaddy account information. The purchase seemed to go well, but after 2 hours GoDaddy refunded the fee and deleted my GoDaddy account without notice. I had to email them to ask what was going on. They replied that they need the Paypal account holder to fax them a government identification document, like a passport.

We refuse. They’re not the police. We’ll never buy anything from them ever again.

Comments (2)

Making Ruby’s garbage collector copy-on-write friendly, part 2

The marking table now uses a set instead of a full hash table. This set implementation is based on st.c from the Ruby source distribution. I removed two fields from the hash table entry structure, so my set structure uses 8 bytes per entry instead of 16 bytes per entries (excluding malloc overhead). If you’re really curious, I changed

C
  1. struct st_table_entry {
  2.     unsigned int hash;
  3.     st_data_t key;
  4.     st_data_t record;
  5.     st_table_entry *next;
  6. };

to

C
  1. struct _PointerSetEntry {
  2.         PointerSetElement element;
  3.         PointerSetEntry *next;
  4. };

To my surprise, this didn’t save as much memory as I expected. The memory saving was only 500 KB instead of the expected 3.8 MB. :(

I later found out that a significant portion of memory is used by the bucket array in the set’s structure. So I increased the set’s default load factor from 0.75 to 2.0. This reduced memory usage in my test program from 16.5 MB to 13.5 MB! Despite collisions, the effect on performance is minimal. Increasing the load factor further doesn’t seem to save memory.

For my next attempt, I will be using a memory pool for allocating the set entries. This should reduce memory fragmentation, as well as increasing performance because allocations and frees and be done in constant time. Hopefully it will also have less memory overhead than malloc.

Comments (1)

Making Ruby’s garbage collector copy-on-write friendly

Remember that I said that I give up? Well, I lied. ;) Something is driving me. I guess I loathe the idea of having to pay an extra $20 per month for the extra RAM to run more Rails apps.

Sleep has to be the best thing ever invented. After a night of sleep and some careful thinking, I managed to modify Ruby’s garbage collector to use a marking table instead of marking the objects directly. The patch seems to work: ‘make test’ and ‘make install’ both run fine (both run a non-trivial Ruby script). Rubygems, which is notorious for using lots of memory and CPU, runs fine as well. Ruby on Rails also seems to work fine.

The Ruby 1.8.6 patch is here.

How it works

Every Ruby object is represented by a C struct, and always begins with the following field:

unsigned long flags;

Ruby uses mark and sweep garbage collection. First it goes through all objects and marks every object which is in use. This is done by setting the FL_MARK flag on the aforementioned field. Then it goes through all objects again, and frees every object which is not marked. The problem was that this breaks copy-on-write semantics: it effectively marks nearly all pages as dirty. Using a mark table, which keeps the marking information in a separate memory region, is supposed to solve that problem, as Dough Beaver mentioned in this slide. My patch modifies the garbage collector to do just that.

We’re not done yet

Unfortunately things aren’t that simple (are they ever simple? :( ). I notice that the memory gains aren’t as good as I hoped. So I wrote the following test script:

Ruby
  1. list = []
  2. 500000.times do
  3.         list < < "foo"
  4. end
  5.  
  6. pid = fork do
  7.         puts "forked"
  8.         STDIN.readline
  9.  
  10.         ObjectSpace.garbage_collect
  11.         puts "garbage collected"
  12.         STDIN.readline
  13.         exit!
  14. end
  15. Process.waitpid(pid)

I use the GNOME System Monitor to monitor the processes' memory usage. The 'Writable memory' column is especially interesting, as that column tells me how much memory is really private to that process (instead of being shared with the parent/child process). Immediate after forking, the child process's VM size is 29.1 MB, and the writable memory is 76 KB. After running the garbage collection though, the writable memory jumps to 16.9 MB.

Something is definitely wrong here! I know that nothing will be freed - all strings that I put in the list array should never be freed as they are still reachable.

After intensive debugging, I found out that the marking table is using quite some memory. The marking table uses the hash table implementation as provided by Ruby's st.c: st_table. st_table uses malloc() to allocate entries for the hash table. I noticed that each entry uses 24 bytes (this size is including malloc overhead). I have more than 500,000 strings in the list, so the marking table should contain at least 500,000 entries after the marking phase. 500,000 * 24 = 12,000,000. Uh oh, the marking table uses 12 MB of memory during the marking phase. :(

So the main issue now is to find a way to reduce the marking table's memory usage. I could port Google's sparse_hash to C. sparse_hash is a bit slower than a regular hash table, but has only 2 bits of overhead per entry. This would theoretically reduce memory usage from 12 MB to (32 * 500,000 + 2 * 500,000) / 8 bytes = 2 MB.

Does anyone have any better ideas?

Comments (7)

Saving memory in Ruby on Rails with fork() – failed

In the past few days I’ve been working on mongrel_light_cluster, an extension for mongrel_cluster which automatically uses copy-on-write semantics to save memory in Ruby on Rails applications. The initial measurements were exciting – one can reduce memory usage by as much as 5 times when running 10 Mongrel instances!

Unfortunately I was hit by a wall. Ruby garbage collector marks all memory pages as dirty, causing all memory in forked processes to be copied, thereby destroying copy-on-write semantics. I was unable to reproduce this problem before, so I kept working on mongrel_light_cluster, but now I see how serious the problem really is.

So I give up. The project failed. This cannot be fixed outside the Ruby interpreter. Someone must fix the Ruby garbage collector to use a mark table instead of marking inside the object’s own memory. Dough Beaver has published a slide about this, but I’m not sure whether anyone’s actually working on fixing the garbage collector. Until the garbage collector has been fixed, we’ll have to tolerate Ruby on Rails wasting memory. :(

Comments (3)

Status report

It’s been a long time since I’ve posted on this blog. Some people might be asking whether I’m still alive. You’re reading this right now, so that answers the question. :)

It’s been a while since I worked on my Ruby on Rails-related projects, saving memory by using copy-on-write and prepared statements support in ActiveRecord. They were on hold for a while since I’ve been pretty busy. Right now I’m having holidays, so I think I’ll start working on them in again in the coming few weeks.

I’ve also decided to finally release a program that I wrote, in the hope that they will be useful to someone. This is the System Daemon Manager, a tool for running any program as a daemon (background process without controlling terminal) on a Unix system. I’m using this to manage my Ruby on Rails Lighttpd processes on my server.
I’ve also written a new page for the Ruby Whirlpool library and imported it into a Subversion repository.

Comments (1)

System Daemon Manager

The System Daemon Manager (SDM) is a tool for running any program as a daemon (background process without controlling terminal) on a Unix system.

  • Each daemon is given a name by the user. SDM will watch the daemon, and will allow you to stop the daemon (or check whether the daemon is still running) from any terminal by name.
  • SDM uses lock files to determine whether a daemon is running, so this avoids problems with stale pid files. The disadvantage is that the lock files may not reside on an NFS filesystem (since NFS doesn’t support lock files).
  • Finally, SDM can be run as any user, not just root. Of course, with SDM you can only stop daemons that are started by you. SDM itself doesn’t need root privileges.

Usage

An example will say probably more than words. In the following demonstration, we start gedit as a daemon:

[bash@localhost]$ sdm start my-fancy-gedit-daemon gedit ~/cool-file.txt

The second argument is the name for this daemon, and the other arguments are the program to start. gedit will now run as a daemon, so even if we close the terminal window (thereby sending a SIGHUP to all processes in the terminal), gedit will keep running.

We can check gedit’s status at any time:

[bash@localhost]$ sdm status my-fancy-gedit-daemon
PID: 14235

If we close gedit manually, and we run the ‘status’ command again, then SDM will tell us that that daemon is not running:

[bash@localhost]$ sdm status my-fancy-gedit-daemon
Not running.

Finally, if the gedit daemon is running, then we can stop it with the following command:

[bash@localhost]$ sdm stop my-fancy-gedit-daemon

Download

You can get SDM through its Subversion repository:

svn checkout http://public.railsplugins.net/repos/sdm/trunk

Once you’ve checked out the source, you can install it with this command:

make install

SDM requires Perl 5.8 and the Time::HiRes module.

Comments (1)