How the Ruby heap is implemented

It’s been a while since I’ve worked on Ruby’s garbage collector, and my memory about it is getting dusty. At the time, I noticed that there’s very little documentation about the Ruby interpreter implementation. There are little pieces of comments here and there, and a few slides and emails all over the Internet, but nothing comprehensive. The best thing I could find is the Ruby Hacking Guide, but unfortunately it’s in Japanese only. There’s an effort to translate the book to English, but the part about garbage collection hasn’t been translated yet. So, seeing that lately more people are becoming interested in Ruby’s garbage collector, I thought it might be useful if I document it.

A few notes:

  • Everything I write here is deducted from the source code of Ruby 1.8.6.
  • When I say “Ruby”, I mean the mainstream Ruby interpreter implementation, not the language specification.
  • When I say “heap”, I mean the Ruby heap unless explicitly mentioned otherwise.

In the beginning is the Object

Almost everything in Ruby is an object. Everything – strings, variables, classes and even procedures and the AST. For instance, we have the following structs in Ruby:

  • RFloat – represents a float value in the language.
  • RString – represents a string value in the language.
  • RClass – represents a class in the language.
  • …etc etc…

All those types are actually child type of a single parent type: RBasic.

Since Ruby is written in C, the implementation doesn’t appear to be object oriented at first sight, even though it actually is (though in a very messy way, in my opinion). But let’s take a look how that’s implemented. This is how RClass and RFloat are defined:

C
  1. struct RClass {
  2.     struct RBasic basic;
  3.     struct st_table *iv_tbl;
  4.     struct st_table *m_tbl;
  5.     VALUE super;
  6. };
  7.  
  8. struct RFloat {
  9.     struct RBasic basic;
  10.     double value;
  11. };

Each type has an RBasic field as the first field in the struct. This means that each type can essentially be casted to RBasic. So far so good – this is actually a fairly standard way to implement inheritance in C, and is used by GObject-based applications and libraries (GNOME, GTK).

But things don’t stop there: all those types can be casted to an RVALUE struct, which is defined roughly as follows:

C
  1. typedef struct RVALUE {
  2.     union {
  3.         struct {
  4.             unsigned long flags;
  5.             struct RVALUE *next;
  6.         } free;
  7.         struct RBasic  basic;
  8.         struct RObject object;
  9.         struct RClass  klass;
  10.         struct RFloat  flonum;
  11.         struct RString string;
  12.         …etc etc…
  13.         struct RNode   node;
  14.         struct RMatch  match;
  15.         struct RVarmap varmap;
  16.         struct SCOPE   scope;
  17.     } as;
  18. } RVALUE;

This might look a bit strange when you read it for the first time. I was confused at the time too. But notice that this is actually a big union. An RVALUE can be casted to an RBasic, RClass, RFloat, RString, etc. They’re all the same size. We’ll see later on that the garbage collector makes use of this fact.

I’ll explain later what the free struct is supposed to be.

The big picture: heaps

Ruby has multiple heaps in which it stores objects. Schematically, it looks like this:
Schematic overview of the heaps

The number of heaps is variable. When Ruby tries to allocate a new object, and all heaps are full, then a new heap is created. The heaps list is an array which points to all heaps. This array does not contain the heaps themselves, it only points to the heaps. The list is defined as follows:

C
  1. static struct heaps_slot {
  2.     void *membase;
  3.     RVALUE *slot;
  4.     int limit;
  5. } *heaps;
  6.  
  7. static int heaps_length = 0;
  8. static int heaps_used   = 0;

The list’s size is represented by heaps_length. But only heaps_used elements is actually used. This is because the heaps list will only grow – it will never shrink. That’s is OK because the heaps list itself is small: the bulk of the data is stored in the actual heaps. The actual heap is freed when it’s no longer necessary.

membase points to the memory location of the heap.
slot points to the first object in that heap.
limit is the number of RVALUE objects that fit into this heap.
Note that membase and slot may not be the same value. From what I understand, Ruby tries to make sure that each RVALUE is located in a memory location that’s a multiple of RVALUE’s size (that is, for each RVALUE object r in Ruby, r % sizeof(RVALUE) == 0 is true).

Inside the heap

Now let us take a look at a single heap. A heap is actually just a big array of RVALUE objects:
A single Ruby heap

Some slots in the heap are empty. A slot r is considered empty if r.as.free.flags == 0. And now you know what the ‘free’ struct in RVALUE is.
You might wonder whether it’s OK to treat an RVALUE as a ‘free’ structure without further checks. It is, because RBasic also has a ‘flags’ field.

Furthermore, a ‘free’ struct also points to another ‘free’ struct, which may even be in a different heap. So all the free slots actually form a freelist. The freelist variable points to the beginning of the freelist.

Garbage collection

Ruby uses a simple mark-and-sweep algorithm. It works as follows:

  1. Mark all objects that are accessible from the current point of the Ruby program (mark phase).
  2. Iterate through all objects and free those that are not marked (sweep phase).

There are a few oddities though:

  1. Ruby also scans the system stack for objects. I don’t know why.
  2. The creation of new heaps is actually done during garbage collection. When Ruby tries to create a new object, it will call the garbage collector if it notices that the free list is too small, thereby indirectly creating more heap space.
  3. Ruby marks an object r by casting r to RBasic, and setting FL_MARK on its ‘flags’ field. This is the reason why pages are made dirty during garbage collection.
  4. Source code filenames are also garbage collected. Ruby does that by allocating n+2 bytes for each filename (where n is the filename’s length, excluding NULL), and using the first byte as mark flag, and everything else as the actual string content. It took me a while to understand this, it made the source code very messy.

Final words

Ruby’s memory management and garbage collection code is huge. It’s in a big 2093 lines source file. It took me a while to understand it, and I hope this helped you to understand it. If there are any questions, feel free to ask. :)

13 Comments »

  1. Jeremy Kemper said,

    October 12, 2007 @ 7:13 pm

    What a lovely read. Thank you! And keep them coming :)

  2. Dan42 said,

    October 13, 2007 @ 9:47 am

    Ruby scans the system stack because objects can be created from within C code. Imagine a C function which allocates a RString str1 and then a RString str2, but there is not enough space to allocate str2 so the garbage collector runs. At this point, since the only reference to str1 is in the system stack, the garbage collector would mistakenly free str1 if it didn’t scan the system stack.

  3. she said,

    October 23, 2007 @ 9:17 am

    woot great read
    i add this link to my local ruby-to-c knowledge page :)

  4. Edward Ocampo-Gooding said,

    October 23, 2007 @ 3:17 pm

    Solid article. Thanks!

  5. Michael Tsai - Blog - How the Ruby Heap Is Implemented said,

    October 23, 2007 @ 3:37 pm

    [...] not well documented, so Hongli Lai read the code to figure it [...]

  6. Eric said,

    October 24, 2007 @ 5:24 pm

    Excellent stuff, the graphics were a good call – help propigate to non-C folks.

  7. Saad Savari said,

    October 26, 2007 @ 4:36 pm

    Great article. keep the good work coming

  8. How Ruby Manages Its Memory said,

    October 27, 2007 @ 3:17 pm

    [...] “How the Ruby heap is implemented,” Hongli Lai looks at how Ruby manages its memory and stores your objects. It’s reasonably [...]

  9. Sam Figueroa said,

    October 28, 2007 @ 1:23 pm

    I code Ruby on Rails as a side hobby. I never thought the hard internals of the Ruby heap would interest me. Your article got me hooked after the first few paragraphs. A great read. Your effort was not in vain.

  10. Satish Talim said,

    October 31, 2007 @ 1:01 am

    A very interesting read. Keep it coming.

  11. 赖洪礼的 blog » Memory usage comparison: Rails 1.2.6 vs 2.0.2 said,

    March 19, 2008 @ 11:08 am

    [...] is usually the best measurement – but not for Ruby! In the past I’ve explained how Ruby’s heap is implemented. Ruby allocates several heaps, and the heaps contain equally-sized slots in which Ruby objects are [...]

  12. jonam said,

    February 24, 2009 @ 6:36 am

    This is the first time i am learning about ruby memory management,
    It’s good,
    I got lot many things.

  13. Understanding how Ruby stores objects in memory – the Ruby Heap « The Irish Penguin said,

    October 28, 2009 @ 1:58 pm

    [...] How the Ruby Heap is Implemented Phusion Passenger’s Hong Lai gives a great explanation of the Ruby Heap – the banner [...]

RSS feed for comments on this post · TrackBack URI

Leave a Comment