Archive for October, 2007

The web, as a platform, sucks

The web platform sucks.

Don’t get me wrong. With Ruby on Rails, developing web applications is actually a whole lot more enjoyable than writing desktop client software! The web is a wonderful platform for writing for simple information processing systems that do not require very complex user interaction. Its main strength lies in its universal aspect – everybody can use it without installing annoying client software. But web applications get nasty and hacky very soon.

At the moment, I’m enrolled in a course at my university called “Ontwerpproject” (“Design project”). We’re developing a system for facilitating the storage and the transfer of knowledge inside as well as between student organizations. This is all fine and dandy. Displaying documents is very simple. Storing and displaying organizations, tags, etc. is very easy, and with scriptaculous effects, it looks cool too.

One of the requirements for the system is that it must be possible to enter knowledge on a website (as opposed to, say, uploading Word documents). We chose to use one of those “web based rich text editors” (I’ll call them WBRTE from now on). And this is where things go downhill. There aren’t much options. Probably the most widely-known open source WBRTE is FCKEditor. There’s even a Ruby on Rails plugin for it. There are commercial alternatives as well, but they’re very expensive.
FCKEditor works, but:

  1. It’s slow.
  2. It’s slow.
  3. Only works in Firefox and IE (though 2.5 beta supports Safari and Opera as well)
  4. If I press Reload or the Back button and then Forward again, the text I entered previously is gone.
  5. Spell checking only supports 1 language simultaneously.
  6. Spell checking button interferes with Firefox’s built in spell checking. By default, FCKEditor disables Firefox’s built in spell checking. Spell-check-as-you-type is a good thing and is easier to use than a spell check button. The end users for our system are Dutch, and if they have Firefox installed, then they usually have the English version. So they’ll want to switch to the Firefox Dutch dictionary by right clicking. But if they that, then they will – tadaa – get the FCKEditor popup menu instead! They can bypass that by holding Ctrl while right clicking, but it’s not intuitive. I had to insert a tip at the editor page to notify them about that, but I would prefer if that wasn’t necessary.
  7. Did I mention it’s slow? And by “slow” I mean “it takes a long time to load and generally feels clunky and less responsive compared to client word processing software”

I’m not sure whether 1 (and 2 and 7) can be fixed. If you look at how FCKEditor and other WBRTEs are implemented, you see that they usually do that with an iframe in which they create DOM elements.
3 is a solvable issue. Not sure about 4.
5 cannot be solved without serious changes in the browsers. Right now there is no way to integrate your own rightclick-popups with that of the browser. In the case of FCKEditor spell checking, this sucks – a lot.

My point is, if you look at some things, you’ll see how hacky the web platform actually is. FCKEditor, while working well in many circumstances, is a big hack in my opinion. A very clever hack, I’ll admit. Other WBRTEs are essentially hacks too and have similar problems. The web platform was never designed for these sort of things. And as a user, I can notice that a lot in the form of reduced performance.

A few more gripes I have:

  • Lightboxes with a transparent black background that fades in. The fade in animation is slow, VERY slow. In both Firefox and Internet Explorer. This could be just an implementation issue – I figure things could be a lot faster if browsers use hardware accelerated graphics rendering. But still, this reeks like a “hack” to me.
  • File uploading. Let’s say you’re uploading a 30 MB JPEG file, which has a corrupted header. The server can only tell you that the file is corrupted after you’ve finished uploading the entire 30 MB. Furthermore, batch uploading is not supported. While you can wrap things in a zip archive, a lot of casual people don’t know how to create .zip files. So web developers work it around with Java applets, which load very slowly.
  • No support for server pushing. You usually don’t need this, but it’s useful for some websites, such as web-based multiplayers online games. Clients have to be constantly notified of updates, and polling the server with Ajax is very inefficient. I realize that there’s Comet, but when I look at how it’s implemented, it feels like a big hack.

I’m not criticizing FCKEditor or WBRTEs in general or light boxes; absolutely not, they provide very useful tools. I’m criticizing the web as a platform. It’s been so many years, isn’t it about time that browsers provide some good built in support for rich text editor components, instead of letting people hack one together with iframes? I think WHATWG had a specification about that, but I’ve never heard anything from WHATWG ever since it was established. It’s about time the web becomes less hacky.

Comments (8)

Making Ruby’s garbage collector copy-on-write friendly, part 6 (final?)

I added some minor optimizations and cleaned up the code a little bit. I also fixed many bugs in mongrel_light_cluster. It’s time for another test round.

Test plan

The goal is to measure the memory usage of my Ruby GC patch + mongrel_light_cluster, compared to default Ruby and mongrel_cluster. The test plan is as follows:

  • Measure current memory usage.
  • Start 10 instances of mongrel_cluster.
  • Measure current memory usage.
  • Load the front page of all 10 mongrel_cluster instances 50 times.
  • Measure current memory usage.
  • Shutdown mongrel_cluster, and repeat the above steps with patched Ruby + mongrel_light_cluster.

Software used in this test

Ruby on Rails 1.2.5
Mongrel 1.0.1
mongrel_cluster 1.0.2
Mephisto 0.7.3

Environment: production
config.action_controller.perform_caching = false

Memory usage of 10 mongrel_cluster instances

Before mongrel_cluster is started:

             total       used       free     shared    buffers     cached
Mem:          1011        576        435          0         16        185
-/+ buffers/cache:        373        637
Swap:          996         13        982

After starting 10 instances of mongrel_light_cluster:

             total       used       free     shared    buffers     cached
Mem:          1011        861        149          0         16        185
-/+ buffers/cache:        659        352
Swap:          996         13        982

After downloading the front page 50 times from all instances:

             total       used       free     shared    buffers     cached
Mem:          1011        901        109          0         16        186
-/+ buffers/cache:        698        312
Swap:          996         13        982

Conclusion:
10 instances of mongrel_cluster will use 435-109 = 326 MB memory.
So mongrel_cluster uses 32.6 MB per instance.

Memory usage of 10 mongrel_light_cluster instances with patched Ruby

Before:

             total       used       free     shared    buffers     cached
Mem:          1011        587        424          0         18        193
-/+ buffers/cache:        375        635
Swap:          996         13        982

After starting 10 instances:

             total       used       free     shared    buffers     cached
Mem:          1011        752        259          0         18        193
-/+ buffers/cache:        540        470
Swap:          996         13        982

After downloading the front page 50 times from all instances:

             total       used       free     shared    buffers     cached
Mem:          1011        803        207          0         18        193
-/+ buffers/cache:        591        419
Swap:          996         13        982

Conclusion:
10 instances of mongrel_light_cluster uses 424-207 = 217 MB memory.
So 1 mongrel_light_cluster instance uses 21.7 MB.

Overall conclusion

Everybody likes graphs, so here is one:
copy-on-write-friendly-ruby-memory-usage.png
Memory usage of an instance went down from 32.6 MB to 21.7 MB. A 33% saving! :)
This seems to be the end of the road. I can’t seem to be able to add more optimizations.

You can download the latest Ruby 1.8.6 patch here.
You can install mongrel_light_cluster by typing:

svn checkout http://public.railsplugins.net/repos/mongrel_light_cluster/trunk mongrel_light_cluster
cd mongrel_light_cluster
rake
gem install pkg/mongrel_light_cluster-0.1.gem

Please read the README for further documentation.

Finally, I’d like to ask your help:

  • I want to turn this patch and mongrel_light_cluster into a real RubyForge project. Can anyone help me with the website and documentation?
  • Is there anyone willing to test my patch and mongrel_light_cluster? I’m not confident enough to use these on my business website (which handles payments) but I don’t have any other Rails sites that are large enough.

Please post here if you’re interested.

Comments (13)

Making Ruby’s garbage collector copy-on-write friendly, part 5

It seems that the garbage collector doesn’t make pages dirty after all, and that malloc() seems to be the problem.

I inserted the following test code at the beginning of the garbage_collect() function:

C
  1. if (debugging) {
  2.     printf("Before 1 KB is allocated.");
  3.     getchar();
  4.     calloc(1024, 1);
  5.     printf("After 1 KB is allocated.");
  6.     getchar();
  7. }

My Ruby test script looks like this:

Ruby
  1. load_ruby_on_rails
  2. ObjectSpace.garbage_collect    # garbage collect before forking to make sure stuff isn’t freed after forking
  3. pid = fork do
  4.     ObjectSpace.start_debugging    # sets the ‘debugging’ variable to true
  5.     ObjectSpace.garbage_collect
  6.     exit!
  7. end
  8. Process.waitpid(pid)

Between the messages “Before 1 KB is allocated.” and “After 1 KB is allocated.”, memory usage in the child process jumps from 125 KB to 7 MB!!!

I wrote a test program in C to see whether I can reproduce it outside Ruby:

C
  1. #include <stdio .h>
  2. #include < sys/types.h>
  3. #include < sys/wait.h>
  4. #include <unistd .h>
  5. #include <stdlib .h>
  6.  
  7. int
  8. main() {
  9.         int i;
  10.  
  11.         for (i = 0; i < 20000; i++) {
  12.                 // Allocate some stuff so that the heap isn’t empty.
  13.                 calloc(1024, 1);
  14.         }
  15.  
  16.         pid_t pid = fork();
  17.         if (pid == 0) {
  18.                 getchar();
  19.                 calloc(1024, 1);
  20.                 getchar();
  21.                 _exit(0);
  22.         } else {
  23.                 int status;
  24.                 waitpid(pid, &status, 0);
  25.         }
  26.         return 0;
  27. }

And I couldn’t. In the above test program, memory usage in the child process jumps from 30 KB to 40 KB.

Is there anyone with intimate knowledge about the Linux/glibc malloc() implementation who can tell me what’s going on?

Comments (2)

Making Ruby’s garbage collector copy-on-write friendly, part 4

My previous attempt implements the marking table with a set structure. In part 3, Dan42 said that I could use a bit field instead. The idea did occur to me in the past, but I thought it was infeasible because I thought the bit field will have to cover the entire process’s address space. After some thinking, I realized I was wrong. Thanks Dan42. :D

Each heap_slot struct now has a bit field, which is used as marking table for objects inside that heap:

C
  1. static struct heaps_slot {
  2.     void *membase;
  3.     RVALUE *slot;
  4.     int limit;
  5.     int *marks;        // added this
  6.     int marks_size;    // and this
  7. } *heaps;

The previous marking table implementation had a large overhead: about 8 bytes per marked object. An object (RVALUE) is 20 bytes, so the overhead is 40% in the worst case! And that’s not counting the overhead used by the set structure’s bucket.
The new bit field implementation has a much smaller overhead:
(n / 8 ) / (n * 20) = n / (8 * n * 20) = 1 / (8 * 20) = 1 / 160 = 0.00625
where n is the number of objects on a heap. The bit field only has 0.6% overhead!! :)

According to my benchmarks, garbage collection performance has improved by 60% compared to the previous implementation. It’s still 10%-20% slower than normal Ruby, but I think it’s acceptable.

I think a significant amount of time is spent on locating the heap (and thus the marking table bit field) that an object belongs to. Right now it performs a linear search on all heaps. That’s usually OK though. Each heap is 80% larger than the previous. Ruby on Rails fits in 6 heaps. I think the average application needs no more than 7 heaps. So in the average case, looking up the correct heap should take no more than 7 operations. But it still sucks, so next time I’ll investigate whether I can speed this up by sorting the heaps and using a binary search.

And I’m still not done. Running the garbage collector after a fork still makes a lot of pages dirty, and I still haven’t figured out why.

Anyway…

You can download the Ruby 1.8.6 patch here.

This is brought to you by blood, sweat and tears. I poked myself several times today wondering why my code crashes for mysterious reasons.

Comments (5)

How the Ruby heap is implemented

It’s been a while since I’ve worked on Ruby’s garbage collector, and my memory about it is getting dusty. At the time, I noticed that there’s very little documentation about the Ruby interpreter implementation. There are little pieces of comments here and there, and a few slides and emails all over the Internet, but nothing comprehensive. The best thing I could find is the Ruby Hacking Guide, but unfortunately it’s in Japanese only. There’s an effort to translate the book to English, but the part about garbage collection hasn’t been translated yet. So, seeing that lately more people are becoming interested in Ruby’s garbage collector, I thought it might be useful if I document it.

A few notes:

  • Everything I write here is deducted from the source code of Ruby 1.8.6.
  • When I say “Ruby”, I mean the mainstream Ruby interpreter implementation, not the language specification.
  • When I say “heap”, I mean the Ruby heap unless explicitly mentioned otherwise.

In the beginning is the Object

Almost everything in Ruby is an object. Everything – strings, variables, classes and even procedures and the AST. For instance, we have the following structs in Ruby:

  • RFloat – represents a float value in the language.
  • RString – represents a string value in the language.
  • RClass – represents a class in the language.
  • …etc etc…

All those types are actually child type of a single parent type: RBasic.

Since Ruby is written in C, the implementation doesn’t appear to be object oriented at first sight, even though it actually is (though in a very messy way, in my opinion). But let’s take a look how that’s implemented. This is how RClass and RFloat are defined:

C
  1. struct RClass {
  2.     struct RBasic basic;
  3.     struct st_table *iv_tbl;
  4.     struct st_table *m_tbl;
  5.     VALUE super;
  6. };
  7.  
  8. struct RFloat {
  9.     struct RBasic basic;
  10.     double value;
  11. };

Each type has an RBasic field as the first field in the struct. This means that each type can essentially be casted to RBasic. So far so good – this is actually a fairly standard way to implement inheritance in C, and is used by GObject-based applications and libraries (GNOME, GTK).

But things don’t stop there: all those types can be casted to an RVALUE struct, which is defined roughly as follows:

C
  1. typedef struct RVALUE {
  2.     union {
  3.         struct {
  4.             unsigned long flags;
  5.             struct RVALUE *next;
  6.         } free;
  7.         struct RBasic  basic;
  8.         struct RObject object;
  9.         struct RClass  klass;
  10.         struct RFloat  flonum;
  11.         struct RString string;
  12.         …etc etc…
  13.         struct RNode   node;
  14.         struct RMatch  match;
  15.         struct RVarmap varmap;
  16.         struct SCOPE   scope;
  17.     } as;
  18. } RVALUE;

This might look a bit strange when you read it for the first time. I was confused at the time too. But notice that this is actually a big union. An RVALUE can be casted to an RBasic, RClass, RFloat, RString, etc. They’re all the same size. We’ll see later on that the garbage collector makes use of this fact.

I’ll explain later what the free struct is supposed to be.

The big picture: heaps

Ruby has multiple heaps in which it stores objects. Schematically, it looks like this:
Schematic overview of the heaps

The number of heaps is variable. When Ruby tries to allocate a new object, and all heaps are full, then a new heap is created. The heaps list is an array which points to all heaps. This array does not contain the heaps themselves, it only points to the heaps. The list is defined as follows:

C
  1. static struct heaps_slot {
  2.     void *membase;
  3.     RVALUE *slot;
  4.     int limit;
  5. } *heaps;
  6.  
  7. static int heaps_length = 0;
  8. static int heaps_used   = 0;

The list’s size is represented by heaps_length. But only heaps_used elements is actually used. This is because the heaps list will only grow – it will never shrink. That’s is OK because the heaps list itself is small: the bulk of the data is stored in the actual heaps. The actual heap is freed when it’s no longer necessary.

membase points to the memory location of the heap.
slot points to the first object in that heap.
limit is the number of RVALUE objects that fit into this heap.
Note that membase and slot may not be the same value. From what I understand, Ruby tries to make sure that each RVALUE is located in a memory location that’s a multiple of RVALUE’s size (that is, for each RVALUE object r in Ruby, r % sizeof(RVALUE) == 0 is true).

Inside the heap

Now let us take a look at a single heap. A heap is actually just a big array of RVALUE objects:
A single Ruby heap

Some slots in the heap are empty. A slot r is considered empty if r.as.free.flags == 0. And now you know what the ‘free’ struct in RVALUE is.
You might wonder whether it’s OK to treat an RVALUE as a ‘free’ structure without further checks. It is, because RBasic also has a ‘flags’ field.

Furthermore, a ‘free’ struct also points to another ‘free’ struct, which may even be in a different heap. So all the free slots actually form a freelist. The freelist variable points to the beginning of the freelist.

Garbage collection

Ruby uses a simple mark-and-sweep algorithm. It works as follows:

  1. Mark all objects that are accessible from the current point of the Ruby program (mark phase).
  2. Iterate through all objects and free those that are not marked (sweep phase).

There are a few oddities though:

  1. Ruby also scans the system stack for objects. I don’t know why.
  2. The creation of new heaps is actually done during garbage collection. When Ruby tries to create a new object, it will call the garbage collector if it notices that the free list is too small, thereby indirectly creating more heap space.
  3. Ruby marks an object r by casting r to RBasic, and setting FL_MARK on its ‘flags’ field. This is the reason why pages are made dirty during garbage collection.
  4. Source code filenames are also garbage collected. Ruby does that by allocating n+2 bytes for each filename (where n is the filename’s length, excluding NULL), and using the first byte as mark flag, and everything else as the actual string content. It took me a while to understand this, it made the source code very messy.

Final words

Ruby’s memory management and garbage collection code is huge. It’s in a big 2093 lines source file. It took me a while to understand it, and I hope this helped you to understand it. If there are any questions, feel free to ask. :)

Comments (13)

Mikuru 1.0.0 released


Mikuru – named after Asahina Mikuru from The Melancholy of Haruhi Suzumiya – is an extremely simple image gallery generator. Its key features are:

  • Easy to use and easy to set up.
  • No dependencies on databases.
  • The generated gallery has a simple, to the point, quick and easy to user interface.

I wrote this because I went to Abunai last weekend, and I took 150 pictures. I didn’t feel like setting up a full blown gallery system such as Gallery, so I wrote this simple fire-and-forget image gallery generator.

How does it work?

You put a bunch of pictures in a folder, run Mikuru, and it’ll generate an ‘index.html’ as well as a set of thumbnails. There are no other generated files. Simply upload index.html, the thumbnails and the pictures to a web server, and you’re ready to go.

Here’s an example usage session:

$ ls
DSCN001.JPG   DSCN002.JPG   DSCN003.JPG
DSCN004.JPG   DSCN005.JPG   DSCN006.JPG
$ mikuru
MKDIR      thumbnails
WRITE      thumbnails/DSCN001.JPG
WRITE      thumbnails/DSCN002.JPG
WRITE      thumbnails/DSCN003.JPG
WRITE      thumbnails/DSCN004.JPG
WRITE      thumbnails/DSCN005.JPG
WRITE      thumbnails/DSCN006.JPG
WRITE      index.html
$ scp -r * www.somehost.com:public_html/my_gallery/

How does it look like?

Click here for a demo image gallery.

Download

Version 1.0.0
Requires Ruby and RMagick.

Comments