Archive for Web Development

The web, as a platform, sucks

The web platform sucks.

Don’t get me wrong. With Ruby on Rails, developing web applications is actually a whole lot more enjoyable than writing desktop client software! The web is a wonderful platform for writing for simple information processing systems that do not require very complex user interaction. Its main strength lies in its universal aspect - everybody can use it without installing annoying client software. But web applications get nasty and hacky very soon.

At the moment, I’m enrolled in a course at my university called “Ontwerpproject” (”Design project”). We’re developing a system for facilitating the storage and the transfer of knowledge inside as well as between student organizations. This is all fine and dandy. Displaying documents is very simple. Storing and displaying organizations, tags, etc. is very easy, and with scriptaculous effects, it looks cool too.

One of the requirements for the system is that it must be possible to enter knowledge on a website (as opposed to, say, uploading Word documents). We chose to use one of those “web based rich text editors” (I’ll call them WBRTE from now on). And this is where things go downhill. There aren’t much options. Probably the most widely-known open source WBRTE is FCKEditor. There’s even a Ruby on Rails plugin for it. There are commercial alternatives as well, but they’re very expensive.
FCKEditor works, but:

  1. It’s slow.
  2. It’s slow.
  3. Only works in Firefox and IE (though 2.5 beta supports Safari and Opera as well)
  4. If I press Reload or the Back button and then Forward again, the text I entered previously is gone.
  5. Spell checking only supports 1 language simultaneously.
  6. Spell checking button interferes with Firefox’s built in spell checking. By default, FCKEditor disables Firefox’s built in spell checking. Spell-check-as-you-type is a good thing and is easier to use than a spell check button. The end users for our system are Dutch, and if they have Firefox installed, then they usually have the English version. So they’ll want to switch to the Firefox Dutch dictionary by right clicking. But if they that, then they will - tadaa - get the FCKEditor popup menu instead! They can bypass that by holding Ctrl while right clicking, but it’s not intuitive. I had to insert a tip at the editor page to notify them about that, but I would prefer if that wasn’t necessary.
  7. Did I mention it’s slow? And by “slow” I mean “it takes a long time to load and generally feels clunky and less responsive compared to client word processing software”

I’m not sure whether 1 (and 2 and 7) can be fixed. If you look at how FCKEditor and other WBRTEs are implemented, you see that they usually do that with an iframe in which they create DOM elements.
3 is a solvable issue. Not sure about 4.
5 cannot be solved without serious changes in the browsers. Right now there is no way to integrate your own rightclick-popups with that of the browser. In the case of FCKEditor spell checking, this sucks - a lot.

My point is, if you look at some things, you’ll see how hacky the web platform actually is. FCKEditor, while working well in many circumstances, is a big hack in my opinion. A very clever hack, I’ll admit. Other WBRTEs are essentially hacks too and have similar problems. The web platform was never designed for these sort of things. And as a user, I can notice that a lot in the form of reduced performance.

A few more gripes I have:

  • Lightboxes with a transparent black background that fades in. The fade in animation is slow, VERY slow. In both Firefox and Internet Explorer. This could be just an implementation issue - I figure things could be a lot faster if browsers use hardware accelerated graphics rendering. But still, this reeks like a “hack” to me.
  • File uploading. Let’s say you’re uploading a 30 MB JPEG file, which has a corrupted header. The server can only tell you that the file is corrupted after you’ve finished uploading the entire 30 MB. Furthermore, batch uploading is not supported. While you can wrap things in a zip archive, a lot of casual people don’t know how to create .zip files. So web developers work it around with Java applets, which load very slowly.
  • No support for server pushing. You usually don’t need this, but it’s useful for some websites, such as web-based multiplayers online games. Clients have to be constantly notified of updates, and polling the server with Ajax is very inefficient. I realize that there’s Comet, but when I look at how it’s implemented, it feels like a big hack.

I’m not criticizing FCKEditor or WBRTEs in general or light boxes; absolutely not, they provide very useful tools. I’m criticizing the web as a platform. It’s been so many years, isn’t it about time that browsers provide some good built in support for rich text editor components, instead of letting people hack one together with iframes? I think WHATWG had a specification about that, but I’ve never heard anything from WHATWG ever since it was established. It’s about time the web becomes less hacky.

Comments (8)

Did I preload enough libraries?

In my previous blog entries (here and here), I blogged about using fork() and copy-on-write semantics to reduce memory usage in Ruby on Rails.

I was wondering whether I’ve preloaded enough Ruby libraries. If I didn’t, then each child process will load its own copy of the required libraries, and that memory will not be shared between the child processes. Unfortunately, Ruby doesn’t seem to have a variable somewhere which lists all loaded Ruby files in the interpreter. So I hacked Ruby, by inserting the following code in ruby.c, function load_file(fname, script):

static void
load_file(fname, script)
    const char *fname;
    int script;
{
    extern VALUE rb_stdin;
    VALUE f;
    int line_start = 1;
    ...
    }
fprintf(stderr, “RUBY-LOAD: %s\n”, fname);
    if (script) {

This will make the Ruby interpreter print all loaded files to stderr.

  1. I made a special version of my prefork script, which exits immediately after preloading the Rails libraries. I redirected its stderr to a file, which allows me see which files are loaded by the preloading procedure.
  2. Next, I ran the normal version of my prefork script. Once again, stderr is redirected to a file. I terminate the script after the child processes have been fully initialized.

I compared the difference between the two files. I found out that only dispatch.fcgi had to be loaded by the child processes! Great. :)

I then launched a web browser and visited a page in my application. I compared the files again. This time I found out that each child process had to individually load the source code for the Rails application itself.

Conclusion

The prefork script doesn’t preload the source files for the Rails application itself. It should - this will save us a few hundred kilobytes to a few megabytes of memory, depending on the size of the application.

Based on this finding, as well as the results of my initial research on memory usage in Rails, I can conclude that most of the memory in a Rails application is spent on storing the Ruby on Rails (and dependencies) code, and that only a small fraction is spent on storing application code.

In other news, I cleaned up and restructured the prefork script. It can now gracefully terminate child processes whenever it receives a termination signal. It also performs more error checking. I’m going to do more testing. If this technique turns out to be reliable I’ll publish the final version of the script to the public.

Comments (1)

Potential problems with preforking Ruby on Rails

In my previous blog entry, I blogged about using fork() and copy-on-write semantics to reduce memory usage in Ruby on Rails. Saimon Moore suggested that I should contact Zed Shaw, author of Mongrel. I asked him on his opinion and potential problems. Unfortunately I don’t have permission to quote him, so I’ll just summarize the issues (with preforking in Rails) and my own findings.

Leaking I/O handles

It is said that Ruby leaks I/O handles when it forks. I really don’t know how in the world that is possible - when the child exits all of its resources are freed, there is no way for it to leak anything unless the parent process forgets to clean up something that it created before forking.
I wrote a script to test this:

require 'socket'
serv = TCPServer.new(2202)
puts "*** File descriptors in parent process:"
system("ls --color -l /proc/#{Process.pid}/fd")
pid = fork do
	serv = TCPServer.new(2203)
	puts "*** File descriptors in child process:"
	system("ls --color -l /proc/#{Process.pid}/fd")
	exit
end
Process.waitpid(pid)
puts "*** File descriptors in parent process:"
system("ls --color -l /proc/#{Process.pid}/fd")

The script creates a TCP server socket, then lists the process’s file descriptors. It then forks, creates another TCP server socket, and lists the child process’s file descriptors. The parent process waits for the child, closes its own server socket, then lists its file descriptors again. The output is:

*** File descriptors in parent process:
total 4
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 0 -> /dev/pts/0
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 1 -> /dev/pts/0
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 2 -> /dev/pts/0
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 3 -> socket:[2300615]
*** File descriptors in child process:
total 5
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 0 -> /dev/pts/0
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 1 -> /dev/pts/0
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 2 -> /dev/pts/0
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 3 -> socket:[2300615]
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 4 -> socket:[2300628]
*** File descriptors in parent process:
total 3
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 0 -> /dev/pts/0
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 1 -> /dev/pts/0
lrwx------ 1 hongli hongli 64 2007-04-05 21:48 2 -> /dev/pts/0

Conclusion: Everything looks perfectly normal to me. I have no idea what “leaking IO handles” means.

Reconnecting to the database

It is said that preforking will cause issues with database reconnections. I gave it a try.

  1. I preforked 2 Rails processes with my script.
  2. I setup lighttpd to only proxy to the first Rails process.
  3. I then visited a page in my Rails app which lists a bunch of records in the database.
  4. I stopped the MySQL server.
  5. I reloaded the page, and it threw an exception, which is to be expected.
  6. I started the MySQL server and reloaded the page. The page displayed fine.
  7. I setup lighttpd to only proxy to the second Rails process, and reloaded the page. The page still displayed fine.

Conclusion: I have no idea what database reconnection issues people are talking about. I can’t find any.

Sharing issues with pstore and SQLite

I don’t use SQLite, and don’t plan on using them any time soon, so I didn’t test this. I use SQLSessionStore for storing session data in MySQL, so pstore issues don’t affect me directly. Pstore is the default session storage in Rails.

Pstore stores session data in files. Imagine two HTTP clients, with the same session ID, accessing two different Rails processes. Both Rails processes write session data to disk. What happens? Will the pstore session file be corrupted? Zed said that even Mongrel (without preforking) has problems with pstore sharing, so it’s possible that Rails doesn’t lock the pstore session file.

I tested my own Rails application, which uses SQLSessionStore:

  1. I launched 2 Rails processes.
  2. I added the following functions to a controller:
    def read
    	if session[:rand].nil?
    		render :text => "No random number set."
    	else
    		render :text => session[:rand]
    	end
    end
    
    def write
    	session[:rand] = rand
    	read
    end

    The write method generates a random number and saves it in the session. The read method reads the last saved number.

  3. I setup lighttpd to only use Rails process 1.
  4. I visited the ‘write’ page, then setup lighttpd to use Rails process 2. I then visited the ‘read’ page. The number is still correct.
  5. I repeated this a few times, and couldn’t find any problems.

Conclusion: I don’t know whether pstore has problems, but SQLSessionStore seems to work fine. It’s a good idea to use SQLSessionStore anyway, as pstore slows down when you have a lot of sessions, and SQLSessionStore makes it easy to wipe idle session data.

Garbage collection makes pages dirty

According to this page, Ruby’s mark-and-sweep garbage collection makes all memory pages dirty, causing almost the entire child process’s to be copied. In my previous blog, I ran httperf to test preforked Rails. Rails creates a new ActionController object every time a HTTP request comes in, so using httperf will definitely activate garbage collection. Yet the memory usage didn’t increase as much as the page predicted it would.

I have a Perl application which uses about 35 MB of memory. 25 MB of that is spent on storing the parsed Perl optree, and only 10 MB is spent on storing runtime data. I suspect that Ruby is similar: most of the memory is spent on storing Rails code, not variable data. Code is probably never garbage collected (why would it be? in a dynamic language one cannot predict whether a function will be used in the future) so the garbage collector probably wouldn’t mark the pages containing Ruby opcodes as dirty. This explains why memory usage doesn’t go up a lot, after having made some HTTP requests.

Conclusion: I can’t find the problem. Nothing to worry about.

Final conclusion

I couldn’t find any large problems that were relevant to me. In the future I will test this preforking technique on a busy (non-commercial) website to see how well it works.

Comments (8)

Saving memory in Ruby on Rails with fork() and copy-on-write

Some of you may have heard of Ruby on Rails. It is a web development framework, based on the Ruby language. Last year I developed a Rails web application. I like Ruby on Rails a lot, and I strongly prefer it over PHP for serious web applications. With Rails I was able to develop a robust, stable and maintainable web application in a much shorter time than I could have when I used PHP.

Unfortunately, Rails seems to use a bit more resources than PHP, namely:

  1. Memory usage.
  2. CPU usage.

I’m not really concerned about CPU usage - my server has enough CPU to handle the load, but memory usage is more problematic. My web server has “only” 1 GB of RAM. It runs quite a lot of services so it’s a bit short on RAM.

A little introduction on how Ruby on Rails interfaces with the HTTP client

But first, let us take a look at how Rails interfaces with a HTTP client (your web browser). My usual setup is as follows:

  1. The web browser connects to the web server software (in my case, Lighttpd).
  2. The web server launches one or more Ruby on Rails FastCGI processes. FastCGI is like CGI, but instead of launching a process every time a HTTP request is made, FastCGI keeps the process in memory, so that it is capable of processing more than one request. Note that Apache 2 doesn’t support FastCGI (its support is broken, last time I checked), so I have to use Lighttpd.
  3. The web server proxies the HTTP request to one of the Rails FastCGI processes.

This setup is illustrated in the following picture:
rails-server-architecture.png
(SVG version)

There are, of course, different setups. Mongrel - a web server designed to run Ruby on Rails - seems to be becoming more and more popular. Unlike Lighttpd which uses FastCGI to spawn several Rails worker processes, Mongrel embeds Rails directly. Rails is not thread safe, so Mongrel can only process one request at a time (unlike Lighttpd which can proxy requests to one of the many worker processes). So what people generally do is to launch several Mongrel processes, and use Apache 2.2 with mod_proxy_balancer to proxy requests to one of the Mongrel processes.

I have no idea why it’s better than using Lighttpd with FastCGI. I heard that Lighttpd’s mod_proxy module was unstable, but I don’t know whether that’s still the case. My web server runs Lighttpd 1.4 and uses mod_proxy to proxy requests to some backend web servers, and I’ve never had problems with it.

But what about memory usage?

A process which embeds Ruby on Rails (that is, either a Rails FastCGI process or a Mongrel process) uses between 20 MB to 30 MB. It is usually a good idea to launch more than one FastCGI process (or Mongrel process) so that your web server can process more than 1 requests concurrently. But the memory usage quickly adds up. If you load 4 Rails processes then you’re already using about 100 MB of memory. My web server has “only” 1 GB of RAM (it runs quite a lot of services so it’s a bit short on RAM). So I’ve been looking for ways to reduce Rails memory usage.

Luckily, there is a way, and it’s called fork and copy on write.

As you might know, processes’ memory are isolated. That is, one process cannot read or write another process’s memory. On modern Unix operating systems, when a parent process forks a child process, almost all of the memory between the parent and child process is shared. So if your 200 MB bloated application forks a child process, the child process actually only uses a few kilobytes. Only when either the parent process or the child process writes to a piece of memory, that piece of memory is copied, so that the parent process’s memory changes won’t affect the child (and vice versa). This is why it’s called “copy on write”.
We can use this simple fact to save memory. For example, mod_perl uses this to reduce web server Perl scripts’ startup time and memory usage. mod_perl loads all required Perl modules in advance. When the web server forks a new child process to process a HTTP requests, the memory used by the already loaded Perl modules will be shared between the parent and the child process. Because most (or all) of the modules are already loaded, Perl will not load them again, thus significantly reducing loading time.
This technique has only one disadvantage: it doesn’t work on Windows! :) Windows has no fork() system call.

We can use the same technique on Ruby on Rails. Though I find it strange that not many Rails users seem to care - they happily spawn multiple Mongrel processes but I haven’t seen many people asking why Mongrel doesn’t use fork() to save memory. Kirk Haines said that it’s because ActiveRecord doesn’t like fork (the connection to the database seems to break after a fork). But I don’t know why Mongrel doesn’t just fork before any requests are processed.

How much memory does Ruby on Rails use, really?

I decided to measure Rails’s memory usage. My Rails environment was as follows:

config.cache_classes     = true
config.whiny_nils        = true
config.breakpoint_server = false
config.action_controller.consider_all_requests_local = true
config.action_controller.perform_caching             = false
config.action_view.cache_template_extensions         = false
config.action_view.debug_rjs                         = false

This was my memory usage before launching Rails:

             total       used       free     shared    buffers     cached
Mem:          1011        365        645          0          1         97
-/+ buffers/cache:        267        744
Swap:          996          0        996

I then proceeded to launch 10 independent Rails FastCGI processes, which do not share memory with each other (other than the memory used by the Ruby interpreter itself). After the launch, the memory usage was as follows:

             total       used       free     shared    buffers     cached
Mem:          1011        537        474          0          1         97
-/+ buffers/cache:        438        573
Swap:          996          0        996

The 10 processes used 171 MB in total, or 17.1 MB per process. But we’re not there yet: memory usage is likely to increase when I make a HTTP request. So I used the httperf tool and ran httperf --uri /rails/ --port 3501 --num-conns 100 --rate 20
This was the memory usage after httperf was done:

             total       used       free     shared    buffers     cached
Mem:          1011        551        459          0          1         97
-/+ buffers/cache:        452        558
Swap:          996          0        996

Memory usage was increased by 10 MB. So that’s 1.5 MB extra memory per process, and 186 MB in total for all 10 processes.

The experiment

I wrote a script which loads all the Ruby on Rails library. It will then fork and spawn x Rails FastCGI processes, where x is a number which you can configure. Each FastCGI process will have its own Unix socket for communication with the web server.

This was my memory usage before I launched the script:

             total       used       free     shared    buffers     cached
Mem:          1011        334        676          0          2         95
-/+ buffers/cache:        236        775
Swap:          996          0        996

After instructing the script to launch 10 processes, memory usage become:

             total       used       free     shared    buffers     cached
Mem:          1011        364        646          0          3         95
-/+ buffers/cache:        265        745
Swap:          996          0        996

Nice! All 10 processes only used 30 MB in total, which is a far cry from the previously measured 171 MB. Now, let us see what happens after we run httperf:

             total       used       free     shared    buffers     cached
Mem:          1011        381        629          0          3         96
-/+ buffers/cache:        282        729
Swap:          996          0        996

Memory usage has gone up by 17 MB. That’s 1.7 MB extra memory per process.

Conclusion

By using preforking I was able to reduce memory usage for 10 Rails processes from 186 to 47 MB. That’s a memory saving of 75%! The Rails application seems to work fine. So far I haven’t been able to detect any strange behavior.

Without preforking, the memory usage, in MB, follows this formula:

memusage(n) = 18.6 * n

…where n is the number of Rails processes. With preforking, the memory usage is:

memusage(n) = 30 + 1.7 * n

The graph looks like this:
Rails memory usage

Script usage

The script can be downloaded here. Put it in your ’scripts’ folder. Launch the script as follows:

./script/fork.rb NUM

…where NUM is the number of Rails processes you want to launch. Each process will be given its own Unix socket, named ‘log/fastcgi.socket-x’, where x is the process’s sequence number (which starts from 0). So if you launch 3 processes, the following Unix sockets will be created:
log/fastcgi.socket-0
log/fastcgi.socket-1
log/fastcgi.socket-2

You must also setup Lighttpd to communicate with Rails through the sockets. I use this configuration:

fastcgi.server = (
    ".fcgi" => (
        ("socket" => "/path-to-your-rails-root-folder/log/fastcgi.socket-0″),
        (”socket” => “/path-to-your-rails-root-folder/log/fastcgi.socket-1″),
        (”socket” => “/path-to-your-rails-root-folder/log/fastcgi.socket-2″)
    )
)

Comments (22)