Re: Strange HTTP header?

Yesterday I challenged people to look at sandbox.phusion.nl’s HTTP headers and check whether they notice anything weird. The HTTP response header of the front page is:

HTTP/1.1 200 OK
Server: nginx/0.6.32

X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.1.0
X-Runtime: 0.00173

Wow, I got a lot more responses than I expected.

Sorry guys, there’s a reason why I didn’t post this on the Phusion blog, but on my personal blog instead. :) Chu Yeow said:

Wow Passenger on Nginx (I think that’s it - doubt you’d run Nginx on top of Apache+Passenger ;)).

Well actually… we are running Nginx on top of Apache+Passenger. :)

The first reaction of many people is probably “WTF, are you out of your mind? Why would you do such a thing?” Let me explain a little bit about our server.

Initial motivation: security

This server is shared by many users, including a few which we don’t fully trust. It not only runs Rails applications but also a bunch of PHP applications, and in the not too distant past some mod_perl applications. In the usual Apache setup, all those PHP/mod_perl applications will run under the same user and have the same rights. This means that there is no security between different people’s web applications: Jane’s PHP script can read Joe’s forum database password file. Not so nice.

server_setup1.jpg

Now, how do we solve this? These days, server virtualization is the latest hype: just give Joe and Joe different virtual machines! But virtualization wastes a lot of memory. Joe and Jane’s websites are really low-traffic compared to mine. The server “only” has 1 GB of RAM, and allocating a fixed amount of RAM (which must be at least 128 MB for a more or less usable server OS) is really wasteful.

Our solution was simple. Each user got his own Apache installation and runs all his web applications under his own user account. Users cannot read from and write to other users’ home folders. Each of these backend Apache installations are firewalled, and a frontend web server proxies requests to these backend Apache installations.

server_setup2.jpg

But the setup is of course not limited to one-Apache-per-real-user. blog.phusion.nl is running on Wordpress, which doesn’t exactly have a good security track record. My personal Wordpress installation had been hacked once: apparently some spam bot changed the file upload folder to /tmp and put a .exe in there. It also disabled Akismet. I wouldn’t be surprised if someone one day finds a remote shell code execution vulnerability. One really wouldn’t want to run Wordpress with the same rights as all the other web applications. So we gave Wordpress its own user account and Apache installation. Wordpress is now completely sandboxed and cannot do any harm to the other websites.

Efficiency

Indeed, what about efficiency? We’ve been using this setup for almost 2 years now, and it’s actually running quite well. Not too long ago, this server hosted a website which got about 30 000 unique visitors per day (about 120 000 requests per day on this server; we load balanced that website over multiple web servers) and it was able to handle the load with ease. We noticed no delay in response times compared to when the website was running on the frontend web server directly. That said, we did go through several stages of optimization:

  1. A long long time ago, the frontend web server was Apache 1.3, which proxies requests via mod_accel. mod_accel is like mod_proxy, but you can specify a list of URI extensions that it won’t proxy. For example, you can tell mod_accel only to proxy requests that don’t end with .css, .jpg, .png, etc.
  2. Unfortunately Apache 1.3 was ancient and not well-supported, so we switched to Apache 2 with mod_proxy instead. mod_proxy provides no way to skip proxying certain URIs, so we had to live with this. Performance was acceptable, though the backend web servers are being hit harder than before because static asset requests are now also being proxied.
  3. Apache 2 proved to be too memory- and CPU-hungry for a reverse proxy, so we switched the frontend web server to Lighttpd instead. This reduced our CPU- and memory usage dramatically. We configured Lighttpd to serve static assets directly, so that the backend web servers are only there to serve PHP.
  4. Unfortunately Lighttpd leaks memory: after a few days, memory usage would jump to 200 MB. From time to time it will also “go out of control” and consume 100% CPU, although it’s still serving requests just fine. 2 days ago I finally got tired of that, and replaced Lighttpd with Nginx.

Finally, we used Apache with the worker MPM and Phusion Passenger development version (from the git repository) for hosting our Rails applications. The worker MPM, which uses a combination of threads and processes, is a lot more memory efficient than the default prefork MPM, which only use processes. This is our Apache worker MPM setup:

StartServers             1
ThreadsPerChild         10
MaxClients              10
MinSpareThreads          1
MaxSpareThreads          1
MaxRequestsPerChild  50000
ThreadStackSize     500000

This tells Apache to use only one process. That process is multi-threaded and will have 10 threads for serving requests. Furthermore, each thread will have a stack size of 500 KB. The default system stack size is usually something along the lines of 8 MB, so setting such a small stack size reduces Apache’s VM size a lot. 500 KB has proven to be sufficient for Apache.

Now, let’s compare the memory usage between Nginx and our Apache installation:

USER   PID   %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
root   11700  0.0  0.2  3452  2012  ??  Is    6:23AM   0:00.00 nginx: master process /usr/local/sbin/nginx
www    11701  0.0  0.3  3452  2880  ??  S     6:23AM   2:28.91 nginx: worker process (nginx)
www    11702  0.0  0.3  3452  2880  ??  S     6:23AM   2:47.10 nginx: worker process (nginx)
app    82548  0.0  0.3  7656  3572  ??  Ss   Tue03PM   0:05.79 /home/app/apache/bin/httpd -k start
app    89467  0.0  0.4 10144  4632  ??  I     5:11AM   0:02.45 /home/app/apache/bin/httpd -k start

The server’s running on FreeBSD, not Linux, so we can’t measure memory usage excluding any copy-on-write savings (i.e. the private dirty RSS). But let’s compare the total Resident Set Sizes (RSS):

  • Nginx: 7772 KB (7.6 MB)
  • Apache: 8204 KB (8.0 MB)

Not a big difference.

Apache’s slowness and Nginx’s performance, both overrated?

People commented:

Pretty fast runtime?

X-Runtime: 0.00171

and

Damn!! It is impossible!
0.00173 per request?!

So the Rails application is running in Apache and behind an Nginx reverse proxy, and it’s still fast.

FastCGI

Lighttpd and Nginx both support PHP via FastCGI, so why didn’t we use that instead? The answer is ease of use. Setting up a PHP-FastCGI process pool for every user is quite a hassle. Plus, the user might be running CGI or mod_perl applications as well. Giving each user his own Apache installation is by far the easiest way. Apache also supports .htaccess, which Lighttpd and Nginx don’t support. Wordpress’s URI rewriting feature writes mod_rewrite rules to .htaccess. Configuring the same rules in Lighttpd was a total pain, and I wouldn’t want to do that again.

Conclusion

I believe that all the fuss about web server performance is usually overrated. As we can see, Apache can be memory-efficient. Running Rails applications on Phusion Passenger behind an Nginx reverse proxy is viable. You just need to know how to tweak and mix-and-match the two.

What we’re doing is not very unlike proxying to a Mongrel cluster from Nginx. Instead of proxying to a Mongrel cluster, we proxy to Apache. This still makes Rails deployment a lot easier because Phusion Passenger will take care of managing the Rails processes for me. The only redundant thing that I have to do now is having to setup 2 virtual host definitions: one in Apache and one in Nginx.

Morale of the story: it’s all HTTP, you can proxy everything in any way you want. Some people on the Phusion Passenger mailing list asked how to horizontally scale Phusion Passenger. The answer is: the same way you’re used to when you were using Mongrel clusters.

This also shows that it is possible to run multiple Apache installations on the same server. It’s only a matter of specifying different configuration files for each installation. It seems that a lot of people aren’t aware of that. In a recent Google talk about Rails scalability, a speaker claimed that there is a limit to the amount of hardware resources that Apache can utilize. He said that if you have 16 cores and 20 GB of RAM, one Apache instance cannot utilize all those resources, and that in order to make full use of your hardware, one must virtualize. But why? It’s easier and more efficient to run multiple Apache instances on the same machine.

By the way, we use the following Nginx config snippet for Phusion Passenger-powered hosts:

proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_redirect http://localhost/ $scheme://$http_host/;

server {
    listen 80;
    server_name sandbox.phusion.nl;
    root /u/apps/sandbox/current/public;
    location / {
        proxy_redirect http://localhost:1234/ $scheme://$http_host/;
        if (!-f $request_filename) {
            proxy_pass http://localhost:1234;
            break;
        }
        if ($request_method != GET) {
            proxy_pass http://localhost:1234;
            break;
        }
    }
}

This forwards all non-static-asset requests to Apache. Static assets are served directly by Nginx.

17 Comments »

  1. Darcy said,

    August 22, 2008 @ 11:50 am

    As payment for such a horrible crime against Rails developers (giving them false hope of something so awesome), we now command you to make Passenger work on nginx - pretty please?

  2. lix said,

    August 22, 2008 @ 11:55 am

    well done!!!

    i solved this php/rails/blabla issue with different jails on FreeBSD, only 30MB overhead + 1 ip /jail and i cannot see security issue to dedicate a port range to each user and leave the user the possibility to run thin/mongrel/whatevar.

    regarding the wordpress:

    mount the /tmp with noexec, and chflags the php files to immutable can sort out any kind of file write issue(of course sql injection is still a problem).

    all it all, really nice post, salute from Ireland!

    lix

  3. AkitaOnRails said,

    August 22, 2008 @ 1:10 pm

    Well, there are situations where you can do virtualization, but sometimes you can’t and yet you still need to have 2 users not collide with each other, security-wise, and mod_php is insecure by design. Why not employ something like suPHP?

  4. Sikachu! said,

    August 22, 2008 @ 3:50 pm

    Hmm .. you really got me then lol ..

    Your solution sounds interesting, but I’m still confuse about locking a user in their own directory. Probably by using Jailkit?

  5. Thijs Burema said,

    August 22, 2008 @ 5:00 pm

    Nice a mod_rails plugin nginx !
    apache is very slow..

  6. Hongli said,

    August 22, 2008 @ 5:01 pm

    @Sikachu: I don’t lock a user. I just give him his own user account, that’s it.

  7. lix said,

    August 22, 2008 @ 5:39 pm

    Sikachu!

    what do you mean?

    lix@test:home$pwd
    /home
    lix@test:home$ls
    ls: .: Permission denied
    lix@test:home$

  8. Sikachu! said,

    August 22, 2008 @ 6:17 pm

    ok ok my bad then ..
    probably i need to learn more :P

  9. Thijs Burema said,

    August 23, 2008 @ 8:23 am

    @lix

    is /home always by default root : root /home/$_youruser

    @toppic hax !! dude

  10. Matthijs Langenberg said,

    August 24, 2008 @ 9:26 am

    There is a small error in your nginx configuration: a GET to an existing file will directly be served from the filesystem, but a HEAD will still be proxied to Apache.
    Do you guys also have an automated way of setting up another apache installation for a new user which automatically generates the nginx vhost with the correct port number?

  11. Hongli said,

    August 24, 2008 @ 9:49 am

    @Matthijs Langenberg: ah you’re correct. :) Too bad Nginx doesn’t seem to support boolean operators, so I have to put multiple if-statements in the config file. :(

    And yes, we have an automated way of setting up another Apache installation. It’s a script which asks the user what his domain name is, then setup a few Apache startup scripts and an Apache config file. It’s automatically assigned to a random port number.

  12. Chu Yeow said,

    August 24, 2008 @ 1:48 pm

    Aww now I feel embarrassed - never thought about it in terms of shared hosting because I’ve never had the need to deal with that. And I was so sure and excited too :p. But it really makes sense (now) to run Nginx on top of Apache+Passenger, since that’s one of the main problems Passenger is trying to solve.

  13. lix said,

    August 25, 2008 @ 1:30 pm

    Thijs Burema

    and?

    chmod 777 /home … :)

  14. Eric Wong said,

    August 26, 2008 @ 3:36 am

    Actually, proxying most application servers behind nginx will get you a
    performance /increase/ in real-world usage.

    The key point to understand is:

    concurrency at the network layer != concurrency at the application layer

    Tying those two together is a common mistake I see people making
    (I don’t see a lot of writings on it, either).

    All HTTP clients are always “slow” in relation to nginx $app_server;
    especially those with slow upstreams. Slow clients hog up $app_server
    processes by making them wait on I/O (making HTTP requests, keepalive
    sockets, lingering close…).

    nginx fully buffers a request before sending it to $app_server; this
    means $app_server spends less time (almost none) waiting on network I/O.
    So despite the extra data copy; $app_server resources are better used
    doing application processing.

    This problem actually applies less to mongrel than it does to Apache
    because mongrel does I/O to clients asynchronously and independently of
    the actual application processing. Reducing I/O time for Mongrels is
    still a win because lightweight green threads are still heavier than the
    accounting structures that nginx uses to manage sockets.

    Apache has mpm_event which should fix things and nullify the benefits
    nginx provides, but I’m not sure if it ever matured to the point of
    usability. Converting an existing codebase like Apache to use events +
    non-blocking I/O is a huge task and a single broken module can ruin your
    day (the same way that C extensions like mysql can break Ruby threading).

  15. Nome do Jogo » Blog Archive » Rails Podcast Brasil - Episódio 29 said,

    August 26, 2008 @ 6:38 am

    […] Re: Strange HTTP header? […]

  16. Sikachu! said,

    August 26, 2008 @ 7:58 am

    OK, here is the question that I got from this method.
    If I use separate Apache installation for each user, then I use passenger with Ruby Enterprise Edition.
    So, if I give each user 2 maximum passenger instances, will the memory usage be decreased in the same percentage as using only 1 Apache with 4 maximum instances?

    :)

  17. Hongli said,

    August 26, 2008 @ 10:24 am

    @Sikachu: yes, each Passenger instance will use about 33% less memory. Of course now you’ll have to deal with twice the memory overhead of Apache, but that can be minimized to only several MB if you tweak things properly.

    Another thing to be aware of is that the two Passenger instances won’t be able to share framework code with each other. Suppose that your 4 applications use the following Rails frameworks:
    App 1: Rails 1.2.6 — allocated to Passenger instance A
    App 2: Rails 1.2.6 — allocated to Passenger instance B
    App 3: Rails 2.0.2 — allocated to Passenger instance A
    App 4: Rails 2.1.0 — allocated to Passenger instance B

    Although app 1 and app 2 use the same Rails framework, they won’t be able to share the framework memory with each other because they’re running on different Passenger instances.

    But if your Apache’s only job is to use serve Passenger-powered applications, then you don’t need to sandbox is to an unprivileged user. Passenger supports privilege lowering out-of-the-box. In that case I’d just run Apache as root so that Passenger can lower the privilege of a Rails application to its corresponding owner.

RSS feed for comments on this post · TrackBack URI

Leave a Comment