[afnog] Scalable, Performance-Critical Web Application Architecture

Thu Sep 2 11:24:38 EAT 2004

On Thu, Sep 02, 2004 at 12:15:57PM +0000, Begumisa Gerald M wrote:
> A Single CGI program:
>   =-=-=-=-=-=-=-=-=-=
> 
> +----------+  IP  +--------------+  ENV +--------------+ IP  +-------+
> |          |----->|              |----->|              |---->|       |
> | Browsers |  IP  | Apache HTTPd |  ENV | CGI in C/C++ | IP  | MySQL |
> |          |<-----|              |<-----|              |<----|       |
> +----------+      +--------------+      +--------------+     +-------+

CGI is a simple interface, and it works well. For concurrent accesses,
Apache will run multiple processes, each of which will fork/exec a CGI.

The major disadvantage here is that you cannot hold open a persistent SQL
connection to the MySQL database, so every page hit which uses the database
will require you to open a fresh MySQL connection, authenticate, perform
operations, and close the connection. There is also a startup overhead for
each fork/exec of the CGI program (relatively small for C, but big for an
interpreted language)

Older versions of 'sqwebmail' used to run in this way.

> B Small, single CGI program + daemon:
>   -=-=-=-=--=-=-=-=-=-=-=-=-=-=-=-=-=

Also works well, but is more difficult to code. Newer versions of
'sqwebmail' work in this way.

If your daemon has any chance of blocking for a long period when handling an
individual request, then it will need to be either forking or multithreaded,
both of which add significant complexity, especially if you want to have a
'pool' of available connections to the Mysql database.

> Personal opinion: B seems to be a better approach but I'd be really happy
> to hear pros / cons or even completely different suggestions of
> architecture or even programming language.

Option C: use 'fastcgi': see http://www.fastcgi.com/, and Apache
mod_fastcgi.

This gives you the best of both worlds. mod_fastcgi starts one or more
copies of your application (it can be dynamic or fixed). Each CGI request is
passed down a socket to one instance of your application, which handles it
and sends back the response. But unlike CGI, your application is persistent;
it has a central main loop which handles one request, sends one reply, then
goes back to wait for another request. (A CGI would terminate after handling
one request). So, before the main loop, it can open one database connection,
and use it to handle each request.

The fastcgi libraries have an API which is *very* similiar to normal CGI. In
fact, you can write a single binary which runs both as a standalone CGI and
in a fastcgi environment, with no changes!

I have used this approach extremely successfully, and you don't even need to
use C/C++ if you don't want; because it eliminates the repeated startup
overhead, even scripting languages like Perl or Ruby work very well.

Another advantage is that you only need to modify Apache once (by installing
mod_fastcgi), and then you can run FastCGI programs written in C, Perl,
Ruby, or whatever, without having to link in any more code to Apache.

Here's some sample code in Ruby using http://raa.ruby-lang.org/project/fcgi/

#!/usr/local/bin/ruby
require 'fcgi'

FCGI.each_cgi do |cgi|
  # Now we have a cgi object which we handle just like a normal CGI
  name = cgi.params['whoami'][0]
  if name
    puts cgi.header
    puts "<html>Hello, #{CGI.escapeHTML(name)}!</html>"
  else
    puts cgi.header
    puts "<html>Enter your name: <form method='POST'><input type='text' name='whoami'></form></html>"
  end
end  

You can run this from the command-line for testing, as a standalone CGI, or
under mod_fastcgi, with no changes.

Regards,

Brian.