[afnog] design redundant mail servers

Sun Aug 29 15:40:12 UTC 2010

Brian Candler (B.Candler) writes:
> > 
> >    -          MTA: Postfix + Mail Scanner (ClamAV, spam assassin) +
> >    Squiremail + Virtualmin + webmin + Dovecot
> > 
> >    -          SASL Authd
> 
> That's a good start. Presumably by "redundant" you mean "able to continue to
> work in the event of a loss of a major component".

	Side note: MailScanner has been in the past criticized as being somewhat
	unsafe, in particular by Wietse Venema, author of Postfix, for manipulating
	the mail queues directly (bad idea).

	Since then the MailScanner authors have used a different approach, which
	they claim doesn't carry the same risks:

	http://wiki.mailscanner.info/doku.php?id=documentation:configuration:mta:postfix:politics

	Note: I am not a MailScanner user, I personnally prefer using Amavisd
	(http://www.ijs.si/software/amavisd/) which uses straight SMTP to
	communicate with other MTAs, so I may be biased :)

	Now I can comment on Brian's excellent suggestions !

	It has to be noted that the original poster mentioned "redundant mail
	servers", which could mean many things - so we'll assume we're talking
	about redundant routing, scanning, storage and spooling.

	On the routing side, it has to be noted that, "hot standby" and load
	balancer solutions only add complexity to the setup with little benefit,
	as much can be achieved using MX load balancing and short DNS TTLs (more
	on this later).

	On the storage side, HA (high availability) solutions such as Brian
	outlines below is a very good option, if the business demands it.

	Overall, various architectures can be considered - including using
	dedicated servers for inbound and outbound mail routing, splitting
	out routing and scanning from the storage and service hosts, etc...

	Once I get a better idea of the original poster's intent, I can add
	some ASCII diagrams for suggested architectures.

> Here are some options to consider.
> 
> (1) You can buy an NFS server with multiple head-ends for redundancy, and
> store your mail on that in Maildir format.  Then you can have multiple
> front-end boxes all talking to the same mail directories, and a
> load-balancer in front of them.  This scales well horizontonally, as you can
> easily add more frontend boxes for load handling, and more NFS servers by
> putting different customers' mail on different mountpoints (e.g.  /mail/0,
> /mail/1 etc)

	Agreed.  A variation on this approach involves splitting mail storage
	across multiple servers, and using a system like Perdition
	(http://horms.net/projects/perdition/) to dispatch the front-facing
	POP/IMAP servers to the correct backend servers.  This mitigates
	failure scenarios in that losing one server (assuming non-redundant
	NFS servers -- see below) doesn't mean losing mail service for all users.

> I have good experience with Network Applicance NFS servers in this kind of
> role (www.netapp.com) - but they are definitely not cheap.

	And, their politics of suing companies that threaten their business
	model has been anything but community friendly:

	http://www.nexenta.com/corp/blog/2010/07/06/coraid-zfs-netapp-and-nexenta/


> (2) You could consider building your own redundant NFS server. This might
> consist of a shelf of SCSI disks, connected to two PCs each with their own
> SCSI cards. Then you'd use some clustering software to decide which of the
> two front-ends mounts the disks and shares them.

	Alternative: use two hosts with each their storage (more expensive
	of couse), and use something like Linux and DRDB to keep things synced
	over the network - very efficient if you want to aim for a high
	availability solution (http://www.drbd.org/).  But this addresses
	the issue of not having complete storage redundancy (Brian addresses
	this a bit further on).

	Side note: I've had good luck building ZFS storage servers on FreeBSD,
	and replicating hourly ZFS snapshots to keep things in sync, and there
	is a DRDB equivalent in FreeBSD (HAST) but this is not yet in any FreeBSD
	release.

	So far the NFS over DRDB/Linux + LVS would be my recommended choice.

> This sort of cluster is hard to get right. You need to arrange that if one
> machine decides to take over the array, it forcibly disconnects or powers
> down the other one ("STONITH").

	Context: Shoot The Other Node In The Head: once the backup server
	has decided to take over service, it needs to make sure the former
	master doesn't come back up and disrupt and service by attempting
	to take over the master role again.

> If you have two machines mounting the same
> volume, major filesystem corruption is guaranteed (unless you use a
> distributed filesystem like GFS or OCFS2, of which I have no experience). 
> Or you could elect to do the switchover manually in the event of a problem.

	... which is always a sane possibility, if proper monitoring and alarming
	are in place (Nagios or other + SMS notifications).

> Also, your SCSI shelf is still a single point of failure.
> 
> (3) Another way to build redundant storage is with drbd (www.drbd.org), the
> kernel modules for which are included in CentOS 5 I believe.  This is
> basically RAID1 disk mirroring between two PCs, across a LAN.  One side is
> the master (read-write), and the other slave (read only).  To get live-live
> operation you could set up two drbd volumes, so that PC 1 is master for vol1
> and slave for vol2, and vice versa.
> 
> Again, some sort of clustering or heartbeat solution is needed to switch the
> master across to the second PC if the first one fails, with the same
> difficulty in getting it right.

	Yes.

> (4) A very interesting option, which I know is in use at least one large UK
> ISP, is "Gluster" (www.gluster.org).  This lets you build a huge filesystem
> spread across a pool of PCs.  As long as you have two or more PCs, your
> files are replicated across different machines.  It uses elastic hashing to
> distribute your files transparently between the servers, so it's essentially
> self-managing, and can serve files using its own protocol or NFS or CIFS.
> 
> http://www.gluster.com/community/documentation/index.php/Gluster_Storage_Platform#Volume_Manager
> 
> So if you want a fully open-source solution using only commodity hardware,
> this is definitely worth checking out.  Using a two-node Gluster fileserver
> with two mail server nodes and a pair of load-balancers should give you a
> very good level of redundancy and a good expansion route for growth.

	... and that is the next consideration: scalability.  It should be easy
	to grow the solution, addinig more servers to the setup.

	Cheers,
	Phil