[afnog] Clustering Exim

Phil Regnauld regnauld at nsrc.org
Thu Feb 25 16:06:27 UTC 2010


Mike Barnard (mike.barnardq) writes:
> can provide?
> 
> I'm looking at having an Active-Passive cluster whose message queues are
> replicated across each node in the cluster. The active node would copy its
> message queues over to the passive node or the passive node would copy the
> queues from the active node so that incase of a failure on the active node,
> the passive node can take over processing email.

	I'm not sure that's very easy to implement, or even desirable.

	You want a redundant mail routing setup, on which mails can't
	get "stuck" if you lose power, crash, etc... ?

	If you're going to pull something like that off, you'd better start
	looking at Linux + DRBD + LVS, or FreeBSD + GeomGate + CARP.
	
	Any other form of non-synchronous replication will leave you with
	potentially 1 or more mails stuck on the currently active system.

	Alternatively, you need to modify the Exim code to ecompass the
	entire transaction, meaning that you do not return a 25x SMTP code
	until after the mail is written to local queue, fsync() has been called
	AND you have gotten confirmation that the mail file has also been copied
	to the remote "backup" queue location on the passive server.

	Then if the master goes down, the slave must detect that (heartbeat)
	and move the queue from the backup location to the primary, and reload
	Exim (or whatever is needed).

	But when the master recovers, you need to find out if it has lost any
	mails in the process (after all, you don't know if the outage was
	a loss of connectivity, or a crash).  Either way, you'll need to setup
	some sort of mechanism to avoid duplicate mails.

	My conclusion would be: don't bother, just use multiple identical
	servers, and load balance.  Have yet to lose a mail on setups like this
	(~ 1.000.000 mails / day, for the past 8 years or so).




More information about the afnog mailing list