[afnog] Clustering Exim
regnauld at nsrc.org
Thu Feb 25 16:06:27 UTC 2010
Mike Barnard (mike.barnardq) writes:
> can provide?
> I'm looking at having an Active-Passive cluster whose message queues are
> replicated across each node in the cluster. The active node would copy its
> message queues over to the passive node or the passive node would copy the
> queues from the active node so that incase of a failure on the active node,
> the passive node can take over processing email.
I'm not sure that's very easy to implement, or even desirable.
You want a redundant mail routing setup, on which mails can't
get "stuck" if you lose power, crash, etc... ?
If you're going to pull something like that off, you'd better start
looking at Linux + DRBD + LVS, or FreeBSD + GeomGate + CARP.
Any other form of non-synchronous replication will leave you with
potentially 1 or more mails stuck on the currently active system.
Alternatively, you need to modify the Exim code to ecompass the
entire transaction, meaning that you do not return a 25x SMTP code
until after the mail is written to local queue, fsync() has been called
AND you have gotten confirmation that the mail file has also been copied
to the remote "backup" queue location on the passive server.
Then if the master goes down, the slave must detect that (heartbeat)
and move the queue from the backup location to the primary, and reload
Exim (or whatever is needed).
But when the master recovers, you need to find out if it has lost any
mails in the process (after all, you don't know if the outage was
a loss of connectivity, or a crash). Either way, you'll need to setup
some sort of mechanism to avoid duplicate mails.
My conclusion would be: don't bother, just use multiple identical
servers, and load balance. Have yet to lose a mail on setups like this
(~ 1.000.000 mails / day, for the past 8 years or so).
More information about the afnog