[afnog] Troubleshooting Network Broadcast
Chris Wilson
chris+afnog at aptivate.org
Fri Aug 7 17:42:33 UTC 2009
(apologies to anyone who receives this message twice; I posted from the
wrong address the first time and haven't yet received the email from the
listserver with the option to cancel my previous post).
Hi Elly,
On Fri, 7 Aug 2009, Eliufoo C. Mahinda wrote:
> Of recent, we have been experiancing intermetmet network broadcast which
> criples the overall perfomance of network and causes some for the
> application to stop function properly or sneeze to work completely.
>
> Monitoring the devices using Solarwind, we can see several switch ports
> interfaces with high outbound (Tx) traffic and minimum inbound(Rx).
> However, we can't ping point the actually node that is causing the
> broadcast.
>
> Can anyone recommend the best practise in troubleshooting such a scenario?
I'd recommend this procedure:
Get a complete network diagram. Make sure it's up to date and accurate.
Mark clearly the links which have high traffic during the storms (the ones
connected to the ports that you identified).
Mark clearly every single loop (complete circuit) in the network (any way
that you can get from a point A back to the same point A without retracing
your steps). There should be few or none of these. Every single one should
be well known, carefully monitored, and exist for reasons of redundancy or
link aggregation that were carefully weighed against the risk of broadcast
storms.
If any of the high volume links overlap with any of the loops, check that
at least one device in the loop has STP enabled and functioning to detect
and bring down redundant links and avoid broadcast loops. If necessary,
break the loops by removing all redundant connections (leaving only one of
any redundant group intact).
For each of the high traffic links which is not part of any loop, install
monitoring (e.g. cacti) on every switch upstream of that link (every
switch that may feed traffic into the TX line with high utilisation).
Watch what happens when the next storm occurs. Trace it back upstream from
there. If you end up tracing around in a circle, you've found a loop that
wasn't on your network diagram, so go back to step 1. Otherwise you've
found the source of the problem.
Cheers, Chris.
--
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES
Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.
More information about the afnog
mailing list