[afnog] Troubleshooting Network Broadcast

Chris Wilson chris+afnog at aptivate.org
Fri Aug 7 17:42:33 UTC 2009


(apologies to anyone who receives this message twice; I posted from the 
wrong address the first time and haven't yet received the email from the 
listserver with the option to cancel my previous post).

Hi Elly,

On Fri, 7 Aug 2009, Eliufoo C. Mahinda wrote:

> Of recent, we have been experiancing intermetmet network broadcast which 
> criples the overall perfomance of network and causes some for the 
> application to stop function properly or sneeze to work completely.
> 
> Monitoring the devices using Solarwind, we can see several switch ports 
> interfaces with high outbound (Tx) traffic and minimum inbound(Rx). 
> However, we can't ping point the actually node that is causing the 
> broadcast.
> 
> Can anyone recommend the best practise in troubleshooting such a scenario?

I'd recommend this procedure:

Get a complete network diagram. Make sure it's up to date and accurate.

Mark clearly the links which have high traffic during the storms (the ones 
connected to the ports that you identified).

Mark clearly every single loop (complete circuit) in the network (any way 
that you can get from a point A back to the same point A without retracing 
your steps). There should be few or none of these. Every single one should 
be well known, carefully monitored, and exist for reasons of redundancy or 
link aggregation that were carefully weighed against the risk of broadcast 
storms.

If any of the high volume links overlap with any of the loops, check that 
at least one device in the loop has STP enabled and functioning to detect 
and bring down redundant links and avoid broadcast loops. If necessary, 
break the loops by removing all redundant connections (leaving only one of 
any redundant group intact).

For each of the high traffic links which is not part of any loop, install 
monitoring (e.g. cacti) on every switch upstream of that link (every 
switch that may feed traffic into the TX line with high utilisation).

Watch what happens when the next storm occurs. Trace it back upstream from 
there. If you end up tracing around in a circle, you've found a loop that 
wasn't on your network diagram, so go back to step 1. Otherwise you've 
found the source of the problem.

Cheers, Chris.
-- 
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.



More information about the afnog mailing list