[afnog] Troubleshooting Network Broadcast

Eliufoo C. Mahinda venomius at yahoo.com
Sat Aug 8 05:41:58 UTC 2009


Hi Team,
 
Thanks for input and recommendation.
 
I'm running Cisco Catalsty switches configured with snmp for monitoring on both Cacti and Solarwind.
 
The following actions have been taken.
 
1. Monitoring all uplink ports with Cacti and Solarwinds
2. Configured SPAN on the main uplink port and using Wireshark to sniff packets.
3. Re-studying the network topology.
4. Viewing the STP setup.
 
The broadcast has sneezed at the moment, but all these tools and settings are in place. When it reoccurs, we'll be able to capture more information.
 
Thanks,
Elly 


--- On Fri, 8/7/09, Chris Wilson <chris+afnog at aptivate.org> wrote:


From: Chris Wilson <chris+afnog at aptivate.org>
Subject: Re: [afnog] Troubleshooting Network Broadcast
To: "Eliufoo C. Mahinda" <venomius at yahoo.com>
Cc: "Afnog Mailing List" <afnog at afnog.org>
Date: Friday, August 7, 2009, 10:12 PM


(apologies to anyone who receives this message twice; I posted from the wrong address the first time and haven't yet received the email from the listserver with the option to cancel my previous post).

Hi Elly,

On Fri, 7 Aug 2009, Eliufoo C. Mahinda wrote:

> Of recent, we have been experiancing intermetmet network broadcast which criples the overall perfomance of network and causes some for the application to stop function properly or sneeze to work completely.
> 
> Monitoring the devices using Solarwind, we can see several switch ports interfaces with high outbound (Tx) traffic and minimum inbound(Rx). However, we can't ping point the actually node that is causing the broadcast.
> 
> Can anyone recommend the best practise in troubleshooting such a scenario?

I'd recommend this procedure:

Get a complete network diagram. Make sure it's up to date and accurate.

Mark clearly the links which have high traffic during the storms (the ones connected to the ports that you identified).

Mark clearly every single loop (complete circuit) in the network (any way that you can get from a point A back to the same point A without retracing your steps). There should be few or none of these. Every single one should be well known, carefully monitored, and exist for reasons of redundancy or link aggregation that were carefully weighed against the risk of broadcast storms.

If any of the high volume links overlap with any of the loops, check that at least one device in the loop has STP enabled and functioning to detect and bring down redundant links and avoid broadcast loops. If necessary, break the loops by removing all redundant connections (leaving only one of any redundant group intact).

For each of the high traffic links which is not part of any loop, install monitoring (e.g. cacti) on every switch upstream of that link (every switch that may feed traffic into the TX line with high utilisation).

Watch what happens when the next storm occurs. Trace it back upstream from there. If you end up tracing around in a circle, you've found a loop that wasn't on your network diagram, so go back to step 1. Otherwise you've found the source of the problem.

Cheers, Chris.
-- Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://afnog.org/pipermail/afnog/attachments/20090807/31076a71/attachment.html>


More information about the afnog mailing list