[afnog] Resolver issues

Brian Candler B.Candler at pobox.com
Mon Apr 10 17:13:40 EAT 2006


On Mon, Apr 10, 2006 at 04:58:26PM +0300, Michuki Mwangi wrote:
> from the caching server all seems well;
> 
> $ dig +norec @ns1.swiftkenya.com kenic.or.ke ns
> 
> ; <<>> DiG 9.3.1 <<>> +norec @ns1.swiftkenya.com kenic.or.ke ns
> ; (1 server found)
> ;; global options:  printcmd
...
>  From the server that performs the dig queries.
> 
> www# dig @ns1.swiftkenya.com kenic.or.ke ns
> 
> ; <<>> DiG 9.3.1 <<>> @ns1.swiftkenya.com kenic.or.ke ns
> ; (1 server found)
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
...
> Both servers are on the same network subnet and share the same gateway.

I can think of a few possibilities.

(1) Your machine 'www' isn't able to resolve ns1.swiftkenya.com, which it
has to do before it can send a packet there. To eliminate this possibility,
try

www# dig @80.240.192.7 kenic.or.ke. ns

instead.

(2) Networking. The UDP packet from your 'www' server to ns1.swiftkenya.com
isn't getting there, or the response isn't getting back. I'd use tcpdump to
check this. On the www server, run

   tcpdump -i eth0 -n -s1500 -X host 80.240.192.7

and on 80.240.192.7, run

   tcpdump -i eth0 -n -s1500 -X host x.x.x.x

where x.x.x.x is the IP address of server 'www'

Maybe some packet filters have been applied somewhere which are blocking UDP
port 53. Talk to your networking people. Also, look for "ICMP
administratively prohibited" packets in the tcpdump. If you see them, the
source IP address will tell you which router is blocking the packets.

(3) Something to do with your multi-views config on ns1.swiftkenya.com,
which means that it will accept queries from itself but not from x.x.x.x

Looking at log files may help you here. Check whether any config files have
changed recently.

> When i use the dig +trace -x to follow the process this is what i get
> 
> www# dig +trace -x @ns1.swiftkenya.com kenic.or.ke ns
> .                       516104  IN      NS      G.ROOT-SERVERS.NET.
> .                       516104  IN      NS      H.ROOT-SERVERS.NET.
> .                       516104  IN      NS      I.ROOT-SERVERS.NET.
> .                       516104  IN      NS      J.ROOT-SERVERS.NET.
> .                       516104  IN      NS      K.ROOT-SERVERS.NET.
> .                       516104  IN      NS      L.ROOT-SERVERS.NET.
> .                       516104  IN      NS      M.ROOT-SERVERS.NET.
> .                       516104  IN      NS      A.ROOT-SERVERS.NET.
> .                       516104  IN      NS      B.ROOT-SERVERS.NET.
> .                       516104  IN      NS      C.ROOT-SERVERS.NET.
> .                       516104  IN      NS      D.ROOT-SERVERS.NET.
> .                       516104  IN      NS      E.ROOT-SERVERS.NET.
> .                       516104  IN      NS      F.ROOT-SERVERS.NET.
> ;; Received 420 bytes from 198.32.67.19#53(198.32.67.19) in 2 ms
> 
> ;; connection timed out; no servers could be reached

Strange. Does look like DNS has been blocked somehow.

> 16:53:52.905281 www.1679 > ole.domain: [bad udp cksum a373!]  52691+ A? 
> H.ROOT-SERVERS.NET. (36) (ttl 64, id 4094, len 64)

Aha. That's *very* suspicious. UDP packets with bad checksums will be
silently dropped by your TCP stack. This is a very strange and rare
occurrence.

Using tcpdump at both ends of the connection, or on a third machine hanging
off a hub (not switch) in between, you can work out whether the packet has
been sent with a bad checksum, or was corrupted in transit, or received with
a bad checksum.

As for what's corrupting it, I would first suspect the NIC in your host
'www', and the ethernet switch or switch port it is uplinked to. It seems
too consistent to be a cabling problem.

You can plug host 'www' into a different switch port, and swap its NIC.

If you are using a cheap NE2000-clone NIC, then you deserve everything you
get :-)

HTH,

Brian.



More information about the afnog mailing list