Frans Pop: The case of the self-perpetuating DNS-errors
| February 28th, 2009Ingredients:
- some lame DNS server
- logcheck
- spamassassin
The last couple of days I've been plagued by some DNS errors that kept showing up in the logcheck mails for my home server which I was busy migrating from one box to another, doing an upgrade from etch/i386 to lenny/amd64 at the same time. So, plenty of stuff going on to confuse the issue.
I kept getting the following messages every hour (anonymized):
named: connection refused resolving 'somedomain.org/NS/IN': xxx.yyy.zzz.nnn#53
named: connection refused resolving 'somedomain.org/NS/IN': xxx.yyy.zzz.mmm#53
named: connection refused resolving 'ns1.somedomain.org/AAAA/IN': xxx.yyy.zzz.mmm#53
named: connection refused resolving 'ns2.somedomain.org/AAAA/IN': xxx.yyy.zzz.mmm#53
named: connection refused resolving 'ns1.somedomain.org/AAAA/IN': xxx.yyy.zzz.nnn#53
named: connection refused resolving 'ns2.somedomain.org/AAAA/IN': xxx.yyy.zzz.nnn#53
The times were fairly regular: once just before the hour, most 2 minutes after. I fetch mail at around that time, but also at other times, so possible but unlikely. The 2 minutes after was the first real clue: some cron job maybe? After disabling logcheck the messages no longer appeared in the log. Enable it again, and they were back.
Additional confusion was caused by the fact that the domain had "debian" in its name, but it was somewhere obscure. So why was logcheck causing a lookup for that domain? This did confuse me enough to waste some time looking for some silly weird (default) configuration problem in some package.
Enter spamassassin. Apparently that was parsing the message body, recognized "somedomain.org" as a host name, and proceded to do a DNS lookup as validity check.
So we have the following loop, started off by something causing an initial DNS lookup for the domain, which fails and gets logged:
- logcheck reports the failure during its next check
- spamassassin processes the logcheck mail, spots the domain name and does a new set of lookups, which fail and get logged
- logcheck reports the failures during its next check
- ...
Duh.
I remember struggling with probably the same problem a couple of years ago, but then it was a lot more severe: masses of repeating DNS errors for obscure domains. At that time I failed to get to the bottom of it and ended up just ignoring the errors by adding the following option in my bind9 configuration:
logging { category lame-servers { null; };
};
Anyway, now I just no longer pass logcheck mails through spamassassin. (Although filtering out these DNS errors in bind9 can be perfectly valid.)