A simple netcat-based DNS server that returns NXDOMAIN on everything

sudo ncat -i1 -k -c "perl -e 'read(STDIN, \$dns_input, 2); \$dns_id = pack \"a2\", \$dns_input; print \"\$dns_id\x81\x83\x00\x00\x00\x00\x00\x00\x00\x00\";'" -u -vvvvvv -l 127.2.3.4 53
  • A DNS request contains two random bytes at the beginning that have to appear in the first two bytes in the response.
  • The DNS flags for an NXDOMAIN response are 0x81 0x83
  • The rest of the bytes can be 0, which mostly means that we have zero other sections in our response
  • The below example uses nmap-ncat, as found in Red Hat-based distributions, but can also be installed on Debian-based distributions (apt-get install ncat)
  • -i1 causes connections to be discarded after 1 second of idle time (optional)
  • -k means that we can accept more than one connection
  • -c means that whatever we get from the other side of the connection gets piped to a perl process running in a shell process (maybe -e is the same in this case)
  • -u means UDP (leaving this away should work if you do DNS over TCP)
  • -vvvvvv means that we can see what’s happening (optional)
  • -l means that we’re listening rather than sending, on 127.2.3.4, port 53
  • read(STDIN, $dns_input, 2) # read exactly two bytes from STDIN
  • $dns_id = pack “a2”, $dns_input # two bytes of arbitrary random data from $dns_input will be put into $dns_id
  • print “$dns_id\x81\x83\x00\x00\x00\x00\x00\x00\x00\x00” # sends $dns_id, NXDOMAIN, and zeros as described above to the other side
  • Note: I didn’t really test this beyond the proof-of-concept stage. If anything’s iffy, feel free to let me know.

Slow DNS in Docker containers without internet connection

If you’re running a Docker container on a Docker network that should _normally_ have internet access, but doesn’t (for whatever reason, see next paragraph for an example), you might find that DNS lookups in that Docker container will be very, very slow. If the DNS lookup “freezes” your program (prevents your program from serving further requests for a short while, etc.), this can be very inconvenient. (For example, and this is how I noticed the problem: if you’re ssh’ing into a container using dynamic port-forwarding to access other containers, every DNS lookup will freeze your ssh connection.)

In my case, I’m running a (prototype? beta?) test environment that is generally not supposed to connect to the internet to avoid “accidentally” doing silly things in production. However, a few sites have to be whitelisted, and whitelisting has to be done on a DNS basis. If you have similar needs, the solution here might help you. Though it’s hacky.

Diving in

I decided to dive in and see if I can change this behavior at all. Normal DNS failure time:

$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error

real 0m0.009s
user 0m0.005s
sys 0m0.000s

In a Docker container:

$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error

real 0m20.599s
user 0m0.010s
sys 0m0.010s

When you don’t have internet access, your host system will in most cases lack a ‘default’ root in the output of ‘ip route’. However, your containers don’t know anything about your host system’s routing tables and your container’s namespace will still have the default route.

Note: Docker uses the host’s /etc/resolv.conf to figure out where to forward DNS requests to, and if /etc/resolv.conf doesn’t specify any servers, Docker will use 8.8.4.4 and/or 8.8.8.8 by default. You can override the default in /etc/docker/daemon.json. (I believe you have to restart Docker after changing the file, sending a kill -HUP will reload some settings specified in the file, but not this one AFAICT). Anyway, the below examples will all show 8.8.4.4 or 8.8.8.8.

When you do curl tired.com in a Docker container, it will send this DNS request to Docker’s internal DNS resolver, as specified in the container‘s /etc/resolv.conf:

nameserver 127.0.0.11

One easy thing we can do is add the following to the container’s /etc/resolv.conf. This speeds things up quite a bit:

options timeout:1 attempts:1
$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error

real 0m2.541s
user 0m0.010s
sys 0m0.014s

(The following implementation details were gathered from stracing the dockerd process, and might change in the future.) This nameserver runs in the dockerd process, but the dockerd process switches to the container’s network name space before forwarding the request:

# strace -vvvttf -p $dockerd_pid
2124 03:18:52.686952 openat(AT_FDCWD, "/var/run/docker/netns/dd995925297c", O_RDONLY) = 20
2124 03:18:52.686999 setns(20, CLONE_NEWNET) = 0
2124 03:18:52.687058 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 21
2124 03:18:52.687088 setsockopt(21, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
2124 03:18:52.687119 connect(21, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0

Tangent: here’s one way to ascertain which namespace this is:

# nsenter --net=/var/run/docker/netns/dd995925297c ip a
...
inet 172.18.0.13/16 brd 172.18.255.255 scope global eth1
...

Then you can just issue docker inspect or docker network inspect commands to figure out which container this IP belongs to.

And back: if you do the same UDP connection with the same parameters on the host system, DNS requests fail straight away, because the kernel is sensible enough to notice that there is no route to this host:

08:06:25.289709 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
08:06:25.289858 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
08:06:25.289981 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 ENETUNREACH (Network is unreachable)

So one thing I thought we could do is to force this connect call to fail. I thought one way might be to get Docker to use TCP to connect to DNS servers. This can be accomplished in /etc/docker/daemon.json like this:

{
"dns-opt": "use-vc"
}

Unfortunately that didn’t help because of the SOCK_NONBLOCK flag. The strace now looks like this:

1580  18:55:02.447528 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 34
1580 18:55:02.447571 setsockopt(34, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
1580 18:55:02.447612 connect(34, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 EINPROGRESS (Operation now in progress)

(Note: adding attempts:1 timeout:1 to dns-opt didn’t seem to have an effect.)

Furthermore, removing the default route doesn’t help either. curl still blocks for ~2.526 seconds. Docker keeps retrying even if it gets ENETUNREACH:

# nsenter --net=/var/run/docker/netns/dd995925297c ip route del default
19:39:44.614836 connect(21, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 ENETUNREACH (Network is unreachable)

So up to now we have avoided reading Docker’s source code (it’s written in a weird, foreign language), but at this point we have no choice but to take a look. Perhaps my assumptions were wrong and Docker’s DNS is actually noticing there is a problem, but just retries no matter what. So the first thing we do is enable debug logs just so we can hopefully get some messages that are output close to the code we have to look at it. We just need to set /etc/docker/daemon.json and reload (kill -HUP) dockerd and run journalctl -xeu docker

{
"debug": true
}

Here are some logs we got after we removed our default route:

20-04-05T19:59:29.285616784+09:00" level=warning msg="[resolver] connect failed: dial udp 8.8.8.8:53: connect: network is unreachable"
20-04-05T19:59:29.285646884+09:00" level=warning msg="[resolver] connect failed: dial udp 8.8.4.4:53: connect: network is unreachable"

And here are some logs we got with the default route still existing:

20-04-05T19:30:14.118994138+09:00" level=debug msg="[resolver] read from DNS server failed, read udp 172.19.0.2:40693->8.8.8.8:53: i/o timeout"20-04-05T19:30:17.115909962+09:00" level=debug msg="[resolver] read from DNS server failed, read udp 172.19.0.2:55315->8.8.4.4:53: i/o timeout"

Searching the docker code takes us straight to the ServeDNS function in docker-ce/components/engine/vendor/github.com/docker/libnetwork/resolver.go. (I did a fresh clone today; the newest commit in my git log is 92768b32964e3037e520ab8e74fe190c39f4c83d. The code may look different in your version.)

So we’ve got a long for loop on line 432 that has some breaks and some continues and uses a couple hard-coded constants here and there.

maxExtDNS       = 3 //max number of external servers to try
extIOTimeout = 4 * time.Second
...
for i := 0; i < maxExtDNS; i++ {

One thing we could do is look for a workaround, i.e. anything that will make this function terminate earlier. Unfortunately, I didn’t have any success on that front. So it looks like we currently have no choice but to set up our own DNS server just to prevent Docker from freezing our ssh connections (or whatever software is being frozen in your case).

Setting up our own DNS server

Unfortunately, we can’t just run dnsmasq on the host and make it listen on 127.0.0.1. 127.0.0.1 obviously means something different inside a container than it does on the host. We generally can’t use any of the other IPs the host machine may have either — Docker most likely adds iptables rules that block containers from communicating with those IPs. (Though we could manually delete these iptables rules.)

So one way around this problem is to run a Docker container that runs dnsmasq. Unfortunately, this can get a bit messy — we need to use a static IP on that container, and we need to configure our host to use that IP in its /etc/resolv.conf.

One kind of nice — though very hacky — solution I came up with is to add a network and two containers that pretend to be 8.8.4.4/8.8.8.8 (or one container and two networks).

### fake-dns/Dockerfile
FROM centos
RUN yum -y install dnsmasq
ENTRYPOINT /usr/sbin/dnsmasq --server=/qiqitori.com/1.1.1.1 -R
$ docker build --tag fake-dns fake-dns
$ docker network create fake-dns-net --subnet 8.8.0.0/16
$ docker run -d --network fake-dns-net -it --ip 8.8.4.4 fake-dns
$ docker run -d --network fake-dns-net -it --ip 8.8.8.8 fake-dns

$ docker connect fake-dns-net existing-container

Connecting your already running containers that you need to fix DNS on to fake-dns-net will make DNS requests immediately get NXDOMAIN responses from our fake 8.8.4.4 and 8.8.8.8 servers, and will take care of any freezing issues you may have, all while DNS requests for other container names will still work as normal. (In this example, qiqitori.com is whitelisted and will be forwarded to Cloudflare’s 1.1.1.1 DNS servers. The -R flag on the dnsmasq command means that dnsmasq will ignore any servers listed in /etc/resolv.conf)

Conclusions

In my opinion, Docker’s automatic fallback to 8.8.4.4/8.8.8.8 isn’t the greatest Docker feature to say the least. Perhaps there should be a way to tell Docker not to fall back, and to send NXDOMAIN when it can’t answer anything by itself.

Unfortunately, this “hack” is likely not that future-proof. Future versions of Docker could for example expand or change the list of fallback servers, and this hack would have to be adapted to Docker’s changes. However, explicitly specifying { dns: [“8.8.4.4″, ” 8.8.8.8″] } in /etc/docker/daemon.json could perhaps take care of this problem. (I haven’t actually tested that.)

Bash: how to put command output/file contents on the command line

In certain situations, you may want to have a command that you e.g. saved in a file on your command line before executing it. As I didn’t find an answer straight away, here’s a quick-and-dirty way to do it that executes a command when you press a keyboard shortcut:

bind -x '"\C-g":"READLINE_LINE=$(cat dnsmasq_command)"'

This will put the contents of the file named ‘dnsmasq_command’ on your command line when you press Control-G.

And while we’re at it, while it’s got nothing to do with this article, here’s a binding that puts a time stamp (like 20200408022000, no, not a palindrome) on your command line:

bind -x '"\C-g":"READLINE_LINE=${READLINE_LINE:0:$READLINE_POINT}$(date +%Y%m%d%H%M00)${READLINE_LINE:$READLINE_POINT}; READLINE_POINT=$((($READLINE_POINT+14)))"'