If you’re running a Docker container on a Docker network that should _normally_ have internet access, but doesn’t (for whatever reason, see next paragraph for an example), you might find that DNS lookups in that Docker container will be very, very slow. If the DNS lookup “freezes” your program (prevents your program from serving further requests for a short while, etc.), this can be very inconvenient. (For example, and this is how I noticed the problem: if you’re ssh’ing into a container using dynamic port-forwarding to access other containers, every DNS lookup will freeze your ssh connection.)
In my case, I’m running a (prototype? beta?) test environment that is generally not supposed to connect to the internet to avoid “accidentally” doing silly things in production. However, a few sites have to be whitelisted, and whitelisting has to be done on a DNS basis. If you have similar needs, the solution here might help you. Though it’s hacky.
Diving in
I decided to dive in and see if I can change this behavior at all. Normal DNS failure time:
$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error
real 0m0.009s
user 0m0.005s
sys 0m0.000s
In a Docker container:
$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error
real 0m20.599s
user 0m0.010s
sys 0m0.010s
When you don’t have internet access, your host system will in most cases lack a ‘default’ root in the output of ‘ip route’. However, your containers don’t know anything about your host system’s routing tables and your container’s namespace will still have the default route.
Note: Docker uses the host’s /etc/resolv.conf to figure out where to forward DNS requests to, and if /etc/resolv.conf doesn’t specify any servers, Docker will use 8.8.4.4 and/or 8.8.8.8 by default. You can override the default in /etc/docker/daemon.json. (I believe you have to restart Docker after changing the file, sending a kill -HUP will reload some settings specified in the file, but not this one AFAICT). Anyway, the below examples will all show 8.8.4.4 or 8.8.8.8.
When you do curl tired.com in a Docker container, it will send this DNS request to Docker’s internal DNS resolver, as specified in the container‘s /etc/resolv.conf:
nameserver 127.0.0.11
One easy thing we can do is add the following to the container’s /etc/resolv.conf. This speeds things up quite a bit:
options timeout:1 attempts:1
$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error
real 0m2.541s
user 0m0.010s
sys 0m0.014s
(The following implementation details were gathered from stracing the dockerd process, and might change in the future.) This nameserver runs in the dockerd process, but the dockerd process switches to the container’s network name space before forwarding the request:
# strace -vvvttf -p $dockerd_pid
2124 03:18:52.686952 openat(AT_FDCWD, "/var/run/docker/netns/dd995925297c", O_RDONLY) = 20
2124 03:18:52.686999 setns(20, CLONE_NEWNET) = 0
2124 03:18:52.687058 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 21
2124 03:18:52.687088 setsockopt(21, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
2124 03:18:52.687119 connect(21, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
Tangent: here’s one way to ascertain which namespace this is:
# nsenter --net=/var/run/docker/netns/dd995925297c ip a
...
inet 172.18.0.13/16 brd 172.18.255.255 scope global eth1
...
Then you can just issue docker inspect or docker network inspect commands to figure out which container this IP belongs to.
And back: if you do the same UDP connection with the same parameters on the host system, DNS requests fail straight away, because the kernel is sensible enough to notice that there is no route to this host:
08:06:25.289709 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
08:06:25.289858 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
08:06:25.289981 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 ENETUNREACH (Network is unreachable)
So one thing I thought we could do is to force this connect call to fail. I thought one way might be to get Docker to use TCP to connect to DNS servers. This can be accomplished in /etc/docker/daemon.json like this:
{
"dns-opt": "use-vc"
}
Unfortunately that didn’t help because of the SOCK_NONBLOCK flag. The strace now looks like this:
1580 18:55:02.447528 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 34
1580 18:55:02.447571 setsockopt(34, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
1580 18:55:02.447612 connect(34, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 EINPROGRESS (Operation now in progress)
(Note: adding attempts:1 timeout:1 to dns-opt didn’t seem to have an effect.)
Furthermore, removing the default route doesn’t help either. curl still blocks for ~2.526 seconds. Docker keeps retrying even if it gets ENETUNREACH:
# nsenter --net=/var/run/docker/netns/dd995925297c ip route del default
19:39:44.614836 connect(21, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 ENETUNREACH (Network is unreachable)
So up to now we have avoided reading Docker’s source code (it’s written in a weird, foreign language), but at this point we have no choice but to take a look. Perhaps my assumptions were wrong and Docker’s DNS is actually noticing there is a problem, but just retries no matter what. So the first thing we do is enable debug logs just so we can hopefully get some messages that are output close to the code we have to look at it. We just need to set /etc/docker/daemon.json and reload (kill -HUP) dockerd and run journalctl -xeu docker
{
"debug": true
}
Here are some logs we got after we removed our default route:
20-04-05T19:59:29.285616784+09:00" level=warning msg="[resolver] connect failed: dial udp 8.8.8.8:53: connect: network is unreachable"
20-04-05T19:59:29.285646884+09:00" level=warning msg="[resolver] connect failed: dial udp 8.8.4.4:53: connect: network is unreachable"
And here are some logs we got with the default route still existing:
20-04-05T19:30:14.118994138+09:00" level=debug msg="[resolver] read from DNS server failed, read udp 172.19.0.2:40693->8.8.8.8:53: i/o timeout"20-04-05T19:30:17.115909962+09:00" level=debug msg="[resolver] read from DNS server failed, read udp 172.19.0.2:55315->8.8.4.4:53: i/o timeout"
Searching the docker code takes us straight to the ServeDNS function in docker-ce/components/engine/vendor/github.com/docker/libnetwork/resolver.go. (I did a fresh clone today; the newest commit in my git log is 92768b32964e3037e520ab8e74fe190c39f4c83d. The code may look different in your version.)
So we’ve got a long for loop on line 432 that has some breaks and some continues and uses a couple hard-coded constants here and there.
maxExtDNS = 3 //max number of external servers to try
extIOTimeout = 4 * time.Second
...
for i := 0; i < maxExtDNS; i++ {
One thing we could do is look for a workaround, i.e. anything that will make this function terminate earlier. Unfortunately, I didn’t have any success on that front. So it looks like we currently have no choice but to set up our own DNS server just to prevent Docker from freezing our ssh connections (or whatever software is being frozen in your case).
Setting up our own DNS server
Unfortunately, we can’t just run dnsmasq on the host and make it listen on 127.0.0.1. 127.0.0.1 obviously means something different inside a container than it does on the host. We generally can’t use any of the other IPs the host machine may have either — Docker most likely adds iptables rules that block containers from communicating with those IPs. (Though we could manually delete these iptables rules.)
So one way around this problem is to run a Docker container that runs dnsmasq. Unfortunately, this can get a bit messy — we need to use a static IP on that container, and we need to configure our host to use that IP in its /etc/resolv.conf.
One kind of nice — though very hacky — solution I came up with is to add a network and two containers that pretend to be 8.8.4.4/8.8.8.8 (or one container and two networks).
### fake-dns/Dockerfile
FROM centos
RUN yum -y install dnsmasq
ENTRYPOINT /usr/sbin/dnsmasq --server=/qiqitori.com/1.1.1.1 -R
$ docker build --tag fake-dns fake-dns
$ docker network create fake-dns-net --subnet 8.8.0.0/16
$ docker run -d --network fake-dns-net -it --ip 8.8.4.4 fake-dns
$ docker run -d --network fake-dns-net -it --ip 8.8.8.8 fake-dns
$ docker connect fake-dns-net existing-container
Connecting your already running containers that you need to fix DNS on to fake-dns-net will make DNS requests immediately get NXDOMAIN responses from our fake 8.8.4.4 and 8.8.8.8 servers, and will take care of any freezing issues you may have, all while DNS requests for other container names will still work as normal. (In this example, qiqitori.com is whitelisted and will be forwarded to Cloudflare’s 1.1.1.1 DNS servers. The -R flag on the dnsmasq command means that dnsmasq will ignore any servers listed in /etc/resolv.conf)
Conclusions
In my opinion, Docker’s automatic fallback to 8.8.4.4/8.8.8.8 isn’t the greatest Docker feature to say the least. Perhaps there should be a way to tell Docker not to fall back, and to send NXDOMAIN when it can’t answer anything by itself.
Unfortunately, this “hack” is likely not that future-proof. Future versions of Docker could for example expand or change the list of fallback servers, and this hack would have to be adapted to Docker’s changes. However, explicitly specifying { dns: [“8.8.4.4″, ” 8.8.8.8″] } in /etc/docker/daemon.json could perhaps take care of this problem. (I haven’t actually tested that.)