Useful developer tool console one-liners/bookmarklets

Many websites lack useful features, many websites go to great lengths to prevent you from downloading images, etc.

When things get too annoying I sometimes open the developer tools and write a short piece of JavaScript to help me out. Okay, but isn’t it annoying to open developer tools every time? Yes! So right-click your bookmarks toolbar, press the button to add a bookmark, give it an appropriate title, and in the URL, put “javascript:” followed by the JavaScript code. (Most of the following examples already have javascript: prepended to the one-liner.)

Note that you have to encapsulate most of these snippets in an anonymous function, i.e. (function() { … })(). Otherwise your browser might open a page containing whatever value your code snippet returns.

(It is my belief that all of the following code snippets aren’t copyrightable with a clear conscience, as they may constitute the most obvious way to do something. The code snippets can therefore be regarded as public domain, or alternatively, at your option, as published under the WTFPL.)

Note that these snippets are only tested on Firefox. (They should work in Chrome too, though.)

Hacker News: scroll to next top-level comment

Q: How likely is this to stop working if the site gets re-designed?
A: Would probably stop working.

This is a one-liner to jump to the first/second/third/… top-level comment. (Because often the first few threads get very monotonous after a while?) If you don’t see anything happening, maybe there is no other top-level comment on the current page.

document.querySelectorAll("img[src='s.gif'][width='0']")[0].scrollIntoView(true)
document.querySelectorAll("img[src='s.gif'][width='0']")[1].scrollIntoView(true)
document.querySelectorAll("img[src='s.gif'][width='0']")[2].scrollIntoView(true)

javascript:document.querySelectorAll("img[src='s.gif'][width='0']")[1].scrollIntoView(true) # Bookmarklet version. Basically just javascript: added at the beginning.

The “true” parameter in scrollIntoView(true) means that the comment will appear at the top (if possible). Giving false will cause the comment to be scrolled to to appear at the bottom of your screen. Not that the scrolling isn’t perfect; the comment will be half-visible.

It would be useful to be able to remember where we last jumped to, and then jump to that + 1. We can add a global variable for that, and to be reasonably sure it doesn’t clash with an existing variable we’ll give it a name like ‘pkqcbcnll’. If the variable isn’t defined yet, we’ll get an error, so to avoid errors, we’ll use typeof to determine if we need to define the variable or not.

javascript:if (typeof pkqcbcnll == 'undefined') { pkqcbcnll = 0 }; document.querySelectorAll("img[src='s.gif'][width='0']")[pkqcbcnll++].scrollIntoView(true);

Wikipedia (and any other site): remove images on page

Q: How likely is this to stop working if the site gets re-designed?
A: Unlikely to stop working, ever.

Sometimes Wikipedia images aren’t so nice to look at, here’s a quick one-liner to get rid of them:

document.querySelectorAll("img").forEach(function(i) { i.remove() });

Instagram (and e.g. Amazon and other sites): open main image in new tab

Q: How likely is this to stop working if the site gets re-designed?
A: Not too likely to stop working.

When you search for images or look at an author’s images in the grid layout, right-click one of the images and press “open in new tab”, then use the following script to open just the image in a new tab. I.e., it works on pages like this: https://www.instagram.com/p/CSSx_E3pmId/ (random cat picture, no endorsement intended.)

Or on Amazon product pages, you’ll often get a large image overlaid on the page when you hover your cursor over a thumbnail. When you execute this one-liner in that state, you’ll get that image in a new tab for easy saving/copying/sharing. E.g., on this page: https://www.amazon.co.jp/dp/B084H7ZYTT you will get this image in a new tab if you hover over that thumbnail. (Random product on Amazon, no endorsement intended.)

This script works by going through all image tags on the page and finding the source URL of the largest image. As of this writing, I believe this is the highest resolution you can get without guessing keys or reverse-engineering.

javascript:var imgs=document.getElementsByTagName("img");var height;var max_height=0;var i_for_max_height=0;for(var i=0;i<imgs.length;i++){height=parseInt(getComputedStyle(imgs[i]).getPropertyValue("height"));
if(height>max_height){max_height=height;i_for_max_height=i;}}open(imgs[i_for_max_height].getAttribute("src"));

xkcd: Add title text underneath image (useful for mouse-less browsing)

Q: How likely is this to stop working if the site gets re-designed?
A: May or may not survive site designs.

Useful for mouseless comic reading. There’s just one <img> with title text, so we’ll take the super-simple approach. Alternatively we could for example use #comic > img or we could select the largest image on the page as above.

javascript:document.querySelectorAll("img[title]").forEach(function(img) { document.getElementById("comic").innerHTML += "<br />
" + img.getAttribute("title"); })

Instagram: Remove login screen that appears after a while in the search results

To do this, we have to get rid of something layered on top of the page, and then remove the “overflow: hidden;” style attribute on the body tag.

A lot of pages put “overflow: hidden” on the body tags when displaying nag screens, so maybe it’s useful to have this as a separate bookmarklet. Anyway, in the following example we do both at once.

javascript:document.querySelector("div[role=presentation]").remove() ; document.body.removeAttribute("style");

Sites that block copy/paste on input elements

This snippet will present a modal prompt without any restrictions, and put that into the text input element. This example doesn’t work in iframes, and doesn’t check that we’re actually on an input element:

javascript:(function(){ document.activeElement.value = prompt("Please enter value")})()

The following snippet also handles iframes (e.g. on https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_input_test) and should even handle nested iframes (untested). The problem is that the <iframe> element becomes the activeElement when an input element is in an <iframe>. So we’ll loop until we find an activeElement that doesn’t have a contentDocument object. And then blindly assume that we’re on an input element:

javascript:(function(){var el = document.activeElement; while (el.contentDocument) { el = el.contentDocument; } el.activeElement.value = prompt("Please enter value")})()

Removing <iframe>s to get rid of a lot of ads

Many ads on the internet use <iframe> tags. Getting rid of all of these at once may clean up pages quite a bit — but some pages actually use <iframe>s for legitimate purposes.

javascript:(function(){document.querySelectorAll("iframe").forEach(function(i) { i.remove() });})();

Amazon Music: Get list of songs in your library

I.e., songs for which you have clicked the “+” icon. Maybe you want to get a list of your songs so you can move to a different service, or maybe you just want the list. The code presented here isn’t too likely to survive a redesign, but could probably be adjusted if necessary. Some JavaScript knowledge might come in handy if you want to get this script to work.

Amazon Music makes this process rather difficult because the UI unloads and reloads elements dynamically when it thinks you have too many on the page at once.

We are talking about this page (using the default, alphabetically ordered setting), BTW:

Ideally, you would just press Ctrl+A and paste the result into an editor, or select all table cells using, e.g., Ctrl+click.

However, you’ll only get around 15 items in that case. (The previous design let you copy everything at once and was superior in other respects too, IMO.)

Anyway, we’ll use the MutationObserver to get the list. Using the MutationObserver, we’ll get notified when elements are added to the page. Then we just need to scroll all the way down and output the collected list. We may get duplicates, depending on how the page is implemented, but we’ll ignore those for now — we may have duplicates anyway if we have the same song added more than once. So I recommend you get rid of the duplicates yourself, by using e.g. sort/uniq or by loading the list into Excel or LibreOffice Calc or Google Sheets, or whatever you want.

On Amazon Music’s page, the <music-image-rows> elements that are dynamically added to the page contain four <div> elements classed col1, col2, col3, and col4. (This could of course change any time.) These <div>s contain the song name, artist name, album name, and song length, respectively, which is all we want (well, all I want). We’ll just use querySelectorAll on the newly added element to select .col1, .col2, .col3, and .col4 and output the textContent to the console. Occasionally, a parent element pops up that contains all the previous .col* <div> elements. We’ll ignore that by only evaluating selections that have exactly four elements.

  1. Scroll to top of page
  2. Execute code (e.g., by opening console and pasting in the code)
  3. Slowly scroll to the end of the page
  4. Execute observer.disconnect() in the console (otherwise text will keep popping up if you scroll)
  5. Select and copy all text in the console (or use right-click → Export Visible Messages To), paste in an editor or (e.g.) Excel. There are some Find&Replace regular expressions below that you could use to post-process the output.
  6. Note that the first few entries in the list (I think sometimes it’s just the first one, sometimes it’s the first four) are never going to be newly added to the document, so you will have to copy them into your text file yourself.

The code’s output is rather spreadsheet-friendly, tab-deliminated and one line per entry. You can just paste that into your favorite spreadsheet software.

The code cannot really be called a one-liner at this point, but feel free to re-format it and package it as a bookmarklet, if you want.

observer = new MutationObserver(function(mutations) {
mutations.forEach(function(mutation) {
if (mutation.type === "childList") {
if (mutation.target && mutation.addedNodes.length) {
var string = "";
selector = mutation.target.querySelectorAll(".col1, .col2, .col3, .col4");
if (selector.length == 4) {
selector.forEach(function(e) { if (e.textContent) string += e.textContent + "\t" });
string += "----------\n"
console.log(string);
}
}
}
});
});
observer.observe(document, { childList: true, subtree: true });

Post-processing regular expressions (the “debugger eval code” one may be Firefox-specific):

Find: [0-9 ]*debugger eval code.*\n
Replace: (leave empty)

Find: \n----------\n
Replace: \n
(Do this until there are no more instances of the "Find" expression)

Find: \t----------
Replace: (leave empty)

There are a lot of duplicates. You can get rid of them either by writing your list to a file and then executing, e.g.: sort -n amazon_music.txt | uniq > amazon_music_uniq.txt
Or you can get Excel to do the work for you.
Make sure that you have the correct number of lines. (The Amazon Music UI enumerates all entries in the list. You would want the same number of lines in your text file.)

A simple netcat-based DNS server that returns NXDOMAIN on everything

sudo ncat -i1 -k -c "perl -e 'read(STDIN, \$dns_input, 2); \$dns_id = pack \"a2\", \$dns_input; print \"\$dns_id\x81\x83\x00\x00\x00\x00\x00\x00\x00\x00\";'" -u -vvvvvv -l 127.2.3.4 53
  • A DNS request contains two random bytes at the beginning that have to appear in the first two bytes in the response.
  • The DNS flags for an NXDOMAIN response are 0x81 0x83
  • The rest of the bytes can be 0, which mostly means that we have zero other sections in our response
  • The below example uses nmap-ncat, as found in Red Hat-based distributions, but can also be installed on Debian-based distributions (apt-get install ncat)
  • -i1 causes connections to be discarded after 1 second of idle time (optional)
  • -k means that we can accept more than one connection
  • -c means that whatever we get from the other side of the connection gets piped to a perl process running in a shell process (maybe -e is the same in this case)
  • -u means UDP (leaving this away should work if you do DNS over TCP)
  • -vvvvvv means that we can see what’s happening (optional)
  • -l means that we’re listening rather than sending, on 127.2.3.4, port 53
  • read(STDIN, $dns_input, 2) # read exactly two bytes from STDIN
  • $dns_id = pack “a2”, $dns_input # two bytes of arbitrary random data from $dns_input will be put into $dns_id
  • print “$dns_id\x81\x83\x00\x00\x00\x00\x00\x00\x00\x00” # sends $dns_id, NXDOMAIN, and zeros as described above to the other side
  • Note: I didn’t really test this beyond the proof-of-concept stage. If anything’s iffy, feel free to let me know.

Slow DNS in Docker containers without internet connection

If you’re running a Docker container on a Docker network that should _normally_ have internet access, but doesn’t (for whatever reason, see next paragraph for an example), you might find that DNS lookups in that Docker container will be very, very slow. If the DNS lookup “freezes” your program (prevents your program from serving further requests for a short while, etc.), this can be very inconvenient. (For example, and this is how I noticed the problem: if you’re ssh’ing into a container using dynamic port-forwarding to access other containers, every DNS lookup will freeze your ssh connection.)

In my case, I’m running a (prototype? beta?) test environment that is generally not supposed to connect to the internet to avoid “accidentally” doing silly things in production. However, a few sites have to be whitelisted, and whitelisting has to be done on a DNS basis. If you have similar needs, the solution here might help you. Though it’s hacky.

Diving in

I decided to dive in and see if I can change this behavior at all. Normal DNS failure time:

$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error

real 0m0.009s
user 0m0.005s
sys 0m0.000s

In a Docker container:

$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error

real 0m20.599s
user 0m0.010s
sys 0m0.010s

When you don’t have internet access, your host system will in most cases lack a ‘default’ root in the output of ‘ip route’. However, your containers don’t know anything about your host system’s routing tables and your container’s namespace will still have the default route.

Note: Docker uses the host’s /etc/resolv.conf to figure out where to forward DNS requests to, and if /etc/resolv.conf doesn’t specify any servers, Docker will use 8.8.4.4 and/or 8.8.8.8 by default. You can override the default in /etc/docker/daemon.json. (I believe you have to restart Docker after changing the file, sending a kill -HUP will reload some settings specified in the file, but not this one AFAICT). Anyway, the below examples will all show 8.8.4.4 or 8.8.8.8.

When you do curl tired.com in a Docker container, it will send this DNS request to Docker’s internal DNS resolver, as specified in the container‘s /etc/resolv.conf:

nameserver 127.0.0.11

One easy thing we can do is add the following to the container’s /etc/resolv.conf. This speeds things up quite a bit:

options timeout:1 attempts:1
$ time curl tired.com
curl: (6) Could not resolve host: tired.com; Unknown error

real 0m2.541s
user 0m0.010s
sys 0m0.014s

(The following implementation details were gathered from stracing the dockerd process, and might change in the future.) This nameserver runs in the dockerd process, but the dockerd process switches to the container’s network name space before forwarding the request:

# strace -vvvttf -p $dockerd_pid
2124 03:18:52.686952 openat(AT_FDCWD, "/var/run/docker/netns/dd995925297c", O_RDONLY) = 20
2124 03:18:52.686999 setns(20, CLONE_NEWNET) = 0
2124 03:18:52.687058 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 21
2124 03:18:52.687088 setsockopt(21, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
2124 03:18:52.687119 connect(21, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0

Tangent: here’s one way to ascertain which namespace this is:

# nsenter --net=/var/run/docker/netns/dd995925297c ip a
...
inet 172.18.0.13/16 brd 172.18.255.255 scope global eth1
...

Then you can just issue docker inspect or docker network inspect commands to figure out which container this IP belongs to.

And back: if you do the same UDP connection with the same parameters on the host system, DNS requests fail straight away, because the kernel is sensible enough to notice that there is no route to this host:

08:06:25.289709 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
08:06:25.289858 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
08:06:25.289981 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 ENETUNREACH (Network is unreachable)

So one thing I thought we could do is to force this connect call to fail. I thought one way might be to get Docker to use TCP to connect to DNS servers. This can be accomplished in /etc/docker/daemon.json like this:

{
"dns-opt": "use-vc"
}

Unfortunately that didn’t help because of the SOCK_NONBLOCK flag. The strace now looks like this:

1580  18:55:02.447528 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 34
1580 18:55:02.447571 setsockopt(34, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
1580 18:55:02.447612 connect(34, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 EINPROGRESS (Operation now in progress)

(Note: adding attempts:1 timeout:1 to dns-opt didn’t seem to have an effect.)

Furthermore, removing the default route doesn’t help either. curl still blocks for ~2.526 seconds. Docker keeps retrying even if it gets ENETUNREACH:

# nsenter --net=/var/run/docker/netns/dd995925297c ip route del default
19:39:44.614836 connect(21, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = -1 ENETUNREACH (Network is unreachable)

So up to now we have avoided reading Docker’s source code (it’s written in a weird, foreign language), but at this point we have no choice but to take a look. Perhaps my assumptions were wrong and Docker’s DNS is actually noticing there is a problem, but just retries no matter what. So the first thing we do is enable debug logs just so we can hopefully get some messages that are output close to the code we have to look at it. We just need to set /etc/docker/daemon.json and reload (kill -HUP) dockerd and run journalctl -xeu docker

{
"debug": true
}

Here are some logs we got after we removed our default route:

20-04-05T19:59:29.285616784+09:00" level=warning msg="[resolver] connect failed: dial udp 8.8.8.8:53: connect: network is unreachable"
20-04-05T19:59:29.285646884+09:00" level=warning msg="[resolver] connect failed: dial udp 8.8.4.4:53: connect: network is unreachable"

And here are some logs we got with the default route still existing:

20-04-05T19:30:14.118994138+09:00" level=debug msg="[resolver] read from DNS server failed, read udp 172.19.0.2:40693->8.8.8.8:53: i/o timeout"20-04-05T19:30:17.115909962+09:00" level=debug msg="[resolver] read from DNS server failed, read udp 172.19.0.2:55315->8.8.4.4:53: i/o timeout"

Searching the docker code takes us straight to the ServeDNS function in docker-ce/components/engine/vendor/github.com/docker/libnetwork/resolver.go. (I did a fresh clone today; the newest commit in my git log is 92768b32964e3037e520ab8e74fe190c39f4c83d. The code may look different in your version.)

So we’ve got a long for loop on line 432 that has some breaks and some continues and uses a couple hard-coded constants here and there.

maxExtDNS       = 3 //max number of external servers to try
extIOTimeout = 4 * time.Second
...
for i := 0; i < maxExtDNS; i++ {

One thing we could do is look for a workaround, i.e. anything that will make this function terminate earlier. Unfortunately, I didn’t have any success on that front. So it looks like we currently have no choice but to set up our own DNS server just to prevent Docker from freezing our ssh connections (or whatever software is being frozen in your case).

Setting up our own DNS server

Unfortunately, we can’t just run dnsmasq on the host and make it listen on 127.0.0.1. 127.0.0.1 obviously means something different inside a container than it does on the host. We generally can’t use any of the other IPs the host machine may have either — Docker most likely adds iptables rules that block containers from communicating with those IPs. (Though we could manually delete these iptables rules.)

So one way around this problem is to run a Docker container that runs dnsmasq. Unfortunately, this can get a bit messy — we need to use a static IP on that container, and we need to configure our host to use that IP in its /etc/resolv.conf.

One kind of nice — though very hacky — solution I came up with is to add a network and two containers that pretend to be 8.8.4.4/8.8.8.8 (or one container and two networks).

### fake-dns/Dockerfile
FROM centos
RUN yum -y install dnsmasq
ENTRYPOINT /usr/sbin/dnsmasq --server=/qiqitori.com/1.1.1.1 -R
$ docker build --tag fake-dns fake-dns
$ docker network create fake-dns-net --subnet 8.8.0.0/16
$ docker run -d --network fake-dns-net -it --ip 8.8.4.4 fake-dns
$ docker run -d --network fake-dns-net -it --ip 8.8.8.8 fake-dns

$ docker connect fake-dns-net existing-container

Connecting your already running containers that you need to fix DNS on to fake-dns-net will make DNS requests immediately get NXDOMAIN responses from our fake 8.8.4.4 and 8.8.8.8 servers, and will take care of any freezing issues you may have, all while DNS requests for other container names will still work as normal. (In this example, qiqitori.com is whitelisted and will be forwarded to Cloudflare’s 1.1.1.1 DNS servers. The -R flag on the dnsmasq command means that dnsmasq will ignore any servers listed in /etc/resolv.conf)

Conclusions

In my opinion, Docker’s automatic fallback to 8.8.4.4/8.8.8.8 isn’t the greatest Docker feature to say the least. Perhaps there should be a way to tell Docker not to fall back, and to send NXDOMAIN when it can’t answer anything by itself.

Unfortunately, this “hack” is likely not that future-proof. Future versions of Docker could for example expand or change the list of fallback servers, and this hack would have to be adapted to Docker’s changes. However, explicitly specifying { dns: [“8.8.4.4″, ” 8.8.8.8″] } in /etc/docker/daemon.json could perhaps take care of this problem. (I haven’t actually tested that.)

Bash: how to put command output/file contents on the command line

In certain situations, you may want to have a command that you e.g. saved in a file on your command line before executing it. As I didn’t find an answer straight away, here’s a quick-and-dirty way to do it that executes a command when you press a keyboard shortcut:

bind -x '"\C-g":"READLINE_LINE=$(cat dnsmasq_command)"'

This will put the contents of the file named ‘dnsmasq_command’ on your command line when you press Control-G.

And while we’re at it, while it’s got nothing to do with this article, here’s a binding that puts a time stamp (like 20200408022000, no, not a palindrome) on your command line:

bind -x '"\C-g":"READLINE_LINE=${READLINE_LINE:0:$READLINE_POINT}$(date +%Y%m%d%H%M00)${READLINE_LINE:$READLINE_POINT}; READLINE_POINT=$((($READLINE_POINT+14)))"'

irssi/perl memory leak

Some irssi users have reported that their irssi memory usage reaches 1-2 GB after a few months of usage. This time, the cause was a memory leak in Perl: https://rt.perl.org/Public/Bug/Display.html?id=130254

This bug affects Perl 5.24 (which is “current” in e.g. Debian Stable (Stretch)), and if you use certain regexes you will probably see memory leakage.

If you are suspecting a memory leak in irssi, here’s one way to find out more about the nature of your memory leak: dump irssi’s core using gcore. (irssi will be stopped during the dumping process but will carry on where it left off as soon as the dump is completed. If it takes a long time to dump the core, you may time out from some or all servers.) To do this, find irssi’s PID (by e.g. doing ps aux | grep irssi) and then execute:

gcore PID

You’ll get a core file that is as large as irssi’s memory usage at the time of the dump. If you have a memory leak due to the above Perl bug, you will have a lot of strings that start with “Assuming NOT a POSIX class” in your core file, which you can check using the following command:

strings core | grep "Assuming NOT a POSIX class" | wc

If this command outputs large numbers, your memory leak is most likely due to the above-mentioned Perl 5.24 bug. If you don’t get any output, you might still have a memory leak that can be found using the strings command. Either go through the output of the strings command manually and see if you can find any repeated messages, or maybe make use of the following command:

strings core | sort | uniq -c | sort -n -r | less

Let me or the people on #irssi on Freenode know if you find any other leaks.

Decoding Docker’s local-kv.db

Network problems in Docker can often be “fixed” by deleting /var/lib/docker/network/files/local-kv.db. However, in some cases it might be possible to just edit the bits that are wrong. This file is a BoltDB database.

Here’s some code to extract the keys contained in the file (libkv_example.go, heavily inspired by the example code in libkv/docs/examples.md)

package main

import (
    "time"
    "log"

    "github.com/docker/libkv"
    "github.com/docker/libkv/store"
    "github.com/docker/libkv/store/boltdb"
)

func init() {
    // Register boltdb store to libkv
    boltdb.Register()
}

func main() {
    client := "./local-kv.db" // ./ appears to be necessary

    // Initialize a new store
    kv, err := libkv.NewStore(
        store.BOLTDB, // or "boltdb"
        []string{client},
        &store.Config{
            Bucket: "libnetwork",
            ConnectionTimeout: 10*time.Second,
        },
    )
    if err != nil {
        log.Fatalf("Cannot create store: %v", err)
    }

    pair, err := kv.List("docker/network")
    for _, p := range pair {
        println("key:", string(p.Key))
        println("value:", string(p.Value))
    }
}

Make sure to work on a copy of your local-kv.db file, and that you have write permissions to your copy. Also note that this script is anything but thoroughly tested.

If you’re new to go like me, here are the commands to install Go, the required libraries and run the program (if you’re on Debian):

sudo apt-get install golang-1.8-go
PATH=/usr/lib/go-1.8/bin/:$PATH
# feel free to try go build libkv_example.go without the go get commands
# you'll most likely get an error like this:
# libkv_example.go:7:5: cannot find package "github.com/docker/libkv" in any of:
# ...
go get github.com/docker/libkv
go get go.etcd.io/bbolt
go build libkv_example.go
./libkv_example

Forwarding DNS requests using netcat, without dnsmasq/bind/other DNS software

I’ve sometimes found that it would be useful to be able to forward DNS requests from one network into another.

In this article, the examples are for forwarding Docker’s internal DNS. My (potential) use case is to (hopefully) work around Softether VPN’s internal DNS server not being able to resolve the names of other Docker containers on the Docker network (when running Softether VPN in a Docker container).

I tried the following on a CentOS 7.5 machine, but this only worked for the first request.

mkfifo0
mkfifo1
nc -l -u 172.19.0.3 53 < fifo1 > fifo0 & nc -u 127.0.0.11 53 < fifo0 > fifo1

Checking netstat -lnp after sending the first request, we see that nc is no longer listening. The problem is that the -k option is missing, but adding the -k option gets us this message:

Ncat: UDP mode does not support the -k or --keep-open options, except with --exec or --sh-exec. QUITTING.

Wait, “–exec”? “–sh-exec”? What, we don’t have to do this whole mkfifo stuff at all?!

/root/nc.sh:

#!/bin/sh
nc -u 127.0.0.11 53

Command:

nc -k -l -u 172.19.0.3 53 -e /root/nc.sh

Note, -e is short for –exec. This appears to work just fine. (Note: the corresponding –sh-exec (-c) option wouldn’t work immediately and I didn’t feel like spending too much time on this.)

Here’s a dnsmasq command to do something similar:

dnsmasq -u root -i eth0 --no-dhcp-interface=eth0 --port=5353

This will also allow you to resolve things using /etc/hosts on the container running dnsmasq, while disabling dnsmasq’s internal DHCP. (If you change the listening interface given in -i, you’ll also have to change the interface given in –no-dhcp-interface.)

Spreadsheets vs. Command Line Utilities vs. SQL (for Pivot Tables)

When processing text files on Linux, you have a lot of choice. Sed, Awk, Perl, or just coreutils, or perhaps a spreadsheet application? I’m a reasonably educated spreadsheet application user, but I’m also a reasonably educated command line user and a reasonably educated SQL user. This article pits the three against each other.

In this article, I’ll show different ways to process a large CSV file: one solution using a spreadsheet application, one solution using standard CLI utilities (GNU coreutils GNU datamash), and one solution using q (http://harelba.github.io/q – Run SQL directly on CSV files) (and one solution using sqlite3, which is almost the same).

Conclusions

Yes, I’m putting my conclusions first. If you need to create a Pivot Table from CSV files, I believe SQL is the best solution. The q utility makes using SQL very comfortable.

The data

The dataset used in this article describes export statistics, i.e., trade from Japan to other countries. We would like to do a simple Pivot Table-like task that would be really easy in Excel: find the total export volume (in JPY) from Japan to a specific country for every HS “section”. Here are some examples of HS sections and their corresponding HS chapters:

Chapters 01-05: LIVE ANIMALS; ANIMAL PRODUCTS
Chapters 06-14: VEGETABLE PRODUCTS
Chapter 15: ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES

The following links are for import, but that doesn’t matter in our case I think. Here’s the whole table: http://www.customs.go.jp/english/tariff/2018_4/index.htm This table contains links to tables further describing the HS codes in each HS chapter. For example, here’s the table for section I, “LIVE ANIMALS; ANIMAL PRODUCTS”, chapter 01: “Live animals”: http://www.customs.go.jp/english/tariff/2018_4/data/e_01.htm.

The HS codes in our dataset look like this: ‘010121000’; the first two digits correspond to the HS chapter, which is all we are going to look at for now. We have to group by these two digits.

The files

I downloaded all the CSV files on this page: https://www.e-stat.go.jp/stat-search/files?page=1&layout=datalist&toukei=00350300&tstat=000001013141&cycle=1&year=20170&month=24101212&tclass1=000001013180&tclass2=000001013181&result_back=1 (English) and merged them into a single file, data.csv like this:

head -n 1 ik-100h2017e001.csv > header
tail -q -n +2 ik-100h2017e0*csv >> header

The HS chapters/sections are described here: http://www.customs.go.jp/english/tariff/2018_4/index.htm (English. A Japanese page is available too, of course.)

The country codes are listed here: http://www.customs.go.jp/toukei/sankou/code/country_e.htm (English. Japanese is available.)

data.csv.gz
countries.csv
hs_sections.csv
hs_chapters_to_sections.csv
hs_sections_no_to_descriptions.csv

The spreadsheet solution

I won’t go into much detail here. First of all, we add worksheets for all of the above files (or reference external files). Then we add a column to compute the first two digits in the HS codes, using a function like MID(C2,2,2). We use VLOOKUP() to look up the HS section. (Perhaps we use another VLOOKUP() for the country codes.) Then we create a pivot table. (It would be more efficient to VLOOKUP() from the pivot table, but while I believe that to be possible in Excel, I’m not sure it’s possible in OpenOffice/LibreOffice.)

Anyway, using spreadsheets is rather user-friendly, but large files take quite a while to process. Adding extra columns to the original data is very inconvenient too. (Using calculated fields in Excel may help with this.)

The CLI/GNU(?) solution

We are going to make use of GNU datamash here. GNU datamash is capable of grouping and summing, which is already halfway there. For the lookups, we use the join command(!), which is part of coreutils.

We need to do some minor pre-processing, as we do not want the header rows in this solution:

tail -n +1 data.csv > data_nh.csv
tail -n +1 countries.csv > countries_nh.csv
tail -n +1 hs_sections.csv > hs_sections_nh.csv

The other files do not have any headers. So far so good, but using common CLI tools gets a bit awkward in the next step, cutting off characters in the middle of the HS code field. Let’s isolate that field:

$ cut -d, -f3 data_nh.csv | head -n3
'010121000'
'010121000'
'010121000'

Then we cut off the unneeded characters:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | head -n3
01
01
01

Then we need to re-add the other columns. This is one of the slightly awkward steps when doing this using CLI tools. Let’s isolate the other relevant columns first though:

$ cut -d, -f4,9 data_nh.csv | head -n3
103,2100
105,1800
205,84220

To paste these two columns back onto the first isolated columns, we use the aptly(?) named paste command. The -d option allows use to combine fields using the comma operator. (Default is tab.) We’ll pass the HS section as standard input, and the other two relevant columns using bash’s <().

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | head -n3
01,103,2100
01,105,1800
01,205,84220

What we have now is a trimmed CSV that goes “HS section”,”Country Code”,”Amount”.

The “VLOOKUP” part is slightly tricky. We are going to use the little-known join command, which is included in coreutils. Some HS sections and some country names have commas in them, which are a bit inconvenient, but not a huge problem as the result of the “VLOOKUP” is attached to the right of the entire original data.

Here’s a quick demonstration of the join command. (Note: countries_nh.csv is pre-sorted. Everything passed to join must be sorted.)

$ echo 222 | join -t, - countries_nh.csv
222,"Finland"

In Excel, we are able to safely group by cells that may contain commas, but not so in datamash. I left out something above: We’ve got the HS chapter code above, but from this chapter code, we wanted to look up the HS section, and group by that section. So let’s go back one step and use join to get us the HS section number from the HS chapter number. Note that all join input must be sorted, so before we add a pipe to sort on the newly created fourth field:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | head -n3
00,103,276850736,0
00,105,721488020,0
00,106,258320777,0

Getting the sort command to sort correctly by a single field isn’t very easy. If it weren’t for the –debug option that is! In this case we want to sort by the first field, so the command becomes ‘sort -n -t, -k1,1’. (Start field == end field == 1, so -k1,1.) Debug output looks like this:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 --debug | head
sort: using ‘en_US.UTF-8’ sorting rules
00,103,276850736
__
________________
00,105,721488020
__
________________
00,106,258320777
__
________________

The field that has been sorted gets underlined. Great! Now let’s do the pivoting part using datamash:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | datamash -s -t, groupby 2,1,4 sum 3
103,00,276850736
103,01,418085
103,03,14476769

The -s option sorts, which is required when using groupby. The -t option selects ‘,’ as the delimiter. This command groups by column 2 (country), and then by column 4 (our HS section number), and computes a sum of column 3 for this grouping. So this is it! If we know the country codes and HS sections by heart, that is.

Well, our above output from datamash starts with the country code, and things are nice and sorted, so we just have to join again:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | datamash -s -t, groupby 2,4 sum 3 | join -t, - countries_nh.csv | head -n3
103,0,276850736,"Republic of Korea"
103,1,15325799,"Republic of Korea"
103,10,50079044,"Republic of Korea"

Next we would like to look up the HS section number to get the HS section description. In the above commands, we joined on the first field, but fortunately join supports joining on different fields. We need to sort on the second field and then tell join to join on the second field, which can be accomplished by using the -1 option and specifying 2 . (So -1 2 or simply -12, though that may look confusing.)

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | datamash -s -t, groupby 2,4 sum 3 | join -t, - countries_nh.csv | sort -n -t, -k2,2 | join -12 -t, - hs_sections_no_to_descriptions.csv | head -n3
00,103,276850736,"Republic of Korea","Unknown"
00,105,721488020,"People's Republic of China","Unknown"
00,106,258320777,"Taiwan","Unknown"

That’s it! To get e.g. Finland, we’ll make it easy for ourselves and just grep for Finland:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | datamash -s -t, groupby 2,4 sum 3 | join -t, - countries_nh.csv | sort -n -t, -k2,2 | join -12 -t, - hs_sections_no_to_descriptions.csv | grep Finland
0,222,1758315,"Finland","Unknown"
2,222,10613,"Finland","VEGETABLE PRODUCTS"
3,222,654,"Finland","ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES"
4,222,45021,"Finland","PREPARED FOODSTUFFS; BEVERAGES, SPIRITS AND VINEGAR; TOBACCO AND MANUFACTURED TOBACCO SUBSTITUTES"
5,222,33611,"Finland","MINERAL PRODUCTS"
6,222,2624353,"Finland","PRODUCTS OF THE CHEMICAL OR ALLIED INDUSTRIES"
7,222,4880410,"Finland","PLASTICS AND ARTICLES THEREOF; RUBBER AND ARTICLES THEREOF"
8,222,12557,"Finland","RAW HIDES AND SKINS, LEATHER, FURSKINS AND ARTICLES THEREOF; ADDLERY AND HARNESS; TRAVEL GOODS, HANDBAGS AND SIMILAR CONTAINERS; ARTICLES OF ANIMAL GUT (OTHER THAN SILK-WORM GUT)"
9,222,3766,"Finland","WOOD AND ARTICLES OF WOOD; WOOD CHARCOAL; CORK AND ARTICLES OF CORK; MANUFACTURES OF STRAW, OF ESPARTO OR OF OTHER PLAITING MATERIALS; BASKETWARE AND WICKERWORK"
10,222,38476,"Finland","PULP OF WOOD OR OF OTHER FIBROUS CELLULOSIC MATERIAL; RECOVERED (WASTE AND SCRAP) PAPER OR PAPERBOARD; PAPER AND PAPERBOARD AND ARTICLES THEREOF"
11,222,527084,"Finland","TEXTILES AND TEXTILE ARTICLES"
12,222,1541,"Finland","FOOTWEAR, HEADGEAR, UMBRELLAS, SUN UMBRELLAS, WALKING-STICKS, SEAT-STICKS, WHIPS, RIDING-CROPS AND PARTS THEREOF; PREPARED FEATHERS AND ARTICLES MADE THEREWITH; ARTIFICIAL FLOWERS; ARTICLES OF HUMAN HAIR"
13,222,991508,"Finland","ARTICLES OF STONE, PLASTER, CEMENT, ASBESTOS, MICA OR SIMILAR MATERIALS; CERAMIC PRODUCTS; GLASS AND GLASSWARE"
14,222,5757,"Finland","NATURAL OR CULTURED PEARLS, PRECIOUS OR SEMI-PRECIOUS STONES, PRECIOUS METALS, METALS CLAD WITH PRECIOUS METAL AND ARTICLES THEREOF; IMITATION JEWELLERY; COIN"
15,222,971561,"Finland","BASE METALS AND ARTICLES OF BASE METAL"
16,222,14614308,"Finland","MACHINERY AND MECHANICAL APPLIANCES; ELECTRICAL EQUIPMENT; PARTS THEREOF; SOUND RECORDERS AND REPRODUCERS, TELEVISION IMAGE AND SOUND RECORDERS AND REPRODUCERS, AND PARTS AND ACCESSORIES OF SUCH ARTICLES"
17,222,13427653,"Finland","VEHICLES, AIRCRAFT, VESSELS AND ASSOCIATED TRANSPORT EQUIPMENT"
18,222,4062385,"Finland","OPTICAL, PHOTOGRAPHIC, CINEMATOGRAPHIC, MEASURING, CHECKING, PRECISION, MEDICAL OR SURGICAL INSTRUMENTS AND APPARATUS; CLOCKS AND WATCHES; MUSICAL INSTRUMENTS; PARTS AND ACCESSORIES THEREOF"
19,222,4550,"Finland","ARMS AND AMMUNITION; PARTS AND ACCESSORIES THEREOF"
20,222,399367,"Finland","MISCELLANEOUS MANUFACTURED ARTICLES"
21,222,20539,"Finland","WORKS OF ART, COLLECTORS' PIECES AND ANTIQUES"

If you need to match column names exactly, you could replace the above grep by something like this:

... | grep -P '^.*?,.*?,.*?,"Finland"

Though personally I’d maybe use awk for that. On my machine, executing this takes 0.675 seconds. That’s pretty fast!

The q solution

q is a tool that allows you to perform SQL queries on CSV files from the comfort of the command line. If you know SQL, that should be pretty cool!

We’ve got some issues with digits and hyphens in the column names, so we first pre-process to get rid of those:

$ head -n 1 data.csv | tr '1-9' 'A-I' | sed 's/-//g'; tail -n +2 data.csv

This uses tr to replace the digits 1-9 with corresponding letters and sed to get rid of hyphens. We pipe this into q. The q command itself looks like this:

$ q -d, -H 'select Description, sum(ValueYear) from - JOIN hs_sections.csv ON substr(HS,2,2)=Number JOIN countries.csv ON Country=CountryID where CountryName="Finland" group by Description'

-d specifies the delimiter, -H specifies the presence of a header row (and we can use these header names in the query!) and the rest is just SQL.

$ time (head -n 1 data.csv | tr '1-9' 'A-I' | sed 's/-//g'; tail -n +2 data.csv) | q -d, -H 'select Description, sum(ValueYear) from - JOIN hs_sections.csv ON substr(HS,2,2)=Number JOIN countries.csv ON Country=CountryID where CountryName="Finland" group by Description'
ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES,654
ARMS AND AMMUNITION; PARTS AND ACCESSORIES THEREOF,4550
"ARTICLES OF STONE, PLASTER, CEMENT, ASBESTOS, MICA OR SIMILAR MATERIALS; CERAMIC PRODUCTS; GLASS AND GLASSWARE",991508
...
real    0m12.590s
user    0m12.364s
sys     0m0.236s

Wow, this was pretty pleasant, but it took my machine 12.59 seconds to get here. q uses sqlite, and we can use the -S option to save the resulting sqlite database to a file. Here’s an sqlite3 command that executes the same query on a saved database:

time sqlite3 data.sqlite 'select Description, sum(QuantityAYear) from `-` JOIN `hs_sections.csv` ON substr(HS,2,2)=Number JOIN `countries.csv` ON Country=CountryID where CountryName="Finland" group by Description;' 
ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES|0
ARMS AND AMMUNITION; PARTS AND ACCESSORIES THEREOF|0
ARTICLES OF STONE, PLASTER, CEMENT, ASBESTOS, MICA OR SIMILAR MATERIALS; CERAMIC PRODUCTS; GLASS AND GLASSWARE|7436
...
real    0m0.218s
user    0m0.192s
sys     0m0.024s

As you can see, that is pretty fast. So it spends quite a bit of time importing the CSV file. Well, as it turns out, sqlite3 supports importing from CSV files as well. Here’s a command that creates a database in test.sqlite, imports the three required CSV files, and runs the query:

$ time (sqlite3 -csv test.sqlite '.import data.csv data'; sqlite3 -csv test.sqlite '.import hs_sections.csv hs_sections'; sqlite3 -csv test.sqlite '.import countries.csv countries'; sqlite3 -csv test.sqlite 'select Description, sum(`Quantity1-Year`) from data JOIN hs_sections ON substr(HS,2,2)=Number JOIN countries ON Country=CountryID where CountryName="Finland" group by Description; ')
"ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES",0
"ARMS AND AMMUNITION; PARTS AND ACCESSORIES THEREOF",0
"ARTICLES OF STONE, PLASTER, CEMENT, ASBESTOS, MICA OR SIMILAR MATERIALS; CERAMIC PRODUCTS; GLASS AND GLASSWARE",267696
...
real    0m2.581s
user    0m2.372s
sys     0m0.120s

This is much faster than using q, but may have various limitations. Note that if you feed sqlite3 commands through standard input, you can do all this in a single sqlite3 session, and the database can be entirely in-memory.

Can’t connect to (e.g.) GitLab, failing with “no hostkey alg”

If you get the following error message when connecting to a server (in my case, it was a GitLab instance running on Docker using something at least inspired by the official Docker image), you may be using an older SSH client, such as the one in RHEL/CentOS 6.

no hostkey alg

Some cursory web searching didn’t give me a satisfactory solution, so here goes: It seems likely that you’re using an older ssh client (for example, the one in CentOS 6.x). This client unfortunately doesn’t support the -Q option to list supported host keys, but we can figure out that information by doing the following:

ssh -vvvv 127.0.0.1

On a more modern system, you might get something like this:

debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa

(Which is the same as ssh -Q key, but harder to read.)

On older systems, you won’t get the helpful “host key algorithms:” label, but you’ll still get the information. So perhaps look out for a line that contains “ssh-rsa”.

Then, try the same ssh -vvvv 12.34.56.78 (replace 12.34.56.78 with the target server’s name or address), and look at the equivalent line. (Or if you have access, log into the server and try ssh -Q key.)

In my case, the client only had ssh-rsa and ssh-dsa, and the target server only listed ecdsa-sha2-nistp256. In my case, this could be solved by entirely on the client side. All we have to do is add an option to the command line and create a key if it doesn’t exist yet:

ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key
ssh -o HostKeyAlgorithms=ecdsa-sha2-nistp256,ssh-rsa 12.34.56.78

To avoid adding this option every time, you can add the following into your ~/.ssh/config:

Host 12.34.56.78
        HostKeyAlgorithms ecdsa-sha2-nistp256,ssh-rsa

(Or if you want this on all hosts: Host *)

Hope this helps.

Factors of the first 1,000,000 integer numbers

A quick Google search only brought up lists up to e.g. 1,000, so here’s a CSV with the factors for the first 1,000,000 integers. (Note that from a programming perspective, you’re most likely better off calculating the factors yourself rather than parsing a CSV file. So it’s not quite clear if this list is actually useful.) Feel free to do with it whatever you want.

1000000_factors.gz (23 MB)

Here’s an extract:

1,1
2,1
3,1
4,1,2
5,1
6,1,2,3
7,1
8,1,2,4
9,1,3
10,1,2,5
11,1
12,1,2,3,4,6
13,1
14,1,2,7
15,1,3,5
16,1,2,4,8
17,1
18,1,2,3,6,9
19,1
20,1,2,4,5,10
21,1,3,7
22,1,2,11
23,1
24,1,2,3,4,6,8,12
25,1,5
26,1,2,13
27,1,3,9
28,1,2,4,7,14
29,1
30,1,2,3,5,6,10,15
31,1
32,1,2,4,8,16
33,1,3,11
34,1,2,17
35,1,5,7
36,1,2,3,4,6,9,12,18
37,1
38,1,2,19
39,1,3,13
40,1,2,4,5,8,10,20
41,1
42,1,2,3,6,7,14,21
43,1
44,1,2,4,11,22
45,1,3,5,9,15
46,1,2,23
47,1
48,1,2,3,4,6,8,12,16,24
49,1,7
50,1,2,5,10,25