Bash: how to put command output/file contents on the command line

In certain situations, you may want to have a command that you e.g. saved in a file on your command line before executing it. As I didn’t find an answer straight away, here’s a quick-and-dirty way to do it that executes a command when you press a keyboard shortcut:

bind -x '"\C-g":"READLINE_LINE=$(cat dnsmasq_command)"'

This will put the contents of the file named ‘dnsmasq_command’ on your command line when you press Control-G.

And while we’re at it, while it’s got nothing to do with this article, here’s a binding that puts a time stamp (like 20200408022000, no, not a palindrome) on your command line:

bind -x '"\C-g":"READLINE_LINE=${READLINE_LINE:0:$READLINE_POINT}$(date +%Y%m%d%H%M00)${READLINE_LINE:$READLINE_POINT}; READLINE_POINT=$((($READLINE_POINT+14)))"'

Skype for Business fortunes

Skype for Business:
    Accept any substitute.
    If it's broke, don't fix it.
    If it ain't broke, fix it.
    Form follows malfunction.
    The Cutting Edge of Obsolescence.
    The trailing edge of software technology.
    Armageddon never looked so good.
    Japan's secret weapon.
    You'll envy the dead.
    Making the world safe for competing communication tools.
    Let it get in YOUR way.
    The problem for your problem.
    If it starts working, we'll fix it.  Pronto.
    It could be worse, but it'll take time.
    Simplicity made complex.
    The greatest productivity aid since typhoid.
    Flakey and built to stay that way.
 
One thousand monkeys.  One thousand Windows 8 machines.  One thousand years.
    Skype for Business.
%
Skype for Business:
    It's not how slow you make it.  It's how you make it slow.
    The communication tool preferred by masochists 3 to 1.
    Built to take on the world... and lose!
    Don't try it 'til you've knocked it.
    Power tools for Power Fools.
    Putting new limits on productivity.
    The closer you look, the cruftier we look.
    Design by counterexample.
    A new level of software disintegration.
    No hardware is safe.
    Do your time.
    Rationalization, not realization.
    Old-world software cruftsmanship at its finest.
    Gratuitous incompatibility.
    Your mother.
    THE user interference management system.
    You can't argue with failure.
    You haven't died 'til you've used it.
 
The environment of today... tomorrow!
    Skype for Business.
%
Skype for Business:
    Something you can be ashamed of.
    30% more entropy than the leading communication tool.
    The first fully modular software disaster.
    Rome was destroyed in a day.
    Warn your friends about it.
    Climbing to new depths.  Sinking to new heights.
    An accident that couldn't wait to happen.
    Don't wait for the movie.
    Never use it after a big meal.
    Need we say less?
    Plumbing the depths of human incompetence.
    It'll make your day.
    Don't get frustrated without it.
    Power tools for power losers.
    A software disaster of Biblical proportions.
    Never had it.  Never will.
    The software with no visible means of support.
    More than just a generation behind.
 
Hindenburg.  Titanic.  Edsel.
    Skype for Business.
%
Skype for Business:
    The ultimate bottleneck.
    Flawed beyond belief.
    The only thing you have to fear.
    Somewhere between chaos and insanity.
    On autopilot to oblivion.
    The joke that kills.
    A disgrace you can be proud of.
    A mistake carried out to perfection.
    Belongs more to the problem set than the solution set.
    To err is Skype for Business.
    Ignorance is our most important resource.
    Complex nonsolutions to simple nonproblems.
    Built to fall apart.
    Nullifying centuries of progress.
    Falling to new depths of inefficiency.
    The last thing you need.
    The defacto substandard.
 
Elevating brain damage to an art form.
    Skype for Business.
%
Skype for Business:
    We will dump no core before its time.
    One good crash deserves another.
    A bad idea whose time has come.  And gone.
    We make excuses.
    It didn't even look good on paper.
    You laugh now, but you'll be laughing harder later!
    A new concept in abuser interfaces.
    How can something get so bad, so quickly?
    It could happen to you.
    The art of incompetence.
    You have nothing to lose but your lunch.
    When uselessness just isn't enough.
    More than a mere hindrance.  It's a whole new barrier!
    When you can't afford to be right.
    And you thought we couldn't make it worse.
 
If it works, it isn't Skype for Business.
%
Skype for Business:
    You'd better sit down.
    Don't laugh.  It could be YOUR thesis project.
    Why do it right when you can do it wrong?
    Live the nightmare.
    Our bugs run faster.
    When it absolutely, positively HAS to crash overnight.
    There ARE no rules.
    You'll wish we were kidding.
    Everything you never wanted in a communication tool.  And more.
    Dissatisfaction guaranteed.
    There's got to be a better way.
    The next best thing to keypunching.
    Leave the thrashing to us.
    We wrote the book on core dumps.
    Even your dog won't like it.
    More than enough rope.
    Garbage at your fingertips.
 
Incompatibility.  Shoddiness.  Uselessness.
    Skype for Business.

Adapted from https://lwn.net/Articles/26612/.

Standing on the shoulder of giants

Making new discoveries usually requires knowledge from previous discoveries. Discoveries are published as text in research papers. Scientists find these research papers using keywords they plug into a search engine or when other research papers cite them. Being able to search by keywords has obviously made scientific research much easier than before.

However, the research papers themselves are still written in natural languages, and unless the authors avoid using grammatical features like the words “it” or “them” to avoid repeating nouns over and over again, it’s difficult to extract meaning from them.

So perhaps people shouldn’t be writing research papers in (e.g.) English and instead use something that is easier for computers to parse. The research paper would contain claims, some numeric indicator for how certain these claims are (or from where they came), how these claims came about (e.g. experiment (with instructions for how and experiment was set up and why), logical reasoning (with the actual chain of reasoning), etc.). Then you could, for example, search for claims, instead of just keywords.

Instead of just using the keywords “ocean plastic waste”, you could search for “plastic waste in oceans negatively impacts ocean fauna”. Instead of typing in this query, you would choose every noun, verb and other qualifiers from a searchable menu. You would be able to display this menu in any language. The words in the menu would all mean a specific thing, and anybody would be able to look at the definitions and properties of every word. Eventually, the system could be programmed to (e.g.) look at two different claims and judge whether these claims might be related, based on common subjects or objects, and generate potential new claims all by itself. For example, claim 1: “Thunderstorms may under certain circumstances cause blackouts” and claim 2: “Certain atmospheric conditions cause thunderstorms” would logically mean that “certain atmospheric conditions cause blackouts”, which is true. The system probably shouldn’t let users use the word “certain” though.

How hard would it be to make such a system? Well, it’s best to start in a single niche and slowly expand the system. Here’s a very simple (completely static) example that is a bit different in that it just generates natural English or Japanese text based on what values are selected in the menus: https://blog.qiqitori.com/cve_generator/

Saving the selections would be a first step in the right direction. Currently, only one “allows attackers to:” is allowed, but you would probably want to 1) allow an arbitrary-length chain 2) something indicating the author’s degree of certainty (“may allow”), 3) add other verbs, other nouns, and 4) the reasoning behind a set of these claims (it gets complex here, but in the end you just have to select subjects, verbs, and objects, ifs, etc.).

Perhaps selecting things from a menu would get old quick (unless you just do it for the main claims). So the computer could try its best to analyze a sentence and ask the user to confirm that its interpretation is correct. The preceding sentence could be analyzed like this, but you’ll see that this could be tricky and just selecting something from a menu might be the better option after all: So [potential solution for previous follows:] the computer [the software] could try its best [could potentially (but probably wouldn’t manage to do this perfectly)] to analyze a sentence [analyze a given input sentence] and [afterwards] ask the user to [force the user] confirm that its [the software’s] interpretation [of the given input sentence] is correct [confirm … or force the user to edit the interpretation using a menu]. As you can see, there is a lot of context involved, which makes it hard to derive (e.g.) claims from natural text.

As you can see in the CVE generator, having the facts makes it somewhat easy for the system to generate text in any language for humans to read and think about

irssi/perl memory leak

Some irssi users have reported that their irssi memory usage reaches 1-2 GB after a few months of usage. This time, the cause was a memory leak in Perl: https://rt.perl.org/Public/Bug/Display.html?id=130254

This bug affects Perl 5.24 (which is “current” in e.g. Debian Stable (Stretch)), and if you use certain regexes you will probably see memory leakage.

If you are suspecting a memory leak in irssi, here’s one way to find out more about the nature of your memory leak: dump irssi’s core using gcore. (irssi will be stopped during the dumping process but will carry on where it left off as soon as the dump is completed. If it takes a long time to dump the core, you may time out from some or all servers.) To do this, find irssi’s PID (by e.g. doing ps aux | grep irssi) and then execute:

gcore PID

You’ll get a core file that is as large as irssi’s memory usage at the time of the dump. If you have a memory leak due to the above Perl bug, you will have a lot of strings that start with “Assuming NOT a POSIX class” in your core file, which you can check using the following command:

strings core | grep "Assuming NOT a POSIX class" | wc

If this command outputs large numbers, your memory leak is most likely due to the above-mentioned Perl 5.24 bug. If you don’t get any output, you might still have a memory leak that can be found using the strings command. Either go through the output of the strings command manually and see if you can find any repeated messages, or maybe make use of the following command:

strings core | sort | uniq -c | sort -n -r | less

Let me or the people on #irssi on Freenode know if you find any other leaks.

Decoding Docker’s local-kv.db

Network problems in Docker can often be “fixed” by deleting /var/lib/docker/network/files/local-kv.db. However, in some cases it might be possible to just edit the bits that are wrong. This file is a BoltDB database.

Here’s some code to extract the keys contained in the file (libkv_example.go, heavily inspired by the example code in libkv/docs/examples.md)

package main

import (
    "time"
    "log"

    "github.com/docker/libkv"
    "github.com/docker/libkv/store"
    "github.com/docker/libkv/store/boltdb"
)

func init() {
    // Register boltdb store to libkv
    boltdb.Register()
}

func main() {
    client := "./local-kv.db" // ./ appears to be necessary

    // Initialize a new store
    kv, err := libkv.NewStore(
        store.BOLTDB, // or "boltdb"
        []string{client},
        &store.Config{
            Bucket: "libnetwork",
            ConnectionTimeout: 10*time.Second,
        },
    )
    if err != nil {
        log.Fatalf("Cannot create store: %v", err)
    }

    pair, err := kv.List("docker/network")
    for _, p := range pair {
        println("key:", string(p.Key))
        println("value:", string(p.Value))
    }
}

Make sure to work on a copy of your local-kv.db file, and that you have write permissions to your copy. Also note that this script is anything but thoroughly tested.

If you’re new to go like me, here are the commands to install Go, the required libraries and run the program (if you’re on Debian):

sudo apt-get install golang-1.8-go
PATH=/usr/lib/go-1.8/bin/:$PATH
# feel free to try go build libkv_example.go without the go get commands
# you'll most likely get an error like this:
# libkv_example.go:7:5: cannot find package "github.com/docker/libkv" in any of:
# ...
go get github.com/docker/libkv
go get go.etcd.io/bbolt
go build libkv_example.go
./libkv_example

Forwarding DNS requests using netcat, without dnsmasq/bind/other DNS software

I’ve sometimes found that it would be useful to be able to forward DNS requests from one network into another.

In this article, the examples are for forwarding Docker’s internal DNS. My (potential) use case is to (hopefully) work around Softether VPN’s internal DNS server not being able to resolve the names of other Docker containers on the Docker network (when running Softether VPN in a Docker container).

I tried the following on a CentOS 7.5 machine, but this only worked for the first request.

mkfifo0
mkfifo1
nc -l -u 172.19.0.3 53 < fifo1 > fifo0 & nc -u 127.0.0.11 53 < fifo0 > fifo1

Checking netstat -lnp after sending the first request, we see that nc is no longer listening. The problem is that the -k option is missing, but adding the -k option gets us this message:

Ncat: UDP mode does not support the -k or --keep-open options, except with --exec or --sh-exec. QUITTING.

Wait, “–exec”? “–sh-exec”? What, we don’t have to do this whole mkfifo stuff at all?!

/root/nc.sh:

#!/bin/sh
nc -u 127.0.0.11 53

Command:

nc -k -l -u 172.19.0.3 53 -e /root/nc.sh

Note, -e is short for –exec. This appears to work just fine. (Note: the corresponding –sh-exec (-c) option wouldn’t work immediately and I didn’t feel like spending too much time on this.)

Here’s a dnsmasq command to do something similar:

dnsmasq -u root -i eth0 --no-dhcp-interface=eth0 --port=5353

This will also allow you to resolve things using /etc/hosts on the container running dnsmasq, while disabling dnsmasq’s internal DHCP. (If you change the listening interface given in -i, you’ll also have to change the interface given in –no-dhcp-interface.)

Spreadsheets vs. Command Line Utilities vs. SQL (for Pivot Tables)

When processing text files on Linux, you have a lot of choice. Sed, Awk, Perl, or just coreutils, or perhaps a spreadsheet application? I’m a reasonably educated spreadsheet application user, but I’m also a reasonably educated command line user and a reasonably educated SQL user. This article pits the three against each other.

In this article, I’ll show different ways to process a large CSV file: one solution using a spreadsheet application, one solution using standard CLI utilities (GNU coreutils GNU datamash), and one solution using q (http://harelba.github.io/q – Run SQL directly on CSV files) (and one solution using sqlite3, which is almost the same).

Conclusions

Yes, I’m putting my conclusions first. If you need to create a Pivot Table from CSV files, I believe SQL is the best solution. The q utility makes using SQL very comfortable.

The data

The dataset used in this article describes export statistics, i.e., trade from Japan to other countries. We would like to do a simple Pivot Table-like task that would be really easy in Excel: find the total export volume (in JPY) from Japan to a specific country for every HS “section”. Here are some examples of HS sections and their corresponding HS chapters:

Chapters 01-05: LIVE ANIMALS; ANIMAL PRODUCTS
Chapters 06-14: VEGETABLE PRODUCTS
Chapter 15: ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES

The following links are for import, but that doesn’t matter in our case I think. Here’s the whole table: http://www.customs.go.jp/english/tariff/2018_4/index.htm This table contains links to tables further describing the HS codes in each HS chapter. For example, here’s the table for section I, “LIVE ANIMALS; ANIMAL PRODUCTS”, chapter 01: “Live animals”: http://www.customs.go.jp/english/tariff/2018_4/data/e_01.htm.

The HS codes in our dataset look like this: ‘010121000’; the first two digits correspond to the HS chapter, which is all we are going to look at for now. We have to group by these two digits.

The files

I downloaded all the CSV files on this page: https://www.e-stat.go.jp/stat-search/files?page=1&layout=datalist&toukei=00350300&tstat=000001013141&cycle=1&year=20170&month=24101212&tclass1=000001013180&tclass2=000001013181&result_back=1 (English) and merged them into a single file, data.csv like this:

head -n 1 ik-100h2017e001.csv > header
tail -q -n +2 ik-100h2017e0*csv >> header

The HS chapters/sections are described here: http://www.customs.go.jp/english/tariff/2018_4/index.htm (English. A Japanese page is available too, of course.)

The country codes are listed here: http://www.customs.go.jp/toukei/sankou/code/country_e.htm (English. Japanese is available.)

data.csv.gz
countries.csv
hs_sections.csv
hs_chapters_to_sections.csv
hs_sections_no_to_descriptions.csv

The spreadsheet solution

I won’t go into much detail here. First of all, we add worksheets for all of the above files (or reference external files). Then we add a column to compute the first two digits in the HS codes, using a function like MID(C2,2,2). We use VLOOKUP() to look up the HS section. (Perhaps we use another VLOOKUP() for the country codes.) Then we create a pivot table. (It would be more efficient to VLOOKUP() from the pivot table, but while I believe that to be possible in Excel, I’m not sure it’s possible in OpenOffice/LibreOffice.)

Anyway, using spreadsheets is rather user-friendly, but large files take quite a while to process. Adding extra columns to the original data is very inconvenient too. (Using calculated fields in Excel may help with this.)

The CLI/GNU(?) solution

We are going to make use of GNU datamash here. GNU datamash is capable of grouping and summing, which is already halfway there. For the lookups, we use the join command(!), which is part of coreutils.

We need to do some minor pre-processing, as we do not want the header rows in this solution:

tail -n +1 data.csv > data_nh.csv
tail -n +1 countries.csv > countries_nh.csv
tail -n +1 hs_sections.csv > hs_sections_nh.csv

The other files do not have any headers. So far so good, but using common CLI tools gets a bit awkward in the next step, cutting off characters in the middle of the HS code field. Let’s isolate that field:

$ cut -d, -f3 data_nh.csv | head -n3
'010121000'
'010121000'
'010121000'

Then we cut off the unneeded characters:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | head -n3
01
01
01

Then we need to re-add the other columns. This is one of the slightly awkward steps when doing this using CLI tools. Let’s isolate the other relevant columns first though:

$ cut -d, -f4,9 data_nh.csv | head -n3
103,2100
105,1800
205,84220

To paste these two columns back onto the first isolated columns, we use the aptly(?) named paste command. The -d option allows use to combine fields using the comma operator. (Default is tab.) We’ll pass the HS section as standard input, and the other two relevant columns using bash’s <().

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | head -n3
01,103,2100
01,105,1800
01,205,84220

What we have now is a trimmed CSV that goes “HS section”,”Country Code”,”Amount”.

The “VLOOKUP” part is slightly tricky. We are going to use the little-known join command, which is included in coreutils. Some HS sections and some country names have commas in them, which are a bit inconvenient, but not a huge problem as the result of the “VLOOKUP” is attached to the right of the entire original data.

Here’s a quick demonstration of the join command. (Note: countries_nh.csv is pre-sorted. Everything passed to join must be sorted.)

$ echo 222 | join -t, - countries_nh.csv
222,"Finland"

In Excel, we are able to safely group by cells that may contain commas, but not so in datamash. I left out something above: We’ve got the HS chapter code above, but from this chapter code, we wanted to look up the HS section, and group by that section. So let’s go back one step and use join to get us the HS section number from the HS chapter number. Note that all join input must be sorted, so before we add a pipe to sort on the newly created fourth field:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | head -n3
00,103,276850736,0
00,105,721488020,0
00,106,258320777,0

Getting the sort command to sort correctly by a single field isn’t very easy. If it weren’t for the –debug option that is! In this case we want to sort by the first field, so the command becomes ‘sort -n -t, -k1,1’. (Start field == end field == 1, so -k1,1.) Debug output looks like this:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 --debug | head
sort: using ‘en_US.UTF-8’ sorting rules
00,103,276850736
__
________________
00,105,721488020
__
________________
00,106,258320777
__
________________

The field that has been sorted gets underlined. Great! Now let’s do the pivoting part using datamash:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | datamash -s -t, groupby 2,1,4 sum 3
103,00,276850736
103,01,418085
103,03,14476769

The -s option sorts, which is required when using groupby. The -t option selects ‘,’ as the delimiter. This command groups by column 2 (country), and then by column 4 (our HS section number), and computes a sum of column 3 for this grouping. So this is it! If we know the country codes and HS sections by heart, that is.

Well, our above output from datamash starts with the country code, and things are nice and sorted, so we just have to join again:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | datamash -s -t, groupby 2,4 sum 3 | join -t, - countries_nh.csv | head -n3
103,0,276850736,"Republic of Korea"
103,1,15325799,"Republic of Korea"
103,10,50079044,"Republic of Korea"

Next we would like to look up the HS section number to get the HS section description. In the above commands, we joined on the first field, but fortunately join supports joining on different fields. We need to sort on the second field and then tell join to join on the second field, which can be accomplished by using the -1 option and specifying 2 . (So -1 2 or simply -12, though that may look confusing.)

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | datamash -s -t, groupby 2,4 sum 3 | join -t, - countries_nh.csv | sort -n -t, -k2,2 | join -12 -t, - hs_sections_no_to_descriptions.csv | head -n3
00,103,276850736,"Republic of Korea","Unknown"
00,105,721488020,"People's Republic of China","Unknown"
00,106,258320777,"Taiwan","Unknown"

That’s it! To get e.g. Finland, we’ll make it easy for ourselves and just grep for Finland:

$ cut -d, -f3 data_nh.csv | cut -c 2-3 | paste -d, - <(cut -d, -f4,9 data_nh.csv) | sort -n -t, -k1,1 | join -t, - hs_chapters_to_sections.csv | datamash -s -t, groupby 2,4 sum 3 | join -t, - countries_nh.csv | sort -n -t, -k2,2 | join -12 -t, - hs_sections_no_to_descriptions.csv | grep Finland
0,222,1758315,"Finland","Unknown"
2,222,10613,"Finland","VEGETABLE PRODUCTS"
3,222,654,"Finland","ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES"
4,222,45021,"Finland","PREPARED FOODSTUFFS; BEVERAGES, SPIRITS AND VINEGAR; TOBACCO AND MANUFACTURED TOBACCO SUBSTITUTES"
5,222,33611,"Finland","MINERAL PRODUCTS"
6,222,2624353,"Finland","PRODUCTS OF THE CHEMICAL OR ALLIED INDUSTRIES"
7,222,4880410,"Finland","PLASTICS AND ARTICLES THEREOF; RUBBER AND ARTICLES THEREOF"
8,222,12557,"Finland","RAW HIDES AND SKINS, LEATHER, FURSKINS AND ARTICLES THEREOF; ADDLERY AND HARNESS; TRAVEL GOODS, HANDBAGS AND SIMILAR CONTAINERS; ARTICLES OF ANIMAL GUT (OTHER THAN SILK-WORM GUT)"
9,222,3766,"Finland","WOOD AND ARTICLES OF WOOD; WOOD CHARCOAL; CORK AND ARTICLES OF CORK; MANUFACTURES OF STRAW, OF ESPARTO OR OF OTHER PLAITING MATERIALS; BASKETWARE AND WICKERWORK"
10,222,38476,"Finland","PULP OF WOOD OR OF OTHER FIBROUS CELLULOSIC MATERIAL; RECOVERED (WASTE AND SCRAP) PAPER OR PAPERBOARD; PAPER AND PAPERBOARD AND ARTICLES THEREOF"
11,222,527084,"Finland","TEXTILES AND TEXTILE ARTICLES"
12,222,1541,"Finland","FOOTWEAR, HEADGEAR, UMBRELLAS, SUN UMBRELLAS, WALKING-STICKS, SEAT-STICKS, WHIPS, RIDING-CROPS AND PARTS THEREOF; PREPARED FEATHERS AND ARTICLES MADE THEREWITH; ARTIFICIAL FLOWERS; ARTICLES OF HUMAN HAIR"
13,222,991508,"Finland","ARTICLES OF STONE, PLASTER, CEMENT, ASBESTOS, MICA OR SIMILAR MATERIALS; CERAMIC PRODUCTS; GLASS AND GLASSWARE"
14,222,5757,"Finland","NATURAL OR CULTURED PEARLS, PRECIOUS OR SEMI-PRECIOUS STONES, PRECIOUS METALS, METALS CLAD WITH PRECIOUS METAL AND ARTICLES THEREOF; IMITATION JEWELLERY; COIN"
15,222,971561,"Finland","BASE METALS AND ARTICLES OF BASE METAL"
16,222,14614308,"Finland","MACHINERY AND MECHANICAL APPLIANCES; ELECTRICAL EQUIPMENT; PARTS THEREOF; SOUND RECORDERS AND REPRODUCERS, TELEVISION IMAGE AND SOUND RECORDERS AND REPRODUCERS, AND PARTS AND ACCESSORIES OF SUCH ARTICLES"
17,222,13427653,"Finland","VEHICLES, AIRCRAFT, VESSELS AND ASSOCIATED TRANSPORT EQUIPMENT"
18,222,4062385,"Finland","OPTICAL, PHOTOGRAPHIC, CINEMATOGRAPHIC, MEASURING, CHECKING, PRECISION, MEDICAL OR SURGICAL INSTRUMENTS AND APPARATUS; CLOCKS AND WATCHES; MUSICAL INSTRUMENTS; PARTS AND ACCESSORIES THEREOF"
19,222,4550,"Finland","ARMS AND AMMUNITION; PARTS AND ACCESSORIES THEREOF"
20,222,399367,"Finland","MISCELLANEOUS MANUFACTURED ARTICLES"
21,222,20539,"Finland","WORKS OF ART, COLLECTORS' PIECES AND ANTIQUES"

If you need to match column names exactly, you could replace the above grep by something like this:

... | grep -P '^.*?,.*?,.*?,"Finland"

Though personally I’d maybe use awk for that. On my machine, executing this takes 0.675 seconds. That’s pretty fast!

The q solution

q is a tool that allows you to perform SQL queries on CSV files from the comfort of the command line. If you know SQL, that should be pretty cool!

We’ve got some issues with digits and hyphens in the column names, so we first pre-process to get rid of those:

$ head -n 1 data.csv | tr '1-9' 'A-I' | sed 's/-//g'; tail -n +2 data.csv

This uses tr to replace the digits 1-9 with corresponding letters and sed to get rid of hyphens. We pipe this into q. The q command itself looks like this:

$ q -d, -H 'select Description, sum(ValueYear) from - JOIN hs_sections.csv ON substr(HS,2,2)=Number JOIN countries.csv ON Country=CountryID where CountryName="Finland" group by Description'

-d specifies the delimiter, -H specifies the presence of a header row (and we can use these header names in the query!) and the rest is just SQL.

$ time (head -n 1 data.csv | tr '1-9' 'A-I' | sed 's/-//g'; tail -n +2 data.csv) | q -d, -H 'select Description, sum(ValueYear) from - JOIN hs_sections.csv ON substr(HS,2,2)=Number JOIN countries.csv ON Country=CountryID where CountryName="Finland" group by Description'
ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES,654
ARMS AND AMMUNITION; PARTS AND ACCESSORIES THEREOF,4550
"ARTICLES OF STONE, PLASTER, CEMENT, ASBESTOS, MICA OR SIMILAR MATERIALS; CERAMIC PRODUCTS; GLASS AND GLASSWARE",991508
...
real    0m12.590s
user    0m12.364s
sys     0m0.236s

Wow, this was pretty pleasant, but it took my machine 12.59 seconds to get here. q uses sqlite, and we can use the -S option to save the resulting sqlite database to a file. Here’s an sqlite3 command that executes the same query on a saved database:

time sqlite3 data.sqlite 'select Description, sum(QuantityAYear) from `-` JOIN `hs_sections.csv` ON substr(HS,2,2)=Number JOIN `countries.csv` ON Country=CountryID where CountryName="Finland" group by Description;' 
ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES|0
ARMS AND AMMUNITION; PARTS AND ACCESSORIES THEREOF|0
ARTICLES OF STONE, PLASTER, CEMENT, ASBESTOS, MICA OR SIMILAR MATERIALS; CERAMIC PRODUCTS; GLASS AND GLASSWARE|7436
...
real    0m0.218s
user    0m0.192s
sys     0m0.024s

As you can see, that is pretty fast. So it spends quite a bit of time importing the CSV file. Well, as it turns out, sqlite3 supports importing from CSV files as well. Here’s a command that creates a database in test.sqlite, imports the three required CSV files, and runs the query:

$ time (sqlite3 -csv test.sqlite '.import data.csv data'; sqlite3 -csv test.sqlite '.import hs_sections.csv hs_sections'; sqlite3 -csv test.sqlite '.import countries.csv countries'; sqlite3 -csv test.sqlite 'select Description, sum(`Quantity1-Year`) from data JOIN hs_sections ON substr(HS,2,2)=Number JOIN countries ON Country=CountryID where CountryName="Finland" group by Description; ')
"ANIMAL OR VEGETABLE FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES",0
"ARMS AND AMMUNITION; PARTS AND ACCESSORIES THEREOF",0
"ARTICLES OF STONE, PLASTER, CEMENT, ASBESTOS, MICA OR SIMILAR MATERIALS; CERAMIC PRODUCTS; GLASS AND GLASSWARE",267696
...
real    0m2.581s
user    0m2.372s
sys     0m0.120s

This is much faster than using q, but may have various limitations. Note that if you feed sqlite3 commands through standard input, you can do all this in a single sqlite3 session, and the database can be entirely in-memory.

海外発行のクレジットカードで Amazon.co.jp で音楽を購入する方法

日本のクレジットカードを入手するのはなかなか困難なので、海外のクレジットカードをずっとそのまま使い続ける方、結構いらっしゃるではないでしょうか?少なくとも私はそうです。国際送金での返済など、不便は多いですね。

今回は、Amazon.co.jp のデジタルミュージックストアにて西野カナ氏の「abracadabra」という曲を購入しようとしたら、「 現在登録されているお支払い情報では、お客様のご購入を処理することができませんでした。日本国内で発行されたクレジットカードをご使用ください。」というエラーメッセージが表示され、購入できませんでした。(ちなみに、普段聞いている音楽は結構違うのですが、この曲のノリとかバックグラウンドのピアノとか、なかなか良かったです。)

でも実は、ギフトカード払いもできるのです。なお、ギフトカードを買うためにわざわざコンビニまで足を運ぶ必要もありません。なぜなら、Amazon のギフトカードは直接 Amazon で購入し、ギフトコードをメールで受け取ることができるからです: Amazonギフトカード(Eメールタイプ)。自分のメールアドレスを入力し、海外のクレジットカードで支払います。

デジタルミュージックストアに戻り再度購入しようとすると!あれ、同じエラーメッセージが表示されています。話を長くしても面白くないですよね。続いて、Amazon の支払い設定で、クレジットカード情報を消してみたら、見事、先程購入したギフトカードでの支払が可能になりました!

(実は、西野カナの「ダーリン」という曲も好きです。)

Can’t connect to (e.g.) GitLab, failing with “no hostkey alg”

If you get the following error message when connecting to a server (in my case, it was a GitLab instance running on Docker using something at least inspired by the official Docker image), you may be using an older SSH client, such as the one in RHEL/CentOS 6.

no hostkey alg

Some cursory web searching didn’t give me a satisfactory solution, so here goes: It seems likely that you’re using an older ssh client (for example, the one in CentOS 6.x). This client unfortunately doesn’t support the -Q option to list supported host keys, but we can figure out that information by doing the following:

ssh -vvvv 127.0.0.1

On a more modern system, you might get something like this:

debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa

(Which is the same as ssh -Q key, but harder to read.)

On older systems, you won’t get the helpful “host key algorithms:” label, but you’ll still get the information. So perhaps look out for a line that contains “ssh-rsa”.

Then, try the same ssh -vvvv 12.34.56.78 (replace 12.34.56.78 with the target server’s name or address), and look at the equivalent line. (Or if you have access, log into the server and try ssh -Q key.)

In my case, the client only had ssh-rsa and ssh-dsa, and the target server only listed ecdsa-sha2-nistp256. In my case, this could be solved by entirely on the client side. All we have to do is add an option to the command line and create a key if it doesn’t exist yet:

ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key
ssh -o HostKeyAlgorithms=ecdsa-sha2-nistp256,ssh-rsa 12.34.56.78

To avoid adding this option every time, you can add the following into your ~/.ssh/config:

Host 12.34.56.78
        HostKeyAlgorithms ecdsa-sha2-nistp256,ssh-rsa

(Or if you want this on all hosts: Host *)

Hope this helps.

Factors of the first 1,000,000 integer numbers

A quick Google search only brought up lists up to e.g. 1,000, so here’s a CSV with the factors for the first 1,000,000 integers. (Note that from a programming perspective, you’re most likely better off calculating the factors yourself rather than parsing a CSV file. So it’s not quite clear if this list is actually useful.) Feel free to do with it whatever you want.

1000000_factors.gz (23 MB)

Here’s an extract:

1,1
2,1
3,1
4,1,2
5,1
6,1,2,3
7,1
8,1,2,4
9,1,3
10,1,2,5
11,1
12,1,2,3,4,6
13,1
14,1,2,7
15,1,3,5
16,1,2,4,8
17,1
18,1,2,3,6,9
19,1
20,1,2,4,5,10
21,1,3,7
22,1,2,11
23,1
24,1,2,3,4,6,8,12
25,1,5
26,1,2,13
27,1,3,9
28,1,2,4,7,14
29,1
30,1,2,3,5,6,10,15
31,1
32,1,2,4,8,16
33,1,3,11
34,1,2,17
35,1,5,7
36,1,2,3,4,6,9,12,18
37,1
38,1,2,19
39,1,3,13
40,1,2,4,5,8,10,20
41,1
42,1,2,3,6,7,14,21
43,1
44,1,2,4,11,22
45,1,3,5,9,15
46,1,2,23
47,1
48,1,2,3,4,6,8,12,16,24
49,1,7
50,1,2,5,10,25