“Embarrassingly parallel” algorithms can often make use of SIMD instructions like those that came with the SSE and AVX extensions. In the Python world, numpy is a very popular package to work with arrays. One of the first things I wondered when I started using numpy was, “How optimized is numpy?” Some quick investigation shows that it’s multi-threaded, and some googling shows that it uses SIMD instructions: https://stackoverflow.com/questions/17109410/how-can-i-check-if-my-installed-numpy-is-compiled-with-sse-sse2-instruction-set
Now, it’s a bit tedious to grep for strings like VADDPD in the disassembly, so this post develops a nicer method.
For the impatient, here’s an unorthodox dirty one-liner (it creates a temporary file) that does this for you. It requires jq and internet access to download a database.
tempfile=`mktemp`; curl https://raw.githubusercontent.com/asmjit/asmdb/488b6d986964627f0b130b5265722dde8d93f11d/x86data.js | cpp | sed -n '/^{/,/^}/ { p }' | jq '[ .instructions | .[] | { (.[0]): .[4] } ] | add' > $tempfile; objdump --no-show-raw-insn -M intel -d /usr/lib/python2.7/dist-packages/numpy/core/*.so | awk '{print $2}' | grep -v : | sort | uniq | while read line; do echo -n "$line "; output=$(jq "with_entries(select(.key | match(\"(^$line\\/|\\/$line\$|$line\\/|^$line\$)\"))) | to_entries | .[] | .value" $tempfile); if [ -z "$output" ]; then echo; else echo $output; fi; done > output_test; rm $tempfile
Note that it is not able to distinguish between e.g. AVX and AVX512. It always prints out the most advanced extension possible, so it will print out AVX512 if any AVX is used. If you want something better, check out the Node.js version at the bottom of this post.
And around this point we start the explanation for the less impatient readers: first of all, we need a database of CPU instructions, and a simple Google query brings up this: https://github.com/asmjit/asmdb (The following discussion is based on commit 488b6d986964627f0b130b5265722dde8d93f11d.)
This project is in JavaScript, and the data file isn’t quite in JSON, so let’s do some minor preprocessing first to make our database easier to use:
cpp x86data.js | sed -n '/^{/,/^}/ { p }' > json
cpp is the C preprocessor to remove comments (there are comments and even multi-line comments in the actual data). The sed bit looks for a line starting with a { and after that a line starting with a }, all the while printing out this whole block.
Next, we need to get a disassembly. Here’s an example for numpy’s .so files:
objdump --no-show-raw-insn -M intel -d /usr/lib/python2.7/dist-packages/numpy/core/*.so | grep -P "^ +[0-9a-z]+:" | awk '{print $2}' | sort | uniq > numpy_instructions
This will get us all instruction mnemonics used. We get a file like this:
adc add addpd addps addsd addss and andnpd andnps
Let’s go back to our data. Today, we’ll use jq as our main tool to get the job done (though it will be many times slower than if we wrote a simple script that loads the hash once and re-uses it for every input instruction). If we just want the instructions block, we could do this:
jq '.instructions' json > instructions
However, this tool is a real Swiss army knife. We can use the familiar concept of piping, and we can wrap things in arrays or hashes just by enclosing expressions in [] or {}. Here’s an entire command to get an array of hashes containing only the instruction and the corresponding extension from the json file:
jq '[ .instructions | .[] | {instruction: .[0], extension: .[4] } ]' json
.[] iterates over the array inside the instructions key. Every item in the array is piped to a bit of jq code that creates a hash with an instruction and an extension key, which correspond to array element 0 and 4 in the input data. So we get output like this:
[ { "instruction": "aaa", "extension": "X86 Deprecated OF=U SF=U ZF=U AF=W PF=U CF=W" }, { "instruction": "aas", "extension": "X86 Deprecated OF=U SF=U ZF=U AF=W PF=U CF=W" }, . . . ]
Now we’re going to do something slightly naughty. The extension field isn’t the same for all instructions with the same mnemonic, as different opcodes with the same mnemonics have been added to the instruction set over time. However, we don’t need to be that precise IMO, so we’re just going to merge everything into an object like {“mnemonic”: “extension info”}. First, let’s get an array of hashes:
jq '[ .instructions | .[] | { (.[0]): .[4] } ]' json | head [ { "aaa": "X86 Deprecated OF=U SF=U ZF=U AF=W PF=U CF=W" }, { "aas": "X86 Deprecated OF=U SF=U ZF=U AF=W PF=U CF=W" }, { "aad": "X86 Deprecated OF=U SF=W ZF=W AF=U PF=W CF=U" }, . . . ]
Now we just need to pipe this into the add filter to merge this array of hashes/objects into a single hash/object:
jq '[ .instructions | .[] | { (.[0]): .[4] } ] | add' json > mnem2ext.json
And the result is:
{ "aaa": "X86 Deprecated OF=U SF=U ZF=U AF=W PF=U CF=W", "aas": "X86 Deprecated OF=U SF=U ZF=U AF=W PF=U CF=W", "aad": "X86 Deprecated OF=U SF=W ZF=W AF=U PF=W CF=U", "aam": "X86 Deprecated OF=U SF=W ZF=W AF=U PF=W CF=U", "adc": "X64 OF=W SF=W ZF=W AF=W PF=W CF=X", "add": "X64 OF=W SF=W ZF=W AF=W PF=W CF=W", "and": "X64 OF=0 SF=W ZF=W AF=U PF=W CF=0", "arpl": "X86 ZF=W", "bndcl": "MPX X64", ... }
Wee! But how do we access the information in this file? Well, with jq of course (not efficient though):
while read line; do echo -n "$line "; jq ".$line" min.json; done < numpy_instructions
Here’s an extract from the output:
cvttpd2dq "SSE2" cvttps2dq "SSE2" cvttsd2si "SSE2 X64" cvttss2si "SSE X64" cwde "ANY" div "X64 OF=U SF=U ZF=U AF=U PF=U CF=U" divpd "SSE2" divps "SSE" divsd "SSE2" divss "SSE" fabs "FPU C0=U C1=0 C2=U C3=U" fadd "FPU C0=U C1=W C2=U C3=U"
Such a nice mix of instructions. <3 We have a few problems though. Here are some instructions that couldn’t resolved:
cmova null cmpneqss null ja null rep null seta null vcmplepd
A closer look at our database reveals that some instructions have slashes in them, like “cmova/cmovnbe”. These are aliases, so we should be able to detect these as well. jq sort of allows to search for keys using regex, though the syntax isn’t easy, and the bash escaping makes things a bit worse:
while read line; do echo -n "$line "; jq "with_entries(select(.key | match(\"(^$line\\/|\\/$line\$|$line\\/|^$line\$)\")))" min.json; done < numpy_instructions > output
Things have gotten a bit slower again, and the rest of our output looks a bit different too:
xor { "xor": "X64 OF=0 SF=W ZF=W AF=U PF=W CF=0" } xorpd { "xorpd": "SSE2" } xorps { "xorps": "SSE" }
We can’t get rid of the echo, otherwise we’ll have no way to tell if jq is finding the mnemonic or not. So we’ll use jq to fix the format. Here’s an easy example:
echo '{ "b": "c" }' | jq 'to_entries[]' [ { "key": "b", "value": "c" } ]
echo '{ "b": "c" }' | jq 'to_entries | .[] | .value' "c"
Here, we’re just converting the hash into an array (as we did above with with_entries), and only select the .values. We can just pipe this within jq:
while read line; do echo -n "$line "; jq "with_entries(select(.key | match(\"(^$line\\/|\\/$line\$|$line\\/|^$line\$)\"))) | to_entries | .[] | .value" min.json; done < numpy_instructions > output
However, we don’t get a newline when we didn’t find an instruction, so we work around this in bash:
while read line; do echo -n "$line "; output=$(jq "with_entries(select(.key | match(\"(^$line\\/|\\/$line\$|$line\\/|^$line\$)\"))) | to_entries | .[] | .value" min.json); if [ -z "$output" ]; then echo; else echo $output; fi; done < numpy_instructions > output
That leaves mostly pseudo-instructions. The following pseudo-instructions are not included in this database but would indicate SSE2: CMPEQPD, CMPLTPD, CMPLEPD, CMPUNORDPD, CMPNEQPD, CMPNLTPD, CMPNLEPD, CMPORDPD. These all belong to the CMPPD instruction introduced in SSE2, as far as I can tell. (https://www.felixcloutier.com/x86/CMPPD.html#tbl-3-2) It would make sense to have them in the database in this case, but I think I’ll leave well enough alone for now though.
Anyway, doing something like awk ‘{print $2}’ output | sed s/\”//g | sort | uniq shows that my numpy version may use instructions from the following sets:
ANY AVX AVX2 AVX512_BW AVX512_DQ AVX512_F CMOV FPU FPU_POP FPU_PUSH I486 MMX2 SSE SSE2 SSE4_1 X64
Well, that’s great. Let’s package this up into a shell script so it’s a bit easier to use. Just stick it in a directory that has cpu_extensions.min.json in it and it’ll work.
#!/bin/bash json_file=$(dirname $0)/cpu_extensions.min.json objdump --no-show-raw-insn -M intel -d $* | grep -P "^ [0-9a-z]+:" | awk '{print $2}' | sort | uniq | while read line; do echo -n "$line " output=$(jq "with_entries(select(.key | match(\"(^$line\\/|\\/$line\$|$line\\/|^$line\$)\"))) | to_entries | .[] | .value" $json_file); if [ -z "$output" ]; then echo; else echo $output | sed -e 's/"//g' -e 's/ .*//g' fi done
Also, here’s a more efficient (O(n)) implementation in Node.js. It gets away with much less pre-processing, all you have to do is:
sed -n '/^{/,/^}/ { p }' x86data.js > cpu_extensions.json
However, it doesn’t execute objdump for you, so you have to call it like this:
show_cpu_extensions.js <(objdump --no-show-raw-insn -M intel -d /usr/lib/python2.7/dist-packages/numpy/core/*.so | grep -P "^ +[0-9a-z]+:" | awk '{print $2}' | sort | uniq)
I’ve also made it display all possible extensions.
#!/usr/bin/nodejs var database_file; var disassembly_file; if (process.argv.length == 3) { // Use default database database_file = __dirname + "/cpu_extensions.json"; disassembly_file = process.argv[2]; } else if (process.argv.length == 4) { database_file = process.argv[2]; disassembly_file = process.argv[3]; } else { console.log("Usage: " + process.argv[1] + " [database] disassembly"); console.log(process.argv); process.exit(1); } var fs = require("fs"); var readline = require("readline"); var mnem2ext = {}; var obj = JSON.parse(fs.readFileSync(database_file, "utf8")); obj["instructions"].map(function(v, i) { var ext = v[4].replace(/ +[A-Z]+=.*/, "").replace(/ +.*/, ""); if (v[0].match(/\//)) { v[0].split("/").forEach(function(v, i) { if (!mnem2ext[v]) { mnem2ext[v] = {}; } mnem2ext[v][ext] = true; }); } else { if (!mnem2ext[v[0]]) { mnem2ext[v[0]] = {}; } mnem2ext[v[0]][ext] = true; } }); var lineReader = require("readline").createInterface({input: fs.createReadStream(disassembly_file)}); lineReader.on("line", function(line) { console.log(line + ": " + (mnem2ext[line] ? Object.keys(mnem2ext[line]).join(", ") : undefined)); });