Excel character counter / エクセル文字カウンター

At work, somebody sometimes needs to count the length of selected text in Excel cells. Doing it in one’s head is understandably unpleasant, so why not see if there’s software out there that can count the number of characters in selected text? Well, I got pretty close at http://www.donationcoder.com/Software/Skrommel/index.html#DragKing. Since this is written in AutoHotkey, I’d just have to modify it a bit to count while the mouse cursor is being dragged. (Watch the video below to see it in action.) Unfortunately, pressing Ctrl+V while selecting text doesn’t work, so we instead send the WM_COPY message, like this:

SendMessage 0x301,0,0,%Control%, %WinTitle%  ; WM_COPY

Also it’s not cool if the script gets rid of your clipboard contents, so we restore that too. Get the script here: moji_counter_v3.ahk
Note that it’s partially localized to Japanese.



SendMessage 0x301,0,0,%Control%, %WinTitle%  ; WM_COPY



I recently started a job at a Japanese app development company that is mostly making fortune telling apps and the occasional personality test. I’m getting to work on some interesting stuff, including app development, web development, one-off utilities, and server administration. My first assignment was a personality test, and since it’s released now, I think I’m just going to link to it below.


ブラック心理テスト@Play Store (Android)
ブラック心理テスト@App Store (iPhone)


Male and female kanji (And how to process ENAMDICT’s XML in Perl)

The other day, someone told me that the kanji 漢 (the 漢 in 漢字) usually indicates a male name. This made me wonder what other kanji there are that might indicate a female or male name. So I downloaded JMnedict.xml and processed it a little bit.

Side note: Both XML::LibXML and MySQL’s import from XML thingy turned out to be uselessly slow, so I did this using XML::Twig. Here’s some lazy code to extract all the <keb> elements and their associated name_type:

#!/usr/bin/perl -w

use XML::Twig;
use strict;

binmode(STDOUT, ":utf8");

my $t = XML::Twig->new(twig_handlers => { entry => \&entry });


sub entry {
    my ($t, $entry) = @_;
    my ($keb, $name_type);

    eval {
        $keb = $entry->first_child("k_ele")->first_child("keb")->text;
        $name_type = $entry->first_child("trans")->first_child("name_type")->text;
    print "$keb\t$name_type\n" if (!$@);

Now we have a file that looks like this:

ゝ泉    given name or forename, gender not specified
〆      female given name or forename

Great names, eh? Then we just do:

grep female all_names.csv > female_names.csv
grep "[^e]male" all_names.csv > male_names.csv
perl -C -n -e 'while ($_ =~ s/(\p{Block=CJK_Unified_Ideographs})//) { print "$1\n" }' female_names.csv | sort -n | uniq -c | sort -n > female_kanji.csv # Perl's -C option enables unicode everywhere. Unfortunately, this option doesn't work on the #! line.
perl -C -n -e 'while ($_ =~ s/(\p{Block=CJK_Unified_Ideographs})//) { print "$1\n" }' male_names.csv | sort -n | uniq -c | sort -n > male_kanji.csv
sed -i "s/^\s*//" female_kanji.csv # fix leading whitespace chars from uniq -c
sed -i "s/^\s*//" male_kanji.csv

Then we put the two files into two different sheets of the same file in a spreadsheet program. (I used LibreOffice, but Excel is better. Seriously.) Call one sheet “Female”, the other “Male”, and on a third sheet, concatenate the two lists of kanji and filter out the duplicates (in column A), and use the following formulas for columns B to E, respectively:

=IF(ISERROR(VLOOKUP(A2, Female.$A$1:$B$1601, 2, 0)), 0, VLOOKUP(A2, Female.$A$1:$B$1601, 2, 0))
=IF(ISERROR(VLOOKUP(A2, Male.$A$1:$B$1601, 2, 0)), 0, VLOOKUP(A2, Male.$A$1:$B$1601, 2, 0))
=IF(ISERROR(B2/C2), 1000000, B2/C2)

(In LibreOffice Calc, you use a period instead of an exclamation mark to reference cells on a different sheet in formulas. LibreOffice Calc doesn’t support the IFERROR() function. I know that 1,000,000 is not the answer to n/0, but we’d like a high number for sorting purposes.) Copy the formulas down and perhaps add the following headers in the top row: Kanji, Female, Male, Ratio, Count. Copy the whole thing without formulas to a new sheet, sort by ratio, and then by count. Perhaps filter out all the kanji that have a count < 10. Here’s a link to my files: kanji_usage_in_names.ods (OpenDocument) and kanji_usage_in_names.xlsx (Excel 2007+).

So it turns out that there are hundreds of kanji that are strong indicators for the gender of a name. By the way, JMnedict.xml’s data isn’t very good: for example, even names like 大介 aren’t gender-classified yet. We’ve got only 1,719 unique kanji for all the gender-classified names, 1,601 unique kanji for female names, and 873 unique kanji for male names. Pretty low and weird numbers. So don’t expect too much accuracy.

Automatically resizing images with Word VBA

For not-too-interested readers: Use the third code snippet.

It’s time for a new post on VBA! Somebody at work is creating a whole lot of training material in Word. Great. Fast forward a couple months, and all of a sudden it’s my job to fix these up. They have all kinds of formatting inconsistencies, and many of the screenshots were resized without preserving their aspect ratios.

I thought, great, I’ll just write some VBA that fixes the screenshots. I quickly came up with something like the following:

Sub resize_all_images_to_page_width()
    For Each inline_shape In ThisDocument.InlineShapes
        inline_shape.LockAspectRatio = msoFalse
        inline_shape.ScaleWidth = 100
        inline_shape.ScaleHeight = 100
        percent = doc.PageSetup.TextColumns.Width / inline_shape.Width
        inline_shape.ScaleWidth = percent * 100
        inline_shape.ScaleHeight = percent * 100
End Sub

This would in theory scale every image to fit to the page’s width.

And it worked. For almost all of the images. For some, it did something incomprehensible (Word 2010) and turned images into a very long and extremely thin strip of pixels. I tried various things (including .Reset) and did some googling, but couldn’t find a better way to fix the images. What I really wanted was to find out an image’s original width and height, just the way it’s displayed in the Format Picture dialog in the Size tab, but there’s no way to access those values.
So to get an image’s original size, I decided to create a new temporary document, copy and paste the image there, set the .ScaleWidth and .ScaleHeight options to 100, and then look at its .Width and .Height properties. Yay, there we go!
I also decided I don’t want to scale images beyond 100% and added some extra logic to handle that. Here’s the result:

Sub resize_all_images_up_to_page_width()
    Set current_doc = ThisDocument
    Set new_doc = Documents.Add(DocumentType:=wdNewBlankDocument)

    For Each ishape In current_doc.InlineShapes
        ' ishape.Copy ' doesn't work
        ishape.Select ' <work-around>
        Selection.Copy ' </work-around>
        new_doc.Content.PasteAndFormat (wdPasteDefault)
        Set new_ishape = new_doc.InlineShapes(1)
        new_ishape.LockAspectRatio = msoFalse
        new_ishape.ScaleWidth = 100
        new_ishape.ScaleHeight = 100
        ishape.LockAspectRatio = msoFalse
        If (new_ishape.Width > current_doc.PageSetup.TextColumns.Width) Then
            ishape.Width = current_doc.PageSetup.TextColumns.Width
            ishape.Height = (current_doc.PageSetup.TextColumns.Width / new_ishape.Width) * new_ishape.Height
            ishape.Width = new_ishape.Width
            ishape.Height = new_ishape.Height
        End If
        ishape.LockAspectRatio = msoTrue
End Sub

And finally, we’d like to make sure images don’t get longer than the length of a page:

Sub resize_all_images_up_to_page_width()
    Set current_doc = ThisDocument
    Set new_doc = Documents.Add(DocumentType:=wdNewBlankDocument)
    page_width = current_doc.PageSetup.TextColumns.Width
    page_height = current_doc.PageSetup.PageHeight - current_doc.PageSetup.TopMargin - current_doc.PageSetup.BottomMargin

    For Each ishape In current_doc.InlineShapes
        ' ishape.Copy ' doesn't work
        ishape.Select ' <work-around>
        Selection.Copy ' </work-around>
        new_doc.Content.PasteAndFormat (wdPasteDefault)
        Set new_ishape = new_doc.InlineShapes(1)
        new_ishape.LockAspectRatio = msoFalse
        new_ishape.ScaleWidth = 100
        new_ishape.ScaleHeight = 100
        ishape.LockAspectRatio = msoFalse
        If (new_ishape.Width > page_width) Then
            If ((page_width / new_ishape.Width) * new_ishape.Height > page_height) Then
                ishape.Width = page_height / new_ishape.Height * page_width
                ishape.Height = page_height
                ishape.Width = page_width
                ishape.Height = (page_width / new_ishape.Width) * new_ishape.Height
            End If
        ElseIf (new_ishape.Height > page_height) Then ' going to be shrinking both height and width, and width is okay already, so it'll be even okayer
            ishape.Width = page_height / new_ishape.Height * new_ishape.Width
            ishape.Height = page_height
            ishape.Width = new_ishape.Width
            ishape.Height = new_ishape.Height
        End If
        ishape.LockAspectRatio = msoTrue
End Sub

Hope this helps! If you have any  questions, feel free to leave a comment.

How to Create Word Documents from Excel using VBA and Fit Strings into Single Lines

On my day job, I was tasked to add a button to an Excel spreadsheet that converts that spreadsheet into a Word file, for improved formatting and aesthetics. This is a pretty straightforward task:

  1. Add a reference to the Microsoft Word Object Library (Tools → References) to get access to the wd-constants
  2. It’s easiest to access things in table cells, so create a Word template that uses tables wherever you want your macro to copy something over from the Excel file.
  3. Write your code around the following skeleton code:
Set objWord = CreateObject("Word.Application") ' Create Word object
Set objDoc = objWord.Documents.Add(Template:=ActiveWorkbook.Path & "\template.dotx") ' Open new document from given template
Set objTable = objDoc.Tables(1) ' Get reference to first table in template
objTable.Cell(1, 1).Range.Text = Cells(1, 1).Text ' Copy cell at row 1, column 1 in current Excel sheet to row 1, column 1 in first table in Word file
' Save file (we don't care about the .docx and just want a .pdf)
FileName = Application.GetSaveAsFilename(fileFilter:="PDF files (*.pdf), *.pdf") ' Note that this is Excel's dialog, but that doesn't matter, as we only need the user to enter a file name/select a file
objDoc.SaveAs2 save_name, wdFormatPDF
objWord.Quit wdDoNotSaveChanges
ActiveWorkbook.FollowHyperlink save_name

In case you need backwards compatibility with older versions of Office, add checks like the following before specifying a template or saving as a PDF file:

If (Application.Version > 12) Then
    ' ...
    ' ...
End If

For older versions of Office, you will need a PDF printer to create PDFs. You can display the print dialog box by doing:


If, unlike me, you do want to display the Word document, get rid of the objWord.Quit line (and perhaps the lines related to saving to PDF) and do the following:

objWord.Visible = True

That’s that, and now we come to the actually interesting part. The part I’ve alluded to in the title of this post. Say you have a table in your Word template that gets populated with a person’s name and email address. You’ve carefully set your column widths to sane values, and don’t want them to change, so you’ve got the “Automatically resize to fit contents” option (Table Properties → Table → Options) turned off. What happens when you get a long name or email address? The name or email address gets split somewhere, and now you’ve got two lines. Ugly! You’ve tried the “Fit text” option (Table Properties → Cell → Options) and felt sorely disappointed. What do we do? Perhaps you’ve tried to access the row’s Height property, and quickly realized that it’s not very useful, as you either get a value of 9999,  or 12 if you’ve changed the height policy, along with a table row that doesn’t look like before. I am sure there are other ways to solve this problem, but this is what I did:

n_lines_before = objDoc.ComputeStatistics(wdStatisticLines) ' Get total number of lines in document
.Cell(3, 4).Range.Text = Cells(26, "AJ").Text
n_lines_after = objDoc.ComputeStatistics(wdStatisticLines) ' Get total number of lines in document, again
Do While (n_lines_after > n_lines_before) ' If the data didn't fit in our cell, the total number of lines will have increased
    .Cell(3, 4).Range.Font.Size = .Cell(3, 4).Range.Font.Size - 1 ' Make the font size a little smaller until the total number of lines returns to being the same as before again
    n_lines_after = objDoc.ComputeStatistics(wdStatisticLines)

This post is not exactly related to Qiqitori, but I hope it’s of some use to someone, somewhere! It certainly serves as documentation for the code I’ve written, if anything. If you have any questions or better ways of doing things, feel free to leave a comment.

How to Calculate Distance Tables Using OpenStreetMaps

For an assignment on my job, I had to compute distances from a couple dozen locations in Japan (which I will call “source locations” in this article) to all postal codes that exist in Japan. Japan has approximately 123,401 postal codes, and we don’t want the straight line distance, but actual routes. So what do we do now? There’s no clear answer, but you may find an idea or two after reading this article.


“Geocoding” refers to the conversion of addresses to latitude/longitude pairs. In our case, (disregarding the couple dozen source locations,) addresses consist of only a postal code (and the country, which is the same for all addresses). Unfortunately, there is no publicly accessible database that matches postal codes with latitude/longitude pairs. (For Japan, at least. Maybe you’ll find that such a database exists for your country.) You could, however, cross-reference data from this database from the Ministry of Land, Infrastructure, Transport and Tourism and this one from the Japan Post website, but realize that this would be very hard because of differences in the spellings of names (e.g., 旭ヶ丘 vs. 旭ケ丘 and オシツクシ vs. 白糠町). It’s best to buy a database, especially if you’re doing this for a company. GeoPostcodes is one company that sells such databases, and at the time of this writing, the database for Japan costs €69.95.

You could also scrape the Google Maps API, but that would violate Google’s terms of service in most cases. You’d also be limited to (currently) 3,000 queries per IP per day. (You could also get access to the enterprise version for $10,000 per year. You’d still be limited to 100,000 requests per day. Hmm. I wonder if you’re allowed to resell your access.) At the time of this writing, you probably won’t find a geocoding API that will let you do this both comfortably and without violating its terms of service.

Calculating routes

With its 3,000 queries per IP per day limit, you won’t be able to use the Google Maps API or any other API (e.g., Bing Maps’ or Nokia’s) to calculate a couple million distances. (Probably not even mapquest open, which is free and merely requests that you ask for permission before firing off thousands of queries.) This is where OpenStreetMap comes in. You will be calculating routes on your own computer. You will find that there is a lot of open-source software to calculate routes using OpenStreetMap data. I’ve tested the following programs (in the following order):

  • Gosmore: Gave up after waiting over 24 hours for the conversion process from OpenStreetMap data (which is XML) to Gosmore’s data format to complete
  • Routino: Slow for long distances (think 60 seconds or more for routes that are longer than 1000 km), interface is extremely easy to use programmatically
  • Navit: Slightly buggy, reasonably fast, interface is either graphical or dbus-based and hard to use programmatically
  • OSRM: Extremely fast, interface is reasonably easy to use programmatically

OSRM is the clear winner here. OSRM manages to calculate even long routes within a few milliseconds. However, you’ll need some dead-serious hardware to convert OpenStreetMap data to a format OSRM can use. The conversion tool ended up using about 30 GB of memory (if I remember correctly) to convert OpenStreetMap data for Japan.

OSRM will start a multithreaded web server (on any port you wish). Within a perl script, you could perhaps perform queries like this:

$response = `curl -s \"$lat1,$long1&loc=$lat2,$long2\"`;

OSRM by default returns JSON, and will by default return alternate routes in addition to the route it deems fastest. By the way, calculating all routes took less than half a day (using ~24 not-too-modern Opteron cores).

Occasionally, OSRM will fail to find a route between two points. In our case, this happened 5,078 times, and it happens for the following reasons:

  • There is no road nearby. (A single postal code can cover a large area.)
  • Isolated road network
    Isolated road network

    The road that is closest to the specified coordinates is not connected to the wider road network. You might be on an island somewhere. (E.g., Hokkaido! Note that routes in Hokkaido were calculated completely separately from the other routes, and the numbers and statistics in this article may disregard Hokkaido.) Or there might be an error in the OpenStreetMap data. (By the way, it would be great if you could correct some of these errors! Anybody can edit OpenStreetMap data.)

We can fix a large number of these broken routes by calculating routes to points in the vicinity of the latitude/longitude pair in question. I chose to check the points 200 m to the north, south, west, and east, and if that failed, incremented the distance to 400, 600, 800, and finally 1,000 m. This reduced the number of broken routes to 1,411.

So what do we do now? We get rid of all postal codes that point to an island. That will eliminate a couple hundred postal codes, and it’s fun, because you’ll get to click around a lot, just like in a game! We’re going to add map markers on a Google Map for all broken postal codes and then click away and make a list of the ones we don’t need. How do you add markers on a Google Map? The answer is only one Google search away. We take the code from the JSFiddle demo and modify it a little bit to make markers go away when we click them and add the relevant latitude/longitude pair to a text box. That leaves only 336 postal codes. Here’s a link to the modified code: display_lat_long_pairs_on_map_bad_zips.html

So what do you do with the remaining 336 postal codes? That’s up to you to decide. I suggest trying mapquest open.

Other facts about postal codes

  • There are buildings in Japan that have multiple postal codes, e.g., one for each floor.
  • The term “ZIP code” only applies to postal codes in the US. “ZIP” is short for “Zone Improvement Plan”. You can read all about it in the Wikipedia article.
  • The number of Japanese postal codes changes (very slightly) every month. (You can find a CSV file that matches postal codes with addresses on the Japan Post website. There you’ll also find small files that contain the updates to this database.) Codes get added, deleted, and re-assigned.

If you have any questions, feel that I’ve left out something important, know of any reasonable alternatives to using OSRM for this, or just found this article helpful, please leave a comment! And if you need distance tables and don’t feel like calculating them yourself, feel free to ask me. I’ll probably manage to get them calculated for you very quickly (for a very modest fee, which mostly depends on if I need to buy external databases).