Mixiの(モバイルでの)署名付きリクエスト

この度、ガラケー向けのMixiアプリを作ることになりました。こういうアプリのリクエストは、実機 → Mixiが運営しているサーバー → アプリ用サーバー、という流れで送られるようです。(もちろんサーバーから実機への回答は逆の流れです。)Mixiのサーバーは、各リクエストに、ユーザーのMixi IDみたいな文字列を付けてくれて、そして本当にMixiのサーバーから来たリクエストかどうかを検証できるように、署名も付けてくれます。

Mixiの公式ドキュメンテーションはhttp://developer.mixi.co.jp/appli/spec/mob/validate-oauth-signature/にありますが、残念ながら、現在のところ、多少足りない部分がありまして…

まずは、下記のGETリクエストを例として上げているのだが、

http://example.com/foo/?opensocial_app_id=123&opensocial_owner_id=xxxxxxxx

「Ky/6LlDHpHX1EZMRi5mfUl9vxqY=」という署名になるには、「opensocial_owner_id」を「xxxxxxxx」ではなく、「456」に設定しないといけません。
なんで知ってるの!?
archive.orgのおかげです: https://web.archive.org/web/20100912034001/http://developer.mixi.co.jp/appli/spec/mob/validate-oauth-signature

続いて、「特殊な文字(漢字など)を含んでいるパラメーターはどうすればいいですか?」という質問に対する明らかな答えもなく、困っていました。
いろいろ試したところ、二重エンコーディングで対応できました。
つまり、「example.com/foo?test=テスト」みたいなリクエストは、ユーザー側のブラウザーから「example.com/foo?test=%E3%83%86%E3%82%B9%E3%83%88」のように送られます。それを「%25E3%2583%2586%25E3%2582%25B9%25E3%2583%2588」にすると検証が通ります。

test[foo]=barみたいな、角括弧が入っているパラメーターも同じように二重エンコーディングします。

Ruby on Railsだと、下記のようなコードで検証できます。

def mixi_signed_request_mobile?
    require 'cgi'
    oauth_header = request.headers["HTTP_AUTHORIZATION"].split(/\s*,\s*/)
    oauth_header_hash = Hash[oauth_header.map { |keqv| keqv.gsub('"', '').split('=') }]

    base_string_array = [
      "oauth_consumer_key=" + @client_id,
      "oauth_nonce=" + oauth_header_hash["oauth_nonce"],
      "oauth_signature_method=" + "HMAC-SHA1",
      "oauth_timestamp=" + oauth_header_hash["oauth_timestamp"],
      "oauth_version=" + "1.0"
    ]
    if request.get?
      uri = URI.parse(request.url)
      query_array = URI.decode_www_form(uri.query)
      base_string_array += query_array.map{|param| "#{CGI.escape(param[0])}=#{CGI.escape(param[1])}"} # takes care of opensocial_app_id and opensocial_owner_id as well
    else
      # don't need the actual parameters for post requests (mixi-specific awkwardness)
      base_string_array += ["opensocial_app_id=#{params["opensocial_app_id"]}", "opensocial_owner_id=#{params["opensocial_owner_id"]}"]
    end
    base_string_array.sort!
    url = request.original_url.gsub(/\?.*/, '') # can't use request.base_url + request.path because the trailing slash is sometimes(?) cut off
    baseString = request.method + "&" + CGI.escape(url) + "&" + CGI.escape(base_string_array.join("&")) # [] and utf-8 are doubly percent-encoded!! e.g. "テスト" becomes "%25E3%2583%2586%25E3%2582%25B9%25E3%2583%2588"!!!!!

    sha1 = OpenSSL::HMAC::digest(OpenSSL::Digest::SHA1.new, @client_secret + "&", baseString)
    base64 = Base64.strict_encode64(sha1)

    return CGI.escape(base64) == oauth_header_hash["oauth_signature"]
  end

(The above explanation and code show how to verify Mixi’s OAuth 1.0 signatures, as used with Japanese feature phone apps.)

Excel character counter / エクセル文字カウンター

At work, somebody sometimes needs to count the length of selected text in Excel cells. Doing it in one’s head is understandably unpleasant, so why not see if there’s software out there that can count the number of characters in selected text? Well, I got pretty close at http://www.donationcoder.com/Software/Skrommel/index.html#DragKing. Since this is written in AutoHotkey, I’d just have to modify it a bit to count while the mouse cursor is being dragged. (Watch the video below to see it in action.) Unfortunately, pressing Ctrl+V while selecting text doesn’t work, so we instead send the WM_COPY message, like this:

SendMessage 0x301,0,0,%Control%, %WinTitle%  ; WM_COPY

Also it’s not cool if the script gets rid of your clipboard contents, so we restore that too. Get the script here: moji_counter_v3.ahk
Note that it’s partially localized to Japanese.

会社に、エクセルセル内の文字を数えたり区切ったりする作業をしている人がいるのですが、頭の中で数えるのはあまり楽しくないので、この度、どうにかならないかという相談を受けました。早速ネットで調べてみると、http://www.donationcoder.com/Software/Skrommel/index.html#DragKingでほぼ同仕様のソフトがありました!しかもAutoHotkeyなので、すぐにうちの用途に合せられます。

DragKingではマウスボタンを外すとテキストをコピーして文字列の長さをカウントしているのですが、文字列を選択している間にカウントできたらもっと便利なんじゃないかなと思って…しかし、Ctrl+Vを押しても文字がコピーされないので、

SendMessage 0x301,0,0,%Control%, %WinTitle%  ; WM_COPY

変わりましてWM_COPYというメッセージを送ってコピーさせます。後、DragKingの仕様と違って、コピー・カウントの後、クリップボードの内容を復活させます。
AutoHotkeyスクリプトを下記のリンクでダウンロードできます。
moji_counter_v3.ahk

ブラック心理テスト

I recently started a job at a Japanese app development company that is mostly making fortune telling apps and the occasional personality test. I’m getting to work on some interesting stuff, including app development, web development, one-off utilities, and server administration. My first assignment was a personality test, and since it’s released now, I think I’m just going to link to it below.

最近転職し、占いアプリなどを取り扱っている日本の会社で活躍することになりました。「ブラック心理テスト」というアプリの開発を頼まれたのですが、ようやくリリースできましたのでリンクを貼らせていただきます〜

ブラック心理テスト@Play Store (Android)
ブラック心理テスト@App Store (iPhone)

まぁ、とても簡単なアプリですが、友だちが「楽しいかも!」と言ってくれました^^

Male and female kanji (And how to process ENAMDICT’s XML in Perl)

The other day, someone told me that the kanji 漢 (the 漢 in 漢字) usually indicates a male name. This made me wonder what other kanji there are that might indicate a female or male name. So I downloaded JMnedict.xml and processed it a little bit.

Side note: Both XML::LibXML and MySQL’s import from XML thingy turned out to be uselessly slow, so I did this using XML::Twig. Here’s some lazy code to extract all the <keb> elements and their associated name_type:

#!/usr/bin/perl -w

use XML::Twig;
use strict;

binmode(STDOUT, ":utf8");

my $t = XML::Twig->new(twig_handlers => { entry => \&entry });

$t->parsefile($ARGV[0]);

sub entry {
    my ($t, $entry) = @_;
    my ($keb, $name_type);

    eval {
        $keb = $entry->first_child("k_ele")->first_child("keb")->text;
        $name_type = $entry->first_child("trans")->first_child("name_type")->text;
        $entry->purge;
    };
    print "$keb\t$name_type\n" if (!$@);
}

Now we have a file that looks like this:

ゝ泉    given name or forename, gender not specified
〆      female given name or forename
...

Great names, eh? Then we just do:

grep female all_names.csv > female_names.csv
grep "[^e]male" all_names.csv > male_names.csv
perl -C -n -e 'while ($_ =~ s/(\p{Block=CJK_Unified_Ideographs})//) { print "$1\n" }' female_names.csv | sort -n | uniq -c | sort -n > female_kanji.csv # Perl's -C option enables unicode everywhere. Unfortunately, this option doesn't work on the #! line.
perl -C -n -e 'while ($_ =~ s/(\p{Block=CJK_Unified_Ideographs})//) { print "$1\n" }' male_names.csv | sort -n | uniq -c | sort -n > male_kanji.csv
sed -i "s/^\s*//" female_kanji.csv # fix leading whitespace chars from uniq -c
sed -i "s/^\s*//" male_kanji.csv

Then we put the two files into two different sheets of the same file in a spreadsheet program. (I used LibreOffice, but Excel is better. Seriously.) Call one sheet “Female”, the other “Male”, and on a third sheet, concatenate the two lists of kanji and filter out the duplicates (in column A), and use the following formulas for columns B to E, respectively:

=IF(ISERROR(VLOOKUP(A2, Female.$A$1:$B$1601, 2, 0)), 0, VLOOKUP(A2, Female.$A$1:$B$1601, 2, 0))
=IF(ISERROR(VLOOKUP(A2, Male.$A$1:$B$1601, 2, 0)), 0, VLOOKUP(A2, Male.$A$1:$B$1601, 2, 0))
=IF(ISERROR(B2/C2), 1000000, B2/C2)
=B2+C2

(In LibreOffice Calc, you use a period instead of an exclamation mark to reference cells on a different sheet in formulas. LibreOffice Calc doesn’t support the IFERROR() function. I know that 1,000,000 is not the answer to n/0, but we’d like a high number for sorting purposes.) Copy the formulas down and perhaps add the following headers in the top row: Kanji, Female, Male, Ratio, Count. Copy the whole thing without formulas to a new sheet, sort by ratio, and then by count. Perhaps filter out all the kanji that have a count < 10. Here’s a link to my files: kanji_usage_in_names.ods (OpenDocument) and kanji_usage_in_names.xlsx (Excel 2007+).

So it turns out that there are hundreds of kanji that are strong indicators for the gender of a name. By the way, JMnedict.xml’s data isn’t very good: for example, even names like 大介 aren’t gender-classified yet. We’ve got only 1,719 unique kanji for all the gender-classified names, 1,601 unique kanji for female names, and 873 unique kanji for male names. Pretty low and weird numbers. So don’t expect too much accuracy.

Automatically resizing images with Word VBA

For not-too-interested readers: Use the third code snippet.

It’s time for a new post on VBA! Somebody at work is creating a whole lot of training material in Word. Great. Fast forward a couple months, and all of a sudden it’s my job to fix these up. They have all kinds of formatting inconsistencies, and many of the screenshots were resized without preserving their aspect ratios.

I thought, great, I’ll just write some VBA that fixes the screenshots. I quickly came up with something like the following:

Sub resize_all_images_to_page_width()
    For Each inline_shape In ThisDocument.InlineShapes
        inline_shape.LockAspectRatio = msoFalse
        inline_shape.ScaleWidth = 100
        inline_shape.ScaleHeight = 100
        percent = doc.PageSetup.TextColumns.Width / inline_shape.Width
        inline_shape.ScaleWidth = percent * 100
        inline_shape.ScaleHeight = percent * 100
    Next
End Sub

This would in theory scale every image to fit to the page’s width.

And it worked. For almost all of the images. For some, it did something incomprehensible (Word 2010) and turned images into a very long and extremely thin strip of pixels. I tried various things (including .Reset) and did some googling, but couldn’t find a better way to fix the images. What I really wanted was to find out an image’s original width and height, just the way it’s displayed in the Format Picture dialog in the Size tab, but there’s no way to access those values.
So to get an image’s original size, I decided to create a new temporary document, copy and paste the image there, set the .ScaleWidth and .ScaleHeight options to 100, and then look at its .Width and .Height properties. Yay, there we go!
I also decided I don’t want to scale images beyond 100% and added some extra logic to handle that. Here’s the result:

Sub resize_all_images_up_to_page_width()
    Set current_doc = ThisDocument
    Set new_doc = Documents.Add(DocumentType:=wdNewBlankDocument)

    current_doc.Activate
    For Each ishape In current_doc.InlineShapes
        ' ishape.Copy ' doesn't work
        ishape.Select ' <work-around>
        Selection.Copy ' </work-around>
        new_doc.Content.PasteAndFormat (wdPasteDefault)
        Set new_ishape = new_doc.InlineShapes(1)
        new_ishape.LockAspectRatio = msoFalse
        new_ishape.ScaleWidth = 100
        new_ishape.ScaleHeight = 100
        ishape.LockAspectRatio = msoFalse
        If (new_ishape.Width > current_doc.PageSetup.TextColumns.Width) Then
            ishape.Width = current_doc.PageSetup.TextColumns.Width
            ishape.Height = (current_doc.PageSetup.TextColumns.Width / new_ishape.Width) * new_ishape.Height
        Else
            ishape.Width = new_ishape.Width
            ishape.Height = new_ishape.Height
        End If
        new_ishape.Delete
        ishape.LockAspectRatio = msoTrue
    Next
End Sub

And finally, we’d like to make sure images don’t get longer than the length of a page:

Sub resize_all_images_up_to_page_width()
    Set current_doc = ThisDocument
    Set new_doc = Documents.Add(DocumentType:=wdNewBlankDocument)
    page_width = current_doc.PageSetup.TextColumns.Width
    page_height = current_doc.PageSetup.PageHeight - current_doc.PageSetup.TopMargin - current_doc.PageSetup.BottomMargin

    current_doc.Activate
    For Each ishape In current_doc.InlineShapes
        ' ishape.Copy ' doesn't work
        ishape.Select ' <work-around>
        Selection.Copy ' </work-around>
        new_doc.Content.PasteAndFormat (wdPasteDefault)
        Set new_ishape = new_doc.InlineShapes(1)
        new_ishape.LockAspectRatio = msoFalse
        new_ishape.ScaleWidth = 100
        new_ishape.ScaleHeight = 100
        ishape.LockAspectRatio = msoFalse
        If (new_ishape.Width > page_width) Then
            If ((page_width / new_ishape.Width) * new_ishape.Height > page_height) Then
                ishape.Width = page_height / new_ishape.Height * page_width
                ishape.Height = page_height
            Else
                ishape.Width = page_width
                ishape.Height = (page_width / new_ishape.Width) * new_ishape.Height
            End If
        ElseIf (new_ishape.Height > page_height) Then ' going to be shrinking both height and width, and width is okay already, so it'll be even okayer
            ishape.Width = page_height / new_ishape.Height * new_ishape.Width
            ishape.Height = page_height
        Else
            ishape.Width = new_ishape.Width
            ishape.Height = new_ishape.Height
        End If
        new_ishape.Delete
        ishape.LockAspectRatio = msoTrue
    Next
End Sub

Hope this helps! If you have any  questions, feel free to leave a comment.

Twitter and Google+

We’re now on Twitter and Google+! Hope we’ll come up with some good tweets. Got to be careful not to turn it into a microblog for Visio haters. ;D

Wir sind jetzt auch auf Twitter und Google+! Hoffe, dass uns bald ein paar gute Tweets einfallen werden, und dass die Angelegenheit nicht zu einem Anti-Visio-Microblog verkommt. ;D

Twitter (ツイッター)Google+を始めました!ビジオに関するツイートを控えるようにします!約束です!(・∀・)

How to Create Word Documents from Excel using VBA and Fit Strings into Single Lines

On my day job, I was tasked to add a button to an Excel spreadsheet that converts that spreadsheet into a Word file, for improved formatting and aesthetics. This is a pretty straightforward task:

  1. Add a reference to the Microsoft Word Object Library (Tools → References) to get access to the wd-constants
  2. It’s easiest to access things in table cells, so create a Word template that uses tables wherever you want your macro to copy something over from the Excel file.
  3. Write your code around the following skeleton code:
Set objWord = CreateObject("Word.Application") ' Create Word object
Set objDoc = objWord.Documents.Add(Template:=ActiveWorkbook.Path & "\template.dotx") ' Open new document from given template
Set objTable = objDoc.Tables(1) ' Get reference to first table in template
objTable.Cell(1, 1).Range.Text = Cells(1, 1).Text ' Copy cell at row 1, column 1 in current Excel sheet to row 1, column 1 in first table in Word file
' Save file (we don't care about the .docx and just want a .pdf)
FileName = Application.GetSaveAsFilename(fileFilter:="PDF files (*.pdf), *.pdf") ' Note that this is Excel's dialog, but that doesn't matter, as we only need the user to enter a file name/select a file
objDoc.SaveAs2 save_name, wdFormatPDF
objWord.Quit wdDoNotSaveChanges
ActiveWorkbook.FollowHyperlink save_name

In case you need backwards compatibility with older versions of Office, add checks like the following before specifying a template or saving as a PDF file:

If (Application.Version > 12) Then
    ' ...
Else
    ' ...
End If

For older versions of Office, you will need a PDF printer to create PDFs. You can display the print dialog box by doing:

objWord.Dialogs(wdDialogFilePrint).Show

If, unlike me, you do want to display the Word document, get rid of the objWord.Quit line (and perhaps the lines related to saving to PDF) and do the following:

objWord.Visible = True

That’s that, and now we come to the actually interesting part. The part I’ve alluded to in the title of this post. Say you have a table in your Word template that gets populated with a person’s name and email address. You’ve carefully set your column widths to sane values, and don’t want them to change, so you’ve got the “Automatically resize to fit contents” option (Table Properties → Table → Options) turned off. What happens when you get a long name or email address? The name or email address gets split somewhere, and now you’ve got two lines. Ugly! You’ve tried the “Fit text” option (Table Properties → Cell → Options) and felt sorely disappointed. What do we do? Perhaps you’ve tried to access the row’s Height property, and quickly realized that it’s not very useful, as you either get a value of 9999,  or 12 if you’ve changed the height policy, along with a table row that doesn’t look like before. I am sure there are other ways to solve this problem, but this is what I did:

n_lines_before = objDoc.ComputeStatistics(wdStatisticLines) ' Get total number of lines in document
.Cell(3, 4).Range.Text = Cells(26, "AJ").Text
n_lines_after = objDoc.ComputeStatistics(wdStatisticLines) ' Get total number of lines in document, again
Do While (n_lines_after > n_lines_before) ' If the data didn't fit in our cell, the total number of lines will have increased
    .Cell(3, 4).Range.Font.Size = .Cell(3, 4).Range.Font.Size - 1 ' Make the font size a little smaller until the total number of lines returns to being the same as before again
    n_lines_after = objDoc.ComputeStatistics(wdStatisticLines)
Loop

This post is not exactly related to Qiqitori, but I hope it’s of some use to someone, somewhere! It certainly serves as documentation for the code I’ve written, if anything. If you have any questions or better ways of doing things, feel free to leave a comment.

Free eBook/Kostenloses eBook: 小松左京『お召し』Komatsu Sakyô: “Omeshi”/„Der Abruf“ (German)

In this post, we provide a download link to a free (German) eBook for your reading pleasure (if you can read German, that is): Komatsu Sakyô’s “Omeshi”. This is a German translation of a Japanese short story, so it doesn’t make much sense to write a post about this in English, soeverything below this paragraph will be in German. It might still be a good idea to convey a tiny bit of information in English, however: Komatsu Sakyô (1931-2011) (where “Komatsu” is the family name) was a Japanese science fiction author. “Omeshi” was first published in Japan’s S-Fマガジン (S-F Magazine) in 1964. As far as we know, this short story has never been translated into a western language before. We do not have the distribution rights for the Japanese version, and we do not have the rights to sell our translation for money. Nonetheless, donations are appreciated! If you don’t want to spend any money, it would be just as awesome (or even awesomer!) if you could tell possibly interested friends or acquaintances about Qiqitori, this blog, and/or about this eBook. Please note that we do not have the rights to pass on distribution rights of this eBook to you. This means that you shouldn’t just pass on the files you have downloaded here, but rather link to this blog post. We would license this eBook under a Creative Commons license if we could, but as far as we know, we can’t. :(

In diesem Blog-Eintrag stellen wir ein kostenloses eBook zur Verfügung: „Der Abruf“ von Komatsu Sakyô (1931-2011) (wobei Komatsu der Familienname ist), einem japanischen Science-Fiction-Autor. „Omeshi“ ist eine Kurzgeschichte und wurde das erste Mal im japanischen S-Fマガジン (S-F Magazine) im Jahre 1964 veröffentlicht. Nach unserem Wissen hat es bisher noch keine Übersetzungen in eine westliche Sprache gegeben. Wir sind leider nicht im Besitz der notwendigen Rechte, um die japanische Version zum kostenlosen Herunterladen zur Verfügung zu stellen, und die deutsche Version dürfen wir auch nur kostenlos anbieten, aber falls Sie merken, dass Ihnen das Lesen dieser Kurzgeschichte Spaß gemacht hat, würden wir uns um eine kleine Spende freuen. Noch besser wäre es jedoch, wenn Sie möglicherweise interessierte Freunde und Bekannte auf Qiqitori, dieses Blog und/oder dieses eBook aufmerksam machen könnten! Bitte beachten Sie, dass wir nicht im Besitz der Rechte sind, das Distributionsrecht dieser Übersetzung an Sie weiterzugeben. Das heißt, dass Sie möglicherweise Interessierten nicht die Datei selber, sondern einen Link zu diesem Blog-Eintrag geben sollten. Wir würden dieses eBook unter einer Creative-Commons-Lizenz veröffentlichen, wenn wir könnten, aber soweit wir wissen, dürfen wir das nicht. :(

Damit Sie dieses eBook auf möglichst angenehme Art und Weise lesen können, stellen wir es in verschiedenen Formaten zur Verfügung. Drücken Sie auf einen der folgenden Links, um das eBook herunterzuladen.

Aktuelle Versionsnummer: 1.0 für alle Formate.

Komatsu Sakyo – Der Abruf (PDF, A5-Format)

Komatsu Sakyo – Der Abruf (Mobipocket (.mobi), unter anderem für den Amazon Kindle)

Komatsu Sakyo – Der Abruf (EPUB, unter anderem für den Barnes & Noble Nook, Kobo eReader, Sony Reader und viele eBook-Apps auf Smartphones)

How to Calculate Distance Tables Using OpenStreetMaps

For an assignment on my job, I had to compute distances from a couple dozen locations in Japan (which I will call “source locations” in this article) to all postal codes that exist in Japan. Japan has approximately 123,401 postal codes, and we don’t want the straight line distance, but actual routes. So what do we do now? There’s no clear answer, but you may find an idea or two after reading this article.

Geocoding

“Geocoding” refers to the conversion of addresses to latitude/longitude pairs. In our case, (disregarding the couple dozen source locations,) addresses consist of only a postal code (and the country, which is the same for all addresses). Unfortunately, there is no publicly accessible database that matches postal codes with latitude/longitude pairs. (For Japan, at least. Maybe you’ll find that such a database exists for your country.) You could, however, cross-reference data from this database from the Ministry of Land, Infrastructure, Transport and Tourism and this one from the Japan Post website, but realize that this would be very hard because of differences in the spellings of names (e.g., 旭ヶ丘 vs. 旭ケ丘 and オシツクシ vs. 白糠町). It’s best to buy a database, especially if you’re doing this for a company. GeoPostcodes is one company that sells such databases, and at the time of this writing, the database for Japan costs €69.95.

You could also scrape the Google Maps API, but that would violate Google’s terms of service in most cases. You’d also be limited to (currently) 3,000 queries per IP per day. (You could also get access to the enterprise version for $10,000 per year. You’d still be limited to 100,000 requests per day. Hmm. I wonder if you’re allowed to resell your access.) At the time of this writing, you probably won’t find a geocoding API that will let you do this both comfortably and without violating its terms of service.

Calculating routes

With its 3,000 queries per IP per day limit, you won’t be able to use the Google Maps API or any other API (e.g., Bing Maps’ or Nokia’s) to calculate a couple million distances. (Probably not even mapquest open, which is free and merely requests that you ask for permission before firing off thousands of queries.) This is where OpenStreetMap comes in. You will be calculating routes on your own computer. You will find that there is a lot of open-source software to calculate routes using OpenStreetMap data. I’ve tested the following programs (in the following order):

  • Gosmore: Gave up after waiting over 24 hours for the conversion process from OpenStreetMap data (which is XML) to Gosmore’s data format to complete
  • Routino: Slow for long distances (think 60 seconds or more for routes that are longer than 1000 km), interface is extremely easy to use programmatically
  • Navit: Slightly buggy, reasonably fast, interface is either graphical or dbus-based and hard to use programmatically
  • OSRM: Extremely fast, interface is reasonably easy to use programmatically

OSRM is the clear winner here. OSRM manages to calculate even long routes within a few milliseconds. However, you’ll need some dead-serious hardware to convert OpenStreetMap data to a format OSRM can use. The conversion tool ended up using about 30 GB of memory (if I remember correctly) to convert OpenStreetMap data for Japan.

OSRM will start a multithreaded web server (on any port you wish). Within a perl script, you could perhaps perform queries like this:

$response = `curl -s \"http://127.0.0.1:5000/viaroute?loc=$lat1,$long1&loc=$lat2,$long2\"`;

OSRM by default returns JSON, and will by default return alternate routes in addition to the route it deems fastest. By the way, calculating all routes took less than half a day (using ~24 not-too-modern Opteron cores).

Occasionally, OSRM will fail to find a route between two points. In our case, this happened 5,078 times, and it happens for the following reasons:

  • There is no road nearby. (A single postal code can cover a large area.)
  • Isolated road network
    Isolated road network

    The road that is closest to the specified coordinates is not connected to the wider road network. You might be on an island somewhere. (E.g., Hokkaido! Note that routes in Hokkaido were calculated completely separately from the other routes, and the numbers and statistics in this article may disregard Hokkaido.) Or there might be an error in the OpenStreetMap data. (By the way, it would be great if you could correct some of these errors! Anybody can edit OpenStreetMap data.)

We can fix a large number of these broken routes by calculating routes to points in the vicinity of the latitude/longitude pair in question. I chose to check the points 200 m to the north, south, west, and east, and if that failed, incremented the distance to 400, 600, 800, and finally 1,000 m. This reduced the number of broken routes to 1,411.

So what do we do now? We get rid of all postal codes that point to an island. That will eliminate a couple hundred postal codes, and it’s fun, because you’ll get to click around a lot, just like in a game! We’re going to add map markers on a Google Map for all broken postal codes and then click away and make a list of the ones we don’t need. How do you add markers on a Google Map? The answer is only one Google search away. We take the code from the JSFiddle demo and modify it a little bit to make markers go away when we click them and add the relevant latitude/longitude pair to a text box. That leaves only 336 postal codes. Here’s a link to the modified code: display_lat_long_pairs_on_map_bad_zips.html

So what do you do with the remaining 336 postal codes? That’s up to you to decide. I suggest trying mapquest open.

Other facts about postal codes

  • There are buildings in Japan that have multiple postal codes, e.g., one for each floor.
  • The term “ZIP code” only applies to postal codes in the US. “ZIP” is short for “Zone Improvement Plan”. You can read all about it in the Wikipedia article.
  • The number of Japanese postal codes changes (very slightly) every month. (You can find a CSV file that matches postal codes with addresses on the Japan Post website. There you’ll also find small files that contain the updates to this database.) Codes get added, deleted, and re-assigned.

If you have any questions, feel that I’ve left out something important, know of any reasonable alternatives to using OSRM for this, or just found this article helpful, please leave a comment! And if you need distance tables and don’t feel like calculating them yourself, feel free to ask me. I’ll probably manage to get them calculated for you very quickly (for a very modest fee, which mostly depends on if I need to buy external databases).

On Buying a 定期券 (teikiken, Commuter Pass) and Long Commutes in Japan

Wow, my first post about Japan! I like it here, but unfortunately, I don’t have anything particularly wonderful to relate to you today. In fact, this might very well come across as a complaint!

I live near Tokyo, and the other day I bought a teikiken (commuter pass that is valid for at least one month) that is (among other routes) valid between Shinjuku and Shinagawa (using the Yamanote line). At ¥5,670 for one month, it wasn’t too expensive, in my opinion. This part of the commute would normally cost ¥190 without a commuter pass and the average number of workdays per month is 21.741 (this figure doesn’t include public holidays). So that is ¥190*21.741, which equals ¥4,130.79. Oops, normally you don’t just want to get back from work, it is often expected that you actually go to work, too! So we need to multiply this by two, and then we get ¥8,261.58. So we save about ¥2,591, or 31%. Not bad. (Note that as pointed out in the comments, the amount you save varies depending on the route, and may just barely reach break-even in some cases!)

As you might have guessed, there’s something about this recent experience that I haven’t told you yet. You can apparently buy commuter passes at ticket vending machines, but I opted to buy mine at a JR ticket counter. Mostly because I didn’t want to make any mistakes. And who knows what might have happened, because I have a special Suica card that you can only get when you buy a Narita Express+Suica bundle. (It looks much better than a regular Suica card.) Anyway, the woman at the ticket counter told me that I have to buy a regular one; special ones can’t be turned into commuter passes. There’s a ¥500 fee, but you get those ¥500 back if you decide you don’t want your card anymore. So no problem there. One more thing: I need my commuter pass for four months, and you can only buy commuter passes that are valid for 1, 3, or 6 months. So I had the choice between buying a one-month pass first and a three-month pass later, buying both at the same time, or buying a three-month pass first and a one-month pass later. If you ever find yourself in a situation like this, definitely buy the one that’s valid for the shorter period first. That way you can maybe switch to a different route later, or maybe you’ll realize that you’ve been scammed somehow. Who knows, right?

The night I bought the ticket, I met up with a friend in Shinbashi, which is on the Yamanote line, but not between Shinjuku and Shinagawa. (It’s three more stops from Shinagawa.) I sort of had a feeling that I was going to be disappointed with the way your fare is handled when you want to go a couple stations farther than your commuter pass is valid for, and my hunch was right: you pay the full fare from the station where your commuter pass ends to the station you get off, which means paying ¥150 for a trip from, e.g., Shinjuku to Tamachi, which is one stop farther than Shinagawa, even though the normal fare from Shinjuku to Tamachi is ¥190. So you’ve already got a ticket that covers 82.8% of the total distance of your journey, but you have to pay an extra 78.9% for the remaining 17.2%.

Careful readers might have noticed that the fare from Shinjuku to Tamachi is the same as from Shinjuku to Shinagawa. When I got home the night I bought the ticket, I decided to check what a commuter pass from Shinjuku to Tamachi would cost. And amazingly enough, the price is exactly the same. In fact, the price is the same or less, almost no matter where you want to go on the Yamanote line. (However, since the Yamanote line is a loop line, you won’t be able to get off for free at any stations on the opposite semicircle.) I would have expected someone who sells commuter passes to tell customers about this, and feel that it’s borderline fraud not to do so. Expensive fares* and packed trains had me slightly miffed at Japanese railway companies before this, but this time I’m fairly annoyed.

* What in the world do they do with all that fare money? By the way, according to this article, approximately one passenger per car is enough to cover the electricity bill for trains.

It turns out that at least one person has written an online tool that tells you how to buy your commuter pass: http://teiki.mooo.com/ (Japanese) This site also includes a tool that searches for regular, non-teikiken routes for Tokyo and Tokyo’s suburbs. For one-way trips, it’ll normally produce the same output as Yahoo!, (with the 表示順序 (hyōji-junjo, display order) set to 料金が安い順 (ryōkin ga yasui jun, order by fare, cheap to expensive),  except that Yahoo!’s route calculator sometimes won’t give you the cheapest route if it decides that the next time this route is available is too many minutes in the future… However, especially for round-trips, it’s sometimes cheaper to get a day pass, and teiki.mooo.com will tell you when that is the case.

My commute is two hours every day. Once I’ve reached Shinjuku, it feels like I’ve practically almost reached my workplace. How does one survive a two-hour commute? By sleeping, of course! Unfortunately, there usually aren’t any seats available where I get on the train. So what do you do in this case? You stand next to somebody who’s wearing a high school uniform. These people don’t have long commutes, and you’ll be able to get their seat after a couple stops! By the way, I used to use a different, slightly faster route that pretty much guarantees that you’ll get a seat. Unfortunately, the company I work for only pays for the cheapest route available. Another by the way: the one-way tickets are only an extra ¥20 for this route, but the commuter pass is an extra ¥8,000. Similarly, there is a route that is ¥30 cheaper than the route via Shinjuku, but the commuter pass is an extra ¥7,000… (Its trains are also extremely packed.) By the way, if your company says they’ll pay for a commuter pass, they’ll probably also pay for one-way tickets before you manage to get your commuter pass. Use that time to find a decent route! If they don’t, figure out how much you save if you had a commuter pass, and if using one-way tickets for a while isn’t that much more expensive, consider using these for a while.

Now I’d like to share one more mind-boggling tidbit: let’s pretend we’re taking a train from Odawara to Shinagawa. Let’s not take a look at the cheapest route, which uses the Odakyū line, but rather JR’s Tōkaidō line via Yokohama: ¥1,280. Now let’s see how much we pay if we change to the Keikyū line in Yokohama: ¥1,240. Next we’ll look at the Tōkaidō line’s fare from Yokohama to Shinagawa: ¥280. Now you’d naturally expect the Keikyū line’s fare from Yokohama to Shinagawa to be less than that, wouldn’t you? Sorry to disappoint you, but it’s ¥290! And here’s the explanation (Japanese).

There’s one more thing to add about the route from Odawara to Shinagawa. If you want to buy a teikiken, it’s cheaper to get a 分割定期券 (bunkatsu teikiken, split commuter pass). This basically divides your route into a part from Odawara to Yokohama, and a part from Yokohama to Shinagawa. You can put this on a single Suica card, though the person at the counter might tell you that is not possible, if he or she is slightly inexperienced. Well, okay, it used to be impossible. Anyway, you don’t have to get off and on again at Yokohama, you just stay on your train. However, if you want a commuter pass from Odawara to Shinagawa, you might want to decide against a split one, because you can’t combine these with 特急料金回数券 (tokkyū ryōkin kaisūken, kaisūken: usually 11 tickets for the price of 10, or 10 tickets for a cheaper price; this kaisūken is an add-on to your normal teikiken, and covers the price difference for the shinkansen (tokkyū ryōkin)). And if you do commute from Odawara to Shinagawa, I am sure you will want to use the Shinkansen occasionally. (The kaisūken is ¥9,500 for 10 tickets.) I like this. The JR are doing something right for once! By the way, the tokkyū ryōkin for this route is ¥1,680, so this kaisūken is way better value (about 43% off) than when you buy 11 tickets for the price of 10 (about 9% off).

Okay, one more annoyance that I’ve accidentally discovered: if you enter Akihabara station, realize that you actually need to get on the train from Iwamotochō, and decide to use the exit closer to Iwamotochō station on your way out of Akihabara station, you’ll get money stolen from your Suica card! Fortunately, the ticket gate will beep and not let you through automatically (if I remember correctly). You’ll have to go the person watching the ticket gates. They’ll tell you that you have to either go back to the exit you came from, or pay (approximately) ¥120 to use this exit. Lovely!

And that’s it for now. If you’ve had any annoying experiences regarding Japanese trains, please leave a comment. Don’t let it eat you up! :P