Automatically resizing images with Word VBA

For not-too-interested readers: Use the third code snippet.

It’s time for a new post on VBA! Somebody at work is creating a whole lot of training material in Word. Great. Fast forward a couple months, and all of a sudden it’s my job to fix these up. They have all kinds of formatting inconsistencies, and many of the screenshots were resized without preserving their aspect ratios.

I thought, great, I’ll just write some VBA that fixes the screenshots. I quickly came up with something like the following:

Sub resize_all_images_to_page_width()
    For Each inline_shape In ThisDocument.InlineShapes
        inline_shape.LockAspectRatio = msoFalse
        inline_shape.ScaleWidth = 100
        inline_shape.ScaleHeight = 100
        percent = doc.PageSetup.TextColumns.Width / inline_shape.Width
        inline_shape.ScaleWidth = percent * 100
        inline_shape.ScaleHeight = percent * 100
    Next
End Sub

This would in theory scale every image to fit to the page’s width.

And it worked. For almost all of the images. For some, it did something incomprehensible (Word 2010) and turned images into a very long and extremely thin strip of pixels. I tried various things (including .Reset) and did some googling, but couldn’t find a better way to fix the images. What I really wanted was to find out an image’s original width and height, just the way it’s displayed in the Format Picture dialog in the Size tab, but there’s no way to access those values.
So to get an image’s original size, I decided to create a new temporary document, copy and paste the image there, set the .ScaleWidth and .ScaleHeight options to 100, and then look at its .Width and .Height properties. Yay, there we go!
I also decided I don’t want to scale images beyond 100% and added some extra logic to handle that. Here’s the result:

Sub resize_all_images_up_to_page_width()
    Set current_doc = ThisDocument
    Set new_doc = Documents.Add(DocumentType:=wdNewBlankDocument)

    current_doc.Activate
    For Each ishape In current_doc.InlineShapes
        ' ishape.Copy ' doesn't work
        ishape.Select ' <work-around>
        Selection.Copy ' </work-around>
        new_doc.Content.PasteAndFormat (wdPasteDefault)
        Set new_ishape = new_doc.InlineShapes(1)
        new_ishape.LockAspectRatio = msoFalse
        new_ishape.ScaleWidth = 100
        new_ishape.ScaleHeight = 100
        ishape.LockAspectRatio = msoFalse
        If (new_ishape.Width > current_doc.PageSetup.TextColumns.Width) Then
            ishape.Width = current_doc.PageSetup.TextColumns.Width
            ishape.Height = (current_doc.PageSetup.TextColumns.Width / new_ishape.Width) * new_ishape.Height
        Else
            ishape.Width = new_ishape.Width
            ishape.Height = new_ishape.Height
        End If
        new_ishape.Delete
        ishape.LockAspectRatio = msoTrue
    Next
End Sub

And finally, we’d like to make sure images don’t get longer than the length of a page:

Sub resize_all_images_up_to_page_width()
    Set current_doc = ThisDocument
    Set new_doc = Documents.Add(DocumentType:=wdNewBlankDocument)
    page_width = current_doc.PageSetup.TextColumns.Width
    page_height = current_doc.PageSetup.PageHeight - current_doc.PageSetup.TopMargin - current_doc.PageSetup.BottomMargin

    current_doc.Activate
    For Each ishape In current_doc.InlineShapes
        ' ishape.Copy ' doesn't work
        ishape.Select ' <work-around>
        Selection.Copy ' </work-around>
        new_doc.Content.PasteAndFormat (wdPasteDefault)
        Set new_ishape = new_doc.InlineShapes(1)
        new_ishape.LockAspectRatio = msoFalse
        new_ishape.ScaleWidth = 100
        new_ishape.ScaleHeight = 100
        ishape.LockAspectRatio = msoFalse
        If (new_ishape.Width > page_width) Then
            If ((page_width / new_ishape.Width) * new_ishape.Height > page_height) Then
                ishape.Width = page_height / new_ishape.Height * page_width
                ishape.Height = page_height
            Else
                ishape.Width = page_width
                ishape.Height = (page_width / new_ishape.Width) * new_ishape.Height
            End If
        ElseIf (new_ishape.Height > page_height) Then ' going to be shrinking both height and width, and width is okay already, so it'll be even okayer
            ishape.Width = page_height / new_ishape.Height * new_ishape.Width
            ishape.Height = page_height
        Else
            ishape.Width = new_ishape.Width
            ishape.Height = new_ishape.Height
        End If
        new_ishape.Delete
        ishape.LockAspectRatio = msoTrue
    Next
End Sub

Hope this helps! If you have any  questions, feel free to leave a comment.

How to Create Word Documents from Excel using VBA and Fit Strings into Single Lines

On my day job, I was tasked to add a button to an Excel spreadsheet that converts that spreadsheet into a Word file, for improved formatting and aesthetics. This is a pretty straightforward task:

  1. Add a reference to the Microsoft Word Object Library (Tools → References) to get access to the wd-constants
  2. It’s easiest to access things in table cells, so create a Word template that uses tables wherever you want your macro to copy something over from the Excel file.
  3. Write your code around the following skeleton code:
Set objWord = CreateObject("Word.Application") ' Create Word object
Set objDoc = objWord.Documents.Add(Template:=ActiveWorkbook.Path & "\template.dotx") ' Open new document from given template
Set objTable = objDoc.Tables(1) ' Get reference to first table in template
objTable.Cell(1, 1).Range.Text = Cells(1, 1).Text ' Copy cell at row 1, column 1 in current Excel sheet to row 1, column 1 in first table in Word file
' Save file (we don't care about the .docx and just want a .pdf)
FileName = Application.GetSaveAsFilename(fileFilter:="PDF files (*.pdf), *.pdf") ' Note that this is Excel's dialog, but that doesn't matter, as we only need the user to enter a file name/select a file
objDoc.SaveAs2 save_name, wdFormatPDF
objWord.Quit wdDoNotSaveChanges
ActiveWorkbook.FollowHyperlink save_name

In case you need backwards compatibility with older versions of Office, add checks like the following before specifying a template or saving as a PDF file:

If (Application.Version > 12) Then
    ' ...
Else
    ' ...
End If

For older versions of Office, you will need a PDF printer to create PDFs. You can display the print dialog box by doing:

objWord.Dialogs(wdDialogFilePrint).Show

If, unlike me, you do want to display the Word document, get rid of the objWord.Quit line (and perhaps the lines related to saving to PDF) and do the following:

objWord.Visible = True

That’s that, and now we come to the actually interesting part. The part I’ve alluded to in the title of this post. Say you have a table in your Word template that gets populated with a person’s name and email address. You’ve carefully set your column widths to sane values, and don’t want them to change, so you’ve got the “Automatically resize to fit contents” option (Table Properties → Table → Options) turned off. What happens when you get a long name or email address? The name or email address gets split somewhere, and now you’ve got two lines. Ugly! You’ve tried the “Fit text” option (Table Properties → Cell → Options) and felt sorely disappointed. What do we do? Perhaps you’ve tried to access the row’s Height property, and quickly realized that it’s not very useful, as you either get a value of 9999,  or 12 if you’ve changed the height policy, along with a table row that doesn’t look like before. I am sure there are other ways to solve this problem, but this is what I did:

n_lines_before = objDoc.ComputeStatistics(wdStatisticLines) ' Get total number of lines in document
.Cell(3, 4).Range.Text = Cells(26, "AJ").Text
n_lines_after = objDoc.ComputeStatistics(wdStatisticLines) ' Get total number of lines in document, again
Do While (n_lines_after > n_lines_before) ' If the data didn't fit in our cell, the total number of lines will have increased
    .Cell(3, 4).Range.Font.Size = .Cell(3, 4).Range.Font.Size - 1 ' Make the font size a little smaller until the total number of lines returns to being the same as before again
    n_lines_after = objDoc.ComputeStatistics(wdStatisticLines)
Loop

This post is not exactly related to Qiqitori, but I hope it’s of some use to someone, somewhere! It certainly serves as documentation for the code I’ve written, if anything. If you have any questions or better ways of doing things, feel free to leave a comment.

How to Calculate Distance Tables Using OpenStreetMaps

For an assignment on my job, I had to compute distances from a couple dozen locations in Japan (which I will call “source locations” in this article) to all postal codes that exist in Japan. Japan has approximately 123,401 postal codes, and we don’t want the straight line distance, but actual routes. So what do we do now? There’s no clear answer, but you may find an idea or two after reading this article.

Geocoding

“Geocoding” refers to the conversion of addresses to latitude/longitude pairs. In our case, (disregarding the couple dozen source locations,) addresses consist of only a postal code (and the country, which is the same for all addresses). Unfortunately, there is no publicly accessible database that matches postal codes with latitude/longitude pairs. (For Japan, at least. Maybe you’ll find that such a database exists for your country.) You could, however, cross-reference data from this database from the Ministry of Land, Infrastructure, Transport and Tourism and this one from the Japan Post website, but realize that this would be very hard because of differences in the spellings of names (e.g., 旭ヶ丘 vs. 旭ケ丘 and オシツクシ vs. 白糠町). It’s best to buy a database, especially if you’re doing this for a company. GeoPostcodes is one company that sells such databases, and at the time of this writing, the database for Japan costs €69.95.

You could also scrape the Google Maps API, but that would violate Google’s terms of service in most cases. You’d also be limited to (currently) 3,000 queries per IP per day. (You could also get access to the enterprise version for $10,000 per year. You’d still be limited to 100,000 requests per day. Hmm. I wonder if you’re allowed to resell your access.) At the time of this writing, you probably won’t find a geocoding API that will let you do this both comfortably and without violating its terms of service.

Calculating routes

With its 3,000 queries per IP per day limit, you won’t be able to use the Google Maps API or any other API (e.g., Bing Maps’ or Nokia’s) to calculate a couple million distances. (Probably not even mapquest open, which is free and merely requests that you ask for permission before firing off thousands of queries.) This is where OpenStreetMap comes in. You will be calculating routes on your own computer. You will find that there is a lot of open-source software to calculate routes using OpenStreetMap data. I’ve tested the following programs (in the following order):

  • Gosmore: Gave up after waiting over 24 hours for the conversion process from OpenStreetMap data (which is XML) to Gosmore’s data format to complete
  • Routino: Slow for long distances (think 60 seconds or more for routes that are longer than 1000 km), interface is extremely easy to use programmatically
  • Navit: Slightly buggy, reasonably fast, interface is either graphical or dbus-based and hard to use programmatically
  • OSRM: Extremely fast, interface is reasonably easy to use programmatically

OSRM is the clear winner here. OSRM manages to calculate even long routes within a few milliseconds. However, you’ll need some dead-serious hardware to convert OpenStreetMap data to a format OSRM can use. The conversion tool ended up using about 30 GB of memory (if I remember correctly) to convert OpenStreetMap data for Japan.

OSRM will start a multithreaded web server (on any port you wish). Within a perl script, you could perhaps perform queries like this:

$response = `curl -s \"http://127.0.0.1:5000/viaroute?loc=$lat1,$long1&loc=$lat2,$long2\"`;

OSRM by default returns JSON, and will by default return alternate routes in addition to the route it deems fastest. By the way, calculating all routes took less than half a day (using ~24 not-too-modern Opteron cores).

Occasionally, OSRM will fail to find a route between two points. In our case, this happened 5,078 times, and it happens for the following reasons:

  • There is no road nearby. (A single postal code can cover a large area.)
  • Isolated road network
    Isolated road network

    The road that is closest to the specified coordinates is not connected to the wider road network. You might be on an island somewhere. (E.g., Hokkaido! Note that routes in Hokkaido were calculated completely separately from the other routes, and the numbers and statistics in this article may disregard Hokkaido.) Or there might be an error in the OpenStreetMap data. (By the way, it would be great if you could correct some of these errors! Anybody can edit OpenStreetMap data.)

We can fix a large number of these broken routes by calculating routes to points in the vicinity of the latitude/longitude pair in question. I chose to check the points 200 m to the north, south, west, and east, and if that failed, incremented the distance to 400, 600, 800, and finally 1,000 m. This reduced the number of broken routes to 1,411.

So what do we do now? We get rid of all postal codes that point to an island. That will eliminate a couple hundred postal codes, and it’s fun, because you’ll get to click around a lot, just like in a game! We’re going to add map markers on a Google Map for all broken postal codes and then click away and make a list of the ones we don’t need. How do you add markers on a Google Map? The answer is only one Google search away. We take the code from the JSFiddle demo and modify it a little bit to make markers go away when we click them and add the relevant latitude/longitude pair to a text box. That leaves only 336 postal codes. Here’s a link to the modified code: display_lat_long_pairs_on_map_bad_zips.html

So what do you do with the remaining 336 postal codes? That’s up to you to decide. I suggest trying mapquest open.

Other facts about postal codes

  • There are buildings in Japan that have multiple postal codes, e.g., one for each floor.
  • The term “ZIP code” only applies to postal codes in the US. “ZIP” is short for “Zone Improvement Plan”. You can read all about it in the Wikipedia article.
  • The number of Japanese postal codes changes (very slightly) every month. (You can find a CSV file that matches postal codes with addresses on the Japan Post website. There you’ll also find small files that contain the updates to this database.) Codes get added, deleted, and re-assigned.

If you have any questions, feel that I’ve left out something important, know of any reasonable alternatives to using OSRM for this, or just found this article helpful, please leave a comment! And if you need distance tables and don’t feel like calculating them yourself, feel free to ask me. I’ll probably manage to get them calculated for you very quickly (for a very modest fee, which mostly depends on if I need to buy external databases).