Building a new ZX80 / Making PCBs in Inkscape

A while ago, I watched a series of videos by DrMattRegan on the ZX80 and was very impressed with the uber-hacks the designers put in there. I’m not going to go into much detail here, but here is a set of facts that may pique your interest:

  • The CPU’s A6 pin is permanently shorted to its \INT pin
  • The ZX80 has only 1 KB of RAM
  • The RAM is SRAM but the CPU’s internal DRAM refresh counter is not wasted

Additionally, the ZX81 was one of the first retro computers I ever repaired (article 1, article 2), which made the prospects of getting a ZX80 of my own even more attractive to me.

You have a lot of options if you want to build a ZX80 yourself. First of all, all the ICs (except the 2114 SRAM) can be bought new. There are multiple PCB designs that you can download for free, some electrically equivalent, some using chips that are a little more common than the ones on the original PCB. (For example, apart from the 2114 RAM, the 74LS93 is pretty rare because it is essentially useless nowadays. It was probably cheaper than the other 74-series counter ICs about 50 years ago, because a lot of “maybe nice to have” features were removed. It can easily be replaced using a different 74-series counter IC, and these days there really is no price difference.)

One thing that I really liked about the ZX81 that I worked on was the curved traces and the absence of solder mask on the traces. On this page by Grant Searle, you can find instructions to re-create the original PCB. There is a PDF that contains high-res images that look like this:

These can be printed out and turned into PCBs. I’m not much of a chemist, and even if the etching and tinning processes went well, I’d still need to do a bunch of drilling and then plate the newly drilled vias. Nowadays, there are companies such as PCBWay and JLCPCB that can do all this for you, and in an automated fashion, and for very attractive prices! (I do not intend to steer you away from etching PCBs yourself, in fact it always impresses me when people make PCBs themselves!)

However, these companies (currently?) do not accept bitmaps (as far as I know), they want Gerber files. So I traced the (600 dpi) bitmaps in Grant Searle’s PDF file and sent them out. I chose JLCPCB after some research because I found a pic of a board manufactured by JLCPCB that didn’t have any solder mask applied, and it looked pretty much the way I wanted it! However, I’m sure that PCBWay could do the same.

In order to get a board without solder mask, you have to get rid of the solder mask layer in the SVG or Gerber file, and you probably should also add a comment stating that this is intentional. I added this in my order, no guarantees the Chinese is correct: “The Gerber files don’t have a solder mask. This is intentional. Please do not apply a solder mask. Gerber文件没有焊膜。这是有意的。请不要应用焊膜。 我们有会说中文的人,所以如果你有什么问题,可以说中文。” I ordered 5 boards (which is the minimum order), lead-free HASL, white silkscreen (“ink-jet/screen printing silkscreen”), no gold fingers (not present on the original PCB either), regular PCB thickness (1.6 mm), regular outer copper weight (1 oz), regular via covering (tented), no castellated holes. JLCPCB’s customer service recommended I switch to ENIG, but it worked out fine with regular lead-free HASL. The price was $18.02 plus $11.70 shipping, minus the first time discount ($5).

When exporting the Gerber files in KiCad, you should follow the PCB manufacturer’s recommended settings. JLCPCB and probably most other places have a support page for this, this is JLCPCB’s.

A glimpse of the outcome

Tracing bitmapped PCB foils

I used Inkscape to trace the PCBs in the PDF. Tracing is easy, you just load a bitmap and then go to “Path” -> “Trace Bitmap…”. But how do you convert that into Gerber files? Well, there is an extension that converts SVG files to KiCad files, svg2shenzhen.

It still took me many, many hours though. Why?

Converting traced holes to actual drill holes

svg2shenzhen expects “circle” shapes in a layer called “Drill”. If that layer doesn’t exist and/or it doesn’t contain circles, there won’t be any drill holes in your Gerber file. I wrote a very simple Inkscape extension to do this: Converting paths to circles in Inkscape

But first, we need to separate out the holes into a single layer. To do this, we first break apart the path containing all the traces (“Path” -> “Break Apart”). Then everything that isn’t connected together, and everything overlapping other things will exist as separate paths. Holes overlap other things and thus will be separate paths. And they will be obscured by the outer path. Now you just click on the surrounding path and remove it, perhaps like this:

One more consideration is hole size. Though the details are a little hazy now, selecting multiple circles and changing their size all at once didn’t work very well for me in Inkscape, so I just edited the SVG file directly.

The traced circles are paths that more or less look like circles, sure. But they’re often slightly oval or otherwise unshapely and not all the same size. So when you (after using the extension) crack open the SVG in a text editor, you may see that the radii are slightly different everywhere:

holes_circles.svg:    <circle
holes_circles.svg-       cx="50.392669494501675"
holes_circles.svg-       cy="193.60723742169256"
holes_circles.svg-       r="0.3154674216925315"
holes_circles.svg:       id="circle12107" />
holes_circles.svg:    <circle
holes_circles.svg-       cx="64.40136232421558"
holes_circles.svg-       cy="193.60780999999997"
holes_circles.svg-       r="0.31595000000000084"
holes_circles.svg:       id="circle12109" />
holes_circles.svg:    <circle
holes_circles.svg-       cx="78.4180010571009"
holes_circles.svg-       cy="193.60854169447143"
holes_circles.svg-       r="0.3165816944714095"
holes_circles.svg:       id="circle12111" />

I just did a find and replace operation here and used holes sizes that seemed good to me (after JLCPCB support telling me that my holes were rather small). And damn, by sheer dumb luck, I chose the best hole sizes ever (0.4 mm) and my jumper wires fit right into them without using clips; they make almost perfect contact. Maybe 0.375 mm or so would have been even better? (The other hole sizes I chose were 0.8 mm and 1 mm.) (Note: PCB manufacturers don’t have every drill bit in the universe, so 0.4 mm and 0.375 mm may just end up being the same drill bit.)

Excellent hole size.

Update 2024-04-19: the holes for the headphone/microphone/power connectors are too tight though.
:(

Adhering to minimum spacing constraints

PCB manufacturers require that certain spacing constraints are observed. For example, traces may need to be 0.125 mm apart. The original bitmap you have may or may not adhere to the spacing constraints. I don’t know whether the bitmap in the PDF adheres to them! But I certainly do know that my order was rejected because the traces in my Gerber file weren’t quite up to snuff.

How would I even find out what my minimum spacing is? Well, I found this page by yet another company that makes PCBs: https://instantdfm.bayareacircuits.com/. This page analyzes your Gerber file and sends you a snazzy PDF with close-ups of your horrible transgressions. This is a screenshot from an early version of my Gerber files:

2.97 mil… that’s 0.075438 mm. Less than 0.125 mm!

So I manually fixed this location and re-submitted, and sure enough, there were plenty of other similar narrow gaps. So I decided there must be a way to get Inkscape to find these critical regions, and this is what I came up with:

I just gave every object a 0.125/2 mm = 0.0625 mm border, and then zoomed in a bunch to look for objects that were touching each other. (Actually I probably added some to that value, but a couple months have passed and I don’t remember.)

Border color is set to red here. Lots of… intimate traces!

Mistakes

As mentioned earlier, in Inkscape, when you break apart a path that contains “holes”, you’ll end up with a large mass obscuring the holes. Here’s a video of exactly that:

Look at the large black regions with holes, especially near the top left. The holes will “disappear” (they will be obscured) after breaking apart the path.

Well, I failed to notice that a number of holes within these black regions had been obscured. And thus they came out like this on the PCB:

IC3 and IC14 have their left pins all sewn together. There were a couple similar spots elsewhere, but you get the idea.

While annoying, I was able to fix this using a utility knife. Boy, that utility knife’s blades went dull quick. (Luckily, my utility knife is one where you can just snap off consumed blade chunks. Having done this for the first time in my life, it was quite scary, to be honest. I did it behind a glass window.)

Very beautiful, I know. However, once you have soldered the IC sockets, all this is mostly hidden underneath the sockets.

Procurement and assembly

After taking care of these mishaps, it was time to do a bunch of other stuff, such as celebrating Christmas and the New Year, and at some point bite the bullet and solder sockets and resistors and capacitors. I soldered everything using lead-free solder, and most or maybe even all of the components I put on the PCB are lead-free too, including the Z80! So maybe this is the first RoHS-compatible ZX81?! (Except the 2114 SRAM chips most likely aren’t RoHS-compatible. We’ll talk about those in a later section, BTW.)

I had half of the 74-series logic chips in stock. (From when I got Ben Eater’s DIY 8-bit computer, which I never assembled. I used the breadboards and some of the other components for a host of other things though!) The other half came from a small electronics shop right next to my office in Machida, サトー電気. I wanted a RoHS Z80 (print on chip ends in -PEG rather than -PEC), and bought it off Amazon.

Update 2024-04-19: Zilog/Littelfuse are reportedly discontinuing the original Z80 CPUs after almost 50 years of production! Maybe get them while they’re still available. Only source so far: https://www.mouser.com/PCN/Littelfuse_PCN_Z84C00.pdf (zilog.com still seems to list everything as “active” at the time of this writing)

For the ROM I’m just using my EEPROM that I’ve been using in other projects. Since it’s huge and has a slightly different pinout, I’m currently using a breadboard as an adapter.

Putting jumper wires in IC sockets permanently damages the sockets. Here I am plugging the jumper wires into a sacrificial socket that I plug into the soldered socket. This way the soldered socket doesn’t get damaged.

The only thing I couldn’t get anywhere, including Akihabara, is the 6.5 MHz oscillator. Many people use a 6.5536 MHz crystal in their ZX80 builds (hmm, I’ve seen a number like that before!) because the 6.5 MHz ones are pretty rare. But even those didn’t seem to exact in my neck of the woods. I ordered a 6.5 MHz oscillator off Digikey (through Marutsu) and will probably have it in a few days. (By the way, the original ZX80 used a 6.5 MHz ceramic resonator, but these seem to be just as unavailable. But who knows, maybe I could have gotten a 6 MHz one and filed it down a bit to get it to 6.5 MHz? But crystals work even better, and are probably a little better for the environment because ceramic resonators are made of lead zirconate titanate. Note that they are exempted from RoHS regulations and can be labeled RoHS3-compliant as long as their leads do not contain lead, I guess.) So I’m using a Raspberry Pi Pico to generate the clock signal for now.

I had some problems with the clock though:

  • I forgot to solder on R20. This resistor is needed in the clock circuit. Fixed using jumper wires.
  • I changed the Pico’s output pin from GPIO0 to GPIO2 and failed to re-wire. I then suspected the Z80 was bad and tested it on a breadboard with the data pins tied to ground to emulate NOPs. Eventually figured out the problem (dur). It was difficult to figure out at first because the circuitry on the PCB picked up the output from GPIO2 as noise, and amplified it to produce something like a clock signal. Except that this was a very dirty clock signal (see image below)! It is interesting that the Z80 seems to detect that the clock is a little cuckoo, and while it works for a split second, it quickly calls it quits and the address bus freezes in a random state. That seems like a useful thing to be aware of! (Could of course be that this is not an intentional feature, or that it’s just seeing a HALT instruction or something.)
  • While testing the Z80 on the breadboard, I set the Pico’s clock output to 3.25 MHz. I then forgot to change it back to 6.5 MHz.
Temporary R20 fix
Dirty clock that quickly made the Z80 seize up.

RAM problem

The 2114 RAM chips originally came out of the Commodore PET that I fixed two years ago. I removed them from the PET because they were a little faulty, but kept them because they still mostly worked. Lucky! I had three, and hoped two of them would work well enough to at least get the ZX80 booted. Well, I picked a lucky one and an unlucky one. I didn’t see anything on the composite output. (Which at the time of writing is just a hole in the PCB. The ZX80 and early versions of the ZX81 don’t produce a back porch, but that problem is somehow fixed or alleviated or otherwise rendered irrelevant inside the modulator. The problem just manifests itself when you decide that the wire leading into the modulator is now going to be composite output. Back when I was looking at the ZX81 at the computer museum in Oume, I used a 555-based circuit that I got from here to fix the problem.)

Not knowing whether it’s a RAM problem or a serious problem with the PCB, I broke out my Pico-based logic analyzer. This time I didn’t bother adding resistor dividers because all Picos I have (I think I have four) turned out to be a least a little 5V-tolerant, and I wouldn’t be using the logic analyzer for a long time.

I needed to make some minor modifications to the logic analyzer code: the reset circuit in the ZX80 uses a 220K pull-up resistor (it’s very high value because it is also a 1 second (or so) RC delay circuit), and the Pico by default has pull downs active on its input GPIOs. These pull downs are much lower in value than 220K, effectively keeping reset asserted forever. (They are surprisingly low!) So the init code now looks like this:

    stdio_init_all();
    gpio_init_mask(ALL_REGULAR_GPIO_PINS);
    gpio_set_dir_masked(ALL_REGULAR_GPIO_PINS, GPIO_IN);
    gpio_disable_pulls(TRIGGER_PIN); // default is pull down; this pull down is much higher in value the the zx80's reset circuit's pull-up and therefore holds the cpu in reset

Of course, it would probably be even better if we disabled all pulls on all GPIOs. Anyway, running the logic analyzer, I quickly found a RAM problem with bit 4.

Here is the logic analyzer output with the problematic signals:

    115       6 0283 40
    116      16 0283 2a
    117       1 0f10 2a
    118      21 0f50 00
    119       6 0284 00
    120      27 0284 0a
    121       6 0285 0a
    122      27 0285 40
    123       8 000a 40
    124      25 000a 21
    125       8 000b 21
    126      25 000b 40

The first number is just the line number, the second number is the number of times the third and fourth numbers are repeated in the logic analyzer output (which is not in sync with the clock, but much faster). You can ignore everything that is less than around 10. In the above output, we are fetching and then executing the instruction at 0x0283~0x0285. Here is the relevant assembly source:

        ld iy,04000h            ;0273   fd 21 00 40     . ! . @ 
        ld hl,04028h            ;0277   21 28 40        ! ( @ 
        ld (04008h),hl          ;027a   22 08 40        " . @ 
        ld (hl),080h            ;027d   36 80   6 . 
        inc hl                  ;027f   23      # 
        ld (0400ah),hl          ;0280   22 0a 40        " . @ 
        ld hl,(0400ah)          ;0283   2a 0a 40        * . @

So we can see that we are putting 0x4028 into hl, and then incrementing hl, which means that hl should now be 0x4029. The instruction at 0x280~0x282 puts hl into 0x400a, and the instruction at 0x0283~0x0285 reads it back from the same address. So we should be putting 0x4029 in there (though not shown above, we are) and should be reading 0x4029 back. But the relevant parts of the logic analyzer output are like this:

    124      25 000a 21
    126      25 000b 40

Don’t worry about this showing 000a and 000b rather than 400a and 400b. I just don’t have the higher address lines connected to the logic analyzer. We’re reading back 0x4021! That’s missing the fourth bit. So I put in the other RAM chip and bam! I see a lovely 15.something KHz signal on the through hole that would normally be occupied by one of the modulator’s leads! (Well, actually I saw a 7.65 KHz signal because I forgot to change the Pico’s clock output back to 6.5 MHz, but while that was very puzzling for a bit, it was very easy to fix.)

The computer resting on top of that old laptop is the newer one!
It’s working! Never mind the Hitachi MB-H2 in the reflection.
I’m currently using two of these smartphone stylus…es to operate the keyboard on the PCB (one for shift and the other for every other key).

There’s a problem though: I’m not able to press keys like ” and many other shifted keys. I haven’t properly looked into that problem yet. However, I did do a quick internet search and found someone with the same problem:

For example, all of 1 to 0 keys work but shifted, only 1 and 0 work. Some other shifted keys don’t work but there doesn’t seem to be a consistent pattern on each row. The display still flickers when say shift 2 or shift 9 (or other combinations) is pressed, just nothing else happens.

https://sinclairzxworld.com/viewtopic.php?p=32893&sid=b108eefd0c71a1bedd148b3049db2673#p32893

But their problem was apparently caused by using super long wires on their keyboard. I’m not doing that as far as I know?!

As my EEPROM is currently mounted on a breadboard and my EEPROM is 512 KB while the ZX80’s ROM is just 4 KB, I have plenty of space left. So I put the ZX81 ROM image at address 0x10000, which means I just need to change the wire on the EEPROM’s A16 pin from 0V to 5V to change to that ROM. And it works too! Plus, the keyboard layout is different, which means I’m able to access the ” key and print out a proper message rather than just numbers:

To do

  • Solder crystal oscillator (when I get it) and R20, and maybe headphone/microphone jacks, etc.
  • Backporch generator
  • Make a proper ROM adapter?
  • Get all shifted keys to work
  • RAM expansion
  • Try to load some software
  • Make the keyboard more ergonomic
  • Maybe get some kind of case

KiCad files

Here are the SVG and KiCad files, with the above mentioned shorting issues most likely fixed. Some important notes:

  • I do not have permission from Grant Searle nor from the copyright holders of the original ZX80 PCB to post anything like this, and if either of these parties asks me to take my files down, I intend to comply swiftly. (I’d just need to be sure that you are who you claim to be. I apologize in advance for any grievance caused and will apologize again if grievance is actually caused.)
  • I used Grant Searle’s replica foils as a base and traced them in Inkscape. I performed some manual and some automated tweaks. This version of the board is likely a less faithful replica of the original board than Grant Searle’s foils. (I don’t think anything’s shifted more than even 1 mm though. Though maybe the holes are quite different. I didn’t encounter any problems with the drill holes while soldering though.)
  • I haven’t tested (i.e. manufactured) this version of the KiCad files, I have only ordered one set of the ZX80 boards, with the shorted pins. This issue should be fixed now, and I hope I didn’t add any new issues.

Commodore SR-37 calculator “repair” and review

Before Commodore made computers, they made typewriters, and later calculators. I scored one such calculator, one that was listed as “non-functional”. The repair itself was very quick, as I had expected.

Repair

Honestly, it was just the power adapter. And some gunk in the keys. I’ll spare you the details on the gunk for today. Let’s see what’s wrong with the plug. Here’s the picture from the listing:

昭和の頃 赤色表示電卓 コモドール Commodore SR-37 難有品(^00WH10A_画像1
The pic from the listing.

Why would you bother to take a picture with the AC adapter cord connected to the calculator but the AC adapter not plugged in? Well, while I didn’t really think too much of it when I saw the listing, I quickly found the answer after it arrived here: it’s damn impossible to get out of there!

Until I got out some pliers and turned it left and right while pulling a little bit for a while. The cable was really sticky, and I believe (though this is half a year ago already) pretty much glued the connector to the device.

So, did they use a bit of an unusual plug shape? Yes, they did! It looks a bit like a mono headphone connector, maybe a bit thicker? (That reminds me, the ZX81’s power connector probably uses something similar.) But anyway, my trusty 28 in 1 “28 in 3” plug set (https://www.amazon.co.jp/gp/product/B01NCN3P3B/) contained something that fit beautifully, and applying power at about 9V, the calculator sprang to life.

Original connector (top) and replacement connector
Glorious LED display
Device’s innards. I didn’t take it apart any further than this, but cleaned up the gunk. Apparently didn’t take an “after” picture though, so you’ll just have to trust me that it looked as clean as a whistle afterwards. Maybe.

Review

So you are thinking of buying a calculator, and hey, you’ve always wanted to show off how cool retro tech can be. Someone nearby is selling a retro calculator with an LED display. It looks fantastic! But will it be useful?

The Commodore SR-37 does pack a lot of functions, maybe not quite as many as a modern scientific calculator, but it’s not far away. (For example, you can’t easily convert degrees to radians.)

So let’s see how useful this thing is. I’ve thought of some expectations the modern calculator user may have that aren’t quite fulfilled by this device:

ExpectationTrue:
False:
Doesn’t use power when turned off using power switch on device.
Doesn’t use a lot of power. So for example not 1.4 W even when you make it display 8888888888888888888888.8.
Doesn’t turn off the display to conserve power after a short while.
Doesn’t take a long time to compute, e.g., exponentials or roots. Definitely not like 2 seconds!
Comes with a boring LCD rather than a beautiful LED display.
Table 1.1: table of broken expectations

The main problem in my opinion is that it uses a lot of power. Even if you turn it off, it’ll use some power (100 mA or so). At least my model didn’t come with a place to put in batteries so it’s external power only. The fact that this device is basically slightly on all the time (unless you unplug the cord or have a switch nearby), I’m a bit concerned for its longevity. So I don’t think I’ll use it much. :(

Besides, what I really need is a calculator that is really good at converting between bases, like kcalc. I’m not sure such a device even exists!

Displaying any image on an MSX, loading from a ROM

This article is basically just a note that I can come back to in case I forget some details.

The tool on this page (Japanese) takes an image file and adds dithering and stuff: https://nazo.main.jp/prog/retropc/gcmsx.html

The output is something that can be BLOAD’ed straight to VRAM memory using BLOAD’s S parameter. Didn’t even know that option existed!

This means we can easily convert this to a ROM by adding a loader that pokes everything into VRAM. We just need to get rid of the BSAVE header at the start (or ignore it in the loader), which looks like this according to http://www.faq.msxnet.org/suffix.html#BIN:

byte 0  : ID byte #FE
byte 1+2: start-address
byte 3+4: end-address
byte 5+6: execution-address

So we just cut off 8 bytes at the beginning, e.g. by doing:

tail -c +8 msx_20231111232339933.SC2 > foo.bin

And the loader in assembly (z80asm-flavor) could look like this:

SetVdpWrite: macro high low ; from http://map.grauw.nl/articles/vdp_tut.php
	ld a,low
	out (0x99),a
	ld a,high+0x40
	out (0x99),a
endm

vpoke: macro value
; 	ld a,value ; not needed in this implementation
	out (0x98),a
; 	nop ; nops not needed in this implementation
; 	nop
; 	nop
endm

	org 0x4000
	db "AB" ; magic number
	dw entry_point
	db 00,00,00,00,00,00,00,00,00,00,00,00 ; ignored

entry_point:
	ld a,2
	call 0x005f
	SetVdpWrite 00 00
	ld hl,data
loop:
	ld a,(hl)
	vpoke a
	inc hl
	ld a,h
	cp end>>8
	jr nz,loop
	ld a,l
	cp end&0xff
	jr nz,loop
inf:
	jr inf
data:
	incbin "foo.bin" ; read data from file foo.bin
end:
	ds 0x8000-$ ; fill remainder of 16 KB chunk with 0s

Of course, this approach is quite wasteful; we need almost 16 KB of memory to display any image, even if it’s mostly empty.

Using a USB-HID game controller on the MSX, using a Raspberry Pi Pico for signal conversion

There is very little code that I wrote myself in this project, but it’s the first time I’m looking at USB-HID on the protocol level and the end result is something mildly useful. So possibly worth a post? (Note: code is at the bottom of this post.)

All we’ll be doing here is modify an example from https://github.com/sekigon-gonnoc/Pico-PIO-USB, namely, the capture_hid_report example. You’ll need a good way to connect a USB-A port to your Raspberry Pi Pico. I’m using an old piece of hardware that was meant for use in desktop PCs to add USB ports on the back, internally connected to motherboard headers. (In the demo we’re using, USB D+ and D- are GPIO pins 0 and 1, respectively.)

Once you have your Pico flashed, open a serial terminal, and then connect a USB-HID gamepad. I’m using a fake PS3 controller. It looks very similar to a Sony PS3 controller but bears a “P3” mark instead of the PlayStation logo on the middle button. (I don’t know if it works with a real PS3.)
(I’m leaving out the wiring details here, but it’s all exactly as explained in the README.md in the repo.) I get the following messages when I plug in my controller. The “EP 0x81” messages are sent continuously.

Device 0 Connected
control in[complete]
Enumerating 054c:0268, class:0, address:1
control out[complete]
control in[complete]
control in[complete]
Manufacture:GASIA CORP.
control in[complete]
control in[complete]
Product:PLAYSTATION(R)3 Controller
control in[complete]
control in[complete]
control out[complete]
inum:0, altsetting:0, numep:2, iclass:3, isubclass:0, iprotcol:0, iface:0
        bcdHID:1.11, country:0, desc num:1, desc_type:34, desc_size:148
control out[error]
control in[complete]
                Report descriptor:05 01 09 04 a1 01 a1 02 85 01 75 08 95 01 15 00 26 ff 00 81 03 75 01 95 13 15 00 25 01 35 00 45 01 05 09 19 01 29 13 81 02 75 01 95 0d 06 00 ff 81 03 15 00 26 ff 00 05 01 09 01 a1 00 75 08  
                        epaddr:0x02, attr:3, size:64, interval:1
                        epaddr:0x81, attr:3, size:64, interval:1
...
054c:0268 EP 0x81:      01 00 00 00 00 00 80 80 80 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 ef 10 00 00 00 00 23 6d 77 01 80 02 00 02 00 01 80 02 00 
054c:0268 EP 0x81:      01 00 00 00 00 00 80 80 80 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 ef 10 00 00 00 00 23 6d 77 01 80 01 ff 01 ff 01 7f 01 ff 
054c:0268 EP 0x81:      01 00 00 00 00 00 80 80 80 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 ef 10 00 00 00 00 23 6d 77 01 80 01 fe 01 fe 01 7e 01 fe 
054c:0268 EP 0x81:      01 00 00 00 00 00 80 80 80 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 ef 10 00 00 00 00 23 6d 77 01 80 01 ff 01 ff 01 7f 01 ff 
054c:0268 EP 0x81:      01 00 00 00 00 00 80 80 80 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 ef 10 00 00 00 00 23 6d 77 01 80 02 00 02 00 01 80 02 00
...

You can also use Linux tools such as usbhid-dump to get reports, but for some reason they look quite different from what I get here. Don’t know if the driver is doing something (odd, or not so odd) or if the Pico is doing something odd. (It’s quite likely that there’s a driver in Linux that configures the controller automatically. On the Pico you have to press the P3 button every time after plugging in, on Linux it lights up the first player LED immediately.)

Decoding the USB HID report descriptor

First of all, we don’t actually need to do all this. Pressing buttons and looking at the output makes it quite obvious what we have to do. But for let’s edify outselves anyway. Here’s a great tutorial on USB HID report descriptors: https://eleccelerator.com/tutorial-about-usb-hid-report-descriptors/. The same guy has a tool on their website that allows us to quickly decode the descriptor: https://eleccelerator.com/usbdescreqparser/.

When we paste in our descriptor, we get the following output:

0x05, 0x01,        // Usage Page (Generic Desktop Ctrls)
0x09, 0x04,        // Usage (Joystick)
0xA1, 0x01,        // Collection (Application)
0xA1, 0x02,        //   Collection (Logical)
0x85, 0x01,        //     Report ID (1)
0x75, 0x08,        //     Report Size (8)
0x95, 0x01,        //     Report Count (1)
0x15, 0x00,        //     Logical Minimum (0)
0x26, 0xFF, 0x00,  //     Logical Maximum (255)
0x81, 0x03,        //     Input (Const,Var,Abs,No Wrap,Linear,Preferred State,No Null Position)
0x75, 0x01,        //     Report Size (1)
0x95, 0x13,        //     Report Count (19)
0x15, 0x00,        //     Logical Minimum (0)
0x25, 0x01,        //     Logical Maximum (1)
0x35, 0x00,        //     Physical Minimum (0)
0x45, 0x01,        //     Physical Maximum (1)
0x05, 0x09,        //     Usage Page (Button)
0x19, 0x01,        //     Usage Minimum (0x01)
0x29, 0x13,        //     Usage Maximum (0x13)
0x81, 0x02,        //     Input (Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position)
0x75, 0x01,        //     Report Size (1)
0x95, 0x0D,        //     Report Count (13)
0x06, 0x00, 0xFF,  //     Usage Page (Vendor Defined 0xFF00)
0x81, 0x03,        //     Input (Const,Var,Abs,No Wrap,Linear,Preferred State,No Null Position)
0x15, 0x00,        //     Logical Minimum (0)
0x26, 0xFF, 0x00,  //     Logical Maximum (255)
0x05, 0x01,        //     Usage Page (Generic Desktop Ctrls)
0x09, 0x01,        //     Usage (Pointer)
0xA1, 0x00,        //     Collection (Physical)
0x75, 0x08,        //       Report Size (8)

// 63 bytes

This doesn’t look quite exactly the same as the examples in the tutorial. Below is my understanding, which may not be 100% correct, but makes sense to me at least. Let’s look at some output lines from the capture_hid_report example. The first line was:

054c:0268 EP 0x81:      01 00 00 00 00 00 80 80 80 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 ef 10 00 00 00 00 23 6d 77 01 80 02 00 02 00 01 80 02 00

First of all, all lines start with “054c:0268 EP 0x81”. That’s just added by the example code. Let’s go through the remaining numbers in the line (actually, just the bolded ones) and see how they relate to the descriptor we saw earlier.

01: report ID, same as the report ID defined in the descriptor.

00: this is the entire report that was described in the following extract from the descriptor, exactly 1 byte (8 bits):

0x75, 0x08,        //     Report Size (8)
0x95, 0x01,        //     Report Count (1)
0x15, 0x00,        //     Logical Minimum (0)
0x26, 0xFF, 0x00,  //     Logical Maximum (255)
0x81, 0x03,        //     Input (Const,Var,Abs,No Wrap,Linear,Preferred State,No Null Position)

00 00 01 00: the (two) reports referenced in the following parts of the descriptor:

0x75, 0x01,        //     Report Size (1)
0x95, 0x13, // Report Count (19)
0x15, 0x00, // Logical Minimum (0)
0x25, 0x01, // Logical Maximum (1)
0x35, 0x00, // Physical Minimum (0)
0x45, 0x01, // Physical Maximum (1)
0x05, 0x09, // Usage Page (Button)
0x19, 0x01, // Usage Minimum (0x01)
0x29, 0x13, // Usage Maximum (0x13)
0x81, 0x02, // Input (Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position)
0x75, 0x01, // Report Size (1)
0x95, 0x0D, // Report Count (13)
0x06, 0x00, 0xFF, // Usage Page (Vendor Defined 0xFF00)
0x81, 0x03, // Input (Const,Var,Abs,No Wrap,Linear,Preferred State,No Null Position)

This is actually two reports. One is 19 bits, the other is 13 bits. Together that’s exactly 32 bits. (This is similar to the three mouse buttons example in the tutorial, where we only have three buttons and therefore want to pad this to 8 bits.) Here, we have 19 buttons (though the controller only sports 16 physical buttons, unless I’m much mistaken), each of which is 1 bit, i.e., either pressed, or not. The remaining 13 bits are just to pad the report to make it easier to handle on the software side, or perhaps the padding is required somehow. (Looks like it is on Windows at least: https://stackoverflow.com/questions/65846159/usb-hid-report-descriptor-multiple-reports.)

And indeed, these four bytes (well, the first 19, er, 16 bits) do change when buttons are pressed. This is basically all we need to figure out how to use this controller with an MSX. We just need to figure out which button is responsible for which bit in the third and fourth bytes. (Unless we also wanted to translate analog stick input.)

What about the rest of the bytes? I don’t know how they relate to our descriptor above, but pressing buttons and moving the analog sticks makes it pretty obvious what they mean. We can see that the analog sticks are two bytes each, and there’s another byte for each button, and the value that comes up depends on how hard the button was pressed. I never knew that the △□○☓ buttons were pressure-sensitive on the PS3! Anyway, perhaps it’s possible that our descriptor is incomplete, as the frame they are transmitted in is limited to 64 (or maybe 63?) bytes?

So without going into any hardware details yet, here’s how we could modify the example code to translate our button presses to electric pulses on a GPIO pin. Here’s the part of the code that we’ll replace:

if (len > 0) {
  printf("%04x:%04x EP 0x%02x:\t", device->vid, device->pid,
        ep->ep_num);
  for (int i = 0; i < len; i++) {
    printf("%02x ", temp[i]);
  }
  printf("\n");
}

And here’s some of the code we could maybe replace the above block with. Note that this doesn’t actually consider the hardware details of the actual MSX interface yet, it’s just an example, so don’t copy this code. The actual code is in the last section of this blog post.

if (len > 0) {
  if (temp[0] == GAMEPAD_REPORT_ID) { // GAMEPAD_REPORT_ID is 1 in our case
    uint16_t button_state = (temp[2] << 8) | temp[3];
    if (button_state & BUTTON_UP) {
      gpio_put(BUTTON_UP_PIN, 1);
    } else {
      gpio_put(BUTTON_UP_PIN, 0);
    }
    if (button_state & BUTTON_DOWN) {
...
  }
}

My controller’s button mappings are as follows:

#define BUTTON_UP 0x1000
#define BUTTON_DOWN 0x4000
#define BUTTON_LEFT 0x8000
#define BUTTON_RIGHT 0x2000
#define BUTTON_A 0x40 // X or O, can't remember
#define BUTTON_B 0x20 // X or O, can't remember

MSX interface

Good news: the MSX joystick ports come with 5V and GND pins. According to https://www.msx.org/wiki/General_Purpose_port, we can draw up to 50 mA from these pins. Is that enough for the Pico and a controller? It is for mine! According to my measurements, the Pico + fake PS3 controller together draw about 40 mA. (Instrument’s display precision is 10 mA, so it could be up to 45 mA, or more if the instrument is inaccurate.) Note that many modern controllers (including real PS3 controllers) contain a battery, and will attempt to charge it. That will most likely take the current way above 50 mA.
(Note: I don’t think that drawing slightly more than 50 mA would be a huge problem at least on my MSX, which I’ve disassembled and reassembled a couple times.)

Slightly bad news (1): pressing buttons on MSX joysticks connects pins to pin 8, which is not GND. Pin 8 is “strobe” and can be GND or 5V. (Button pins are normally pulled high.)

Slightly bad news (2): on the MSX side, the joystick ports are actually GPIO ports and can be configured for both input (normal) and output! We don’t want the Pico to output when the MSX’s PSG is outputting too!

1: addressing this in software might be possible, albeit with a (very) short time lag. I don’t really know anything about this feature and have no easy way to test a software-based solution with MSX software that actually uses this feature.
2: I don’t think regular MSX joysticks worry about this. Some MSX systems may have safeguards in place.

Some measurements and wiring details

Using my multimeter’s rarely used ammeter mode, with the probes between STROBE and any button pin, I see a current of ~0.7 mA. (Two button pins, and the ammeter shows 1.5 mA. It probably goes up linearly the more buttons you press.) That’s quite a lot by modern standards!

On my MSX, there are two 74LS157 chips that implement joystick selection. The 74LS157’s outputs are directly connected to the PSG’s inputs. (Only one joystick’s state is visible to software at a time; a PSG register change is required to switch to the other one.) If we change the PSG’s I/O port’s direction, I think we’ll be pitting the 74LS157’s outputs against the PSG’s outputs. Doesn’t sound so great, eh.
Except, the 74LS157 has an \ENABLE pin! When the \ENABLE pin on a 74LS157 is disabled, the outputs will be high-Z, potentially allowing a signal coming from the PSG to make its way somewhere. Is there such functionality on the MB-H2? Answer after some probing: no. \ENABLE on these two chips is tied to GND.

While most button pins have 10K pull-up resistors (some seem to have 3.3K pull-up resistors), the STROBE pins are connected directly to PSG pins 8 and 9.

Deciding how to hook up the Pico to the joystick port

In real joysticks, when you press a button, you short the STROBE pin to the button pin, and STROBE can be HIGH or LOW. (When STROBE is HIGH, we short HIGH to HIGH, and nothing happens. Since on my MSX, as discussed above, the button pins are connected to 74LS157 inputs, they should always be pulled high, and never go low during device operation.) In real joysticks, when nothing is pressed, there is no connection, period. So that’s more like a tri-state affair, so instead of producing a HIGH level when nothing is pressed, we should set our Pico’s GPIO to high-Z (which we can do by setting it to INPUT).

Armed with the measurements above, I think we can hook up the Pico directly (let’s ignore 5V vs 3.3V for now), producing a LOW level when a button is pressed. We’ll have the Pico’s GPIO pin sink a little bit of current, but not too much. Even if all 6 buttons are pressed, the Pico shouldn’t sink more than around 4.5 mA across multiple pins. That’s okay, really. Even if we decided to make the Pico interface with two controllers, that’s still comfortably under our limit.

Now we could start thinking about reducing the 5V on the button pins to 3.3V on the Pico’s GPIO pins. But according to many accounts, 5V is okay if it’s sufficiently current-limited, which it is, in our case. So let’s ignore this for now.

Other MSX machines

(I will probably expand this section at some point.)

YIS-503

The YIS-503 (schematics, look at the circuits near the JOY1 and JOY2 connectors on the left) has 22K pull-up resistors and an MSX Engine. I doubt that MSX Engine chips (which have the PSG integrated) can be configured to destruct themselves.

Kuninet’s homebrew MSX

10k pull-up resistors. The rest of the wiring looks exactly like on the Hitachi MB-H2 as far as I can tell.

So does it actually work?

Yes. Here’s a pic of my trusty Hitachi MB-H2 MSX running its built-in sketch program, controlled through the fake PS3 controller. (Sorry, the computer’s still open from the probing I did earlier.)

The USB port is a PC part that I found at my local Hard Off. The RS232 cable is also from Hard Off. In fact, the fake PS3 controller is also from Hard Off! The USB cable is from my own cable collection.

Totally off-topic, but there’s a spot in this pic that has been cleaned up using my Pixel phone’s magic eraser. Can you see where it is? (I didn’t touch it up in an image editor afterwards.)

Pacman <3

Problems

Plugging our contraption into the joystick port while the MSX is running crashes the machine! The screen goes very slightly dark for a split second too. Probably in-rush current. I’m pretty sure I blew my multimeter’s (200 mA) fuse trying to measure the current. Laff. (Having the controller already plugged in before powering on the computer works fine.)

Source code

Replace your Pico-PIO-USB/examples/capture_hid_report.c with the following code, (re-)make, flash, and you’re set. (I’m basing my modifications on git commit d00a10a8c425d0d40f81b87169102944b01f3bb3.)

#include <stdio.h>
#include <string.h>

#include "pico/stdlib.h"
#include "pico/multicore.h"
#include "pico/bootrom.h"

#include "pio_usb.h"

#define GAMEPAD_REPORT_ID 1

// 0x1: L2
// 0x2: R2
// 0x4: L1
// 0x8: R1
// 0x10: Triangle
// 0x20: Circle
// 0x40: X
// 0x80: Square?
// 0x100: ?
// 0x200: ?
// 0x400: R3
// 0x800: Start
// 0x1000: Up
// 0x2000: Right
// 0x4000: Down?
// 0x8000: Left?

#define BUTTON_UP 0x1000
#define BUTTON_DOWN 0x4000
#define BUTTON_LEFT 0x8000
#define BUTTON_RIGHT 0x2000
#define BUTTON_A 0x20
#define BUTTON_B 0x40
#define BUTTON_A_ALT 0x10
#define BUTTON_B_ALT 0x80

#define BUTTON_UP_PIN 16
#define BUTTON_DOWN_PIN 17
#define BUTTON_LEFT_PIN 18
#define BUTTON_RIGHT_PIN 19
#define BUTTON_A_PIN 20
#define BUTTON_B_PIN 21

static usb_device_t *usb_device = NULL;

void core1_main() {
  sleep_ms(10);

  // To run USB SOF interrupt in core1, create alarm pool in core1.
  static pio_usb_configuration_t config = PIO_USB_DEFAULT_CONFIG;
  config.alarm_pool = (void*)alarm_pool_create(2, 1);
  usb_device = pio_usb_host_init(&config);

  //// Call pio_usb_host_add_port to use multi port
  // const uint8_t pin_dp2 = 8;
  // pio_usb_host_add_port(pin_dp2);

  while (true) {
    pio_usb_host_task();
  }
}

static void gpio_setup()
{
  gpio_init(BUTTON_UP_PIN);
  gpio_init(BUTTON_DOWN_PIN);
  gpio_init(BUTTON_LEFT_PIN);
  gpio_init(BUTTON_RIGHT_PIN);
  gpio_init(BUTTON_A_PIN);
  gpio_init(BUTTON_B_PIN);

  gpio_set_dir(BUTTON_UP_PIN, GPIO_IN);
  gpio_set_dir(BUTTON_DOWN_PIN, GPIO_IN);
  gpio_set_dir(BUTTON_LEFT_PIN, GPIO_IN);
  gpio_set_dir(BUTTON_RIGHT_PIN, GPIO_IN);
  gpio_set_dir(BUTTON_A_PIN, GPIO_IN);
  gpio_set_dir(BUTTON_B_PIN, GPIO_IN);

  gpio_put(BUTTON_UP_PIN, 0);
  gpio_put(BUTTON_DOWN_PIN, 0);
  gpio_put(BUTTON_LEFT_PIN, 0);
  gpio_put(BUTTON_RIGHT_PIN, 0);
  gpio_put(BUTTON_A_PIN, 0);
  gpio_put(BUTTON_B_PIN, 0);
}

int main() {
  // default 125MHz is not appropreate. Sysclock should be multiple of 12MHz.
  set_sys_clock_khz(120000, true);

  stdio_init_all();
  printf("hello!");

  sleep_ms(10);

  multicore_reset_core1();
  // all USB task run in core1
  multicore_launch_core1(core1_main);

  gpio_setup();

  while (true) {
    if (usb_device != NULL) {
      for (int dev_idx = 0; dev_idx < PIO_USB_DEVICE_CNT; dev_idx++) {
        usb_device_t *device = &usb_device[dev_idx];
        if (!device->connected) {
          continue;
        }

        // Print received packet to EPs
        for (int ep_idx = 0; ep_idx < PIO_USB_DEV_EP_CNT; ep_idx++) {
          endpoint_t *ep = pio_usb_get_endpoint(device, ep_idx);

          if (ep == NULL) {
            break;
          }

          uint8_t temp[64];
          int len = pio_usb_get_in_data(ep, temp, sizeof(temp));

          if (len > 0) {
            if (temp[0] == GAMEPAD_REPORT_ID) {
              uint16_t button_state = temp[2] << 8 | temp[3];
              if (button_state & BUTTON_UP) {
                gpio_put(BUTTON_UP_PIN, 0);
                gpio_set_dir(BUTTON_UP_PIN, GPIO_OUT);
                printf("BUTTON_UP_PIN\n");
              } else {
                gpio_set_dir(BUTTON_UP_PIN, GPIO_IN);
              }

              if (button_state & BUTTON_DOWN) {
                gpio_put(BUTTON_DOWN_PIN, 0);
                gpio_set_dir(BUTTON_DOWN_PIN, GPIO_OUT);
                printf("BUTTON_DOWN_PIN\n");
              } else {
                gpio_set_dir(BUTTON_DOWN_PIN, GPIO_IN);
              }

              if (button_state & BUTTON_LEFT) {
                gpio_put(BUTTON_LEFT_PIN, 0);
                gpio_set_dir(BUTTON_LEFT_PIN, GPIO_OUT);
                printf("BUTTON_LEFT_PIN\n");
              } else {
                gpio_set_dir(BUTTON_LEFT_PIN, GPIO_IN);
              }

              if (button_state & BUTTON_RIGHT) {
                gpio_put(BUTTON_RIGHT_PIN, 0);
                gpio_set_dir(BUTTON_RIGHT_PIN, GPIO_OUT);
                printf("BUTTON_RIGHT_PIN\n");
              } else {
                gpio_set_dir(BUTTON_RIGHT_PIN, GPIO_IN);
              }

              if (button_state & BUTTON_A || button_state & BUTTON_A_ALT) {
                gpio_put(BUTTON_A_PIN, 0);
                gpio_set_dir(BUTTON_A_PIN, GPIO_OUT);
                printf("BUTTON_A_PIN\n");
              } else {
                gpio_set_dir(BUTTON_A_PIN, GPIO_IN);
              }

              if (button_state & BUTTON_B || button_state & BUTTON_B_ALT) {
                gpio_put(BUTTON_B_PIN, 0);
                gpio_set_dir(BUTTON_B_PIN, GPIO_OUT);
                printf("BUTTON_B_PIN\n");
              } else {
                gpio_set_dir(BUTTON_B_PIN, GPIO_IN);
              }
            }
          }
        }
      }
    }
    stdio_flush();
    sleep_us(10);
  }
}

Another Hitachi MB-H2 MSX repair

Introduction and conclusion

I bought another Hitachi H2 MSX last year, mostly because I wanted the manual, which I’ve scanned. Unfortunately for my free time but fortunately for my, um, education in retro computing, this computer had issues with its video RAM. Often, the computer would boot up with a garbled screen. Resetting after a couple minutes would usually fix the issue. The video RAM is made by Toshiba, and is called TMM416P-2 (also marked 4116-2). If you have this memory, I’d recommend you look out for issues, because all eight ICs had the same issue, namely: crazy-ass noise on the -5V line. (How much noise is “crazy-ass” noise? In this case, it’s +-3V.) The noise sort of comes and goes, or at least gets stronger and weaker, randomly, which made it too hard for me to find a combination of capacitors to tame it. (Though it’s more likely to be present after turning the computer on after a long while.) I ended up socketing them all, replacing one that unfortunately died during the very professional desoldering process, and added 103 ceramic capacitors to (almost) every one, between the -5 and GND pins, which seems to have a slight positive effect. (The bottom part of the case has a hook that requires some clearance and prevents two of the chips from getting their capacitor.) I also replaced the zener diode with a 7905, which fit perfectly after bending the legs a little bit.

Details

The -5V rail for the 4116 VRAM chips is generated using a zener diode. Replacing this, or the capacitor on the rail, unfortunately didn’t have any effect. Hmm, odd!

Next, I decided to desolder the -5V pin on the first 4116 IC, and drive it using my own known good -5V supply (using a standard 7905 regulator). Result: noise both on the first chip and all the others. Hmm, odd!

Next, I did this for the rest of the 4116 ICs, and was able to see that each and every one generates noise.

Next, I decided to desolder all of them and individually test them on my 4116 tester. (They still produced the noise while in the tester.) I decided to desolder all of them because the H2 seemed to support 4416 ICs for the video RAM, and I happened to have some of those that were waiting to be put to use. I.e., there are holes of the right size, right next to the VDP, and the silkscreen on those holes says “TMS4416”. ;)

Well, today’s lesson is, do not necessarily trust the silkscreen. The TMS9928A doesn’t even support 4416 VRAM! The holes where the data pins go didn’t even have any traces on them.
The TMS9928A can be made to support 4416 RAM using a custom circuit, though. Maybe I should have implemented this circuit. I even bought the two required parts! But then decided against it for complexity management reasons.

Unfortunately, expecting to be able to use the 4416 slots, I had desoldered the original VRAM ICs in a rather brutish manner, losing vias and traces in the process, which meant that I needed to add a bunch of bodge wires to get them to work again. At least the bodge wires aren’t too complex to figure out, if at some point one of them decides to become loose again. I ended up keeping the original, noisy, RAM chips. But since they’re now all socketed, it shouldn’t be too hard to replace them at some point, if necessary.

Pictures

Garbled screen
Hitachi MB-H2 logic board from above. The misleading silkscreen is in the top left, above the TMS9928ANL VDP.
Example with a lot of noise
And an example with a lot less. (This picture is from six months ago. It’s entirely possible that I had extra capacitors for this shot.)
TMM416P-2 noise closeup. Intensity varies. Here it’s about 3V peak-to-peak.
TMM416P-2 noise closeup, one more example.

After

Noise after “completion” of this repair (note: using AC coupling here). Note that noise intensity has always been a bit random, so I this can’t be taken as proof that adding 103 capacitors to each chip is going to help in any case, and I am not too interested in performing rigorous testing. Anecdotally, I haven’t seen any garbled screens yet after the “repair”!
Noise closeup (note: using AC coupling here)
Check out this 7905’s limbo dance moves

Before looking at the picture of the bodge wires below, please keep in mind that it is rude to stare.

Ahem

Testing 4164/4116 DRAM ICs in-circuit/live with a Raspberry Pi Pico, without removing them (WIP)

Hi! My sabbatical ended and I’ve been working again since two months ago. Boo. However there’s this thing I just wanted to get off my chest, so I spent a few hours that I did not really have and wrote some code and this blog post about it!

Last year, I made a 4164/4116 DRAM tester for the Raspberry Pi Pico, which works just as you would expect, you program the Pico, place it on a breadboard, add some wires and something to drop the 5V to 3.3V for the Q output, place the 4164 or 4116 chip you’d like to test on the breadboard, connect a USB cable from the Pico to a computer, and look at the terminal output (or just the on-board LED). This is useful if you have already extracted a 4164 chip that you have determined to be bad. (I have written previously how you could determine whether a 4164 chip is bad, here and here.)

I previously made a “live” chip tester for the Arduino, capable of checking whether simple logic chips are doing what they’re supposed to be doing in a powered on system. I used the Arduino rather than the Pico because the Arduino is 5V tolerant. In this post, we’re not doing simple logic chips, but DRAM chips. And since we need to be super-fast, we’re using a Pico.

Executive summary

We “allocate” 64 KB of RAM on the Pico. We need to read in two 8-bit addresses, and combine them to a 16-bit address. If a write is being attempted, we write the same bit value into the appropriate address of Pico’s RAM. If a read is being attempted, we check if the DRAM’s output is the same as what we have in the Pico’s RAM. (We could also use 64 Kb of RAM on the Pico at a minimum, but as we’ll see in the next section we do not really have a lot of time for such shenanigans.)

To use this software, having an IC test clip would probably be very beneficial. I use these: https://www.kandh.com.tw/ic-test-clips-ic-test-clips.html. These are available in Akihabara from Akizuki: https://akizukidenshi.com/catalog/g/gC-04753/. If they aren’t available in your market, perhaps these somewhat expensive test clips from 3M might be more available: https://www.digikey.jp/ja/products/detail/3m/923700/3852.

Note: to use this software, you need to at least mostly know what you’re doing.

Example usage
Test clip close-up
Wiring closeup

One note on hooking up the Pico directly to 5V components, as seen in the above pictures

Not guaranteed to not fry your Pico (note the double negation), do this at your own risk. Your Pico will possibly also draw more current than a normal TTL chip when driven above 3.6V or so, which could easily damage your precious hardware! (The current on the address pins will likely be supplied by a pair of 74LS157 chips, on the Q pins by the RAM chip, and the current on the RAM’s data in pin by the CPU or any other. \RAS and \CAS probably by custom logic chips.) Use a 74HCT245 between the Pico and the device you’d like to test.

Caveat 1

The Pico is very fast when compared to an 8-bit computer from the 1980s, but 4164 transition times are extremely fast too. If you look at a timing diagram for the 4164 (which you will find in any 4164 datasheet), you will notice that all transition times listed are on the order of <ten, tens, or low hundreds of nanoseconds. Most 4164 chips have a -20, -15, -12, or -10 suffix in their part number. This indicates the minimum allowable number of nanoseconds × 10 for the sum of all transitions. (If the DRAM is driven faster, it probably won’t work correctly. However, if it’s driven slower, most things will generally work out, though if your system e.g. reads the DRAM’s output too slowly it might be too late and not work out.)

The stock Pico runs at 125 MHz, which means that one CPU cycle is 8 ns. From hearsay, you can probably overclock any Pico to 200 MHz (clock cycles are 5 ns), and many people report that their Pico runs fine even at 400 MHz (clock cycles are 2.5 ns). For -15 DRAMs, you have 150/8 = 18.75 CPU cycles per transition if the DRAM is driven at its max speed. (Note: it isn’t on MSX machines, at least.) 18 CPU cycles isn’t a lot. Remember, we need to convert two 8-bit addresses to a 16-bit address, and then check if the newly read value matches our previously recorded value. Is that doable in 18 CPU cycles? I don’t think so, but I’m not an ARM assembly expert.

So, did I get it to work? Well… sort of but not quite.

Caveat 2

Note that there are many failure modes for DRAM chips. For example, if the chip gets super-hot within a few seconds, it’s probably shorted. I’d expect there to be a very low resistance between ground and another pin. Before hooking up the “live” tester I’m going to explain on this page, check for that kind of stuff. I would not recommend using the live tester on a chip that gets super-hot within seconds. You could risk melting your connectors, and if the short is not between VCC and GND, potentially also risk your Pico due to excessive current on a pin driven by the Pico.

Caveat 3

Untested with 4116 chips, only tested with my MSX’s main RAM.

Current status

The live tester successfully verifies that a TMS4164-15 DRAM chip under test in my stand-alone 4164 RAM tester is outputting the correct values. (There is no reason why it shouldn’t work with a 4116 chip. The tester certainly does! You just need to re-wire slightly. Also, 5V on the Pico, yeah, it doesn’t seem to “explode immediately.” But -5V or 12V? You’d better leave those pins unconnected!)

On a real system (my trusty Hitachi MB-H2 MSX) with TMS4164-15NL DRAM chips, the live tester manages just fine from power up, up until the first ~11000 comparisons (which is a split second), but at some point reads a 0 when it should have been a 1, and prints an error. Printing errors takes a long time at 115200 bps, so we go completely off the rails once we’ve encountered the first error. (That’s slightly configurable though, see “Lnobs” section for details.)

However, in the live tester’s “DEBUG” mode, it just collects a lot of samples into memory, and prints them out when the sample memory is full. Using a simple script (the Perl script included in the repo), I can then verify that all the samples check out. Note that the DEBUG code also prints out who many times it had to wait until it got data from the PIO. The answer is 0 times every time, which means that we’re too slow or almost too slow. (Sometimes there is a handful of mismatches, I’ll look into those at some point. Could be that we were just too slow, or the 5V is messing with the system ;D)

There’s a lot that could be improved, hence the “WIP” attribute in title. The first obvious improvement would be to try a little harder in the non-debug mode. The Pico has two CPU cores, and we’re only using one. We could attain more throughput by running according to the following scheme:

Core 1:
Wait for sample 1
Tell core 2 to wait for sample 2
Process sample 1
Wait for sample 3
Tell core 2 to wait for sample 4
Process sample 3

Core 2:
Wait for instructions from CPU 1
Wait for sample 2
Process sample 2
Wait for instructions from CPU 1
Wait for sample 4
Process sample 2

Another potential optimization would be to write the processing code in ARM assembly. (My experience with ARM assembly is mostly read-only, so not sure how much better I can get without spending way too much effort.)

Also I haven’t tried overclocking yet. Probably should!

Some more technical details

We use two PIO state machines. One waits for RAS high→low (“RAS SM”, and the other one waits for CAS high→low (“CAS SM”, which comes after the RAS transition.

Not all RAS transitions are followed by CAS transitions. For example, refresh is mostly RAS-only. In addition, though perhaps not used on the MSX(?), RAS transitions may be followed by multiple CAS transitions.

In the C code, we wait for events on the CAS SM, and then read from both the RAS SM’s FIFO and the CAS SM’s FIFO. In the PIO code, the CAS SM tells the RAS SM whether to push its address or not. (We could alternatively (maybe) always push and have the CPU make sure the FIFO never gets full, but my experiments in that regard didn’t go that well.)

There are a lot of defines that change the way the system works.

Knobs

  • Setting PRINT_ERROR_THRESHOLD to something above 0 only starts printing errors after encountering that many errors.
  • CORRECT_ERRORS causes the Pico’s memory to be updated when we encounter a mismatch
  • VERBOSE_STATUS_LEDS causes the Pico to perform GPIO writes at GPIO16+ (or so, I recommend you check the source to find the exact GPIO pin number) to indicate whether we’re reading or writing. This isn’t very beneficial performance-wise.
  • SWAP_RAS_AND_CAS_ADDRESSES: my Hitachi MB-H2 MSX applies the CPU’s A0-A7 to the RAM pins at RAS time, and A8-A15 at CAS time. When thinking “rows” and “columns”, most people would probably assume that “rows” use the more significant bits, but that is not necessarily the case, and it doesn’t matter. When operating in DEBUG mode, you’ll see accesses that are mostly linear if this define is set correctly. Otherwise each access will be 256 apart.

The code is at https://github.com/qiqitori/pico_live_chip_tester.

So… is this likely to find a fault?

I haven’t seen any DRAM chip failures (except on YouTube) where only some addresses were broken and the chip appeared to work otherwise. (SRAM chips are different story. I’m sort of planning on doing an SRAM tester too, but probably not too soon.) Most DRAM chips I’ve seen are an all-or-nothing affair. For all-or-nothing affairs, this chip tester is very likely to find the problem immediately, especially if you compare all 8 chips and only one is weird. For hypothetical chips with just a single address problem (or perhaps, a single broken row), either it’ll be difficult with the code not 100% working right now, or it might take several attempts and statistics.

Playing .psg tunes (or even .asc/.pt3/… tunes) on the MSX (part 2)

(I do not recommend watching this demo on a smartphone. It’s quite flashy and exacerbated my headache. Also, the WebMSX code will ask you to go fullscreen, but I don’t think you can start the demo without going fullscreen and then back again.)

In part 1, we constructed a 48 KB ROM to play a tune called “Popsa 2”. We didn’t apply any real compression algorithms but implemented a set of scripts to find repeated sections in a binary file and added an instruction to the .psg file format to “call” repeated sections. Using real compression algorithms we could achieve much better compression, and each tune would just occupy a couple KBs. Using our method, we _just_ manage to fit the tune into a single cartridge. Popsa 2 fit into 48 KB, and the tune we’re going to do today is going to require a 64 KB ROM. If the previous track didn’t quite do it for you, I think it might be worth giving this one a chance. It’s a very complex piece of wonder-inducing music in my opinion. (Press the power button and in the menu that pops up, choose “Power” to boot the ROM.)

64 KB ROMs still require a header at 0x4000 or 0x8000. This means that we need to add a header and some entrypoint code to set up the slots right in the middle of our data. That’s inconvenient, but I didn’t feel like changing the structure of the program, so I just added one check each at the beginning and end of the main loop to see if the HL register has gone above a certain value. If yes: before the main loop, it adds an offset; after the main loop, it subtracts the same offset again. This way, we don’t have to do anything too complex when jumping to a previous section of the track.

Mic – Dreamless is #2 on the top 300 of ZX Spectrum music. There’s a bunch of other nice stuff on that list!

In part 1, I mentioned a problem in WebMSX that prevented the 48 KB ROM from working. 64 KB ROMs are not affected by this problem. The WebMSX player at the top of this article page plays the ROM linked to above. The ROM also works on real hardware (the Hitachi MB-H2 MSX1 I repaired a while ago).

Aside: disabling WebMSX’ auto-scroll

In the unlikely event that you have read this blog’s front page sometime in the last few months, you might have noticed that it scrolled automatically to this WebMSX player, even though this post is now very much not the newest post on this blog! I only noticed this a short while ago and decided to fix it, because it’s quite annoying. The below code snippets are taken from the WebMSX commit with the tag “v6.0.4”. Older or newer versions may look different.

All you need to do is remove the “this.focus()” line in the powerOn function in CanvasDisplay.js:

    this.powerOn = function() {
        this.setDefaults();
        updateLogo();
        document.documentElement.classList.add("wmsx-started");
        setPageVisibilityHandling();
        this.focus(); // <-- this is the line you need to remove or comment out
        if (WMSXFullScreenSetup.shouldStartInFullScreen()) {
            setFullscreenState(true);
            if (FULLSCREEN_MODE !== 2 & isMobileDevice) setEnterFullscreenByAPIOnFirstTouch();       // Not if mode = 2 (Windowed)
        }
    };

If you prefer to just edit the minified version, search for the call to setPageVisibilityHandling() and then edit out the “this.focus(),” bit.

Before:

documentElement.classList.add("wmsx-started"),setPageVisibilityHandling(),this.focus(),WMSXFullScreenSetup.shouldStar

After:

documentElement.classList.add("wmsx-started"),setPageVisibilityHandling(),WMSXFullScreenSetup.shouldStartInFullScreen

Playing .psg tunes (or even .asc/.pt3/… tunes) on the MSX (part 1)

This article is somewhat technical. If you just want to listen to a chip tune on WebMSX, maybe go for part 2 instead.

In previous articles I explored the YM2151 and the VGM file format. In this article, we’ll go back a generation and listen to some tunes written for the DSG (doorbell sound generator) PSG (programmable sound generator, i.e., the General Instruments AY-3-8910, or compatibly, Yamaha’s YM2149). PSG files (and particularly ASC files) are mainly used for ZX Spectrum chip tunes (I think), but the MSX has the same sound chip so why not play some chip tunes on the MSX?

Well, before we spend time working on something just slightly above PC beeper music… are there even any decent PSG tunes? Well, I’ve found at least one that like, “Popsa 2”, as is included in the below mix (scroll down a bit) on YouTube for example, and some of the commenters on this video seem to like “Illusion”.

Unfortunately, it doesn’t work in WebMSX (after 20 seconds or so). But it works in all three (NTSC) openMSX machines I bothered to test with, and it also works on my real MSX1 (Hitachi MB-H2). To get it to run in WebMSX, you have to “Set ROM format” -> “KonamiSCC”, but even then it’ll crash after a few minutes (vs. 20 seconds for e.g. ASCII8). For some reason it doesn’t let me choose “Normal”. I’m quite sure it would work with that setting if it were available. :p I’ll look into the matter at some point, probably. Looks like WebMSX will require a patch to work. Patch is submitted and will probably make it into the next version.

This machine produces NTSC color artifacts like there is no tomorrow.

Caution: writing to certain PSG registers is unsafe on certain MSX machines. I don’t think my code writes to these registers, but I didn’t make 100% sure. (However, openMSX gives you a warning when it notices unsafe writes, and I didn’t get a warning.)

“Popsa 2” was made in a program called ASC Sound Master. The “.asc” file can be downloaded here: https://zxart.ee/eng/authors/d/dreamer/popsa-2/. These .asc files are pretty small. They can be converted to PSG using ZXTune (https://bitbucket.org/zxtune/zxtune/) (and from PSG they can easily be converted to e.g. VGM, see bottom of this post), but the resulting files are too large to fit on a regular MSX1 cartridge.

ZXTune compilation and conversion:

git clone https://bitbucket.org/zxtune/zxtune.git
cd zxtune
make platform=linux system.zlib=1 -C apps/zxtune123/ -j4

bin/linux/release/zxtune123 --convert mode=psg,filename=foo.psg -- Dreamer\ -\ POPSA-2\ \(1994\).asc

The original .asc file is 3720 bytes. The resulting .psg is 129028 bytes. If you convert that to VGM, the resulting size is 187342 bytes.

The PSG file format

The PSG file format is very similar in concept to the VGM file format, except that only one chip is supported, the PSG. It seems it’s primarily used for ZX Spectrum chip tunes. As only one chip is supported, you don’t need the “command byte” that indicates what chip is to be written to. So you only have pairs of “register address” and “register value to write”.

There’s also a header in the first 16 bytes. The first three bytes are “PSG”, dunno about the rest.

The PSG only has 16 (IIRC) registers, and some of those aren’t even relevant for sound. In other words, the registers 0x10 to 0xff don’t exist and the designers of this file format used that opportunity to fit in a “wait” command at 0xff (one raster scan, so 1/50s or 1/60s depending on whether the system is PAL or NTSC). There’s also a command that waits multiple raster intervals, 0xfe, and a command that ends the tune, 0xfd. Ignoring the header, here are the first few bytes of the Popsa 2 PSG file:

ff
00 41
01 05
02 0b
03 01
04 e0
05 00
06 1f
07 31
08 0f
09 0f
0a 0a
ff
02 e0
03 00
07 38
09 0d
0a 0c
ff
...

All this means: wait 1 raster interval, then write to registers 00 through 0a with values 41, 05, 0b, 01, …, respectively, wait 1 raster interval, write to registers 02, 03, 07, 09, 0a, with values e0, 00, 38, 0d, 0c, respectively, wait 1 raster interval. (As you can see the 0xff command doesn’t take any parameters.)

Now that we know mostly how this file format works, it’s time to think about how to fit roughly 126 KB of data into my 48 KB cartridge. We could easily use an off-the-shelf compression library, but where’s the fun in that? That’s like… modern programming, ew.

We’ll invent another command for PSG, 0xfc, which takes a two-byte parameter that tells it to jump back somewhere (for a while, and then returns to its original location). We also need to write a program that identifies repetitive sections in the music (of which there are plenty). The former is pretty easy, so let’s talk about the latter program first.

  1. Compute MD5 sums of a 100-byte window for every byte in the file. So we end up with 129028-100=128928 MD5 sums. Easy and fast on modern hardware. See code snippet below.
  2. Check if we even have repeated chunks, e.g. by executing: md5sum chunks/* | awk ‘{print $1}’ | sort -n | uniq -c
  3. We may want to check a couple other window sizes to see if we can get better results. A lower window size means we’ll find more repetition, but we need 3 bytes to encode a jump in our PSG file.
  4. Re-assemble PSG file using a quick-and-dirty and probably somewhat buggy script. (See below.)

The resulting data length is 42158 bytes for the Popsa 2 song.

For task (1) we first convert the PSG file into hex, and later into tokens:

xxd -p Dreamer\ -\ POPSA-2\ \(1994\).psg | sed -r -e 's/(..)/\1 /g' | tr -d '\n' > Dreamer\ -\ POPSA-2\ \(1994\).psg.hex
# Then remove 16-byte header using a standard text editor
cat Dreamer\ -\ POPSA-2\ \(1994\).psg.hex | perl -e '$_ = <STDIN>; while (s/(..) //) { $cmd = $1; if ($cmd eq "ff" or $cmd eq "fd") { print("$cmd\n") } else { s/(..) //; $param = $1; print("$cmd $param\n") } }' > tokens

Then divide the tokens into chunks using the below script, divide_tokens_into_chunks.sh:

#!/bin/bash

N=100 # sliding window length
mkdir -p chunks_N$N
line_count=$(cat tokens | wc -l)
for ((i=0; i<$((line_count-N)); i++)); do
    tail -n +$i tokens | head -n $N > chunks_N$N/chunk_$i
done
rm chunks_N$N/chunk_0 # same as chunk_1

You know, looking back at this code for the first time in a while, I see there’s a nice off-by-1 error and a nice rm command to fix half of the problem. But the great thing about this being a hobby is that I don’t need to care. :)

Next, we have a Perl script that creates our PSG file. It needs some help though, so we do this first:

find chunks_N100/ | xargs md5sum > chunks_N100_md5sums

(We can’t do md5sum chunks_N100/* because that expands to a tad too many arguments in our case. xargs automatically cuts down the number of arguments to a more reasonable value.) This is the main program. Usage: ./compress_aggressive_but_convert_to_psg.pl < chunks_N100_md5sums > foo.psg

#!/usr/bin/perl

# dependencies:
# chunks_N$N/ (directory)
# chunks_N$10_md5sums (file) # example generation: find chunks_N10/ | xargs md5sum > chunks_N10_md5sums

use strict;
use warnings;
use feature "switch";

my $N = 100;

my $md5s = {};
my @chunks;

my $md5;
my $file;

my $debug_logged = 0;
my $lines = [];
my $current_output_byte_number = 0;
for (my $chunk_number = 0; <>; $chunk_number++) {
    /([a-z0-9]+)\s+([a-zA-Z0-9_\/]+)/;
    $md5 = $1;
    $file = $2;
    if (exists $md5s->{$md5}) {
        # can't call chunks that already contain a call because that call would take us beyond the N token window that we can see from where we are
        # that means it's likely we'd generate wrong code
        # so we'll just move on and maybe we'll find a nicer block

        my $target_chunk_number = $md5s->{$md5}->{chunk_number};
        my $concatted_chunks = join('', @chunks[max(0, $target_chunk_number-$N)..min($#chunks, $target_chunk_number+$N)]);
        if (($concatted_chunks =~ /; call/) or # NOTE "call wait_for_raster" is allowed
            ($chunk_number - $target_chunk_number < $N)) {
            # 1) can't convert due to existing call; nothing to be done here, or
            # 2) we can't call something right behind us
            # DANGER let's head back to the non-exists path
            goto NON_EXIST_PATH;
        } else {
            if (!$md5s->{$md5}->{converted_to_call}) {
                convert_to_callable_sub($target_chunk_number);
                $md5s->{$md5}->{converted_to_call} = 1;
            }
            my $output_byte_number_high = int($md5s->{$md5}->{output_byte_number} / 256);
            my $output_byte_number_low = $md5s->{$md5}->{output_byte_number} % 256;
            $chunks[$chunk_number] = sprintf("fc %02x %02x ; call " . $md5s->{$md5}->{output_byte_number} . " ($md5)\n", $output_byte_number_high, $output_byte_number_low);
            $current_output_byte_number += 3;

            # skip next N-1 rows
            for (0..$N-1) {
                my $foo = <>;
                $chunk_number++;
                $chunks[$chunk_number] = "";
            }
        }
    } else {
        $md5s->{$md5} = {};
        $md5s->{$md5}->{chunk_number} = $chunk_number;
        $md5s->{$md5}->{converted_to_call} = 0;
NON_EXIST_PATH:
        open my $fh, '<', $file or die "Can't open \"$file\": $!";
        my $token = <$fh>;
        close $fh;
        my $asm = convert_to_asm($token);
        $md5s->{$md5}->{output_byte_number} = $current_output_byte_number;
        $current_output_byte_number += (scalar(split(" ", $asm)));
        $chunks[$chunk_number] = $asm;
    }
}

print foreach @chunks;

print "infloop:
    jr infloop\n";

# no changes needed
sub convert_to_callable_sub($) {
    my $block_number = shift;
}

# don't actually do anything here
sub convert_to_asm($) {
    my $string = shift;
    return "$string";
}

sub min($$) {
    my ($a, $b) = @_;
    return $a if ($a < $b);
    return $b;
}
sub max($$) {
    my ($a, $b) = @_;
    return $a if ($a > $b);
    return $b;
}

The output of this program is in hex. Now we just need some assembly code to read the data and put it into the PSG registers. Here’s the core part:

ld hl,psg_begin
main_loop:
    ld a,(hl)
    cp 0xff
    jr z,wait
    cp 0xfe
    jr z,wait_n_times
    cp 0xfd
    jr z,end
    cp 0xfc
    jr z,jump
    jr register_write
inc_loop:
    inc hl
    jr loop

wait:
    call wait_for_raster
    jr inc_loop

register_write:
    ld a,(hl)
    out (0xa0),a
    inc hl
    ld a,(hl)
    out (0xa1),a
    jr inc_loop

wait_for_raster:
    in a,(0x99)
    and 128
    cp 128
    jr nz,wait_for_raster
    ret

psg_begin:
    include "foo.psg"
ds 010000h-$ ; fill rest with 0s

Understanding the above should help understanding the full implementation. (The above doesn’t include the code for the 0xfe, 0xfd, and 0xfc commands.) Note that we can’t use the above wait_for_raster on NTSC machines because the tune assumes 50 Hz. So we’ll instead emulate the 50 Hz interval using a busy loop.

For 0xfd (end of song), we just enter an infinite loop. For 0xfe, we just call wait_for_raster multiple times. For 0xfc, we need to store where we left off, then set hl to the address in the parameter, then execute exactly 100 main loop runs, then set hl back to its previous address and continue as normal.

Here’s the code, which also includes some VRAM writes to visualize the music a little bit. Does it look good? Eh, I dunno. It was an experiment. I changed the registers to be displayed because some registers don’t see updates very often. The overall visuals are a bit noisy, but there is one section that looks good in my opinion, and it’s also the section that I like best in the tune, right at the end. You can clearly see one of the registers changing right in sync with the doorbell sound. (It looks even more in sync in openMSX.)

N: equ 100

org 4000H
db "AB"
dw entry_point
db 00,00,00,00,00,00,00,00,00,00,00,00

SetVdpWrite: macro high low ; from http://map.grauw.nl/articles/vdp_tut.php
    ld a,low
    out (0x99),a
    ld a,high
    add 0x40
    out (0x99),a
endm

vpoke: macro value
    ld a,value
    out (0x98),a
endm

entry_point:
    ; copy cart rom (c000-f000) to ram
    in a,(0a8h)
    and 11000000b ; we want to know which slot is RAM, and AFAIK RAM should be mapped in at 0xc000-0xffff.
    ld c,a ; save value for later
    in a,(0a8h)
    and 00001100b ; we are executing from cartridge ROM at 0x4000~0x7fff, so the 2-bit value for this region is known correct. we just have to make the slots above this one the same value.
    ld b,a ; save a
    rla ; << 1 (now have 000xx000b)
    rla ; << 1 (now have 00xx0000b)
    or b ; | saved b (now have 00xxxx00b)
    rla ; << 1 (now have 0xxxx000b)
    rla ; << 1 (now have xxxx0000b)
    or b ; | saved b (now have xxxxxx00b)
;     ld a,01010100b ; set pages 0: rom 1: rom 2: cart 3: cart
    out (0a8h),a
copy_c000_f000:
    ld hl,0c000h ; start at c000
copy_c000_f000_loop:
    ld a,(hl) ; read from ROM address (hl)
    ld d,a
    in a,(0a8h)
    ld b,a ; store original value
    and 00111111b ; only keep settings for lower three slots
    or c ; add in setting for top slot (saved earlier)
;     ld a,011010100b
    out (0a8h),a ; set port
    ld (hl),d ; store value read from ROM address (hl) to RAM address (also hl of course)
    ld a,b ; load a with original value
    out (0a8h),a ; set port back
    inc hl
    ld a,h
    cp 0f0h
    jp z,other_init ; done with this copy
    jp copy_c000_f000_loop

; entry_point:
;     ld a,0xd4
;     out (0xa8),a ; set slots
other_init:
    ; set ports to bios:cart:cart:ram
    in a,(0a8h)
    and 00111111b ; only keep settings for lower three slots
    or c ; add in setting for top slot (saved earlier)
    out (0a8h),a ; set port
    ; set colors
    ld a,011110000b ; set data to be written into register (white on black)
    out (099h),a
    ld a,010000111b ; set register number (7)
    out (099h),a
    SetVdpWrite 0x20 0x05
    vpoke 0x0f ; set white on black for some part of the screen
    vpoke 0x0f ; set white on black for some other part of the screen
video_init:
    ; put chars /0123456789 into 0x1800-0x1AFF
    SetVdpWrite 0x18 0x00
    ld b,64 ; 64 chars
video_loop_1:
    vpoke 0x2f
    djnz video_loop_1
    ld b,64 ; 64 chars
video_loop_2:
    vpoke 0x2f
    djnz video_loop_2
    ld b,64 ; 64 chars
video_loop_3:
    vpoke 0x33
    djnz video_loop_3
    ld b,64 ; 64 chars
video_loop_4:
    vpoke 0x33
    djnz video_loop_4
    ld b,64 ; 64 chars
video_loop_5:
    vpoke 0x36
    djnz video_loop_5
    ld b,64 ; 64 chars
video_loop_6:
    vpoke 0x36
    djnz video_loop_6
    ld b,64 ; 64 chars
video_loop_7:
    vpoke 0x31
    djnz video_loop_7
    ld b,64 ; 64 chars
video_loop_8:
    vpoke 0x31
    djnz video_loop_8
    ld b,64 ; 64 chars
video_loop_9:
    vpoke 0x35
    djnz video_loop_9
    ld b,64 ; 64 chars
video_loop_10:
    vpoke 0x35
    djnz video_loop_10
    ld b,64 ; 64 chars
video_loop_11:
    vpoke 0x37
    djnz video_loop_11
    ld b,64 ; 64 chars
video_loop_12:
    vpoke 0x37
    djnz video_loop_12
    ld b,64 ; 64 chars
    
    ld b,0 ; flag to indicate whether we are jumping around at the moment (0 means we aren't) (NOTE: nested jumping isn't supported)
    ld c,0xa0 ; first PSG port
    ld hl,psg_begin
    jr main_loop

loop:
    ld a,b
    cp 0
    jr z,main_loop ; b isn't set so just head back to the loop
    pop af
    dec a
    cp -1
    jr z,restore_hl
    push af ; don't need this on the stack if we go to restore_hl, so place it after the jump
    jr main_loop
restore_hl:
    ld b,0 ; unset flag
    pop hl
    inc hl
    ; and continue executing into loop
main_loop:
    ld a,(hl)
    cp 0xff
    jr z,wait
    cp 0xfe
    jr z,wait_n_times
    cp 0xfd
    jr z,end
    cp 0xfc
    jr z,jump
    jr register_write
inc_loop:
    inc hl
    jr loop

wait:
    call wait_for_raster_50hz_emu
    jr inc_loop

wait_n_times: ; safe to assume that param isn't 0
    push bc
    inc hl
    ld b,(hl)
wait_n_times_loop:
    call wait_for_raster_50hz_emu
    djnz wait_n_times_loop
    pop bc
    jr inc_loop

end:
    jr end ; infinite loop

jump:
    inc hl
    ld d,(hl)
    inc hl
    ld e,(hl)
    push hl
    ld b,1 ; signal that we're calling a previous segment
    ld a,N ; we want to execute N instructions before going back to where we left off
    push af
    ld hl,psg_begin
    add hl,de
    jr loop

register_write:
    ld a,(hl)
    out (c),a

    ; really we only need ld a,(hl) and out (0xa1),a, but let's poke around in the VRAM to make this program slightly less boring
    ; we'll modify the tile definitions of characters /, 0, ..., 9 (8 bytes each starting at 0x178) and just put in the same value we're writing to the PSG register
    or a ; clear carry flag to make rla behave
    ; a = a*8 for vram write address
    rla ; *2
    rla ; *2 (*2*2 == *4)
    rla ; *2 (*2*2*2 == *8)
    ld d,a ; vram write address
    inc hl
    ld a,(hl)
    ld e,a ; vram write value
    out (0xa1),a
    ld a,0x78
    add a,d ; vram address low byte is 0x78 + (psg register)*8

    ; color change code currently commented out because it's not very pleasant to look at
;     ; let's also change some colors when register 5 is written to, which doesn't appear to happen very often
;     ; for register 5 a is 5*8 + 0x78 = 0xa0
;     cp 0xa0
;     jr nz,skip_color_change
;     ld d,a
;     ld a,e
;     out (099h),a
;     ld a,010000111b ; set register number (7)
;     out (099h),a
;     ld a,d
skip_color_change:
    SetVdpWrite 1 a ; vram address high byte is 1 (full address: 0x178)
    vpoke e
    vpoke e
    vpoke e
    vpoke e
    vpoke e
    vpoke e
    vpoke e
    vpoke e

    jr inc_loop

wait_for_raster:
    in a,(0x99)
    and 128
    cp 128
    jr nz,wait_for_raster
    ret

wait_for_raster_50hz_emu:
    ; CPU clock is 3579545 Hz
    ; decrement and loop routine takes 36 instructions per loop run (wait_for_raster_50hz_emu_loop up to (not including) low_0)
    ; (https://www.overtakenbyevents.com/tstates/)
    ; want routine to finish in 1/50 or a second, so:
    ; 3579545/50/36=1988.636111111111, let's very scientifically, er, let's throw out that whole calculation and say 1650 because we have overhead and I have experimentally determined that to sound close enough to the original :p
    ; our overhead varies depending on code path. some rhythm problems are audible, but not _too_ terrible
    ld de,1650
wait_for_raster_50hz_emu_loop:
    dec de
    ld a,e
    cp 0
    jr z,low_0
    jr wait_for_raster_50hz_emu_loop
low_0:
    ld a,d
    cp 0
    jr z,high_low_0
    jr wait_for_raster_50hz_emu_loop
high_low_0:
    ret

psg_begin:
include "foo.psg"

ds 010000h-$

Compiles with z80asm. Other assemblers might need some tweaks.

Bonus: converting PSG files to VGM

This is implemented in straight-forward C.
Compilation: cc -o psg2vgm psg2vgm.c
Execution: ./psg2vgm Dreamer\ -\ POPSA-2\ \(1994\).psg | xxd -r -p > foo.vgm

#include <stdio.h>
#include <stdlib.h>

int try_fgetc(FILE *file)
{
    int param;
    if ((param = fgetc(file)) == EOF) {
        fprintf(stderr, "Unexpected EOF");
        exit(1);
    }
    return param;
}

int main(int argc, char **argv)
{
    FILE *file = fopen(argv[1], "r");
    int c, param;
    int i;

    // header with 3579545/2 = 1789772 Hz clock rate for AY8910, fake value for length (should be long enough for anyone)
    printf("56 67 6d 20 f0 f0 f0 f0 50 01 00 00 00 00 00 00 00 00 00 00 36 09 01 00 00 dd 00 00 24 00 00 00 00 dd 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4c 4f 1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00");
    
    for (i = 0; i < 16; i++) { // bulldoze over header
        try_fgetc(file);
    }

    while ((c = fgetc(file)) != EOF) {
        switch (c) {
            case 0xff:
                printf("%02x ", 0x63); // wait 882 samples (50th of a second), a shortcut for 0x61 0x72 0x03
                break;
            case 0xfe:
                param = try_fgetc(file);
                for (i = 0; i < param; i++) {
                    printf("%02x ", 0x63); // wait 882 samples (50th of a second), a shortcut for 0x61 0x72 0x03
                }
                break;
            case 0xfd:
                printf("%02x ", 0x66); // end of sound data
                break;
            default:
                param = try_fgetc(file);
                printf("%02x %02x %02x ", 0xa0, c, param);
        }
    }

    printf("\n");
}

Lifting instrument definitions from the MSX SFG-01 audio module (i.e., patches for the YM2151)

Motivation: test a probably broken YM2151 on a breadboard. We need to play certain instruments in a certain way to reproduce the problem. I won’t bother you with the details in this blog post, but I’d say it worked well.

Introduction

The MSX SFG-01 extension module contains some software on a 16 KB ROM, some MIDI hardware, and an FM audio generator chip. The software can be started by typing “CALL MUSIC” in the BASIC prompt. The software looks like this:

In this state, you can press left and right to select a different voice. There’s a lot of them!

If you have an external (piano) keyboard connected to the SFG-01 module (not via MIDI, but via a proprietary connector), you can now hit keys and hear them played using the settings displayed above. The above software makes it look like you can only select between a couple different instruments (currently, BRASS 1 is selected), but the YM2151 is a full-fledged synthesizer chip; you can define any instrument you want just by poking into a couple registers. And you have 8 independent voices!

Programming the YM2151

The Commander X16 project’s documentation provides a very good overview of the registers and some programming information: https://github.com/commanderx16/x16-docs/blob/master/X16%20Reference%20-%2009%20-%20Sound%20Programming.md

Quick summary: first you poke into a set of registers to define what your instrument is supposed to sound like, then you poke into a different set of registers to make the YM2151 play notes using that instrument definition. The above link has BASIC code for both. Here’s the code for the instrument definition (from https://github.com/commanderx16/x16-docs/blob/master/X16%20Reference%20-%2009%20-%20Sound%20Programming.md#loading-a-patch), this produces marimba-like sounds:

5 YA=$9F40 : YD=$9F41 : V=0
10 REM: MARIMBA PATCH FOR YM VOICE 0 (SET V=0..7 FOR OTHER VOICES)
20 DATA $DC,$00,$1B,$67,$61,$31,$21,$17,$1F,$0A,$DF,$5F,$DE
30 DATA $DE,$0E,$10,$09,$07,$00,$05,$07,$04,$FF,$A0,$16,$17
40 READ D
50 POKE YA,$20+V : POKE YD,D
60 FOR A=$38 TO $F8 STEP 8
70 READ D : POKE YA,A+V : POKE YD,D
80 NEXT A

What’s going on over here? First, we poke $DC into register $20. Then $00 into $38. Then $1B into $40, $67 into $48, $61 into $50, etc. (We increase the address register by 8 on every loop execution because we only want to load the marimba sound into the registers for voice 0. For voice 1, the registers would be $39, $41, $49, $51, $59, $61, $69, etc. For voice 2, $3A, $42, $4A, etc.)

Getting the patches off the SFG-01’s firmware using openMSX and a TCL script

So what are these magic values? Well, I’m not a synthesizer expert, but they control stuff like ADSR (attack, decay, sustain, release), among other things. See https://en.wikipedia.org/wiki/Envelope_(music) for more information.

Well, if you’re like me you probably won’t immediately come up with the correct values that imitate a specific sound. So let’s extract these values! That’s the main part of this blog post. Unfortunately they aren’t in an easy-to-guess data structure on the software ROM. So instead we’ll be using openMSX’ VGM recording script! This script is in TCL and I’m in the process of getting some additions merged to make it work with the SFG-01’s YM2151. In the meantime, you can grab it from here: https://github.com/qiqitori/openMSX/blob/vgm_rec_ym2151/share/scripts/_vgmrecorder.tcl. Replace your openMSX’ installation’s version of this script (probably /usr/share/openmsx/scripts/_vgmrecorder.tcl) and restart openMSX. Then add the SFG-01 extension (Menu -> Hardware -> Extensions -> Add… -> Yamaha SFG-01 FM Sound Synthesizer Unit) and restart your emulated MSX. Then type “CALL MUSIC” and you should get something like the above screenshot.

Next, press F10 to open the console and type “vgm_rec start SFG-01”. Close the console by pressing F10 again. Then select a different instrument using the cursor keys. Then open the console again and type “vgm_rec stop”. If everything went well, you should now have a playable VGM file containing the instrument patch. (If you play it with vgmplay, nothing will happen because no note is being pressed.) Let’s look at the generated VGM file in a hex editor:

We have a header and a bunch of byte triplets, each starting with “54”, which indicates that a YM2151 command follows. We also have “61”s, which indicate that the YM2151 is to play a certain number of samples with the current settings. (These two numbers are defined in the VGM specification.) The next byte is the register address, the next byte the data to put into that register. We see a bunch of writes to registers $12 and $14. Remembering the BASIC script, we should actually only be interested in writes to $38 to $FF. We can easily extract these like this:

$ xxd -p music0005.vgm | perl -pe 's/(..)/"$1 "/eg; chomp' | grep -o -P '54 [3-9a-f][0-9a-f] .. ..'
54 e1 4f 61
54 e2 4f 61
54 e3 4f 61
54 e4 4f 61
54 e5 4f 61
54 e6 4f 61
54 e7 4f 61
54 e9 1f 61
54 ea 1f 61
54 eb 1f 61
54 ec 1f 61
54 ed 1f 61
54 ee 1f 61
54 ef 1f 61
54 f1 0f 61
54 f2 0f 61
54 f3 0f 61
54 f4 0f 61
54 f5 0f 61
54 f6 0f 61
54 f7 0f 61
54 f9 0f 61
54 fa 0f 61
54 fb 0f 61
54 fc 0f 61
54 fd 0f 61
54 fe 0f 61
54 ff 0f 61
54 31 00 61
54 32 00 61
54 33 00 61
54 34 00 61
54 35 00 61
54 36 00 61
54 37 00 61
54 39 00 61
54 3a 00 61
54 3b 00 61
54 3c 00 61
54 3d 00 61
54 3e 00 61
54 3f 00 61
54 41 14 61
54 42 14 61
54 43 14 61
54 44 14 61
54 45 14 61
54 46 14 61
54 47 14 61
54 49 0c 61
54 4a 0c 61
54 4b 0c 61
54 4c 0c 61
54 4d 0c 61
54 4e 0c 61
54 4f 0c 61
54 51 12 61
54 52 12 61
54 53 12 61
54 54 12 61
54 55 12 61
54 56 12 61
54 57 12 61
54 59 06 61
54 5a 06 61
54 5b 06 61
54 5c 06 61
54 5d 06 61
54 5e 06 61
54 5f 06 61
54 81 92 61
54 82 92 61
54 83 92 61
54 84 92 61
54 85 92 61
54 86 92 61
54 87 92 61
54 89 5e 61
54 8a 5e 61
54 8b 5e 61
54 8c 5e 61
54 8d 5e 61
54 8e 5e 61
54 8f 5e 61
54 91 12 61
54 92 12 61
54 93 12 61
54 94 12 61
54 95 12 61
54 96 12 61
54 97 12 61
54 99 59 61
54 9a 59 61
54 9b 59 61
54 9c 59 61
54 9d 59 61
54 9e 59 61
54 9f 59 61
54 a1 0a 61
54 a2 0a 61
54 a3 0a 61
54 a4 0a 61
54 a5 0a 61
54 a6 0a 61
54 a7 0a 61
54 a9 06 61
54 aa 06 61
54 ab 06 61
54 ac 06 61
54 ad 06 61
54 ae 06 61
54 af 06 61
54 b1 84 61
54 b2 84 61
54 b3 84 61
54 b4 84 61
54 b5 84 61
54 b6 84 61
54 b7 84 61
54 b9 85 61
54 ba 85 61
54 bb 85 61
54 bc 85 61
54 bd 85 61
54 be 85 61
54 bf 85 61
54 c1 02 61
54 c2 02 61
54 c3 02 61
54 c4 02 61
54 c5 02 61
54 c6 02 61
54 c7 02 61
54 c9 02 61
54 ca 02 61
54 cb 02 61
54 cc 02 61
54 cd 02 61
54 ce 02 61
54 cf 02 61
54 d1 00 61
54 d2 00 61
54 d3 00 61
54 d4 00 61
54 d5 00 61
54 d6 00 61
54 d7 00 61
54 d9 00 61
54 da 00 61
54 db 00 61
54 dc 00 61
54 dd 00 61
54 de 00 61
54 df 00 61
...

The firmware starts writing the patch at register address $E0. Note that this program doesn’t appear to be setting channel 0, just channels 1-7. Also note that we don’t have anything for registers $60-$7F. We don’t need those; they just control the loudness.

The X16 page also had a BASIC program that hits some keys so we can actually figure out if the patch sounds right. This is the code:

10 YA=$9F40      : REM YM_ADDRESS
20 YD=$9F41      : REM YM_DATA
30 POKE YA,$29   : REM CHANNEL 1 NOTE SELECT
40 POKE YD,$4A   : REM SET NOTE = CONCERT A
50 POKE YA,$08   : REM SELECT THE KEY ON/OFF REGISTER
60 POKE YD,$00+1 : REM RELEASE ANY NOTE ALREADY PLAYING ON CHANNEL 1
70 POKE YD,$78+1 : REM KEY-ON VOICE 1 TO PLAY THE NOTE
80 FOR I=1 TO 100 : NEXT I : REM DELAY WHILE NOTE PLAYS
90 POKE YD,$00+1 : REM RELEASE THE NOTE

Note that this program is operating on voice 1, not voice 0. What’s going on here? We write $4A into register $29. This selects the “A” note in octave 4. It’s just a coincidence that hex A is note A. ($0 is C#, $1 is D, $2 is D#, …, $A is A.) Next we write 1 into register $08. That ends any note already playing on voice 1. (We could have done that first.) Then we put $79 into the same register to play our select note. (“A” in octave 4). Then we wait a bit, and release the note by putting 1 into the same register again.

Playing notes with the extracted patches

Now let’s see if these patches produce the expected sound. To do that, we could manually add a couple note commands to the generated VGM files using our hex editor. It would just be a couple byte triplets. However, I decided to quickly hack together a short Perl program that generates a hex dump that can be converted back to a binary file that then happens to be a VGM file. So you would put this in, say, brass1_patch_with_body_polyphonic.pl and concatenate its output to a header (which I’ll show below). The result can be played with vgmplay. (Note: there is a cleaned up version of this code at the bottom of this post.)

#!/usr/bin/perl

@data = (0xDC, 0x21, 0x01, 0x01, 0x23, 0x11, 0x21, 0x17, 0x1F, 0x0A, 0x8d, 0x15, 0x4f, 0x52, 0x06, 0x0e, 0x08, 0x83, 0x02, 0x00, 0x00, 0x00, 0x18, 0x28, 0x18, 0x28); # brass 1 from sfg-01
#        ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  ----  DC1   DC1   DC1   DC1   DC2   DC2   DC2   DC2   ----  ----  ----  ----
# DC1: decay rate 1, DC2: decay rate 2

# init voice 0
$v = 0;
$i = 0;
printf("%02x %02x %02x ", 0x54, 0x20+$v, $data[$i]);
$i++;
for ($a=0x38; $a <= 0xf8; $a += 8, $i++) {
    printf("%02x %02x %02x ", 0x54, $a+$v, $data[$i])
}
# init voice 1
$v = 1;
$i = 0;
printf("%02x %02x %02x ", 0x54, 0x20+$v, $data[$i]);
$i++;
for ($a=0x38; $a <= 0xf8; $a += 8, $i++) {
    printf("%02x %02x %02x ", 0x54, $a+$v, $data[$i])
}
# init voice 2
$v = 2;
$i = 0;
printf("%02x %02x %02x ", 0x54, 0x20+$v, $data[$i]);
$i++;
for ($a=0x38; $a <= 0xf8; $a += 8, $i++) {
    printf("%02x %02x %02x ", 0x54, $a+$v, $data[$i])
}

# body
for (1..10) {
    $v = 0;
    printf("%02x %02x %02x ", 0x54, 0x28+$v, 0x4a); # 4 means octave 4, a means A (coincidence)
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
    printf("%02x %02x %02x ", 0x54, 0x08, 0x78+$v); # key down for voice $v
    for ($i = 0; $i < 256; $i++) { # wait
        printf("%02x ", 0x63);
    }
    $v = 1;
    printf("%02x %02x %02x ", 0x54, 0x28+$v, 0x50); # 5 means octave 5, 0 means C#
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
    printf("%02x %02x %02x ", 0x54, 0x08, 0x78+$v); # key down for voice $v
    for ($i = 0; $i < 256; $i++) { # wait
        printf("%02x ", 0x63);
    }
    $v = 2;
    printf("%02x %02x %02x ", 0x54, 0x28+$v, 0x54); # 5 means octave 5, 4 means E
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
    printf("%02x %02x %02x ", 0x54, 0x08, 0x78+$v); # key down for voice $v
    for ($i = 0; $i < 256; $i++) { # wait
        printf("%02x ", 0x63);
    }
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
    $v = 1;
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
    $v = 0;
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
    for ($i = 0; $i < 128; $i++) { # wait
        printf("%02x ", 0x63);
    }
}
printf("\n");

The header is as follows. Note that you should normally specify an end offset in the header, but vgmplay is nice and doesn’t care if that offset is set correctly or not. Name it simple_vgm_header.hex or something.

56676d20f0f0f0f05001000000000000000000003609010000dd00002400000000dd000000000000000000000000000000093d000c0000000000000000000000

Then you execute:

$ perl brass1_patch_with_body_polyphonic.pl > body.hex
$ cat simple_vgm_header.hex body.hex | xxd -r -p > simple_vgm.vgm
$ vgmplay simple_vgm.vgm
VGM Player
----------

File Name:      foo.vgm
Warning! Invalid EOF Offset 0xF0F0F0F4! (should be: 0x2592)

Track Title:
Game Name:
System:
Composer:
Release:
Version:        1.50      Gain: 1.00    Loop: Yes (00:01.28)
VGM by:
Notes:

Used chips:     YM2151  

Playing 8.92%   00:16.06 / 00:01.28 seconds
Playing finished.

You can replace the contents of the @data array with the extracted patch. Remember that the @data array starts with $DC followed by the value to be put into $38, the value to be put into $40, $48, etc., while the extracted patch starts with register address $E0. ($E1 actually because it doesn’t set voice 0, just voices 1-7.) In addition, the extracted patch contains register writes to set all voices (except 0) to the selected instrument. So let’s have another look at portions of the extracted patch:

54 39 00 61
54 3a 00 61
54 3b 00 61
...
54 41 14 61
54 42 14 61
54 43 14 61
...
54 49 0c 61
54 4a 0c 61
54 4b 0c 61
...
54 51 12 61
54 52 12 61
54 53 12 61
...
54 59 06 61
54 5a 06 61
54 5b 06 61
54 5c 06 61
54 5d 06 61
54 5e 06 61
54 5f 06 61
54 81 92 61
54 82 92 61
54 83 92 61
...

(Oh, never mind the 61 at the end, we didn’t actually want that.) This writes $00 to $39-$3F, $14 to $41-$47, $0C to $49-$4F, $12 to $51-$57, $06 to $59-$5F, $92 to $81-$87, etc.

So we modify the @data array like this:

@data = (0xDC, 0x00, 0x14, 0x0C, 0x12, 0x06, # above numbers
0x21, 0x17, 0x1F, 0x0A, # volume, can stay as-is
0x92, 0x5E, 0x12, 0x59, 0x0A, 0x06, 0x84, 0x85, 0x02, 0x02, 0x00, 0x00, 0x47, 0x19, 0x08, 0x09); # unfortunately I'm a bit silly and don't quite remember which patch this was, but it might have been PORGAN2 :p

Let’s also extract KOTO, just out of interest. We can make our command line a bit more intelligent so we can read off the values for @data a little easier:

$ xxd -p music0006.vgm | perl -pe 's/(..)/"$1 "/eg; chomp' | \grep -o -P '54 [3-9a-f][0-9a-f] ..' | perl -ne 'if (!/^(54 .)[2-7a-f](.*)/) { print }'
54 e1 2f
54 e9 3f
54 f1 1f
54 f9 1f
54 31 00
54 39 01
54 41 33
54 49 31
54 51 34
54 59 31
54 81 da
54 89 dc
54 91 dd
54 99 df
54 a1 08
54 a9 04
54 b1 05
54 b9 8a
54 c1 05
54 c9 02
54 d1 04
54 d9 03
54 e1 27
54 e9 36
54 f1 14
54 f9 15
54 e1 27
54 e9 36
54 f1 14
54 f9 15

Thus, @data becomes:

@data = (0xDC, 0x01, 0x33, 0x31, 0x34, 0x31, # above numbers
0x21, 0x17, 0x1F, 0x0A, # volume, can stay as-is
0xda, 0xdc, 0xdd, 0xdf, 0x08, 0x04, 0x05, 0x8a, 0x05, 0x02, 0x04, 0x03, 0x27, 0x36, 0x14, 0x15); # koto?

As the koto has fast decay, we may want to reduce the amount of time to wait between notes by changing the “256” in “for ($i = 0; $i < 256; $i++) { # wait” to something much smaller, such as 8. Here’s what it sounds like:

As promised, here’s the cleaned up version of the VGM generator code:

#!/usr/bin/perl

my @data = (0xDC, 0x00, 0x1B, 0x67, 0x61, 0x31, 0x21, 0x17, 0x1F, 0x0A, 0xDF, 0x5F, 0xDE, 0xDE, 0x0E, 0x10, 0x09, 0x07, 0x00, 0x05, 0x07, 0x04, 0xFF, 0xA0, 0x16, 0x17); # marimba
# my @data = (0xDC, 0x21, 0x01, 0x01, 0x23, 0x11, 0x21, 0x17, 0x1F, 0x0A, 0x8d, 0x15, 0x4f, 0x52, 0x06, 0x0e, 0x08, 0x83, 0x02, 0x00, 0x00, 0x00, 0x18, 0x28, 0x18, 0x28); # brass 1
# my @data = (0xDC, 0x00, 0x14, 0x0C, 0x12, 0x06, 0x21, 0x17, 0x1F, 0x0A, 0x92, 0x5E, 0x12, 0x59, 0x0A, 0x06, 0x84, 0x85, 0x02, 0x02, 0x00, 0x00, 0x47, 0x19, 0x08, 0x09); # porgan2?
# my @data = (0xDC, 0x01, 0x33, 0x31, 0x34, 0x31, 0x21, 0x17, 0x1F, 0x0A, 0xda, 0xdc, 0xdd, 0xdf, 0x08, 0x04, 0x05, 0x8a, 0x05, 0x02, 0x04, 0x03, 0x27, 0x36, 0x14, 0x15); # koto

sub voice_init {
    my ($v) = @_;
    my $i = 0;
    printf("%02x %02x %02x ", 0x54, 0x20+$v, $data[$i]);
    $i++;
    for ($a=0x38; $a <= 0xf8; $a += 8, $i++) {
        printf("%02x %02x %02x ", 0x54, $a+$v, $data[$i])
    }
}

sub play_note {
    my ($v, $note, $length) = @_;
    printf("%02x %02x %02x ", 0x54, 0x28+$v, $note);
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
    printf("%02x %02x %02x ", 0x54, 0x08, 0x78+$v); # key down for voice $v
    for ($i = 0; $i < $length; $i++) { # wait
        printf("%02x ", 0x63);
    }
}

sub release_voice {
    my ($v) = @_;
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
}

voice_init(0);
voice_init(1);
voice_init(2);

for (1..3) {
    play_note(0, 0x4a, 8); # 4 means octave 4, A means A (concidence)
    play_note(1, 0x50, 8); # 5 means octave 5, 0 means C#
    play_note(2, 0x54, 32); # 5 means octave 5, 4 means E (But it's just three semitones from C# to E? Shouldn't it be 0x53? No, for some reason, 3, 7, B, and F are skipped)
    release_voice(0);
    release_voice(1);
    release_voice(2);
}
printf("\n");

Now that we have a koto, let’s play a tune that sounds good on the koto!

#!/usr/bin/perl

my @data = (0xDC, 0x01, 0x33, 0x31, 0x34, 0x31, 0x21, 0x17, 0x1F, 0x0A, 0xda, 0xdc, 0xdd, 0xdf, 0x08, 0x04, 0x05, 0x8a, 0x05, 0x02, 0x04, 0x03, 0x27, 0x36, 0x14, 0x15); # koto

sub voice_init {
    my ($v) = @_;
    my $i = 0;
    printf("%02x %02x %02x ", 0x54, 0x20+$v, $data[$i]);
    $i++;
    for ($a=0x38; $a <= 0xf8; $a += 8, $i++) {
        printf("%02x %02x %02x ", 0x54, $a+$v, $data[$i])
    }
}

sub play_note {
    my ($v, $note, $length) = @_;
    printf("%02x %02x %02x ", 0x54, 0x28+$v, $note);
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
    printf("%02x %02x %02x ", 0x54, 0x08, 0x78+$v); # key down for voice $v
    for ($i = 0; $i < $length; $i++) { # wait
        printf("%02x ", 0x63);
    }
}

sub release_voice {
    my ($v) = @_;
    printf("%02x %02x %02x ", 0x54, 0x08, 0x0+$v); # release voice $v
}

voice_init(0);

$QUA_NOTE_LEN = 32;
for (1..2) {
    play_note(0, 0x4a, $QUA_NOTE_LEN);
    play_note(0, 0x4a, $QUA_NOTE_LEN);
    play_note(0, 0x4d, $QUA_NOTE_LEN*2);
}

play_note(0, 0x4a, $QUA_NOTE_LEN);
play_note(0, 0x4d, $QUA_NOTE_LEN);
play_note(0, 0x4e, $QUA_NOTE_LEN);
play_note(0, 0x4d, $QUA_NOTE_LEN);

play_note(0, 0x4a, $QUA_NOTE_LEN);
play_note(0, 0x4d, $QUA_NOTE_LEN/2);
play_note(0, 0x4a, $QUA_NOTE_LEN/2);
play_note(0, 0x45, $QUA_NOTE_LEN*2);

play_note(0, 0x44, $QUA_NOTE_LEN);
play_note(0, 0x3e, $QUA_NOTE_LEN);
play_note(0, 0x44, $QUA_NOTE_LEN);
play_note(0, 0x45, $QUA_NOTE_LEN);

play_note(0, 0x44, $QUA_NOTE_LEN);
play_note(0, 0x44, $QUA_NOTE_LEN/2);
play_note(0, 0x3e, $QUA_NOTE_LEN/2);
play_note(0, 0x3d, $QUA_NOTE_LEN*2);

for (1..2) {
    play_note(0, 0x4a, $QUA_NOTE_LEN);
    play_note(0, 0x4a, $QUA_NOTE_LEN);
    play_note(0, 0x4d, $QUA_NOTE_LEN*2);
}

play_note(0, 0x44, $QUA_NOTE_LEN);
play_note(0, 0x45, $QUA_NOTE_LEN);
play_note(0, 0x4d, $QUA_NOTE_LEN/2);
play_note(0, 0x4a, $QUA_NOTE_LEN/2);
play_note(0, 0x45, $QUA_NOTE_LEN);

play_note(0, 0x44, $QUA_NOTE_LEN*3);

release_voice(0);

printf("\n");
sakura.vgm

Adapting vgmplay-msx to work from a MSX1 cartridge without MSX-DOS

Summary

I have a Yamaha MSX1 (YS-503) with 64 32 KB of RAM and an SFG-01, which has a YM2151 on it. I do not have a floppy drive, but I have a way to easily “make cartridges” that are up to 48 KB in size. This blog post explores the source code of vgmplay-msx and ports portions of the program to work off a cartridge. Here’s how the result looks in openMSX:

Here are ROM files that work in OpenMSX, one with the SFG-01 inserted into slot 2, and the other with the SFG-01 inserted into slot 3, both playing the first ~20 seconds of track 2 on https://vgmrips.net/packs/pack/fantasy-zone-ii-dx-sega-system-16c, “10 Years After ~ Cama-Ternya [Demo]”.

vgmplay-msx deep dive

VGM files have a 128 or 256-byte header followed by the actual song data. The song data entirely consists of 1-byte commands possibly followed by a couple bytes of arguments to the command. The only commands we are interested in are “YM2151 register write” and the “wait” commands, of which there are a few. (And maybe the end of song/loop commands.) Everything else is irrelevant for our setup and what we want to do.

We only have 48 KB of ROM space, which means that it’s a bit of a tight fit for the program and the song data. The stock vgmplay.com file is about 32 KB, but it includes code (src/drivers/) for a lot of chips. We only need src/drivers/SFG.asm. There are also vast regions of 0s. We also don’t need any code to make song data fit into more than 64 KB of RAM (src/MappedReader.asm). We don’t need support for compressed .vgm files. And we don’t need any MSX-DOS-specific code, nor do we need code to handle reading from the floppy drive. Song data tends to be relatively large too: the song I used in Raspberry Pi Pico implementation of the YM3012 DAC (mono) was around 1 minute and is 68 KB in size. We’ll have to either truncate it, or find something shorter or simpler.

vgmplay-msx is written in a rather unusual assembly dialect. The assembler supports scoping, and there appears to be a bit of a “class” hierarchy. For example, MappedReader (src/MappedReader.asm) extends Reader (lib/neonlib/src/Reader.asm).

After putting the MSXDOS22.ROM into .openMSX/share/systemroms and booting from MSXDOS22.dsk, and adding the Yamaha SFG-01 extension (Hardware -> Extensions), and executing ‘make’ in the vgmplay-msx source directory, I was able to execute ‘vgmplay foo.vgm’ in MSX-DOS and hear the VGM file being played back in openMSX. After reading the code for a little bit, I opened and connected the debugger. In System -> Symbol manager, we can read the symbols generated by the assembler, vgmplay.sym, which are quite convenient.

Note: openMSX debugger fails to show the correct disassembly when there is a label in the middle of an instruction. Below, 427F 26 db #26 and 4280 79 ld a,c are actually a single instruction, which you can manually decode using something like this:

$ echo -n -e '\x26\x79' > foobar
$ z80dasm foobar
; z80dasm 1.1.5
; command line: z80dasm foobar
        org     00100h
        ld h,079h
Here, in Reader_Read_IY we read a byte from the VGM music data. We then create an address by reading one byte from 79xx (where xx is the read byte) and one byte from 80xx (again, xx is the previously read byte) and jump to it. This is the main jump table.

The jump table is defined in src/Player.asm, and for efficiency reasons is separated into two in Player_InitCommandsJumpTable in the same file.

; Shuffles the commands jump table so that the LSB and MSB are separated.
; This allows faster table value lookups.
Player_InitCommandsJumpTable: PROC
...

The byte we read was a 0x54, which indicates that we are going to write to the YM2151. This is where we have jumped:

We are going to read a word (two bytes). The first byte holds the address of the YM2151 register to write to, the second byte the data. It looks like the next instruction has been butchered by the openMSX debugger again. C3F069 is actually “JP 69F0”.
Reader_ReadWord_IY simply calls Reader_Read_IY twice. (The first screenshot also used Reader_Read_IY to fetch the command byte.) The address goes into the E register, the data into the D register.

We have now jumped to 69F0. The source file is src/drivers/SFG.asm.

There are a few things to unpack here.

First of all, we look into the address value and may jump to MaskIRQEN or MaskCT if the address is exactly 0x14 or 0x1b, respectively. Our E is set to E8, so that doesn’t apply here, so we fall right through into SFG_instance.WriteRegister. I am going to guess that the MaskIRQEN and MaskCT sections modify some bits in the address register to perhaps turn off a feature in the YM2151 that would trigger output on the interrupt or one of the CT pins, but I don’t know for sure. Here’s a pinout of the YM2151 BTW:

There are IRQ and CT pins, and IIRC they are output pins.

Next, let’s edit the symbol file to work around the debugger’s inability to disassemble instructions that have labels in the middle… Search for ‘6a02’ and ‘6a07’ in the symbol file, remove the symbol file from the debugger, and add it back in again. Then our WriteRegister function becomes a little clearer:

We read from and write to the A8 I/O port. This screenshot is from after executing the OUT instruction, which modifies the slot selection; the modifed slots are highlighted in red in the upper-right corner.

The SFG’s YM2151 registers are memory-mapped(!) at the following addresses:

SFG_YM2151_STATUS: equ 3FF1H
SFG_YM2151_ADDRESS: equ 3FF0H
SFG_YM2151_DATA: equ 3FF1H
SFG_ID_ADDRESS: equ 80H
SFG_CLOCK: equ 3579545

If you know quite a bit about how the MSX works, you may know that the MSX in general doesn’t use memory-mapped I/O, and you may also know that 0000-3FFF is where the system ROM is usually located (mapped; it can be unmapped and something else can be mapped instead). In the screenshot above, you can see that there’s an “in b,(c)” instruction at 69FF, where C holds #A8. This is the I/O register that allows you to remap stuff. See this link if you want to know more about how this register works: http://map.grauw.nl/resources/msx_io_ports.php#ppi. (BTW, this page is probably authored by the same person who wrote vgmplay-msx.) So “in b,(c)” saves the contents of the #A8 into B.

In order to perform memory-mapped I/O, we have to unmap any ROM or RAM currently mapped in. And when we’re done, we obviously have to map it back in. (Oh, good that we saved the #A8 contents into B.) ROM and RAM mappings vary between MSX models, which means the OUT part of the code is probably generated dynamically somewhere in the init code. (Hence the labels in the middle of our instructions.) The next instructions (6A05 and 6A06) save the contents of the subslot register (FFFF) into E (so we can change it back later) and set the subslot register to 0. (Note that at this point our register address has already been moved into register A, while data is still in register D.)

After the OUT is done, we just write our address to SFG_YM2151_ADDRESS and our data to SFG_YM2151_STATUS (which is an alias of SFG_YM2151_DATA, the address is 100% the same). The “cp (hl)” instruction in the middle is just to wait a short moment according to the comment in the source code: “; R800 wait: ~4 bus cycles”. When we’re done, we set the slot and subslot registers back to what they were.

So that’s how we perform a register write. We also need to know how to wait a specific number of cycles. VGM files are full of wait commands, and if the amount of waiting we do is too imprecise, that will definitely be audible. The wait commands in VGM files assume an output sample rate of 44100 Hz, which is different from the actual sample rate on real hardware. The number specifies the number of samples to just leave the YM2151 alone to do its thing. In reality, the YM2151 in the SFG-01 outputs at 3579545/2/32 = 55930.390… Hz. The Z80 runs at 3579545 Hz. So we get 64 Z80 cycles per sample, but because the VGM file wait cycles assume a different sample rate, just adding NOPs would end up being rather imprecise. What’s more, some VGMs are for machines where the YM2151 is clocked at 4000000 Hz, which results in an output rate of 4000000/2/32 = 62500 Hz.

So let’s… jump back to our jump table to see what happens when a wait instruction is encountered!

	dw Player_WaitNSamples        ; 61H
	dw Player_Wait735Samples      ; 62H
	dw Player_Wait882Samples      ; 63H
...
	dw Player_Wait1Samples        ; 70H
	dw Player_Wait2Samples        ; 71H
	dw Player_Wait3Samples        ; 72H
	dw Player_Wait4Samples        ; 73H
	dw Player_Wait5Samples        ; 74H
	dw Player_Wait6Samples        ; 75H
	dw Player_Wait7Samples        ; 76H
	dw Player_Wait8Samples        ; 77H
	dw Player_Wait9Samples        ; 78H
	dw Player_Wait10Samples       ; 79H
	dw Player_Wait11Samples       ; 7AH
	dw Player_Wait12Samples       ; 7BH
	dw Player_Wait13Samples       ; 7CH
	dw Player_Wait14Samples       ; 7DH
	dw Player_Wait15Samples       ; 7EH
	dw Player_Wait16Samples       ; 7FH
	dw Player_Skip1               ; 80H
	dw Player_Wait1Samples        ; 81H
	dw Player_Wait2Samples        ; 82H
	dw Player_Wait3Samples        ; 83H
	dw Player_Wait4Samples        ; 84H
	dw Player_Wait5Samples        ; 85H
	dw Player_Wait6Samples        ; 86H
	dw Player_Wait7Samples        ; 87H
	dw Player_Wait8Samples        ; 88H
	dw Player_Wait9Samples        ; 89H
	dw Player_Wait10Samples       ; 8AH
	dw Player_Wait11Samples       ; 8BH
	dw Player_Wait12Samples       ; 8CH
	dw Player_Wait13Samples       ; 8DH
	dw Player_Wait14Samples       ; 8EH
	dw Player_Wait15Samples       ; 8FH

As we can see, there are a lot of commands that perform waits. For Player_Wait1Samples, we jump to 492F:

The “exx” instruction switches between the directly usable registers and the shadow registers. (“EXX exchanges BC, DE, and HL with shadow registers BC’, DE’, and HL’.”) Wow, the Z80 has so many registers. All we do here is add 1 to hl’. In a special case, we pop AF from the stack, but we have no choice but to ignore that for now. And basically, Player_Wait2Samples, Player_Wait3Samples, …, Player_Wait16Samples, Player_Wait735Samples, Player_Wait882Samples, all work the same. Player_WaitNSamples grabs its argument, and apart from that also works the same as the others. Here’s a screenshot of the stack, and it’s always the same for all Player_Wait* sections:

That is, we are going to jump back to 4279, and we have already seen the code at 4279. Scroll up to see it again. It’s our main loop body, where we grab a command and use the jump table to jump somewhere. (What I hadn’t noticed or mentioned above was that it begins with a “push ix” command, which seemingly puts 4279 back on the top of the stack each time.)

Well, this is a good time to think about that “ret nc pop af ret” bit again, right? If the carry flag is set, we do not return. Instead, we grab 4279 off the stack and shove it into the AF register. Then we return, and this time we should return to 4313, according to the above stack screenshot. The carry flag is set if shadow HL overflows. Currently, it’s FF71. Hmm, just a few F9 presses maybe.

Intermission, sort of

But let’s take a step back and think about what we have seen so far. Perhaps the MSX is just way too slow to play VGMs in real time with perfect timing, and it just makes sense to skip all wait commands and just sync whenever the carry flag is set?

There’s a lot of timing code, and it’s all a bit complicated because the code seems very un-assembly-like. (But as this piece of software supports many different configurations, the somewhat object-oriented patterns may maybe make sense.) Looking back at the projects homepage, it appears that on the MSX2, the timing is 300 Hz, so perhaps that means the waits are ignored as they are encountered, but everything is put in sync (up to?) 300 times a second. It looks like on the MSX1 the timing is either 50 or 60 Hz.

The timing resolution is 50 or 60 Hz on MSX1 machines with a TMS9918 VDP, 300 Hz on machines with a V9938 or V9958 VDP, 1130 Hz if a MoonSound or OPL3 is present, and 4000 Hz on MSX turboR.

https://www.grauw.nl/projects/vgmplay-msx/

While the site says that vgmplay-msx works on MSX1 machines, I’m not entirely sure what kind of hardware configuration in e.g. openMSX would allow us to do that, because vgmplay-msx needs 128 KB of RAM, and MSX-DOS2. As far as I know you also can’t give an existing MSX2 machine an MSX1-class TMS9928A VDP, because the MSX2 logo requires the V9938. (Maybe you could try to give it an MSX1 BIOS too, but I think I’m outta here.)

(End of intermission)

So what we’re going to do is: recompile without LineTimer support by commenting out in src/timers/TimerFactory.asm:

TimerFactory_Create:
; 	call TimerFactory_CreateTurboRTimer ; this line and
; 	call nc,TimerFactory_CreateOPLTimer ; this line and
; 	call nc,TimerFactory_CreateLineTimer ; this line.
	call nc,TimerFactory_CreateVBlankTimer ; this line is left as-is
	ret

So anyway, we’re now running using the VBlankTimer and added breaks like this:

And after the ret we end up here:

Note that Application_instance.player.time is a variable, not something disassemblablablable.

So what we do here: we save our shadow HL (which has gone past FFFF; it’s currently 0AF9) to a variable called Application_instance.player.time. And after the next ret we’re here:

At some point, we get to “Application_instance.player.timerFactory.vBlankTimer.Update”. Wait, what language is this again?

Over here we loop between 42A1 and 42A5 until the JIFFY value contains something different.

Here’s the code with more symbols:

Update: PROC
	ld b,(ix + VBlankTimer.lastJiffy)
Wait:
	ld a,(JIFFY)
	cp b
	jr z,Wait

Wait, what’s JIFFY? It’s actually a system variable:

Address: FC9Eh
Name: JIFFY
Length: 2

Contains value of the software clock, each interrupt of the VDP it is increased by 1. The contents can be read or changed by the function ‘TIME’ or instruction ‘TIME’

https://www.msx.org/wiki/System_variables_and_work_area

After the tight loop is finished, we update lastJiffy with the new JIFFY. Then we set a value for our shadow HL. We either initialize DE with 0x2DF or 0x372 depending on whether we’re on 60 Hz or 50 Hz. (Update: I don’t think this routine works on the MSX1!) Then, we jump to a callback that was set way back when our Timer was first created (i.e., during program initialization), in src/Player.asm:

Player_Construct:
	ld (ix + Player.vgm),e
	ld (ix + Player.vgm + 1),d
	call Player_SetLoops
	ld hl,Player_commandsJumpTable
	call Scanner_Construct
	push ix
	ld e,ixl
	ld d,ixh
	ld hl,DEBUG ? Player.UpdateDebug : Player.Update
	add hl,de
	call Player_GetTimerFactory
	call TimerFactory_Create
	call nc,System_ThrowException
	ld e,ixl
	ld d,ixh
	pop ix
	ld (ix + Player.timer),e
	ld (ix + Player.timer + 1),d
	jr Player_ConnectPCMWrite

Actually executing the code we see that we end up in Application_instance.player.Update:

Application_instance.player.time causes the disassembly to look wonky. 4305-4307 loads Application_instance.player.time, currently set to 0292, into the HL register. The next valid instruction is 4308. Also, Application_instance.vgmFactory._size is an alias of Scanner.Process (which is the code we have seen a number of times, where we grab a byte and use the jump table)

We are about to do “sbc hl,de”. The HL register has 0292, DE contains 02DF or 0372, depending on whether the system is 60 Hz or 50 Hz. Note: I don’t think the routine to figure out whether the system is 50 or 60 Hz works on MSX1s.

Address: FFE8h
Name: RG09SAV
Length: 1

System saves the byte written to the register R#09 here, Used by VDP(10). (MSX2~)

https://www.msx.org/wiki/System_variables_and_work_area#VDP_Registers

Anyway, I don’t really know where the carry flag that SBC is supposed to take into account comes into play (it’s not set in the code visible in the above screenshot). But anyway, 0292 – 02DF = FFB3. And this is the value we will add numbers to again in the Wait* procedures. Or let’s use our brains just one more time:

  • There are 0x02DF samples per 60 Hz VSYNC (or 0x0372 samples per 50 Hz VSYNC)
  • We already “overshot” our previous target by 0x0292 samples in the previous run
  • We have 0x02DF-0x0292=0x4D samples left until we should wait for VSYNC again
    • Note: 0x10000-0xFFB3=0x4D

We have now seen enough to take just the parts we need.

What parts do we need?

  • Jump table
    • We’ll edit it to remove support for anything but the YM2151 though
    • We’ll also need the remaining jump locations of course (loop, end of song, wait, etc.)
    • (Also code to make jump table more efficient)
  • SFG register writing code
    • (If possible, also init code to figure out correct slot selection register values.)
  • Timing code
    • VSYNC-based timer only
  • The song data will be directly on the cartridge, so

Let’s do it

For convenience/compatibility with the existing code we will be using the same assembler, Glass, though I don’t think we’ll be using any of its unique features. We won’t be using constructors or a heap, even, but we will use the stack in the same way the existing code is using it.

Cartridge contents are mapped to 4000-7FFF, or if no cartridge was detected at 4000-, then 8000-BFFF. (The MSX BIOS maps in the candidate addresses (starting with 4000-7FFF) and checks for the presence of a header at the beginning of this address space to see check if a cartridge is inserted.) Thus, programs must start like this (I added some useless code in the entry_point that makes it easier to test that this thing is working):

org 4000h ; hex number syntax may differ from assembler to assembler
db "AB" ; all cartridges have this
dw entry_point ; 16-bit absolute pointer
db 00,00,00,00,00,00 ; can be anything probably

entry_point:
    nop
    jp entry_point

Compilation example if saved as foo.asm:

$ z80asm -o foo.rom foo.asm
$ dd if=/dev/zero of=foo.rom bs=16384 seek=1 count=0 # actually creates a sparse file but that's fine for all intents and purposes
$ hexdump -C foo.rom 
00000000  41 42 aa 0f 00 00 00 00  00 00 00 18 fd 00 00 00  |AB..............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000
The disassembler is (rightly) confused again at 3FFF, but we can see our code and we are indeed executing a NOP and then jumping right back to that NOP, ad infinitum.

However, we only got our cartridge mapped up to 7FFF, but if we want its whole 48 KB mapped, we need to set port #A8. #A8 of course holds an 8-bit number, but you should interpret it as four 2-bit numbers. “00 00 00 00” (0x00) would mean that everything is on slot 0. “01 01 01 01” (0x55) would mean that everything is on slot 1. You can choose any combination your hardware likes.

All MSXs have Main-ROM in primary slot 0 or in secondary slot 0-0 (see variable EXPTBL below for more details). Cartridge slot that is on top of computer is typically slot 1. If there is another expansion port then this is often slot 2. Although the internal RAM should preferably be in slot 3, this is often not the case for MSX1s.

https://www.msx.org/wiki/Slots

The cartridge slot is “typically slot 1”. I don’t know if there are any computers that have a different number, but it’s easy to determine the number in software: we’re running off 4000-7FFF, so the slot is already set correctly here. We just need to set the page we want to the same number.

Now, if we want to set 4000-FFFF to the cartridge, we won’t have any RAM. And therefore, no stack. vgmplay uses the stack (as seen earlier). vgmplay also uses the BIOS (mapped into 0000-3FFF) because we need the VSYNC interrupt handler, and this interrupt handler writes to a system variable called JIFFY, which is located at FC9E, as mentioned above. We could decide to leave the last slot for RAM, limiting the amount of song data we can play to less than 32 KB. But the amount of RAM we need isn’t exactly very much. In addition, as we have seen, some parts of the vgmplay code are self-modifying.

So what we’ll do instead is: copy the entire cartridge to RAM. Then we’ll be able to use self-modifying code, and we’ll be able to have a stack too. Sounds easy? Well, let’s say we have our copying routine at ROM address 4100. What happens if we try to copy the ROM at 4000-7FFF to RAM? We’ll execute instructions at 4100, and these instructions say to switch 4000-7FFF to RAM, which we do, and then what? We’ll have pulled the carpet from under our feet! One way to avoid this problem is by having two (or more) copies of the copying code.

So first we could copy 8000-BFFF to RAM, which doesn’t require any precautions. For the page starting at C000, we probably shouldn’t copy past F000, which is where the stack and some system variables appear to be located. (Not that the stack holds anything.) But otherwise no precautions are required. Then, we jump to (e.g.) 8100, where we have another copy of our copy routine, and copy 4000-7FFF. (We’ll choose a different location for the second copy because we’d expect the song data to start before 8100. Perhaps F200 or so.) If we copy byte-by-byte (or word-by-word), switching between ROM and RAM every time, we can use a single register to hold the read value and won’t need any buffer memory (which would complicate things a bit). (It finishes in less than 1 second per 16 KB, so no issues here as this is init code.)

One more annoyance is that since we do not have a stack, it doesn’t make a lot of sense to ‘call’ the copy code. Jumping to the code isn’t exactly fun either because we’d have to know where to jump back, and the register dance becomes a little annoying. So we’ll just copy and paste the code. Here’s the first-draft code to copy 4000-F000 from ROM to RAM. This code assumes that the BIOS ROM is in slot 0, the cartridge ROM in slot 1, all RAM in slot 3, and that there are no subslots. This assumption doesn’t hold on many systems. This code can be compiled using z80asm using the following command line:

z80asm -o foo.rom foo.asm
org 4000h
db "AB"
dw entry_point
db 00,00,00,00,00,00

entry_point:
    ld a,01010100b ; set rom - cart - cart - cart
    out (0a8h),a

    ; copy 8000-bfff
    copy_8000_bfff:
    ld hl,08000h ; start at 8000
copy_8000_bfff_loop:
    ld a,(hl) ; read from ROM address (hl)
    ld d,a
    in a,(0a8h)
    ld b,a ; store original value
    or 000110000b ; set bits 5-4 to 11 (RAM) (actual value depends on machine)
    out (0a8h),a ; set port
    ld (hl),d ; store value read from ROM address (hl) to RAM address (also hl of course)
    ld a,b ; load a with original value
    out (0a8h),a ; set port back
    inc hl
    ld a,h
    cp 0c0h
    jp z,copy_c000_f000 ; done with this copy
    jp copy_8000_bfff_loop
    
copy_c000_f000:
    ld hl,0c000h ; start at c000
copy_c000_f000_loop:
    ld a,(hl) ; read from ROM address (hl)
    ld d,a
    in a,(0a8h)
    ld b,a ; store original value
    or 011000000b ; set bits 5-4 to 11 (RAM) (actual value depends on machine)
    out (0a8h),a ; set port
    ld (hl),d ; store value read from ROM address (hl) to RAM address (also hl of course)
    ld a,b ; load a with original value
    out (0a8h),a ; set port back
    inc hl
    ld a,h
    cp 0f0h
    jp z,copy_4000_7fff ; done with this copy
    jp copy_c000_f000_loop

done_copying:
    ld a,011111100b ; switch to 4000-ffff to RAM
    out (0a8h),a
nop:
    nop
    jr nop ; infinite loop

seek 0b000h ; b000+4000 = f000
org 0f000h

copy_4000_7fff:
    ld hl,04000h ; start at c000
copy_4000_7fff_loop:
    ld a,(hl) ; read from ROM address (hl)
    ld d,a
    in a,(0a8h)
    ld b,a ; store original value
    or 000001100b ; set bits 5-4 to 11 (RAM) (actual value depends on machine)
    out (0a8h),a ; set port
    ld (hl),d ; store value read from ROM address (hl) to RAM address (also hl of course)
    ld a,b ; load a with original value
    out (0a8h),a ; set port back
    inc hl
    ld a,h
    cp 080h
    jp z,done_copying ; done with this copy
    jp copy_4000_7fff_loop

This is pretty almost all that we need to code ourselves, everything else will be copy and pasted from vgmplay-msx! Note: the above doesn’t actually assemble in Glass because Glass requires that reserved words like “org” are indented, and “seek” isn’t supported. The final code will therefore look a bit different.

Putting all the necessary bits together (and throwing out everything else)

I got it to work on an emulated version of my Hitachi H2. (My apologies to the original author of vgmplay. I completely butchered their code.) And right when I start looking at the slot map of the computer I actually intended to run this one (Yamaha YIS-503), I noticed that the silly machine only has 32 KB of RAM, har har har. And (this was expected though not quite to this extent) the slot map is different, so the #a8 port settings will have to adjusted. With 64 KB we could load the whole 48 KB ROM into RAM, but with 32 KB, arranged the way it is, we’ll boot from 8000 and ignore 4000-7FFF. (We could rewrite the self-modifying code to refer to variables in RAM space instead, but unfortunately I’m running out of steam on this project. I’d planned two days and it’s about three days already! :p)

It ended up working on this machine too, of course. Except, I think that openMSX might be putting the SFG-01 into a different slot (2), rather than slot 3 as sort of indicated in the above screenshot. Due to the limited amount of RAM, our music gets truncated even earlier than before. Feel free @ anyone wanting to fix this. TBH, I just want to be able to hear if the music sounds okay or not. (Edit 2023/04/08: I checked on a real YIS-503 and the SFG-01 was indeed in slot 3, so the above screenshot is correct.)

Code is at https://github.com/qiqitori/vgmplay-msx-cartridge.