Development – Page 3 – The Qiqitori Blogs

MSX / MSX2 bank switching, and short simple RAM test in BASIC

In this article, we’ll create a short memory test for use on MSX/MSX2 machines to check the lower 32 KB of RAM. Why only the lower 32 KB of RAM? Because you can check the higher 32 KB using pure BASIC PEEKs and POKEs, and generally software won’t run if the higher 32 KB has defects. (Many games may still run even with defects in the lower 32 KB.)

It’s useful to have a test that can be run from BASIC and that can be typed into the machine in a couple minutes.

MSX bank switching

The MSX has a CPU that can only address 65536 addresses, but can have 64 KB of RAM and at least 16 KB of ROM. In a previous article, I mentioned that that is probably handled by copying ROM into RAM, but that is wrong. Instead, there is a chip that has a couple registers and enables/disables ROM/RAM chips based on the value of one of those registers.

On some MSX machines (but not the ones I tried) you may be able to read out that register from BASIC:

print inb(&ha8)

Summary: ROM/RAM can be switched in/out in 16 KB chunks, 0x0000-0x3fff, 0x4000-0x7fff, 0x8000-0xbfff, 0xc000-0xffff. There are four choices possible for each chunk. The register is 8 bits: 2 bits for the first chunk, 2 bits for the second chunk, etc.

There is another register (or rather another register for each slot) though, and it’s often used on MSX2 machines. This register is accessed in memory space, not I/O space. (Memory address 65535, requires that the correct slot be selected in the 0xc000-0xffff chunk.) This register gives us another four choices to select a different ROM/RAM for each choice already made. In other words, we have 4 pages (chunks), 4 slots (ROM/RAM choices), and for each slot, another 4 “subslots” (ROM/RAM choices). If you didn’t 100% understand that, don’t worry, I don’t actually think it’s comprehensible the way I wrote it. But you may still be able to follow the discussion below.

When we start BASIC on a 64 KB machine, we’ll probably have the lower 32 KB mapped to some kind of ROM, and the upper 32 KB mapped to RAM. BASIC then lets us use about 28 KB of that RAM and reserves about 4 KB for its own purposes, or so I assume. (The firmware selects the right slots and subslots to make this work for the machine in question, and the required numbers are different depending on the machine’s configuration. Also remember that there are RAM extension cartridges. During the boot phase, the firmware actively probes the 16 KB chunks to see if something is RAM or not. BTW, if the RAM is sufficiently bad, it won’t detect it as RAM at all and you’ll never see the boot screen.)

BASIC runs from ROM. If we disable that ROM (the “slot” containing the ROM) and instead enable RAM (the “slot” containing the RAM) on that “page” (chunk), we’ll be pulling the carpet from under BASIC’s feet. So if we run a command like this in BASIC:

out &ha8,0

The system will freeze immediately. So instead, we’ll be writing our memory test in assembler and poke it into memory, and then execute it.

We’ll be using a total of 9 instructions in our memory test program. Even if you have never seen Z80 assembly code, the following is almost all you need to know: “di” disables interrupts, just in case. “ei” re-enables interrupts. “ret” returns from our code, i.e., we’ll go back to BASIC. “ld” loads some 8-bit value to destination, from source. Numbers in (parentheses) are like dereference in C. (Except for in/out, you always uses parentheses there.) “cp” is compare argument with register “a”. (Some instructions require the use of register “a”.) “jp” is jump. “jp z,…” is “jump if equal”. “z” meaning, “if zero flag is set”.

Let’s first take a look at a short program that switches slots, undos the switch, and returns. It looks like this:

org 0c000h

di ; disable interrupts
ld a,0ffh ; put "255" in register "a" (the correct value depends on your machine!)
out (0a8h),a ; enable IO write, and put "0xa8" on the address bus and contents of register "a" on the data bus
;subslot
ld hl,0ffffh ; put "65535" in register "hl"
ld (hl),0ffh ; write "255" into *hl, i.e. into address 65535 (the correct value to write depends on your machine!)
;/subslot


; now undo everything:

ld a,0f0h ; put "240" in register "a" (the correct value depends on your machine!)
out (0a8h),a ; enable IO write, and put "0xa8" on the address bus and contents of register "a" (i.e., 240) on the data bus
;subslot
ld hl,0ffffh ; put "65535" in register "hl"
ld (hl),0f0h ; write "0" into *hl, i.e. into address 65535 (the correct value to write depends on your machine!)
;/subslot
ei ; enable interrupts
ret ; return to caller (BASIC)

This code can be assembled using “z80asm”, which is available in Debian’s repositories at least. z80asm outputs a file called “a.bin”. We can convert that file to unsigned 8-bit integers for use on BASIC data lines using od -t u1 a.bin.

$ z80asm all_ram_and_back.asm 
$ od -t u1 a.bin 
0000000 243  62 255 211 168  33 255 255  54 255  62 240 211 168  33 255
0000020 255  54  15 251 201
0000025

Anyway, now we just need to add some code between the two snippets above. We want code that writes to memory addresses and then compares what was written. Here’s the annotated assembly code to do that:

ld hl,00h ; put 0 in register "hl"
write: ; this is a label that we can use in "jp" (jump) instructions
ld (hl),0ffh ; put 255 in *hl, i.e. address 0
inc hl ; increment hl register
ld a,h ; put high byte of "hl" register into "a" register
cp 080h ; check if the "a" register contains 0x80
jp z,done_writing ; if yes, that means we have incremented the hl register a bunch of times and it's time to go check if what we wrote earlier is still there (i.e. we've written 255 into 0x0000 to 0x7fff)
jp write ; if we reach this instruction, that means that the previous instruction didn't perform its conditional jump. this instruction jumps back to the instruction at the "write" label (ld (hl),0ffh). i.e., we haven't reached 0x8000 yet.

done_writing: ; this is a label
ld hl,00h ; put 0 in register "hl"
compare: ; this is a label
ld a,0ffh ; put 255 in register "a"
cp (hl) ; compare contents of register "a" with contents of *hl, i.e. memory address 0
jp nz,bad ; we put 255 in there earlier but this address contains something else now, that means we have bad memory
inc hl ; if we reach this instruction, that means that the previous instruction didn't perform its conditional jump. increment hl register
ld a,h ; put high byte of "hl" register into "a" register
cp 0c0h ; check if the "a" register contains 0xc0
jp z,done ; if yes, that means we have incremented the hl register a bunch of times and it's time to go home
jp compare ; if we reach this instruction, that means that the previous instruction didn't perform its conditional jump. this instruction jumps back to the instruction at the "compare" label (ld a,0ffh)

bad:
...

done:
...

Finding the correct values for I/O port A8 and memory port 65535

To make the assembled code work on your MSX, you will most likely have to change the values to be written into the A8 I/O register and the values to be written into address 65535 to select the correct sub-slot.

Let’s take a look at the “Slot Map” section on msx.org’s wiki page for the Sony HB-T7, which is the machine I’m trying to debug.

We want to access RAM, and it’s at slot 3, subslot 3. In BASIC, I get the following output:

?peek(65535)
15

This output is inverted. 15 is 0b00001111 in binary, but we should read this as 0b11110000. The lowest two bits specify the subslot for the 0x0000-0x3fff block. The subslot is set to 0b00, i.e. 0 here. Same for 0x4000-0x7fff. On page 0x8000-0xbfff, we see that subslot 0b11, i.e., 3 is selected. Same for 0xc000-0xffff. This is as expected — the above table from the MSX wiki page specifies that all RAM is at subslot 3 (of slot 3).

If we want the lower pages to point to RAM, we not only have to set the A8 register to 3 (because all the RAM is at slot 3), we also have to select subslot 3.

I.e., to set the subslots for all four 16 KB chunks (“pages”) to use RAM, which is at slot 3 subslot 3, we have to write 0b11111111. To revert this, we have to write 0b11110000 (our peek showed 15 (0b00001111) at memory address 65535, but this is inverted, hence 0b11110000).

Customizing the memory test

In the above code, we write 0xff to all addresses, and then later check if 0xff is still there. However, one very common type of memory fault is “stuck bits”, where memory bits are always stuck at 1, even if we’d written 0. In order to test that, I recommend that you change “ld (hl),0ffh” and “ld a,0ffh” under the “write” and “compare” labels to different values.

We also have to think about what to do if we have encountered bad memory. One easy thing we can do is generate an audible click. (May be somewhat faint.) Here’s the code to do that:

bad:
ld a,15
out (0abh),a
ld a,14
out (0abh),a
ld a,15

Writing 15 and then 14 to 0xAB produces a click. You can do this in BASIC too:

out &hab,15:out &hab,14

Alternatively, we could write the memory address that contained something unexpected into some memory location, for example like this:

ld de,0c100h ; memory address to write to
ld a,h
ld (de),a
inc de
ld a,l
ld (de),a

This would write the significant byte of the failed address to 0xc100 and the insignificant byte to 0xc101.

Putting it all together

Here’s the assembly code for the whole test:

org 0c000h

di
ld a,0ffh
out (0a8h),a
;subslot
ld hl,0ffffh
ld (hl),0ffh
;/subslot
ld hl,00h
write:
ld (hl),0
; ld (hl),l ; alternative code, fills RAM with 0x00-0xff, 0x00-0xff, ...
; nop
inc hl
ld a,h
cp 080h
jp z,done_writing
jp write

done_writing:
ld hl,00h
compare:
ld a,0
; ld a,l ; alternative code, see above
; nop
cp (hl)
jp nz,bad
inc hl
ld a,h
cp 080h
jp z,done
jp compare

bad: ; audible click version
ld a,15
out (0abh),a
ld a,14
out (0abh),a
ld a,15

done:
ld a,0f0h
out (0a8h),a
;subslot
ld hl,0ffffh
ld (hl),0f0h
;/subslot
ei
ret

And here’s the short BASIC loader:

10 i=49152
20 read j
30 if j=-1 then goto 80
40 poke i,j
50 i=i+1
60 goto 20
70 data 243,62,255,211,168,33,255,255,54,255,33,0,0,54,255,35,124,254,128,202,25,192,195,13,192,33,0,0,62,255,190,194,44,192,35,124,254,128,202,54,192,195,28,192,62,16,211,171,62,14,211,171,62,15,62,240,211,168,33,255,255,54,240,251,201,-1
80 def usr1=49152

run
Ok.
x=usr1(0)
Ok.

If you get “Ok.” after “x=usr1(0)”, the machine hasn’t crashed (which means your slot and subslot selections were correct). If you heard a click, your memory is probably defective.

The click may be hard to hear, so maybe try running “x=usr1(0)” in a loop:

for i=0 to 100:x=usr1(0):next

Poor man’s unit tests for C (and maybe C++)

Developing software is mostly about tradeoffs.

Make the software easy to use by getting rid of advanced features? Make the software featureful but harder to use?

Make the software comfortable to develop for but invest a lot of time setting up frameworks and maintaining them? Or make the software a bit less comfortable to work on, but avoid spending a bunch of learning/maintaining the frameworks?

Well, it’s all up to you, and I have a strong belief that people shouldn’t have strong beliefs about this! Er, okay.

Let’s say you want to add a couple tests (b.c) for individual static functions hidden in a .c file somewhere (a.c). Well, you can’t access those static functions from other .c files. Put the tests in the .c file? Some people would call it ugly. You could also write a script that concatenates the source file and test file and compiles that instead! Bit messy. You can’t have multiple main() functions, etc. But have you ever considered making the “static” keyword disappear using the C preprocessor? It works! And it can be messy too because it’ll make your static variable non-static. Great. But there are cases where that doesn’t matter, especially when we’re just trying to run some unit tests. Here’s a minimum example:

a.c

#include <stdio.h>

static void a()
{
    printf("Hi :,\n");
}

b.c

void a(); // prototype

int main(int argc, char **argv)
{
    a();
}

$ cc -o foo a.c b.c
/usr/bin/ld: /tmp/ccpspoxC.o: in function `main':
b.c:(.text+0x15): undefined reference to `a'
collect2: error: ld returned 1 exit status
$

It doesn’t work, duh. Because a() is static.

And now we’re going to make static disappear and it’ll work, so you can put all your tests in what we called b.c:

$ cc -Dstatic= -o foo a.c b.c
$ # no errors

If you don’t have control over the code base you are working on but still want some quick tests, this hack may be useful.

Raspberry Pi Pico 15.6 KHz analog RGB to VGA upscaler (part 1? POC? WIP?)

My Hitachi MB-H2 MSX machine has an analog RGB port that produces a 15.6 KHz CSYNC (combined horizontal and vertical) signal and analog voltages indicating how red, green or yellow things are.

I recently unearthed my old LCD from 2006 or so and decided to see if I could get it to sync if I just massaged the CSYNC signal a bit to bring it to TTL levels and connected a VGA cable.

(Technical details: when you connect a VGA cable to a monitor that is powered on, you will often first of all see a message like “Cable not connected”. To get past that problem, you first have to ground a certain pin on the VGA connector. I found that female Dupont connectors fit reasonably well on male VGA connectors so I just used a cable with female Dupont connectors on both ends to connect the two relevant pins. I’m not sure if it’s the same pin on all monitors. You can find the pin by looking for a pin that should be GND according to the VGA pinout but actually has some voltage on it. Don’t blame me if you break your Dupont connectors by following this advice.)

Unfortunately, that didn’t work. I got “Input not supported”, and I am reasonably sure that is because my monitor doesn’t support 15 KHz signals. Aw, why’d I even bother taking it out of storage?

So what do we do… Well there is this library (PicoVGA) that produces VGA signals using the Raspberry Pi Pico’s PIOs. Raspberry Pi Picos are extremely cheap, just about 600 yen per piece where I am.

Damn, I’ve seen this in videos, but seeing this in real life, a tiny, puny microcontroller generating fricking VGA signals! Amazing. Just last year I was playing around with monochrome composite output on an Arduino Nano, and even that was super impressive to me! (Cue people reading this 20 years in the future and laughing at the silly dude with the retro microcontroller from year-of-the-pandemic 2020. I’m sure microcontrollers in the 2040s will have 32 cores and dozens of pins with built-in 1 GHz DACs and ADCs and mains voltage tolerance, and will be able to generate a couple streams of 4K video ;D)

Some boring technical notes I took before embarking on the project, feel free to skip this section

Is the Pico’s VGA library magic? Yes, definitely. Can we add our own magic to simultaneously capture video and output it via the VGA library?
It sure looks like it! Why?

The Pico has two CPU cores, and the VGA library uses just one of them, the second core
- Dual-core microcontroller, that’s craziness
We may be able to use the second core a little bit anyway (“If the second core is not very busy (e.g. when displaying 8-bit graphics that are simply transferred using DMA transfer), it can also be used for the main program work.”)
- We will indeed be working with 8-bit graphics simply transferred using DMA
The Pico has two PIO controllers, and the VGA library uses just one (“The display of the image by the PicoVGA library is performed by the PIO processor controller. PIO0 is used. The other controller, PIO1, is unused and can be used for other purposes.”)

However:

We possibly won’t be able to use DMA all that much (“Care must also be taken when using DMA transfer. DMA is used to transfer data to the PIO. Although the transfer uses a FIFO cache, using a different DMA channel may cause the render DMA channel to be delayed and thus cause the video to drop out. A DMA overload can occur, for example, when a large block of data in RAM is transferred quickly. However, the biggest load is the DMA transfer of data from flash memory. In this case, the DMA channel waits for data to be read from flash via QSPI and thus blocks the DMA render channel.”)
- If we use PIO and DMA for capturing video-in, we might run into trouble there
- However, using DMA to capture and another DMA transfer to transfer the data to VGA out sounds somewhat inefficient; maybe it’s possible to directly transfer from capture PIO to VGA PIO? Would require modifications to the VGA library, which doesn’t sound so great right now (we didn’t do this)

That said, it’s likely that capturing without the use of PIO would be fast enough, generally speaking.
The “pixel clock” for a 320×200 @ 60 Hz signal is between 4.944 and 6 MHz according to https://tomverbeure.github.io/video_timings_calculator (select 320×200 / 60 in the drop-down menu), depending on some kind of mode that I don’t know anything about.
According to our oscilloscope capture of a single pixel on one of the color channels (DS1Z_QuickPrint22.png), we get about 5.102 MHz. Let’s take that value. We’ll hopefully be able to calculate the exact value at some point. (Yeah, the TMS59918A/TMS59928A/TMS59929A datasheet actually (almost) mentions the exact value! “The VDP is designed to operate with a 10.738635 (± 0.005) MHz crystal”, “This master clock is divided by two to generate the pixel clock (5.3 MHz)”. So it’s 5.3693175 MHz, thank you very much.)

This means that we have to be able to capture at exactly that frequency. From our previous experimental logic analyzer (which doesn’t use PIO) we were more than capable of capturing everything going on with our Z80 CPU — we had multiple samples of every single state the CPU happened to be in, and the CPU ran at 3.58 MHz. (However, if the VGA library chooses to set the CPU to use a lower clock frequency, we may run into problems. It’s possible to prevent the library from adjusting the clock frequency, but maybe that will impact image quality.) The main part of the code looked like this:

for (i = 0; i < LOGIC_BUFFER_LEN; i++) {
    logic_buffer[i] = gpio_get_all() & ALL_REGULAR_GPIO_PINS;
}

To capture video, we’d like to post-process our capture just a little bit, to convert it to 3-3-2 RGB. Or we could post-process our capture during VSYNC, but that would be a rather tight fit, with only 1.2 ms to work with. (Actually, our signal’s VSYNC pulse is even shorter than that, but there’s nothing on the RGB pins for a while before and after that.)

So our loop might look like this. (Note, the code I ended up writing looks reasonably similar to this, which is why I’m including this here.)

for (x = 0; x < 320; x++) {
    pixel = gpio_get_all();
    red = msb_table_inverted[((pixel & R_MASK) >> R_SHIFT) << R_SHIFT];
    green = msb_table_inverted[((pixel & G_MASK) >> G_SHIFT) << G_SHIFT];
    blue = msb_table_inverted[((pixel & B_MASK) >> B_SHIFT) << B_SHIFT];
    capture[y][x] = red | (green << 3) | (blue << 6);
}

Where msb_table_inverted is a lookup table to convert our raw GPIO input to the proper R/G/B values. This depends on how we do the analog to digital conversion, so the loop might look slightly different in the end.

Well, how likely is it that this will produce a perfectly synced capture? About 0% in my opinion. If we’re too fast, we’ll get a horizontally compressed image. If we’re too slow, the image will be wider than it should be, and more importantly, cut off on the right side.
In the first case, we may be able to improve the situation by adding the right amount of NOPs.
In the second case, we could reduce the amount of on-the-fly post-processing, and do stuff during HBLANK or VBLANK instead.
In addition, we might miss a few pixels on the left side if we can’t begin capturing immediately when we get our HSYNC interrupt. How likely is this to succeed? It might work, I think.

The PIOs can also be used without DMA. (Instead of using DMA, we’d use functions like pio_sm_get_blocking().) With PIO, we can get perfect timing, which would be really great to have. We can’t off-load any arithmetic or bit twiddling operations, the PIOs don’t have that. So let’s dig in and run some experiments.

Implementation

The pico_examples repository has a couple of PIO examples. The PicoVGA library has a hello world example. I thought the logic_analyser example in pico_examples looked like a good start. It’s really quite amazing.

You can specify the number of samples you’d like to read (const uint CAPTURE_N_SAMPLES = 96)
You can specify the number of pins you’d like to sample from (const uint CAPTURE_PIN_COUNT = 2)
You can specify the frequency you’d like to read at (logic_analyser_init(pio, sm, CAPTURE_PIN_BASE, CAPTURE_PIN_COUNT, 1.f), where “1.f” is a divider of the system clock. I.e., this will capture at system clock speed. We can specify a float number here.)
The PIO input is (mostly?) independent from what else you have going on on that pin, so the code of course proceeds to configure a PWM signal on a pin, and to capture from that same pin. Bonkers!

Well, let’s cut to the chase, shall we? I took parts of the logic_analyser code to capture the input from RGB, then wrote some code to massage the captured data a little bit, and then output everything using PicoVGA at a higher resolution. After some troubleshooting, I got a readable signal!

However, my capture has wobbly scanlines. Which is why there might be a part 2. And since it’s wobbly, I spent even less effort on the analog to digital conversion than I’d originally planned, which was already rather “poor man” (more on that later, because the code assumes that circuit exists).

I’m triggering the capture by looking for a positive to negative transition. (That’s already two out of the three instructions my PIO program consists of, one to wait for positive, one to wait for negative.) I currently don’t really know why my scanlines are wobbly. I had a few looks with the oscilloscope to see if there’s anything wrong in my circuit that converts CSYNC to TTL levels — for example, slow response from the transistor. But I didn’t find anything so far. :3 It’s of course entirely possible that the source signal is wonky. I’ve never had a chance to connect my MSX to a monitor that supports 15 KHz signals. (Now that’s a major TODO right there.) Of course there are other ways to check if the signal is okay.

We could also (hopefully) get rid of the wobbling by only paying attention to the VSYNC and timing scanlines ourselves, for example by generating them using the Pico’s PWM. As seen in the original logic_analyser.c code! But that’s something for part 2 I guess.

BTW, it’s unlikely that the wobbliness is being caused by a problem with the code or resource contention. I tested this by switching the capture to an off-screen buffer after a few seconds. The screen displayed the last frame captured into the real framebuffer, and was entirely static. I.e., I added code like this into the main loop (which you will see below):

+        if (j > 600) {
+            rgb_buf = fake_rgb_buf;
+            gpio_put(PICO_DEFAULT_LED_PIN, true);
+        } else {
+            j++;
+        }

Poor-man’s ADC

What I actually planned to do: the program I wrote expects four different levels of red, green, and blue. There are three pins per color, and if all pins of a color are 0, that means that color is 0, if only one is 1, that’s still quite dark, if two are 1, that’s somewhat bright, and if all three are 1, then that’s bright. The program then converts that into two bits (0, 1, 2, 3); PicoVGA works with 8-bit colors, 3 bits for red, 3 bits for green, 2 bits for blue. That means that we can capture all the blue we need, and for red and green we could scale the numbers a bit. However, I shelved that plan for now, because I don’t even have enough potentiometers at the moment, and if the signal is as wobbly as it is, that’s just putting lipstick on a pig. Instead, I just took a single color (blue, just because that was less likely to short my MacGyver wiring), and feed that into all colors’ “bright” pin.

As my MSX’s RGB signal voltages are a bit funky (-0.7 to 0.1 IIRC), I converted that to something the Pico can understand using a simple class A-kinda amplifier. The signal gets inverted by this circuit, but that’s fine for a POC. Completely blue will be black, and vice versa.

So here’s the code:

#include "include.h"

#include <stdio.h>
#include <stdlib.h>

#include "pico/stdlib.h"
#include "hardware/pio.h"
#include "hardware/dma.h"
#include "hardware/structs/bus_ctrl.h"

// Some logic to analyse:
#include "hardware/structs/pwm.h"

const uint CAPTURE_PIN_BASE = 9;
const uint CAPTURE_PIN_COUNT = 10; // CSYNC, 3*R, 3*G, 3*B

const float PIXEL_CLOCK = 5369.3175f; // datasheet (TMS9918A_TMS9928A_TMS9929A_Video_Display_Processors_Data_Manual_Nov82.pdf) page 3-8 / section 3.6.1 says 5.3693175 MHz (10.73865/2)
// from same page on datasheet
// HORIZONTAL                   PATTERN OR MULTICOLOR   TEXT
// HORIZONTAL ACTIVE DISPLAY    256                     240
// RIGHT BORDER                 15                      25
// RIGHT BLANKING               8                       8
// HORIZONTAL SYNC              26                      26
// LEFT BLANKING                2                       2
// COLOR BURST                  14                      14
// LEFT BLANKING                8                       8
// LEFT BORDER                  13                      19
// TOTAL                        342                     342

const uint INPUT_VIDEO_WIDTH = 308; // left blanking + color burst + left blanking + left border + active + right border

// VERTICAL                     LINE
// VERTICAL ACTIVE DISPLAY      192
// BOTTOM BORDER                24
// BOTTOM BLANKING              3
// VERTICAL SYNC                3
// TOP BLANKING                 13
// TOP BORDER                   27
// TOTAL                        262

const uint INPUT_VIDEO_HEIGHT = 240; // top blanking + top border + active + 1/3 of bottom border
const uint INPUT_VIDEO_HEIGHT_OFFSET_Y = 40; // ignore top 40 (top blanking + top border) scanlines
// we're capturing everything there is to see on the horizontal axis, but throwing out most of the border on the vertical axis
// NOTE: other machines probably have different blanking/border periods

const uint CAPTURE_N_SAMPLES = INPUT_VIDEO_WIDTH;

const uint OUTPUT_VIDEO_WIDTH = 320;
const uint OUTPUT_VIDEO_HEIGHT = 200;

static_assert(OUTPUT_VIDEO_WIDTH >= INPUT_VIDEO_WIDTH);
static_assert(OUTPUT_VIDEO_HEIGHT >= INPUT_VIDEO_HEIGHT-INPUT_VIDEO_HEIGHT_OFFSET_Y);

uint offset; // Lazy global variable; this holds the offset of our PIO program

// Framebuffer
ALIGNED u8 rgb_buf[OUTPUT_VIDEO_WIDTH*OUTPUT_VIDEO_HEIGHT];

static inline uint bits_packed_per_word(uint pin_count) {
    // If the number of pins to be sampled divides the shift register size, we
    // can use the full SR and FIFO width, and push when the input shift count
    // exactly reaches 32. If not, we have to push earlier, so we use the FIFO
    // a little less efficiently.
    const uint SHIFT_REG_WIDTH = 32;
    return SHIFT_REG_WIDTH - (SHIFT_REG_WIDTH % pin_count);
}

void logic_analyser_init(PIO pio, uint sm, uint pin_base, uint pin_count, float div) {
    // Load a program to capture n pins. This is just a single `in pins, n`
    // instruction with a wrap.
    uint16_t capture_prog_instr[3];
    capture_prog_instr[0] = pio_encode_wait_gpio(false, pin_base);
    capture_prog_instr[1] = pio_encode_wait_gpio(true, pin_base);
    capture_prog_instr[2] = pio_encode_in(pio_pins, pin_count);
    struct pio_program capture_prog = {
            .instructions = capture_prog_instr,
            .length = 3,
            .origin = -1
    };
    offset = pio_add_program(pio, &capture_prog);

    // Configure state machine to loop over this `in` instruction forever,
    // with autopush enabled.
    pio_sm_config c = pio_get_default_sm_config();
    sm_config_set_in_pins(&c, pin_base);
    sm_config_set_wrap(&c, offset+2, offset+2); // do not repeat pio_encode_wait_gpio instructions
    sm_config_set_clkdiv(&c, div);
    // Note that we may push at a < 32 bit threshold if pin_count does not
    // divide 32. We are using shift-to-right, so the sample data ends up
    // left-justified in the FIFO in this case, with some zeroes at the LSBs.
    sm_config_set_in_shift(&c, true, true, bits_packed_per_word(pin_count)); // push when we have reached 32 - (32 % pin_count) bits (27 if pin_count==9, 30 if pin_count==10)
    sm_config_set_fifo_join(&c, PIO_FIFO_JOIN_RX); // TX not used, so we can use everything for RX
    pio_sm_init(pio, sm, offset, &c);
}

void logic_analyser_arm(PIO pio, uint sm, uint dma_chan, uint32_t *capture_buf, size_t capture_size_words,
                        uint trigger_pin, bool trigger_level) {
    pio_sm_set_enabled(pio, sm, false);
    // Need to clear _input shift counter_, as well as FIFO, because there may be
    // partial ISR contents left over from a previous run. sm_restart does this.
    pio_sm_clear_fifos(pio, sm);
    pio_sm_restart(pio, sm);

    dma_channel_config c = dma_channel_get_default_config(dma_chan);
    channel_config_set_read_increment(&c, false);
    channel_config_set_write_increment(&c, true);
    channel_config_set_dreq(&c, pio_get_dreq(pio, sm, false)); // pio_get_dreq returns something the DMA controller can use to know when to transfer something

    dma_channel_configure(dma_chan, &c,
        capture_buf,        // Destination pointer
        &pio->rxf[sm],      // Source pointer
        capture_size_words, // Number of transfers
        true                // Start immediately
    );

    pio_sm_exec(pio, sm, pio_encode_jmp(offset)); // just restarting doesn't jump back to the initial_pc AFAICT
    pio_sm_set_enabled(pio, sm, true);
}

void blink(uint32_t ms=500)
{
    gpio_put(PICO_DEFAULT_LED_PIN, true);
    sleep_ms(ms);
    gpio_put(PICO_DEFAULT_LED_PIN, false);
    sleep_ms(ms);
}

// uint8_t msb_table_inverted[8] = { 3, 3, 3, 3, 2, 2, 1, 0 };
uint8_t msb_table_inverted[8] = { 0, 1, 2, 2, 3, 3, 3, 3 };

void post_process(uint8_t *rgb_bufy, uint32_t *capture_buf, uint buf_size_words)
{
    uint16_t i, j, k;
    uint32_t temp;
    for (i = 8, j = 0; i < buf_size_words; i++, j += 3) { // start copying at pixel 24 (8*3) (i.e., ignore left blank and color burst, exactly 24 pixels).
        temp = capture_buf[i] >> (2+1); // 2: we're only shifting in 30 bits out of 32, 1: ignore csync
        rgb_bufy[j] = msb_table_inverted[temp & 0b111]; // red
        rgb_bufy[j] |= (msb_table_inverted[(temp & 0b111000) >> 3] << 3); // green
        rgb_bufy[j] |= (msb_table_inverted[(temp & 0b111000000) >> 6] << 6); // blue
        temp >>= 10; // go to next sample, ignoring csync
        rgb_bufy[j+1] = msb_table_inverted[temp & 0b111]; // red
        rgb_bufy[j+1] |= (msb_table_inverted[(temp & 0b111000) >> 3] << 3); // green
        rgb_bufy[j+1] |= (msb_table_inverted[(temp & 0b111000000) >> 6] << 6); // blue
        temp >>= 10; // go to next sample, ignoring csync
        rgb_bufy[j+2] = msb_table_inverted[temp & 0b111]; // red
        rgb_bufy[j+2] |= (msb_table_inverted[(temp & 0b111000) >> 3] << 3); // green
        rgb_bufy[j+2] |= (msb_table_inverted[(temp & 0b111000000) >> 6] << 6); // blue
    }
}

int main()
{
    uint16_t i, y;

    gpio_init(PICO_DEFAULT_LED_PIN);
    gpio_init(CAPTURE_PIN_BASE);
    gpio_set_dir(PICO_DEFAULT_LED_PIN, GPIO_OUT);
    gpio_set_dir(CAPTURE_PIN_BASE, GPIO_IN);

    blink();

    // initialize videomode
    Video(DEV_VGA, RES_CGA, FORM_8BIT, rgb_buf);

    blink();

    // We're going to capture into a u32 buffer, for best DMA efficiency. Need
    // to be careful of rounding in case the number of pins being sampled
    // isn't a power of 2.
    uint total_sample_bits = CAPTURE_N_SAMPLES * CAPTURE_PIN_COUNT;
    total_sample_bits += bits_packed_per_word(CAPTURE_PIN_COUNT) - 1;
    uint buf_size_words = total_sample_bits / bits_packed_per_word(CAPTURE_PIN_COUNT);
    uint32_t *capture_buf0 = (uint32_t*)malloc(buf_size_words * sizeof(uint32_t));
    hard_assert(capture_buf0);
    uint32_t *capture_buf1 = (uint32_t*)malloc(buf_size_words * sizeof(uint32_t));
    hard_assert(capture_buf1);

    blink();

    // Grant high bus priority to the DMA, so it can shove the processors out
    // of the way. This should only be needed if you are pushing things up to
    // >16bits/clk here, i.e. if you need to saturate the bus completely.
    // (Didn't try this)
//     bus_ctrl_hw->priority = BUSCTRL_BUS_PRIORITY_DMA_W_BITS | BUSCTRL_BUS_PRIORITY_DMA_R_BITS;

    PIO pio = pio1;
    uint sm = 0;
    uint dma_chan = 8; // 0-7 may be used by VGA library (depending on resolution)

    logic_analyser_init(pio, sm, CAPTURE_PIN_BASE, CAPTURE_PIN_COUNT, (float)Vmode.freq/PIXEL_CLOCK);

    blink();

    // 1) DMA in 1st scan line, wait for completion
    // 2) DMA in 2nd scan line, post-process previous scan line, wait for completion
    // 3) DMA in 3rd scan line, post-process previous scan line, wait for completion
    // ...
    // n) Post-process last scanline

    // I'm reasonably sure we have enough processing power to post-process scanlines in real time, we should have about 80 us.
    // At 126 MHz each clock cycle is about 8 ns, so we have 10000 instructions to process about 320 bytes, or 31.25 instructions per byte.
    while (true) {
        // "Software-render" vsync detection... I.e., wait for low on csync, usleep for hsync_pulse_time+something, check if we're still low
        // If we are, that's a vsync pulse!
        // This works well enough AFAICT
        while (true) {
            while(gpio_get(CAPTURE_PIN_BASE)); // wait for negative pulse on csync
            sleep_us(10); // hsync negative pulse is about 4.92 us according to oscilloscope, so let's wait a little longer than 4.92 us
            if (!gpio_get(CAPTURE_PIN_BASE)) // we're still low! this must be a vsync pulse
                break;
        }
        for (y = 0; y <= INPUT_VIDEO_HEIGHT_OFFSET_Y; y ++) { // capture and throw away first 40 scanlines, capture without throwing away 41st scanline
            logic_analyser_arm(pio, sm, dma_chan, capture_buf0, buf_size_words, CAPTURE_PIN_BASE, true);
            dma_channel_wait_for_finish_blocking(dma_chan);
        }
        for (y = 1; y < (INPUT_VIDEO_HEIGHT-INPUT_VIDEO_HEIGHT_OFFSET_Y)-1; y += 2) {
            logic_analyser_arm(pio, sm, dma_chan, capture_buf1, buf_size_words, CAPTURE_PIN_BASE, true);
            post_process(rgb_buf + (y-1)*OUTPUT_VIDEO_WIDTH, capture_buf0, buf_size_words);
            dma_channel_wait_for_finish_blocking(dma_chan);

            logic_analyser_arm(pio, sm, dma_chan, capture_buf0, buf_size_words, CAPTURE_PIN_BASE, true);
            post_process(rgb_buf + y*OUTPUT_VIDEO_WIDTH, capture_buf1, buf_size_words);
            dma_channel_wait_for_finish_blocking(dma_chan);
        }
        post_process(rgb_buf + (y-2)*OUTPUT_VIDEO_WIDTH, capture_buf0, buf_size_words);
    }
}

Replace vga_hello/src/main.cpp with the above file and recompile (make program.uf2). Maybe this post will help if you are on something that isn’t Windows and can’t get this to compile.

Explanation

The PIO program is generated in the logic_analyser_init function. Here it is again:

    capture_prog_instr[0] = pio_encode_wait_gpio(false, pin_base);
    capture_prog_instr[1] = pio_encode_wait_gpio(true, pin_base);
    capture_prog_instr[2] = pio_encode_in(pio_pins, pin_count);
    struct pio_program capture_prog = {
            .instructions = capture_prog_instr,
            .length = 3,
            .origin = -1
    };

First we wait for a “false” (low) signal. Then a “true” (high) signal. Then we read. Okay… but that doesn’t make any sense, does it?
No, it doesn’t, but maybe with the following bit of code:

    sm_config_set_wrap(&c, offset+2, offset+2); // do not repeat pio_encode_wait_gpio instructions

sm_config_set_wrap is used to tell the PIOs how to loop the PIO program. And in this case, we loop after we have executed the instruction at offset+2, and we jump to offset+2. The instruction at offset+2 is the “in” instruction. That is, we just keep executing the “in” instruction, except the first time. The first time, we wait for low on CSYNC, then wait for high on CSYNC, and then (as this state means that the CSYNC pulse is over) we keep reading as fast as we can (at the programmed PIO speed).

Results

Let’s take a look at the results. Remember, we’re converting to monochrome, and only looking at the blue channel. Remember that our super lazy “analog frontend” is super lazy, and the potentiometer has to be fine-tuned to get to a sweet spot that allows everything on the screen to be displayed.

The composite signal. Black looking very… let’s call it RGB, is one of the things that motivated me to check if I can get monitor output to work. The other thing is the jailbars. The jailbars are more prominent when showing a dark color.

This is before tuning the capture parameters to ignore HBLANK and VBLANK, so we’re slightly cut off at the bottom and on the right. We’re only feeding into the pin for green here. Everything where blue is at zero intensity is green (top VBLANK and left HBLANK and black characters), and everything where blue is at full intensity, is black. I was running off a slightly wrong pixel clock here. You can see that the boundary between HBLANK green and black is fuzzy. On some scanlines we start a pixel (or fraction of a pixel) early, on others a pixel (or fraction thereof) late. On the next frame, this moves a little. It’s like there’s a somewhat low-frequency wave overlaid over the sync signal. Maybe just our old friend, interference? My CSYNC wire _is_ rather janky. Let’s just say, nothing’s shielded, I’m using a paper clip to get the signal out of the RGB jack, I’m connecting mutiple jumper wires to get to the right length, and the ground wire is crazy long.

And this is what it looks like with the HBLANK and VBLANK front porches ignored, and the pixel clock corrected. (Wait, I still see the horizontal front porch? Must be some qaulity code there.) TBH I have a feeling that the wobbliness increased with the correct pixel clock ;D Um, I’ll get to the bottom of this at some point. (It also looks like we’re ignoring too many scanlines at the top, but that’s okay for now.) Note: the noise you see on the screen isn’t part of the signal, that’s just my camera. This also shows that “m”s don’t look too good. (To my defense, they don’t look too clever on composite either.)
Actually the HBLANK front porches are gone now after I fixed a typo in the code. But it’s still quite wobbly. Maybe not quite as wobbly as in the above video?

Top breadboard converts CSYNC signal to TTL (and there’s some other stuff on there that isn’t used right now). Bottom double breadboard would be large enough for everything, but this sort of grew organically. The “USB POWER” thing is this: https://www.amazon.co.jp/dp/B07XM5FWDW. Super useful tiny power supply that runs off USB! I think I got it cheaper than the current price though. Not shown on this pic, but I run this setup off a small USB power bank, and use the power supply to convert the 5V from USB to 3.3V.
What’s the pen and the eraser doing here? TBH my eyes just tend to filter out junk after a while. So stuff just sort of becomes part of the scenery.

Minor update

Fixing a typo in the code (already fixed above as it made no sense to leave it there) fixed up the signal quite a bit. I also added buttons to fine-tune the pixel clock. This stabilizes the signal significantly. However, hopefully mostly due to the fact that our analog frontend is a bit lame, we get a somewhat fuzzy image, where some pixels change between black and white. I am somewhat tempted to build out the analog frontend properly but before that I think I’ll try my hand at digital RGB, more on that in a later post.

Anyway, here’s the updated code for analog input, with support for two buttons to fine-tune the pixel clock:

#include "include.h"

#include <stdio.h>
#include <stdlib.h>

#include "pico/stdlib.h"
#include "hardware/pio.h"
#include "hardware/dma.h"
#include "hardware/structs/bus_ctrl.h"

const uint CAPTURE_PIN_BASE = 9;
const uint CAPTURE_PIN_COUNT = 10; // CSYNC, 3*R, 3*G, 3*B
const uint INCREASE_BUTTON_PIN = 20;
const uint DECREASE_BUTTON_PIN = 21;

const PIO pio = pio1;
const uint sm = 0;
const uint dma_chan = 8; // 0-7 may be used by VGA library (depending on resolution)

const float PIXEL_CLOCK = 5369.3175f; // datasheet (TMS9918A_TMS9928A_TMS9929A_Video_Display_Processors_Data_Manual_Nov82.pdf) page 3-8 / section 3.6.1 says 5.3693175 MHz (10.73865/2)
// the pixel clock has a tolerance of +-0.005 (i.e. +- 5 KHz), let's add a facility to adjust our hard-coded pixel clock:
const float PIXEL_CLOCK_ADJUSTER = 0.1; // KHz

// from same page on datasheet
// HORIZONTAL                   PATTERN OR MULTICOLOR   TEXT
// HORIZONTAL ACTIVE DISPLAY    256                     240
// RIGHT BORDER                 15                      25
// RIGHT BLANKING               8                       8
// HORIZONTAL SYNC              26                      26
// LEFT BLANKING                2                       2
// COLOR BURST                  14                      14
// LEFT BLANKING                8                       8
// LEFT BORDER                  13                      19
// TOTAL                        342                     342

const uint INPUT_VIDEO_WIDTH = 308; // left blanking + color burst + left blanking + left border + active + right border

// VERTICAL                     LINE
// VERTICAL ACTIVE DISPLAY      192
// BOTTOM BORDER                24
// BOTTOM BLANKING              3
// VERTICAL SYNC                3
// TOP BLANKING                 13
// TOP BORDER                   27
// TOTAL                        262

const uint INPUT_VIDEO_HEIGHT = 240; // top blanking + top border + active + 1/3 of bottom border
const uint INPUT_VIDEO_HEIGHT_OFFSET_Y = 40; // ignore top 40 (top blanking + top border) scanlines
// we're capturing everything there is to see on the horizontal axis, but throwing out most of the border on the vertical axis
// NOTE: other machines probably have different blanking/border periods

const uint CAPTURE_N_SAMPLES = INPUT_VIDEO_WIDTH;

const uint OUTPUT_VIDEO_WIDTH = 320;
const uint OUTPUT_VIDEO_HEIGHT = 200;

static_assert(OUTPUT_VIDEO_WIDTH >= INPUT_VIDEO_WIDTH);
static_assert(OUTPUT_VIDEO_HEIGHT >= INPUT_VIDEO_HEIGHT-INPUT_VIDEO_HEIGHT_OFFSET_Y);

uint offset; // Lazy global variable; this holds the offset of our PIO program

// Draw box
ALIGNED u8 rgb_buf[OUTPUT_VIDEO_WIDTH*OUTPUT_VIDEO_HEIGHT];

static inline uint bits_packed_per_word(uint pin_count) {
    // If the number of pins to be sampled divides the shift register size, we
    // can use the full SR and FIFO width, and push when the input shift count
    // exactly reaches 32. If not, we have to push earlier, so we use the FIFO
    // a little less efficiently.
    const uint SHIFT_REG_WIDTH = 32;
    return SHIFT_REG_WIDTH - (SHIFT_REG_WIDTH % pin_count);
}

void logic_analyser_init(PIO pio, uint sm, uint pin_base, uint pin_count, float div) {
    // Load a program to capture n pins. This is just a single `in pins, n`
    // instruction with a wrap.
    static bool already_initialized_once = false;
    uint16_t capture_prog_instr[3];
    capture_prog_instr[0] = pio_encode_wait_gpio(false, pin_base);
    capture_prog_instr[1] = pio_encode_wait_gpio(true, pin_base);
    capture_prog_instr[2] = pio_encode_in(pio_pins, pin_count);
    struct pio_program capture_prog = {
            .instructions = capture_prog_instr,
            .length = 3,
            .origin = -1
    };
    if (already_initialized_once) {
        pio_remove_program(pio, &capture_prog, offset);
    }
    offset = pio_add_program(pio, &capture_prog);
    already_initialized_once = true;

    // Configure state machine to loop over this `in` instruction forever,
    // with autopush enabled.
    pio_sm_config c = pio_get_default_sm_config();
    sm_config_set_in_pins(&c, pin_base);
    sm_config_set_wrap(&c, offset+2, offset+2); // do not repeat pio_encode_wait_gpio instructions
    sm_config_set_clkdiv(&c, div);
    // Note that we may push at a < 32 bit threshold if pin_count does not
    // divide 32. We are using shift-to-right, so the sample data ends up
    // left-justified in the FIFO in this case, with some zeroes at the LSBs.
    sm_config_set_in_shift(&c, true, true, bits_packed_per_word(pin_count)); // push when we have reached 32 - (32 % pin_count) bits (27 if pin_count==9, 30 if pin_count==10)
    sm_config_set_fifo_join(&c, PIO_FIFO_JOIN_RX); // TX not used, so we can use everything for RX
    pio_sm_init(pio, sm, offset, &c);
}

void logic_analyser_arm(PIO pio, uint sm, uint dma_chan, uint32_t *capture_buf, size_t capture_size_words,
                        uint trigger_pin, bool trigger_level) {
    // TODO: disable interrupts
    pio_sm_set_enabled(pio, sm, false);
    // Need to clear _input shift counter_, as well as FIFO, because there may be
    // partial ISR contents left over from a previous run. sm_restart does this.
    pio_sm_clear_fifos(pio, sm);
    pio_sm_restart(pio, sm);

    dma_channel_config c = dma_channel_get_default_config(dma_chan);
    channel_config_set_read_increment(&c, false);
    channel_config_set_write_increment(&c, true);
    channel_config_set_dreq(&c, pio_get_dreq(pio, sm, false)); // pio_get_dreq returns something the DMA controller can use to know when to transfer something

    dma_channel_configure(dma_chan, &c,
        capture_buf,        // Destination pointer
        &pio->rxf[sm],      // Source pointer
        capture_size_words, // Number of transfers
        true                // Start immediately
    );

    pio_sm_exec(pio, sm, pio_encode_jmp(offset)); // just restarting doesn't jump back to the initial_pc AFAICT
    pio_sm_set_enabled(pio, sm, true);
}

void blink(uint32_t ms=500)
{
    gpio_put(PICO_DEFAULT_LED_PIN, true);
    sleep_ms(ms);
    gpio_put(PICO_DEFAULT_LED_PIN, false);
    sleep_ms(ms);
}

// uint8_t msb_table_inverted[8] = { 3, 3, 3, 3, 2, 2, 1, 0 };
uint8_t msb_table_inverted[8] = { 0, 1, 2, 2, 3, 3, 3, 3 };

void post_process(uint8_t *rgb_bufy, uint32_t *capture_buf, uint buf_size_words)
{
    uint16_t i, j, k;
    uint32_t temp;
    for (i = 8, j = 0; i < buf_size_words; i++, j += 3) { // start copying at pixel 24 (8*3) (i.e., ignore left blank and color burst, exactly 24 pixels).
        temp = capture_buf[i] >> (2+1); // 2: we're only shifting in 30 bits out of 32, 1: ignore csync
        rgb_bufy[j] = msb_table_inverted[temp & 0b111]; // red
        rgb_bufy[j] |= (msb_table_inverted[(temp & 0b111000) >> 3] << 3); // green
        rgb_bufy[j] |= (msb_table_inverted[(temp & 0b111000000) >> 6] << 6); // blue
        temp >>= 10; // go to next sample, ignoring csync
        rgb_bufy[j+1] = msb_table_inverted[temp & 0b111]; // red
        rgb_bufy[j+1] |= (msb_table_inverted[(temp & 0b111000) >> 3] << 3); // green
        rgb_bufy[j+1] |= (msb_table_inverted[(temp & 0b111000000) >> 6] << 6); // blue
        temp >>= 10; // go to next sample, ignoring csync
        rgb_bufy[j+2] = msb_table_inverted[temp & 0b111]; // red
        rgb_bufy[j+2] |= (msb_table_inverted[(temp & 0b111000) >> 3] << 3); // green
        rgb_bufy[j+2] |= (msb_table_inverted[(temp & 0b111000000) >> 6] << 6); // blue
    }
}

void adjust_pixel_clock(float adjustment) {
    static absolute_time_t last_adjustment = { 0 };
    static float pixel_clock_adjustment = 0.0f;
    absolute_time_t toc = get_absolute_time();
    if (absolute_time_diff_us(last_adjustment, toc) > 250000) {
        pio_sm_set_enabled(pio, sm, false);
        pixel_clock_adjustment += adjustment;
        last_adjustment = toc;
        logic_analyser_init(pio, sm, CAPTURE_PIN_BASE, CAPTURE_PIN_COUNT, ((float)Vmode.freq)/(PIXEL_CLOCK+pixel_clock_adjustment));
    }
}

int main()
{
    uint16_t i, y;

    gpio_init(PICO_DEFAULT_LED_PIN);
    gpio_init(CAPTURE_PIN_BASE);
    gpio_set_dir(PICO_DEFAULT_LED_PIN, GPIO_OUT);
    gpio_set_dir(CAPTURE_PIN_BASE, GPIO_IN);

    blink();

    // initialize videomode
    Video(DEV_VGA, RES_CGA, FORM_8BIT, rgb_buf);

    blink();

    // We're going to capture into a u32 buffer, for best DMA efficiency. Need
    // to be careful of rounding in case the number of pins being sampled
    // isn't a power of 2.
    uint total_sample_bits = CAPTURE_N_SAMPLES * CAPTURE_PIN_COUNT;
    total_sample_bits += bits_packed_per_word(CAPTURE_PIN_COUNT) - 1;
    uint buf_size_words = total_sample_bits / bits_packed_per_word(CAPTURE_PIN_COUNT);
    uint32_t *capture_buf0 = (uint32_t*)malloc(buf_size_words * sizeof(uint32_t));
    hard_assert(capture_buf0);
    uint32_t *capture_buf1 = (uint32_t*)malloc(buf_size_words * sizeof(uint32_t));
    hard_assert(capture_buf1);

    blink();

    // Grant high bus priority to the DMA, so it can shove the processors out
    // of the way. This should only be needed if you are pushing things up to
    // >16bits/clk here, i.e. if you need to saturate the bus completely.
    // (Didn't try this)
//     bus_ctrl_hw->priority = BUSCTRL_BUS_PRIORITY_DMA_W_BITS | BUSCTRL_BUS_PRIORITY_DMA_R_BITS;

    logic_analyser_init(pio, sm, CAPTURE_PIN_BASE, CAPTURE_PIN_COUNT, (float)Vmode.freq/PIXEL_CLOCK);

    blink();

    // 1) DMA in 1st scan line, wait for completion
    // 2) DMA in 2nd scan line, post-process previous scan line, wait for completion
    // 3) DMA in 3rd scan line, post-process previous scan line, wait for completion
    // ...
    // n) Post-process last scanline

    // I'm reasonably sure we have enough processing power to post-process scanlines in real time, we should have about 80 us.
    // At 126 MHz each clock cycle is about 8 ns, so we have 10000 instructions to process about 320 bytes, or 31.25 instructions per byte.
    while (true) {
        // "Software-render" vsync detection... I.e., wait for low on csync, usleep for hsync_pulse_time+something, check if we're still low
        // If we are, that's a vsync pulse!
        // This works well enough AFAICT
        while (true) {
            while(gpio_get(CAPTURE_PIN_BASE)); // wait for negative pulse on csync
            sleep_us(10); // hsync negative pulse is about 4.92 us according to oscilloscope, so let's wait a little longer than 4.92 us
            if (!gpio_get(CAPTURE_PIN_BASE)) // we're still low! this must be a vsync pulse
                break;
        }
        for (y = 0; y <= INPUT_VIDEO_HEIGHT_OFFSET_Y; y ++) { // capture and throw away first 40 scanlines, capture without throwing away 41st scanline
            logic_analyser_arm(pio, sm, dma_chan, capture_buf0, buf_size_words, CAPTURE_PIN_BASE, true);
            dma_channel_wait_for_finish_blocking(dma_chan);
        }
        for (y = 1; y < (INPUT_VIDEO_HEIGHT-INPUT_VIDEO_HEIGHT_OFFSET_Y)-1; y += 2) {
            logic_analyser_arm(pio, sm, dma_chan, capture_buf1, buf_size_words, CAPTURE_PIN_BASE, true);
            post_process(rgb_buf + (y-1)*OUTPUT_VIDEO_WIDTH, capture_buf0, buf_size_words);
            dma_channel_wait_for_finish_blocking(dma_chan);

            logic_analyser_arm(pio, sm, dma_chan, capture_buf0, buf_size_words, CAPTURE_PIN_BASE, true);
            post_process(rgb_buf + y*OUTPUT_VIDEO_WIDTH, capture_buf1, buf_size_words);
            dma_channel_wait_for_finish_blocking(dma_chan);
        }
        post_process(rgb_buf + (y-2)*OUTPUT_VIDEO_WIDTH, capture_buf0, buf_size_words);

        if (gpio_get(INCREASE_BUTTON_PIN)) {
            adjust_pixel_clock(PIXEL_CLOCK_ADJUSTER); // + some Hz
        } else if (gpio_get(DECREASE_BUTTON_PIN)) {
            adjust_pixel_clock(-PIXEL_CLOCK_ADJUSTER); // - some Hz
        }
    }
}

I think my camera doesn’t like this type of scene. It doesn’t look perfect in real life, but not this bad. :p I swear! The noise isn’t there, for example. You can see that the AOC logo is kind of wobbly too, and obviously it isn’t in real life. (I’ll try with a different camera, or without image stabilization next time.)

Compiling PicoVGA on Linux

git clone https://github.com/Panda381/PicoVGA
cd PicoVGA/vga_matrixrain

program.uf2 already exists in this directory, you can copy that to your Pico and it will work.
Let’s try to recompile it though:

.../PicoVGA/vga_matrixrain$ make

Nothing happens but program.uf2 gets deleted. Great.

Let’s try this instead:

.../PicoVGA/vga_matrixrain$ make program.uf2

Output:

ASM ../_boot2/boot2_w25q080_bin.S
Assembler messages:
Fatal error: can't create build/boot2_w25q080_bin.o: No such file or directory
make: *** [../Makefile.inc:469: build/boot2_w25q080_bin.o] Error 1

Let’s create the ‘build’ subdirectory and try again.

.../PicoVGA/vga_matrixrain$ mkdir build

ASM ../_boot2/boot2_w25q080_bin.S
ASM ../_sdk/bit_ops_aeabi.S
ASM ../_sdk/crt0.S
ASM ../_sdk/divider.S
ASM ../_sdk/divider0.S
ASM ../_sdk/double_aeabi.S
ASM ../_sdk/double_v1_rom_shim.S
ASM ../_sdk/float_aeabi.S
ASM ../_sdk/float_v1_rom_shim.S
ASM ../_sdk/irq_handler_chain.S
ASM ../_sdk/mem_ops_aeabi.S
ASM ../_sdk/pico_int64_ops_aeabi.S
ASM ../_picovga/render/vga_atext.S
ASM ../_picovga/render/vga_attrib8.S
ASM ../_picovga/render/vga_color.S
ASM ../_picovga/render/vga_ctext.S
ASM ../_picovga/render/vga_dtext.S
ASM ../_picovga/render/vga_fastsprite.S
ASM ../_picovga/render/vga_ftext.S
ASM ../_picovga/render/vga_graph1.S
ASM ../_picovga/render/vga_graph2.S
ASM ../_picovga/render/vga_graph4.S
ASM ../_picovga/render/vga_graph8.S
ASM ../_picovga/render/vga_graph8mat.S
ASM ../_picovga/render/vga_graph8persp.S
ASM ../_picovga/render/vga_gtext.S
ASM ../_picovga/render/vga_level.S
ASM ../_picovga/render/vga_levelgrad.S
ASM ../_picovga/render/vga_mtext.S
ASM ../_picovga/render/vga_oscil.S
ASM ../_picovga/render/vga_oscline.S
ASM ../_picovga/render/vga_persp.S
ASM ../_picovga/render/vga_persp2.S
ASM ../_picovga/render/vga_plane2.S
ASM ../_picovga/render/vga_progress.S
ASM ../_picovga/render/vga_sprite.S
ASM ../_picovga/render/vga_tile.S
ASM ../_picovga/render/vga_tile2.S
ASM ../_picovga/render/vga_tilepersp.S
ASM ../_picovga/render/vga_tilepersp15.S
ASM ../_picovga/render/vga_tilepersp2.S
ASM ../_picovga/render/vga_tilepersp3.S
ASM ../_picovga/render/vga_tilepersp4.S
ASM ../_picovga/vga_blitkey.S
ASM ../_picovga/vga_render.S
CC ../_sdk/adc.c
CC ../_sdk/binary_info.c
CC ../_sdk/bootrom.c
CC ../_sdk/claim.c
CC ../_sdk/clocks.c
CC ../_sdk/critical_section.c
CC ../_sdk/datetime.c
CC ../_sdk/dma.c
CC ../_sdk/double_init_rom.c
CC ../_sdk/double_math.c
CC ../_sdk/flash.c
CC ../_sdk/float_init_rom.c
CC ../_sdk/float_math.c
CC ../_sdk/gpio.c
CC ../_sdk/i2c.c
CC ../_sdk/interp.c
CC ../_sdk/irq.c
CC ../_sdk/lock_core.c
CC ../_sdk/mem_ops.c
CC ../_sdk/multicore.c
CC ../_sdk/mutex.c
CC ../_sdk/pheap.c
CC ../_sdk/pico_malloc.c
CC ../_sdk/pio.c
CC ../_sdk/platform.c
CC ../_sdk/pll.c
CC ../_sdk/printf.c
CC ../_sdk/queue.c
CC ../_sdk/rp2040_usb_device_enumeration.c
CC ../_sdk/rtc.c
CC ../_sdk/runtime.c
CC ../_sdk/sem.c
CC ../_sdk/spi.c
CC ../_sdk/stdio.c
CC ../_sdk/stdio_semihosting.c
CC ../_sdk/stdio_uart.c
CC ../_sdk/stdio_usb.c
CC ../_sdk/stdio_usb_descriptors.c
CC ../_sdk/stdlib.c
CC ../_sdk/sync.c
CC ../_sdk/time.c
CC ../_sdk/timeout_helper.c
CC ../_sdk/timer.c
CC ../_sdk/uart.c
CC ../_sdk/unique_id.c
CC ../_sdk/vreg.c
CC ../_sdk/watchdog.c
CC ../_sdk/xosc.c
CC ../_tinyusb/bsp/raspberry_pi_pico/board_raspberry_pi_pico.c
CC ../_tinyusb/class/audio/audio_device.c
CC ../_tinyusb/class/bth/bth_device.c
CC ../_tinyusb/class/cdc/cdc_device.c
CC ../_tinyusb/class/cdc/cdc_host.c
CC ../_tinyusb/class/cdc/cdc_rndis_host.c
CC ../_tinyusb/class/dfu/dfu_rt_device.c
CC ../_tinyusb/class/hid/hid_device.c
CC ../_tinyusb/class/hid/hid_host.c
CC ../_tinyusb/class/midi/midi_device.c
CC ../_tinyusb/class/msc/msc_device.c
CC ../_tinyusb/class/msc/msc_host.c
CC ../_tinyusb/class/net/net_device.c
CC ../_tinyusb/class/usbtmc/usbtmc_device.c
CC ../_tinyusb/class/vendor/vendor_device.c
CC ../_tinyusb/class/vendor/vendor_host.c
CC ../_tinyusb/common/tusb_fifo.c
CC ../_tinyusb/device/usbd.c
CC ../_tinyusb/device/usbd_control.c
CC ../_tinyusb/host/ehci/ehci.c
CC ../_tinyusb/host/ohci/ohci.c
CC ../_tinyusb/host/hub.c
CC ../_tinyusb/host/usbh.c
CC ../_tinyusb/host/usbh_control.c
CC ../_tinyusb/portable/raspberrypi/rp2040/dcd_rp2040.c
CC ../_tinyusb/portable/raspberrypi/rp2040/hcd_rp2040.c
CC ../_tinyusb/portable/raspberrypi/rp2040/rp2040_usb.c
CC ../_tinyusb/tusb.c
C++ src/main.cpp
In file included from src/main.cpp:8:0:
src/include.h:13:10: fatal error: ../vga.pio.h: No such file or directory
 #include "../vga.pio.h"  // VGA PIO compilation
          ^~~~~~~~~~~~~~
compilation terminated.
make: *** [../Makefile.inc:458: build/main.o] Error 1

Where do we get vga.pio.h? It’s nowhere in the directory.
Let’s take a look at vga_matrixrain/c.bat:

...
..\_exe\pioasm.exe -o c-sdk ..\_picovga\vga.pio vga.pio.h
...

Hey! pioasm. The repository contains an exe file for this. Is it part of the Pico SDK?
Try:

.../PicoVGA/vga_matrixrain$ locate pioasm

I found it in one of my SDK-related directories. Cool. Let’s try it.

.../picoprobe/build/pioasm/pioasm -o c-sdk ../_picovga/vga.pio vga.pio.h
../_picovga/vga.pio:1.1: invalid character: 
    1 | 
      | ^
../_picovga/vga.pio:13.1: invalid character: 
   13 | 
      | ^
../_picovga/vga.pio:14.13: invalid character: 
   14 | .program vga
      |             ^
../_picovga/vga.pio:17.1: invalid character: 
   17 | 
      | ^
../_picovga/vga.pio:19.1: invalid character: 
   19 | 
      | ^
../_picovga/vga.pio:20.13: invalid character: 
   20 | public sync:
      |             ^
../_picovga/vga.pio:22.11: invalid character: 
   22 | sync_loop:
      |           ^

too many errors; aborting.

One look at hexdump -C ../_picovga/vga.pio | less, we see CRLF line endings. Let’s get rid of them and try again:

tr -d '\015' < ../_picovga/vga.pio > ../_picovga/vga.pio.unix
mv ../_picovga/vga.pio > ../_picovga/vga.pio.windows
mv ../_picovga/vga.pio.unix ../_picovga/vga.pio

.../picoprobe/build/pioasm/pioasm -o c-sdk ../_picovga/vga.pio vga.pio.h # no errors!

Success! Let’s try compiling a little more.

.../PicoVGA/vga_matrixrain$ make program.uf2
C++ src/main.cpp
C++ ../_picovga/vga.cpp
C++ ../_picovga/vga_layer.cpp
C++ ../_picovga/vga_pal.cpp
C++ ../_picovga/vga_screen.cpp
C++ ../_picovga/vga_util.cpp
C++ ../_picovga/vga_vmode.cpp
C++ ../_picovga/util/canvas.cpp
C++ ../_picovga/util/mat2d.cpp
C++ ../_picovga/util/overclock.cpp
C++ ../_picovga/util/print.cpp
C++ ../_picovga/util/rand.cpp
C++ ../_picovga/util/pwmsnd.cpp
C++ ../_picovga/font/font_bold_8x8.cpp
C++ ../_picovga/font/font_bold_8x14.cpp
C++ ../_picovga/font/font_bold_8x16.cpp
C++ ../_picovga/font/font_boldB_8x14.cpp
C++ ../_picovga/font/font_boldB_8x16.cpp
C++ ../_picovga/font/font_game_8x8.cpp
C++ ../_picovga/font/font_ibm_8x8.cpp
C++ ../_picovga/font/font_ibm_8x14.cpp
C++ ../_picovga/font/font_ibm_8x16.cpp
C++ ../_picovga/font/font_ibmtiny_8x8.cpp
C++ ../_picovga/font/font_italic_8x8.cpp
C++ ../_picovga/font/font_thin_8x8.cpp
C++ ../_sdk/new_delete.cpp
ld build/program.elf
uf2 program.uf2
make: execvp: ../_exe/elf2uf2.exe: Permission denied
make: *** [../Makefile.inc:435: program.uf2] Error 127

elf2uf2, I’ve seen that before. Let’s check if that’s in the SDK.

.../PicoVGA/vga_matrixrain$ locate elf2uf2

Found it.

.../picoprobe/build/elf2uf2/elf2uf2

Let’s see what exactly needs to be executed here:

make --trace program.uf2
../Makefile.inc:434: update target 'program.uf2' due to: build/program.elf
echo     uf2             program.uf2
uf2 program.uf2
../_exe/elf2uf2.exe build/program.elf program.uf2
make: execvp: ../_exe/elf2uf2.exe: Permission denied
make: *** [../Makefile.inc:435: program.uf2] Error 127

All right, so we just have to do:

.../PicoVGA/vga_matrixrain$ .../picoprobe/build/elf2uf2/elf2uf2 build/program.elf program.uf2

Let’s put that on the Pico and see what happens.

Success!

Edit Makefile.inc like this to get the build system find elf2uf in the correct location:

diff --git a/Makefile.inc b/Makefile.inc
index 3130ab5..03cf706 100644
--- a/Makefile.inc
+++ b/Makefile.inc
@@ -349,7 +349,7 @@ NM = ${COMP}nm
 SZ = ${COMP}size
 
 # uf2
-UF = ../_exe/elf2uf2.exe
+UF = /path/to/picoprobe/build/elf2uf2/elf2uf2
 
 ##############################################################################
 # File list

Sony HB-10 MSX repair using a Raspberry Pi Pico-powered breadboard logic analyzer

This Sony HB-10 was kept inside its original shrink-wrapped box for around 35 years. (Which is longer than I’ve lived.) It’s incredibly clean. There are two clips on the joystick side that make it hard to open. You can imagine how nervous I was about fumbling about with a practically pristine red box, not knowing where the clips are. Fortunately, I found a YouTube video that showed where they are. They are right underneath where the green tape is in many of the images below. I was using the green tape to block the clips so I could half-close the computer while I wasn’t working on it, without again having to spend ages fighting those silly clips when getting back to the computer.

First of all, some board pics in case anyone needs them:

None of the chips are socketed. So let’s spy through the oscilloscope and see some worrying things:

Anyway, these signals aren’t completely out of spec. (And indeed, at least the fuzzy IO turned out to be normal. I.e., this problem didn’t go away after fixing the computer. I didn’t check for the steppy address lines again after getting the computer to work, but I’d hazard a guess that they’re still there. (Update 2022/09/27: I also fixed an HB-11 a while after that, and it had the same fuzzy IO signal on the pin. Most likely nothing to worry about!)

Anyway, what we’ll do today is… build a 26-channel logic analyzer using a Raspberry Pi Pico! And a large handful of resistors to reduce the 5V signals to 3.3V. We connect the logic analyzer to the ROM chip. The ROM chip’s address and data lines are directly connected to the CPU’s address and data lines, and the RAM data lines. Except there’s no A15, but that’s probably all right for now. Using this, we may be able to figure out what’s going on. (Foreshadowing)

Resistors used for the resistor dividers: 10k, 20k on one side (gets us 3.333V) and 4.7k, 6.8k on the other (gets us 2.957V). Ran out of the higher valued ones. Higher values are better, as you’ll draw less current from the CPU (i.e., be less of a burden). I think you can go pretty high, but I’m sticking with what I’ve used before here.

What is that awesome connector? It’s this: https://akizukidenshi.com/catalog/g/gC-04756/.

And this is the program we’ll run on the Pico:

#include <stdio.h>
#include "pico/stdlib.h"

#define ALL_REGULAR_GPIO_PINS 0b00011100011111111111111111111111
#define LOGIC_BUFFER_LEN 62660

#define TRIGGER_PIN 28

uint32_t logic_buffer[LOGIC_BUFFER_LEN] = { 0 };

int main() {
    int i = 0;
    stdio_init_all();
    gpio_init_mask(ALL_REGULAR_GPIO_PINS);
    gpio_init(PICO_DEFAULT_LED_PIN);
    gpio_set_dir_masked(ALL_REGULAR_GPIO_PINS, GPIO_IN);
    gpio_set_dir(PICO_DEFAULT_LED_PIN, GPIO_OUT);

    // wait until /dev/ttyACM0 device is ready on host
    for (i = 0; i < 10; i++) {
        gpio_put(PICO_DEFAULT_LED_PIN, i%2==0);
        sleep_ms(500);
    }
    gpio_put(PICO_DEFAULT_LED_PIN, 1);
    printf("Logic analyzer ready, waiting for trigger\n");
    while (gpio_get(TRIGGER_PIN) == 0);
    for (i = 0; i < LOGIC_BUFFER_LEN; i++) {
        logic_buffer[i] = gpio_get_all() & ALL_REGULAR_GPIO_PINS;
    }
    printf("Done recording");
    for (i = 0; i < LOGIC_BUFFER_LEN; i++) {
        printf("%04x %04x\n", i, logic_buffer[i]);
    }
    printf("Done printing\n");
}

The TRIGGER_PIN is connected to the RESET line of the Z80. The while(gpio_get(TRIGGER_PIN) == 0) waits for this line to go high. (It’s active-low.) Then we just have a for loop that fills the logic_buffer array with the contents of the GPIO pins that we are using. (I.e., all 26 “normal” GPIO pins.)

Then there’s another for loop, which prints out the contents of the buffer.

Let’s avoid spaghetti wiring, and instead prioritize connection convenience. Which unfortunately means that the GPIO pin numbers and address/data line numbers will be pretty much shuffled now. Which means that we need something to decode the output of the logic analyzer to tell us the contents of the address bus and the data bus. And this is a quick and dirty Perl program to do that. Input is on standard input. The bold lines mean that A14 is on GPIO5, A13 on GPIO4, A12 on GPIO11, etc. D7 is on GPIO22, D6 is on GPIO21, etc.

In the unlikely event that you are reading this, and in the unlikelier event that you are thinking of building this thing, I strongly recommend you connect everything in a way that is convenient for you, and fix the values in these bold lines.

#!/usr/bin/perl

# logic_analyzer_raw2address.pl

@address_positions_from_a14 = (5, 4, 11, 1, 27, 2, 3, 12, 13, 14, 15, 10, 6, 7, 8);
@data_positions_from_d7 = (22, 21, 20, 19, 18, 17, 16, 9);

while (<>) {
    $address = 0;
    $data = 0;
    $num = hex($_);
    $current_address_pin = 14;
    foreach (@address_positions_from_a14) {
        if ($num & (1 << ($_))) {
            $address |= (1 << $current_address_pin);
        }
        $current_address_pin--;
    }
    $current_data_pin = 7;
    foreach (@data_positions_from_d7) {
        if ($num & (1 << ($_))) {
            $data |= (1 << $current_data_pin);
        }
        $current_data_pin--;
    }
    printf ("%04x %02x\n", $address, $data);
}

And the other way round, address&data to logic analyzer value, which will come in handy later. Note that you need to set the input values in the source code, $address_input and $data_input. (They are set to 0x7c86 and 0x21 respectively in the below example.)

#!/usr/bin/perl

# address2logic_analyzer_raw.pl

@address_positions_from_a14 = (5, 4, 11, 1, 27, 2, 3, 12, 13, 14, 15, 10, 6, 7, 8);
@data_positions_from_d7 = (22, 21, 20, 19, 18, 17, 16, 9);

$address_input = 0x7c86;
$data_input = 0x21;

$current_position = 14;
$current_position2 = 0;
foreach (@address_positions_from_a14) {
    if ($address_input & (1<<$current_position)) {
        $mask |= (1 << $address_positions_from_a14[$current_position2]);
    }
    $current_position--;
    $current_position2++;
}
$current_position = 7;
$current_position2 = 0;
foreach (@data_positions_from_d7) {
    if ($data_input & (1<<$current_position)) {
        $mask |= (1 << $data_positions_from_d7[$current_position2]);
    }
    $current_position--;
    $current_position2++;
}
printf("%04x\n", $mask);

So, to run this, you’d do the following:

Connect everything up, don’t forget to connect GND between the Pico and the device under test (the HB-10 in this case)
minicom -C logic_analyzer_output -D /dev/ttyACM0
Wait until you get the “ready” message
Turn on the device under test
Wait until the Pico is done printing (takes maybe two seconds)
Turn off the device under test
Exit minicom
awk ‘{print $2}’ logic_analyzer_output | perl logic_analyzer_raw2address.pl > logic_analyzer_output_decoded
Run openmsx and openmsx-debugger and display logic_analyzer_output_decoded side-by-side

Looking at 7c8c both in the trace and in the emulator

So what do you do if you have reached the end of your trace and would like to see what happens next? In my case I saw that we spent a lot of time in a tight loop initializing memory. That takes up the entire logic buffer. So I’d like to continue reading at a certain address (which can be determined easily by following along in openmsx-debugger), right after the memory is initialized.

That’s where the other Perl script comes in. You think of an address bus value and data bus value where you’d like to continue tracing, and convert that into a value that would be seen by the logic analyzer. Then you modify the logic analyzer program like this, for example:

#include <stdio.h>
#include "pico/stdlib.h"

#define ALL_REGULAR_GPIO_PINS 0b00011100011111111111111111111111
#define LOGIC_BUFFER_LEN 62660

#define ADDRESS_PINS 0b00001000000000001111110111111110
#define DATA_PINS    0b00000000011111110000001000000000
#define ADDRESS_DATA_PINS (ADDRESS_PINS | DATA_PINS)

#define AFTER_MEMCPY 0x27d3cc
#define AFTER_MEMCPY2 0x8101af2

#define TRIGGER_PIN 28

uint32_t logic_buffer[LOGIC_BUFFER_LEN] = { 0 };

int main() {
    int i = 0;
    stdio_init_all();
    gpio_init_mask(ALL_REGULAR_GPIO_PINS);
    gpio_init(PICO_DEFAULT_LED_PIN);
    gpio_set_dir_masked(ALL_REGULAR_GPIO_PINS, GPIO_IN);
    gpio_set_dir(PICO_DEFAULT_LED_PIN, GPIO_OUT);

    // wait until /dev/ttyACM0 device is ready on host
    for (i = 0; i < 10; i++) {
        gpio_put(PICO_DEFAULT_LED_PIN, i%2==0);
        sleep_ms(500);
    }
    gpio_put(PICO_DEFAULT_LED_PIN, 1);
    printf("Logic analyzer ready, waiting for trigger\n");
    while (gpio_get(TRIGGER_PIN) == 0);
    while ((gpio_get_all() & ADDRESS_DATA_PINS) != AFTER_MEMCPY2);
    for (i = 0; i < LOGIC_BUFFER_LEN; i++) {
        logic_buffer[i] = gpio_get_all() & ALL_REGULAR_GPIO_PINS;
    }
    printf("Done recording");
    for (i = 0; i < LOGIC_BUFFER_LEN; i++) {
        printf("%04x %04x\n", i, logic_buffer[i]);
    }
    printf("Done printing\n");
}

And then it’ll start tracing as soon as it sees that the relevant GPIO pins are equal to AFTER_MEMCPY2, which is just a name I came up with.

Logic analyzer output and analysis

Here are the raw traces I produced. You’d need to use the awk command above to convert them.

logic_analyzer_output2 Download

logic_analyzer_output2_retake Download

logic_analyzer_output3 Download

logic_analyzer_output4 Download

And here are the three relevant post-processed files:

0000-03b6_retake Download

03b7-7c85 Download

7c89-Download

You can see that we have a very detailed trace of the Z80’s execution. We can easily see what address is being set by the CPU, and what’s being read at or written to that address. You may also notice that we have a couple gaps in the data, which is why we needed a retake for logic_analyzer_output2. You may also be able to tell that things apparently start at 2 here.

We can easily see that the Z80 is executing code correctly
We can easily see that the ROM is giving us the correct code (the code is identical to what we see in the emulator)
We see that the code is trying to switch banks (out #a8) and identify RAM, by overwriting an address and reading back the same address
In the emulator, it finds the RAM on first try, because it’s connected on “slot 0”, same as the ROM. (Which is possible because this machine only has 16 KB of RAM and 32 KB of ROM, which is less than the 64 KB addressable by the Z80.)
In our logic trace, it gets back a slightly different value from what it had written, which indicates that the RAM is most likely bad!
- Let’s take a look at 000-03b6_retake.txt around line 6430+, address 0365 to 036e.
- In Z80 asm, we have here:
  ld hl,#fe00
  ld a,(hl)
  cpl
  ld (hl),a
  cp (hl)
  cpl
  ld (hl),a
  jr nz,#0379
- This means that we load from #fe00, invert, write this inversion back to #fe00, compare contents of #fe00 with our inverted value, (restore original value,) and if the comparison didn’t quite work out, we jump to #0379.
- This code is run a number of times, and it shouldn’t jump to #0379 the first time. (It doesn’t in the emulator. It ought to work the first time because ROM and RAM are both in bank 0. But if the RAM is defective, the comparison will fail!)
- We can also see our loads and stores to memory in the logic analyzer:
  - Line 6510: 7e00 09 (Read 09 from fe00. A15 is missing so fe00 turns into 7e00.)
  - Line 6575: 7e00 f6 (Wrote f6 to fe00. That’s the inversion of 09.)
  - Line 6620: 7e00 f7 (Read f7 from fe00. Last time I checked f6 and f7 weren’t equal.)
In our third logic trace, we reach a point where a function is called (7c8c, lines 80-165 in trace) and that function attempts to return. When a function returns, it checks the stack to figure out the correct address to return to (lines 600- in trace). And again, that address doesn’t quite match the address we had written when we executed the CALL instruction! In the trace we can clearly see that it’s reading 7d8f, when it should have been 7c8f. 7d is 01111101, 7c is 01111100. So it would appear that we have a stuck bit in D0.
So we now jump to a rather random location, which means we start to execute nonsense code.
At some point, the nonsense code jumps to f380 (which is uninitialized RAM). (Note that the trace doesn’t have A15, so it looks like 7380.) And while we’re now completely off the rails and firmly in nonsense territory, the see that everything here appears to have D0 set!

So before we take out the RAM chip, let’s see if we can rule out any other possible malfunctions that could lead to this behavior.

The RAM’s address pins are not connected directly to the CPU’s address bus, instead they are most likely connected via the nearby 74LS157 chips (I didn’t check TBH). Could these be the cause of this failure?
- They would have to magically produce addresses that always have D0 set; that’s very unlikely.
- When writing the return address to the stack, we should get back the correct value because the same address should be generated when reading and writing. But we’re not reading the correct value back, so it’s very unlikely that the 74LS157 is translating our addresses incorrectly.
Some other chip is interfering with the RAM’s output
- Unlikely, as it’s just a single bit that is erroneous
- Nothing is interfering with the ROM’s output or IO outputs
- We could probably see this on the oscilloscope

Checking RAM chips with just a multimeter?

Before taking out the chip (which is quite a chore without a desoldering iron), I put my multimeter in diode mode and checked if there’s anything unusual about the chip. And there was! Putting my positive lead on ground and the negative lead on each of the data pins, I noticed that I got a different voltage drop on the pin for the suspected defective bit, 515 mV. On all others I got 462 mV. (Disclaimer: note that this is an in-circuit test and the RAM chip isn’t the only path from ground to the data pin. I also forgot to check again after removing the chip, so take this with a heap of salt.)

So let’s see what happens when we replace that RAM chip and boot!

Silly metal bar got squashed at first and the silly author of this blog post didn’t notice at first. Now it looks like this. Also guess who didn’t have any replacement chips that day.

Replacement chips arrived. Yay, it works!

Sokoban! I cleared this level. Will take a look at the next level soon. Yeah, maybe tomorrow.

Did you guys know that the word “Sokoban” is Japanese? I only recently realized that when I saw the game for sale somewhere. 倉庫番!

Also, the Raspberry Pi Pico is fast. 3.3V is inconvenient, but not the end of the world.

“Almost Pong” for the Commodore 64

Original version: https://www.lessmilk.com/almost-pong/

I recently came across a funky version of Pong called “Almost Pong”. I really liked it and thought it would be fun to re-create it for the Commodore 64. I went about this entirely in BASIC, without writing any assembler routines — by using a fancy compiler that I recently came across: https://egonolsen71.github.io/basicv2/. (Before you get too excited, the physics in my game are quite different — I don’t think doing the same physics calculations would work in realtime on the C64, but using lookup tables it might be possible to produce very similar physics.)

So is this a usable game? I don’t know, when I play the above-mentioned Almost Pong it kind of feels more fun, maybe it’s the physics, maybe it’s because my game is even more bare-bones, or maybe I’m just biased against my own game? Anyway, without further ado:

Works in VICE and on real hardware
Press fire button on joystick #2 to jump
The source code has lines that are longer than 80 lines and will therefore produce syntax errors if you type it into a C64 verbatim
It’ll also be way too slow to play if you don’t compile it using Basicv2

Here’s the source code:

0 in=0
1 poke 53280,1:poke 53281,0:poke 646,1
2 v=53248:lv=1:co=1:c=0:fr=0:x=160:y=141:rem center
3 if in=0 then print chr$(147):print " press fire to play":wait 56320,16,16:wait 56320,16:in=1
4 for t=12288 to 12415 step 1:poke t,0:next
5 for t=12289 to 12350 step 3:poke t,255:next:for t=12373 to 12397 step 3:poke t,255:next
6 poke v+21,7:poke v+39,1:poke v+40,1:pokev+41,co
7 poke v,72:poke v+16,1:poke v+2,16:poke v+4,x:poke 53271,3
8 poke v+1,y:poke v+3,y:poke v+5,y
9 poke 2040,192:poke 2041,192:poke 2042,193
10 s=54272:w=17:poke s+5,97: poke s+6,200: poke s+4,w:poke s+24,15:rem sound
15 ox=4:oy=1:od=1:jm=4
20 sx=ox:sy=oy:dy=od
21 gosub 1500
30 for i=1 to 2 step 0:rem infinite loop
35 sc=peek(53278):rem sprite collision lag workaround
40 x=x+sx
50 y=y+0
60 if x>255 then poke v+16,5:poke v+4,x-256:goto 70
65 if x<=255 then poke v+16,1:poke v+4,x
70 y=y+sy:sy=sy+dy
73 if y>234 then y=234:poke v+5,y:gosub 1000:rem todo could add poke v+5,y to the subroutine
74 if y<43 then y=43:poke v+5,y:gosub 1010
75 poke v+5,y
80 rem joystick
90 f1=peek(56320) and 16
92 if f1<>0 then fr=0
93 poke 162,0:wait 162,2:rem wait 1/80s
99 rem only fire if fire wasn't pressed during last poll
100 if fr=0 and f1=0 then y=y-sy*jm:sy=oy:dy=od:fr=1:gosub 1700

110 rem bounce
120 if sx < 0 or x<=328 then goto 150
125 rem x=328:pokev+4,x-256
126 poke v+5,y
127 sc=peek(53278)
130 if sc <> 0 then goto 140:rem bounce if we collided
131 if x+sx < 344 then goto 180
135 gosub 1020:rem game over if no collision
140 gosub 1600:rem bounce
145 poke v+1,int(rnd(0)*158)+50

150 if sx > 0 or x>=32 then goto 180
155 rem x=32:pokev+4,x
156 poke v+5,y
159 sc=peek(53278)
160 if sc <> 0 then goto 170:rem bounce if we collided
161 if x+sx > 16 then goto 180
165 gosub 1030:rem game over if no collision
170 gosub 1600:rem bounce
175 poke v+3,int(rnd(0)*158)+50
180 gosub 1800:next:rem sound off
190 end

1000 print " you lose (don't fall into the abyss)":gosub 1100
1010 print " you lose (don't jump into the sky)":gosub 1100
1020 print " you lose (you need to bounce off the":print " right paddle)":gosub 1100
1030 print " you lose (you need to bounce off the":print " left paddle)":gosub 1100
1100 poke s,120:poke s+1,6:poke 162,0:wait 162,16
1110 wm=0
1120 gosub 1800
1130 print " press fire to play"
1140 wait 56320,16,16:wait 56320,16
1150 goto 1
1160 return:rem not reached

1500 rem print game status
1510 print chr$(147):print " level:", lv, "points:", c
1520 return

1600 rem bounce
1610 sx=-sx
1620 c=c+1
1630 if c/5 < lv then goto 1650
1640 lv=lv+1:co=co+1:poke v+41,co:ox=ox+1:if co=15 then co=1
1650 gosub 1500:rem print sc:poke 162,0:wait 162,8
1660 return

1700 rem fire button sound on
1710 poke s,133:poke s+1,11
1730 wm=3:rem num of sound off gosubs to wait before muting
1740 return

1800 rem sound off
1810 if wm>0 then wm=wm-1:goto 1830
1820 poke s,0:poke s+1,0
1830 return

Here’s the compiled .prg:

almost_pong_v6 Download

Here’s me playing the game.

Stupid game

How to load software from the internets on a real C64 using the Datasette drive

Now here’s a (zipped) .wav file that you can put on a tape (or much easier, stream through a cassette adapter into your datasette drive) and load on a real computer. I will describe how to generate .wav files from .prg/.tap/.d64 files in the next section.

almost_pong_v6.wav Download

Notes on using cassette adapters to load C64 software

When using a cassette adapter, you should make sure your playback device’s volume is neither too loud nor too quiet. On my computer, 50% volume appears to be the sweet spot. You may have to experiment a bit to find your own. When using a cassette adapter, you will have to pause streaming manually (after the C64 prints out “FOUND NAMEOFPROGRAM”, as whatever device you’re using to stream is completely unaware of whether the datasette drive’s motor is on or off and just continues streaming regardless. (I didn’t add a long enough pause between the header(?) and the actual data. You could maybe add a multiple-second pause yourself to automate this part.) It’s probably generally best to stream using Audacity (or similar software) and watch the datasette drive to see whether the motor is on or off. When it’s off, you press stop and when the motor starts running again, reposition the cursor into the nearest section of silence and press play again. I was able to load various kinds of software using this approach. Commercial software may have multiple bits of silence where the motor stops running for a bit.

In this Audacity screenshot, there’s a short bit of silence at the 11 second mark. This is where the C64 will stop the motor and print “FOUND NAMEOFPROGRAM”. (Note that in the above .wav file, the name of the program is blank, so the C64 will only print “FOUND”.) When the motor stops, press the stop button. When the motor starts running again, reposition the cursor somewhere in the middle of the silent section (11.139s in this example) and press play.

Here’s an Audacity screenshot for Great Giana Sisters (the tape version):

The cursor is positioned in the first short bit of silence, right after the “FOUND NAMEOFPROGRAM” but there are multiple points where the motor stops spinning and you will have to manually press stop/play each time.

Great Giana Sisters! I think you had to press space to continue loading beyond this screen, which means you’d again need to manually press stop/play in Audacity

Now that we have talked about how to load .wav files into the C64, here’s how to actually generate .wav files:

Generating .wav files from .tap/.prg/.d64 files

I spent a few hours evaluating multiple solutions, and have found that wav-prg (https://wav-prg.sourceforge.io/) did the best job.

Here’s how to build this program on Linux:

git clone https://git.code.sf.net/p/wav-prg/libtap
cd wav-prg-libtap
make libtapdecoder.so
make libtapencoder.so
cd ..
git clone https://git.code.sf.net/p/wav-prg/libaudiotap
cd wav-prg-libaudiotap
make clean # probably not needed but happened to be in my notes, possibly for a reason
make -j4 DEBUG=y LINUX64BIT=y libaudiotap.so
cd ..
git clone https://git.code.sf.net/p/wav-prg/code
cd wav-prg-code
make clean # probably not needed but happened to be in my notes, possibly for a reason
make -j4 cmdline/wav2prg DEBUG=y AUDIOTAP_HDR=../wav-prg-libaudiotap/ AUDIOTAP_LIB=../wav-prg-libaudiotap/
make -j4 cmdline/prg2wav DEBUG=y AUDIOTAP_HDR=../wav-prg-libaudiotap/ AUDIOTAP_LIB=../wav-prg-libaudiotap/

Here are some example invocations for an NTSC C64. You can also generate .wav files for use with the PET (full explanation here) or the VIC-20.

# (pwd is .../wav-prg-code)
LD_LIBRARY_PATH=../wav-prg-libaudiotap/:../wav-prg-libtap/ cmdline/prg2wav -m c64ntsc -t filename.tap /path/to/prg # convert prg to tap image for use in vice etc.
LD_LIBRARY_PATH=../wav-prg-libaudiotap/:../wav-prg-libtap/ cmdline/prg2wav -m c64ntsc -w filename.wav /path/to/prg # convert prg directly to wav file

Here’s how to extract a .prg file from a .d64 floppy image, which you can then convert to a .wav file. All you need is VICE, which comes with a tool to extract data from .d64 files:

./c1541 -attach ~/retro/c64/software/BLOCKNB1.D64 -list
0 "ass presents:   "      
66    "block'n'bubble"    prg 
6     "block'n'bub. dox"  prg 
1     "doc-maker.code"    prg 
1     "scores"            prg 
590 blocks free.

“ass”? :p Anyway, judging by the number of blocks, “block’n’bubble” seems like the program we’re most interested in. (You can try extracting the others and running them in an emulator.) To extract and to convert, run the following commands:

# (pwd is .../vice-3.5/src)
./c1541 -attach ~/retro/c64/software/BLOCKNB1.D64 -read "block'n'bubble" bnb.prg
reading file `block'n'bubble' from unit 8

# (change directory to .../wav-prg-code)
LD_LIBRARY_PATH=../wav-prg-libaudiotap/:../wav-prg-libtap/ cmdline/prg2wav -m c64ntsc -w bnb.wav bnb.prg # adjust input and output file paths

I don’t remember how to play, but I remember the intro screen music. Felt good to hear it played from a real Commodore 64 for the first time in 15-20 years! Here’s a clip. Sorry for the copyright violation:

Block’n’Bubble title music (looped)

The game’s playable as-is, but at least on my hardware the first time I started a game I got some weird colors. (The second time round the colors were normal.) Could be a PAL-vs.-NTSC issue, or could have something to do with the above conversion (for example, a program could assume that the C64’s tape buffer is available for temporary usage, but when loading from tape… it isn’t).

Note that large commercial software is often divided into multiple files, e.g., consisting of a loader and a main program, and perhaps a high score file. In that case, you will not be able to easily convert the software. (You would have to re-write the parts where the loader starts loading from the floppy, etc.)

For example, here’s what’s in the Sokoban .d64:

./c1541 -attach ~/retro/c64/SOKOBAN0.D64 -list
0 "ass presents:   "      
1     "soko-ban"          prg 
40    "soko-ban docs"     prg 
56    "sokoban"           prg 
41    "title"             prg 
2     "maze01"            prg 
2     "maze55"            prg 
9     "start"             prg 
3     "flip"              prg 
1     "test"              prg 
509 blocks free.

It would probably require some hacking to convert this to tape.

However, not all commercial software is this complex. For example, using this method, I was able to convert the main .prg file in “JupiterLander_1982_Commodore.d64” to .tap and load this in VICE (didn’t try on real hardware but should work) and play as normal.

~/src/vice-3.5/src/c1541 -attach JupiterLander_1982_Commodore.d64 -list
0 "1982 commodore  " -136-
30    "j.lander +2  /n0"  prg 
5     "j.lander docs/n0"  prg 
629 blocks free.

~/src/vice-3.5/src/c1541 -attach JupiterLander_1982_Commodore.d64 -read "j.lander +2  /n0.prg" jupiterlander.prg

LD_LIBRARY_PATH=~/src/wav-prg-libaudiotap/:~/src/wav-prg-libtap/ ~/src/wav-prg-code/cmdline/prg2wav -t jupiterlander.tap jupiterlander.prg

4164 tester (including refresh testing) on the Raspberry Pi Pico and mini-review (updated)

I recently lost access to my Arduino for a couple days and decided to finally start playing around with my Raspberry Pi Pico. In case you don’t know, the Raspberry Pi Pico is even cheaper than most Arduinos. I think I paid 550 Japanese yen at a brick-and-mortar Marutsu for mine, and packs a lot of pins and quite some performance. (Amazon is way more expensive.)

So before we get to the 4164 tester, here’s my mini-review of the Raspberry Pi Pico from the perspective of someone who has never used a Raspberry Pi before and is used to his Arduino Nano:

The header pins weren’t attached so I had to solder them myself. I guess it’s possible to buy Picos with header pins soldered on, and I guess it’s also possible to buy Arduino Nanos without header pins. (Confirmed on amazon.co.jp). Soldering wasn’t too hard and I didn’t break anything.

Getting the (C/C++) development environment set up (on Debian Linux) wasn’t too hard for me personally, but I do this kind of thing a lot and would expect this to be much more difficult for someone who doesn’t have as much experience, especially if they are on a different distro. What I did was look at this script: https://raw.githubusercontent.com/raspberrypi/pico-setup/master/pico_setup.sh and instead of just running it, did the same things the script would have done — i.e., installed the prerequisite packages, cloned the relevant repositories, compiled and installed. If you can read bash scripts reasonably well you should be able to do this — otherwise you can just try and run the script verbatim and hope for the best. Note that the script is made to be run on a (non-Pico) Raspberry Pi.

The Arduino has an LED that is always on as long as there is power. The Pico has an onboard LED but it’s completely software-controlled. That’s great for non-beginners and perhaps not so great for beginners. (Is this thing even working?) To flash a program, you hold down a button and then connect USB. The Pico will identify as a sort of flash drive and you can copy over a .uf2 file. Once your OS is done copying (i.e. almost instantly) the Pico will immediately reboot and immediately start running your code. To the OS it will look like the flash drive suddenly disappeared. (This took me a little while to figure out.)

On the Arduino, you can be connected via serial at all times, though you’ll get a grey screen while flashing. On the Pico, USB serial feels much more “software-defined”, and you can’t be connected while re-flashing AFAICT. If you write a program that outputs something to serial right after starting, you probably won’t ever be able to see that output because your computer will take a while to notice there is something on USB. (For this reason, I added a 25 second pause at the beginning in my RAM tester program.)

So some parts of the development process are slightly more annoying than on the Arduino Nano, but the features may make up for it I think — more pins (but fewer analog I/O pins), much more performance, and step-by-step debugging via GDB (haven’t tried this yet).

One more very important thing to be aware of is that the Pico’s GPIO pins’ high logic level is 3.3V, not 5V. This doesn’t matter (IME) when driving the 4164’s input pins (address/RAS/CAS/WRITE/data in), but it’s likely to matter for the single GPIO pin connected to the 4164’s Q output pin. So you will need a resistor divider to bring the 4164’s 5V output down to 3.3V.

Update 2023/08/07: the code is also available on GitHub: https://github.com/qiqitori/pico_4164_tester

The code consists of three files, a very standard CMakeLists.txt (pretty much a straight amalgam of https://github.com/raspberrypi/pico-examples/blob/master/CMakeLists.txt and https://github.com/raspberrypi/pico-examples/blob/master/hello_world/usb/CMakeLists.txt), pico_sdk_import.cmake (straight from https://github.com/raspberrypi/pico-examples/blob/master/pico_sdk_import.cmake), and 4164_test.c. Some of the code in 4164_test.c has been adapted from https://ezcontents.org/4164-dynamic-ram-arduino.

CMakeLists.txt

cmake_minimum_required(VERSION 3.12)

# Pull in SDK (must be before project)
include(pico_sdk_import.cmake)

project(pico_examples C CXX ASM)
set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)

if (PICO_SDK_VERSION_STRING VERSION_LESS "1.3.0")
    message(FATAL_ERROR "Raspberry Pi Pico SDK version 1.3.0 (or later) required. Your version is ${PICO_SDK_VERSION_STRING}")
endif()

set(PICO_EXAMPLES_PATH ${PROJECT_SOURCE_DIR})

# Initialize the SDK
pico_sdk_init()

include(example_auto_set_url.cmake)

add_compile_options(-Wall -Wextra
        -Wno-format          # int != int32_t as far as the compiler is concerned because gcc has int32_t as long int
        -Wno-unused-function # we have some for the docs that aren't called
        -Wno-maybe-uninitialized
        )

add_executable(4164_test
        4164_test.c
        )

# pull in common dependencies
target_link_libraries(4164_test pico_stdlib)

# enable usb output, disable uart output
pico_enable_stdio_usb(4164_test 1)
pico_enable_stdio_uart(4164_test 0)

# create map/bin/hex file etc.
pico_add_extra_outputs(4164_test)

# add url via pico_set_program_url
example_auto_set_url(4164_test)

4164_test.c

#include <stdio.h>
#include "pico/stdlib.h"

#define D           0
#define WRITE       1
#define RAS         2
#define A0          3
#define A2          4
#define A1          5

#define A7          16
#define A5          17
#define A4          18
#define A3          19
#define A6          20
#define Q           21
#define CAS         22

#define HIGH        1
#define LOW         0

#ifndef PICO_DEFAULT_LED_PIN
#error blink requires a board with a regular LED
#endif

#define STATUS_LED PICO_DEFAULT_LED_PIN
// #define STATUS_LED 15

#define BUS_SIZE 8
#define MAX_ERRORS 20
#define REFRESH_EVERY_N_WRITES 4
#define REFRESH_EVERY_N_READS 256 // 256 is the maximum for this implementation
#define N_REFRESHES 256

const unsigned int a_bus[BUS_SIZE] = {
  A0, A1, A2, A3, A4, A5, A6, A7
};

// #define debug_print printf
#define debug_print(...) ;

void nop(void) {
    __asm__ __volatile__( "nop\t\n");
}

void set_bus(unsigned int a) {
    int i;
    /* Write lowest bit into lowest address line first, then next-lowest bit, etc. */
    for (i = 0; i < BUS_SIZE; i++) {
        gpio_put(a_bus[i], a & 1);
        a >>= 1;
    }
}

void refresh() {
    int row;
    for (row = 0; row < 256; row++) {
        set_bus(row);
        gpio_put(RAS, LOW);
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        nop();
        gpio_put(RAS, HIGH);
    }
}

void write_address(int row, int col, bool val) {
    // Pull RAS and CAS HIGH
    gpio_put(RAS, HIGH);
    gpio_put(CAS, HIGH);

    // Set row address
    set_bus(row);

    // Pull RAS LOW
    gpio_put(RAS, LOW);
    nop(); // need to wait 15 ns before setting column address
    nop(); // need to wait 15 ns before setting column address
    nop(); // need to wait 15 ns before setting column address
    nop(); // need to wait 15 ns before setting column address

    // Set column address
    set_bus(col);

    // Pull CAS LOW
    gpio_put(CAS, LOW);

    // Set Data in pin to HIGH (write a one)
    gpio_put(D, val);

    // Pull Write LOW (Enables write)
    gpio_put(WRITE, LOW);

    sleep_us(1);
    gpio_put(WRITE, HIGH);
    gpio_put(CAS, HIGH);
    gpio_put(RAS, HIGH);
}

void read_address(int row, int col, bool *val) {
    // Pull RAS and CAS and WRITE HIGH
    gpio_put(RAS, HIGH);
    gpio_put(CAS, HIGH);
    gpio_put(WRITE, HIGH);

    // Set row address
    set_bus(row);

    // Pull RAS LOW
    gpio_put(RAS, LOW);
    nop(); // need to wait 15 ns before setting column address
    nop(); // need to wait 15 ns before setting column address
    nop(); // need to wait 15 ns before setting column address
    nop(); // need to wait 15 ns before setting column address

    // Set column address
    set_bus(col);

    // Pull CAS LOW
    gpio_put(CAS, LOW);

    sleep_us(1);

    *val = gpio_get(Q);

    gpio_put(WRITE, HIGH);
    gpio_put(CAS, HIGH);
    gpio_put(RAS, HIGH);
}

void setup() {
    bool dummy = false;
    int i;

    gpio_init(D);
    gpio_init(WRITE);
    gpio_init(RAS);
    gpio_init(A0);
    gpio_init(A2);
    gpio_init(A1);
    
    gpio_init(A7);
    gpio_init(A5);
    gpio_init(A4);
    gpio_init(A3);
    gpio_init(A6);
    gpio_init(Q);
    gpio_init(CAS);

    gpio_init(STATUS_LED);

    gpio_set_dir(D, GPIO_OUT);
    gpio_set_dir(WRITE, GPIO_OUT);
    gpio_set_dir(RAS, GPIO_OUT);
    gpio_set_dir(A0, GPIO_OUT);
    gpio_set_dir(A2, GPIO_OUT);
    gpio_set_dir(A1, GPIO_OUT);

    gpio_set_dir(A7, GPIO_OUT);
    gpio_set_dir(A5, GPIO_OUT);
    gpio_set_dir(A4, GPIO_OUT);
    gpio_set_dir(A3, GPIO_OUT);
    gpio_set_dir(A6, GPIO_OUT);
    gpio_set_dir(Q, GPIO_IN);
    gpio_set_dir(CAS, GPIO_OUT);

    gpio_set_dir(STATUS_LED, GPIO_OUT);

    gpio_put(RAS, HIGH);
    gpio_put(CAS, HIGH);
    gpio_put(WRITE, HIGH);
    gpio_put(D, LOW);
    gpio_put(A0, LOW);
    gpio_put(A1, LOW);
    gpio_put(A2, LOW);
    gpio_put(A3, LOW);
    gpio_put(A4, LOW);
    gpio_put(A5, LOW);
    gpio_put(A6, LOW);
    gpio_put(A7, LOW);
    
    sleep_us(1110);
    for (i = 0; i < 8; i++) {
        read_address(0, 0, &dummy);
        sleep_us(1);
    }
}

void test(bool start_val) {
    int row, col, refresh_count;
    int errors = 0;
    bool read_val = false;
    bool val = start_val;

    for (row = 0; row < 256; row++) {
        debug_print("Testing row: %d\n", row);
        for (col = 0; col < 256; col++) {
            gpio_put(STATUS_LED, val);
            write_address(row, col, val);
            read_address(row, col, &read_val);
            if (val != read_val) {
                printf("ERROR: row %d col %d read %d but expected %d\n", row, col, read_val, val);
                if (++errors > MAX_ERRORS) {
                    while (true) {
                        gpio_put(STATUS_LED, HIGH);
                        sleep_ms(50);
                        gpio_put(STATUS_LED, LOW);
                        sleep_ms(50);
                    }
                }
            }
            val = !val;
            gpio_put(STATUS_LED, val);
            if (col % REFRESH_EVERY_N_WRITES == 0) {
                refresh();
            }
        }
    }
    for (refresh_count = 0; refresh_count < N_REFRESHES; refresh_count++) {
        printf("Refresh test %d\n", refresh_count);
        val = start_val; // start from start_val (which determines whether we're testing 10101010... or 01010101...)
        for (row = 0; row < 256; row++) {
            for (col = 0; col < 256; col++) {
                gpio_put(STATUS_LED, val);
                read_address(row, col, &read_val);
                if (val != read_val) {
                    printf("ERROR: row %d col %d read %d but expected %d\n", row, col, read_val, val);
                    if (++errors > MAX_ERRORS) {
                        while (true) {
                            gpio_put(STATUS_LED, HIGH);
                            sleep_ms(50);
                            gpio_put(STATUS_LED, LOW);
                            sleep_ms(50);
                        }
                    }
                }
                val = !val;
                gpio_put(STATUS_LED, val);
                if (col % REFRESH_EVERY_N_READS == 0) {
                    refresh();
                }
            }
        }
    }
}

int main() {
    stdio_init_all();
    setup();

    sleep_ms(10000); // wait until /dev/ttyACM0 device is ready on host
    printf("Starting 10101010... test\n");
    test(true);
    printf("Starting 01010101... test\n");
    test(false);
    printf("Test done. All OK!\n");
    while (true) {
        gpio_put(STATUS_LED, HIGH);
        sleep_ms(1000);
        gpio_put(STATUS_LED, LOW);
        sleep_ms(1000);
    }
}

The Raspberry Pi Pico has a lot more pins than the Arduino Nano, which means that we can connect everything in a very straight and orderly manner. The pins on the left of the IC are connected to pins on the left of the Pico, and the pins on the right of the IC are connected to pins on the right of the Pico, and there are no cross connections. That’s a huge plus, and (as in our case) if you’ve got just one pin that requires level shifting, I’m not sure I’d choose the Arduino Nano if I already knew and owned both. Adjust the #defines at the top if you want to connect things differently.

Also, the Pico could theoretically allow much faster testing than the Nano, but my test is pretty slow, especially when using USB serial output. I also added a couple of delays to be super sure we don’t go out of spec. As the library only has millisecond and microsecond delays, I added a NOP-based delay (which probably delays things much more than required).

Note that this code won’t do anything the first ~10 seconds, this delay gives the host computer time to identify the USB serial device (should appear as /dev/ttyACM0). If you then execute, e.g., ‘minicom -D /dev/ttyACM0’ you should be able to see output as the test progresses. It’s also possible to see whether the test passed by looking at the onboard LED — fast blinking means error, slow blinking means success. If you prefer running tests using the LED, I suggest you do the following:

Comment out the sleep_ms(10000); call
Remove ‘#define println printf’, add ‘#define println(…) ;’ instead
Set MAX_ERRORS to 0 (or remove the if (++error > MAX_ERRORS) logic entirely)

(The MAX_ERRORS logic causes the program to keep running after the first 20 errors, if you don’t want that, feel free to remove.)

There are three constants that control how refreshing works, REFRESH_EVERY_N_WRITES, REFRESH_EVERY_N_READS, and N_REFRESHES. If you want to test if your memory is refreshed correctly for a longer time, adjust N_REFRESHES.

To compile, copy the above files into a new directory, create a ‘build’ subdirectory and cd to it, run cmake and make:

mkdir build
cd build
cmake ../ -DCMAKE_BUILD_TYPE=Debug
make -j4
# hold BOOTSEL button while plugging in Pico USB
cp 4164_test.uf2 /media/.../RPI-RP2/

Testing a ZX81 RAM pack with an Arduino (and repair)

For a quick overview of what I did to the ZX81 before arriving at this point, see this post: ZX81 repair (no video, some keys not working, and bad RAM pack)

I recently got hold of a Spectrum ZX81 RAM pack that when plugged in, produced a garbled screen on boot. I decided to check what’s wrong before ordering any chips. To do that, I first looked at the schematics and made sure there were no bad connections. This was a laborious process, but fortunately all RAM chips share all pins except the data pins, so you should have continuity between all pins except two on all RAM chips.

I finally thought I found something broken — but it turned out that there’s just a slight difference between my board and the schematics: only three NAND gates are used on the quad NAND IC, and logically and electrically it doesn’t matter which gates you use and which one you leave unsoldered. Well, for some reason my board used different pins (i.e., left a different gate unsoldered) than the ones in the schematics.

Below you will find my annotated schematics of the RAM pack.

Annotated schematics of the ZX81 RAM expansion pack

Here is an actual picture (with the bad RAM chip replaced) that shows which chips are where:

The ZX81 RAM pack is made of two circuit boards. These circuit boards are sandwiched together. The pins connecting the two boards are very flexible, so you can just apply a small amount of force and bend the two boards apart. One board has logic chips (the aforementioned NAND chip, an OR chip, four data selector chips (74LS157) and a dual 4-bit counter chip (74LS393)). The other has the DRAM chips and some circuitry to generate -5V and 12V from 5V and 9V input. My voltages were all good and I didn’t see anything unusual there, so I didn’t really look into it too much. If you need to debug the power circuitry, you may need to know how to generate negative voltage (https://www.allaboutcircuits.com/projects/build-your-own-negative-voltage-generator/). I also created a rough simulation of the power circuitry on https://www.falstad.com/circuit/. If you are interested, go to File -> Import from Text and paste the following code, but I don’t think I’m using the correct transformer and there may be other issues:

$ 1 0.000005 24.46919322642204 50 5 43 5e-11
169 112 112 192 112 0 4 9 -1.3552527156068805e-20 0.05437017461335131 0.022386130031495474 0.99
R 192 112 128 64 0 0 40 9 0 0 0.5
w 192 144 272 144 0
w 272 144 272 256 0
t 240 272 272 272 0 -1 17.389821061615624 -0.6849346284276479 100 default
w 272 288 272 336 0
w 192 224 288 224 0
d 336 224 288 224 2 default
34 zener-12 1 1.7143528192810002e-7 0 2.0000000000000084 12 1
z 336 224 400 224 2 zener-12
d 336 224 336 256 2 default
w 336 256 336 336 0
r 192 224 192 272 0 100
w 192 272 240 272 0
r 192 272 192 336 0 2200
g 192 336 144 336 0 0
w 192 336 272 336 0
w 304 336 336 336 0
w 192 176 224 176 0
w 224 176 224 384 0
w 272 336 304 336 0
d 304 384 304 336 2 default
d 352 384 304 384 2 default
r 352 384 416 384 0 2200
34 zener-5.1 1 1.7143528192810002e-7 0 2.0000000000000084 5.1 1
z 416 384 416 336 2 zener-5.1
209 352 336 352 384 0 0.000001 5.679241726295006 1 1
w 224 176 352 176 0
d 352 176 400 176 2 default
w 400 176 400 224 0
d 400 112 400 176 2 default
w 192 112 400 112 0
w 400 112 464 112 0
w 416 384 448 384 0
w 416 336 464 336 0
w 464 336 464 256 0
209 464 208 464 256 0 0.000022000000000000003 8.999999999994335 1 1
w 464 112 464 160 0
c 400 224 400 336 0 0.00009999999999999999 9.397384509781268 0.001
w 352 336 400 336 0
w 336 336 352 336 0
w 400 336 416 336 0
O 384 416 432 416 1 0
O 400 224 448 224 1 0
c 192 176 192 224 0 0.000022 -18.406729087662764 0.001
c 224 384 304 384 0 0.000001 -0.8460828370149334 0.001
r 464 160 464 208 0 1000
r 448 384 512 384 0 1000000
g 512 384 560 384 0 0
x 9 10 431 32 4 16 ZX81\sRAM\spack\spower\ssupply\scircuit\s(9V\s->\s-5,\s12V).\\nChanged\ssome\scapacitors\sto\snon-polarized
x 489 171 629 212 4 16 Added\s1k\sresistor\\nto\sprevent\sshort\\ncircuit

The four 74LS157 selector chips on the non-DRAM board work as two separate entities, that is, the “selector” inputs are tied together for the lower two chips and tied together for the higher two chips. When you look for 74LS157 pinouts on the internet, you’ll often find an OCR’d and slightly wrong pinout. The pin labelled I1d on the bottom side should be labelled I1a instead:

Wrong 74ls157 pinout; lower I1d should be I1a — Two I1d pins? Yeah right! The lower one is actually I1a.

The 74LS393 is used by the ZX81 to refresh the DRAM. According to the datasheet, the DRAM has to be refreshed at least every 2 ms. I am guessing that the CPU or ULA periodically generates the RFSH signal, but we don’t have to worry about that in the context of this repair. Each time RFSH goes low (low because there is a NOT gate built from one of the NAND gates between RFSH and pin 1 (“clock”) of the 74LS393 counter chip), the counter chip adds +1 to its internal state. Additionally, the RFSH signal also goes into the first pair of selector chips, which causes the output of the counter to be selected as the output. Otherwise, address lines A0-A6 are used as the output.

The second pair of selector chips has address lines A7-A13 as one set of inputs, and the output of the previous selector chip as the other set of inputs. The circuit that goes into the selector pin is somewhat complicated, as it uses four different inputs to decide which set of inputs to select. I decided to make a truth table to better understand it. If you need to understand this circuit, the truth table or OpenDocument / Excel files below may help a little bit:

Write operation:
WR	MREQ	RD	temp1 nand(wr,rd)	A14	temp2 nand(temp1,A14)	S or(mreq,temp2)
0	0	0	1	0	1	1	1
0	0	0	1	1	0	0	2
0	0	1	1	0	1	1	3
0	0	1	1	1	0	0	4
0	1	0	1	0	1	1	5
0	1	0	1	1	0	1	6
0	1	1	1	0	1	1	7
0	1	1	1	1	0	1	8

Read operation:
WR	MREQ	RD	temp1 nand(wr,rd)	A14	temp2 nand(temp1,A14)	S or(mreq,temp2)
1	0	0	1	0	1	1	9
1	0	0	1	1	0	0	10
1	0	1	0	0	1	1	11
1	0	1	0	1	1	1	12
1	1	0	1	0	1	1	13
1	1	0	1	1	0	1	14
1	1	1	0	0	1	1	15
1	1	1	0	1	1	1	16

OpenDocument (contains formulas rather than just hard-coded 0 or 1s): truth_table.ods Download

Excel (contains formulas rather than just hard-coded 0 or 1s): truth_table.xlsx Download

The circuit contains a number of RC delay circuits to make the timing work, but as the delay is on the order of 10-20 ns, I don’t have to worry about those when driving this circuit using an Arduino — I’m using digitalRead() and digitalWrite(), and these functions take a couple of microseconds to complete. Looking at the timing diagram in the DRAM IC’s datasheet however, it is relatively obvious that these delays are needed.

As stated above, the DRAMs are all connected in parallel on all pins except the data pins. And while the DRAM chips have separate pins for input and output, the RAM pack ties these together as they are of course not used at the same time — you either read or write.

Some more notes on the timing — programming the Arduino like this will drive the chips very slowly, but according to the datasheet, we don’t really have to worry about being too slow in most cases. Some parameters have “max” values on the order of 10s or 100s of ns, but the notes alleviate most concerns in that area. The maximum RAS/CAS pulse width of 32000/10000 ns should be okay with just digitalRead()/digitalWrite() (I didn’t measure too much though, to be honest). Here is the code doing the write and CAS pulses, and what we know about digitalWrite(), this should be just under 10000 ns:

void writeAddress(...) {
...
  /* write */
  digitalWrite(WR, LOW);

  /* tRCD max is 50 ns, but footnote 10 states:
   * "If tRCD is greater than the maximum recommended value shown in this table, tRAC will increase by the amount that tRCD exceeds the value shown."
   * Therefore this is not a hard maximum and we don't have to worry too much about being too slow */
  digitalWrite(XA14, HIGH); /* pulls CAS low after 10-20ns */

  digitalWrite(WR, HIGH);
  digitalWrite(XA14, LOW);

Here’s an oscilloscope screenshot for just the WR pulse (which should have the same timing), which is approximately… 10 microseconds!

There is code out there to test 4116 RAM ICs. However, the chips in my RAM pack weren’t socketed so I couldn’t take them out very easily. And it’s not certain if we can just attach the Arduino directly to the DRAM chips’ pins — if we apply power to the board we will power up the rest of the circuitry and that could interfere with our testing — the selector chips might produce 1s when we want 0s, or vice versa. I took this code and modified it to work with the rest of the circuitry. I originally planned on testing two bits at once (i.e., two DRAM chips at once), but I ran out of cables. I’ve left in the code however, commented.

Since we don’t have a lot of pins on the Arduino (or connectors that we can use to connect the Arduino with the RAM pack), I decided to enlist the binary counter chip’s help to generate the addresses. Check out the advanceRow() function to see how easy this is — we just need to manipulate RFSH. (Note that “row” means the same thing as it does in the datasheet — the DRAM chip is organized into 128 “rows” and 128 “columns”, 128×128 = 16384 bits.)

I also decided to write two different values in two successive addresses before reading back from these addresses. This is important because otherwise the Arduino may just read whatever it just put on the wire itself. I.e., if you take an Arduino that isn’t connected to anything at all and do something like the following, your digitalRead may return whatever you wrote using digitalWrite!

digitalWrite(13, HIGH);
val = digitalRead(13); // val may be 1 now!

Which is why we instead do something like this (c is column, v is value, row is set elsewhere):

        writeAddress(c, v, v);
        writeAddress(c+1, !v, !v);
        readAddress(c, &read_v0_0, &read_v1_0);
        readAddress(c+1, &read_v0_1, &read_v1_1);

I also changed the error() and ok() functions. ok() will make a (preferably green) LED blink slowly, error() will made a (preferably red) LED and the other LED blink alternatingly.

ZX81 RAM pack memory test using an Arduino — Diagnostic surgery in progress.

Here is the code:

/* Modified by sneep to test the Sinclair ZX81 RAM pack.
 * Original code is at http://labs.frostbox.net/2020/03/24/4116-d-ram-tester-with-schematics-and-code/
 * The Arduino doesn't have enough pins to check all outputs at
 * the same time so we'll test one (out of eight) at a time;
 * rewiring is required between tests.
 *
 * Unlike the previous version of this source code, we go through
 * the onboard logic (a couple of ORs, ANDs, multiplexers, and a
 * counter for refresh) rather than talking to the 4116 RAM ICs
 * directly.
 * It's probably not possible to check the 4116 chips in-circuit
 * using the original source code, as we would apply power to
 * everything and would then cause our address signals to fight
 * against the multiplexer's outputs.
 *
 * NOTE: As we are using digitalWrite, this is a very slow test.
 * We go beyond the 'max' value recommended in the datasheet for
 * one thing, and go way beyond the 'min' values -- borderline
 * chips could pass our tests but fail when driven by the ZX81.
 *
 * NOTE: At least the init refresh cycles may stop working if we
 * replace digitalWrite by something faster (init refresh).
 */

//This is for an arduino nano to test 4116 ram ic. Please see video https://youtu.be/MVZYB54VD2g and blogpost
//Cerated in november 2017. Code commented and posted march 2020. 
//Most of the code and design is from http://forum.defence-force.org/viewtopic.php?p=15035&sid=17bf402b9c2fd97c8779668b8dde2044
//by forum member "iss"" and modified to work with 4116 D ram by me Uffe Lund-Hansen, Frostbox Labs. 
//This is version 2 of the code. Version 1 had a very seroisl bug at approx. line 43 which meant it only checked ram address 0 

//#include <SoftwareSerial.h>

#define XD0         A1
#define MREQ        5
#define WR          6
#define RFSH        10

#define XA7         4
#define XA8         2
#define XA9         3
#define XA10        A3 // orange
#define XA11        A4 // yellow
#define XA12        A5 // green
#define XA13        A2
#define XA14        A0

#define R_LED       13    // Arduino Nano on-board LED
#define G_LED       8

//Use the reset button to start the test on solder an external momentary button between RST pin and GND pin on arduino.

#define BUS_SIZE     7

#define NO_DEBUG 0
#define VERBOSE_1 1
#define VERBOSE_2 2
#define VERBOSE_3 3
#define VERBOSE_4 4
#define VERBOSE_MAX 5

#define DEBUG NO_DEBUG // VERBOSE_3
#define DEBUG_LED_DELAY 0 /* Set to 0 for normal operation. Adds a delay inbetween when double-toggling fast signals, e.g. RFSH */

int g_row = 0;

const unsigned int a_bus[BUS_SIZE] = {
  XA7, XA8, XA9, XA10, XA11, XA12, XA13
};

void setBus(unsigned int a) {
  int i;
  /* Write lowest bit into lowest address line first, then next-lowest bit, etc. */
  for (i = 0; i < BUS_SIZE; i++) {
    digitalWrite(a_bus[i], a & 1);
    a /= 2;
  }
}

void advanceRow() {
    /* Keep track of which row we're on so we can put that in our debug output */
    g_row = (g_row + 1) % (1<<BUS_SIZE);
    /* Counter chip should be fast enough.
     * NOTE there is a NOT gate between arduino pin and counter chip */
    digitalWrite(RFSH, LOW);
    if (DEBUG_LED_DELAY) {
      interrupts();
      delay(DEBUG_LED_DELAY);
      noInterrupts();
    }
    digitalWrite(RFSH, HIGH);
}

void writeAddress(unsigned int c, int v0, int v1) {
  /* Set column address in advance (arduino may be too slow to set this later) (won't appear on the RAM chip pins yet) */
  setBus(c);

  if (DEBUG >= VERBOSE_MAX) {
    interrupts();
    Serial.print("Writing v0 ");
    Serial.println(v0);
//    Serial.print("Writing v1 ");
//    Serial.println(v1);
    noInterrupts();
  }
  /* Set val in advance (arduino may be too slow to set this later) (chip doesn't care what's on this pin except when it's looking) */
  pinMode(XD0, OUTPUT);
//  pinMode(XD1, OUTPUT);
  digitalWrite(XD0, (v0 & 1)? HIGH : LOW);
//  digitalWrite(XD1, (v1 & 1)? HIGH : LOW);

  digitalWrite(MREQ, LOW); /* pulls RAS low */

  /* write */
  digitalWrite(WR, LOW);

  /* tRCD max is 50 ns, but footnote 10 states:
   * "If tRCD is greater than the maximum recommended value shown in this table, tRAC will increase by the amount that tRCD exceeds the value shown."
   * Therefore this is not a hard maximum and we don't have to worry too much about being too slow */
  digitalWrite(XA14, HIGH); /* pulls CAS low after 10-20ns */

  digitalWrite(WR, HIGH);
  digitalWrite(XA14, LOW);
  digitalWrite(MREQ, HIGH);

  pinMode(XD0, INPUT);
//  pinMode(XD1, INPUT);
}

void readAddress(unsigned int c, int *ret0, int *ret1) {
  /* set column address (won't appear on the RAM chip pins yet) */
  setBus(c);
  digitalWrite(MREQ, LOW); /* pulls RAS low, row address will be read in after tRAH (20-25 ns) */

  /* Need to wait tRCD (RAS to CAS delay time), min. 20ns max. 50 ns, but a footnote implies that we can go over the max */
  digitalWrite(XA14, HIGH); /* sets S to high and pulls CAS low after 10-20ns (it's correct to have the column address on the bus before pulling CAS low) */

  /* Need to wait tCAC (time CAS-low to data-valid), but Arduino is slow enough for our purposes */

  /* get current value
   * datasheet "DATA OUTPUT CONTROL", p. 8:
   * "Once having gone active, the output will remain valid until CAS is taken to the precharge (logic 1) state, whether or not RAS goes into precharge."
   */
  *ret0 = digitalRead(XD0);
//  *ret1 = digitalRead(XD1);

  digitalWrite(XA14, LOW);
  digitalWrite(MREQ, HIGH);
}

void error(int c, int v, int read_v0_0, int read_v1_0, int read_v0_1, int read_v1_1)
{
  unsigned long a = ((unsigned long)c << BUS_SIZE) + g_row;
  interrupts();
  Serial.print(" FAILED $");
  Serial.println(a, HEX);
  Serial.print("Wrote v/!v: ");
  Serial.println(v);
  Serial.println(!v);
  Serial.print("Read v0_0: ");
  Serial.println(read_v0_0);
//  Serial.print("Read v1_0: ");
//  Serial.println(read_v1_0);
  Serial.print("Read v0_1: ");
  Serial.println(read_v0_1);
//  Serial.print("Read v1_1: ");
//  Serial.println(read_v1_1);
  Serial.flush();
  while (1) {
    blink_abekobe(100);
  }
}

void ok(void)
{
  digitalWrite(R_LED, LOW);
  digitalWrite(G_LED, LOW);
  interrupts();
  Serial.println(" OK!");
  Serial.flush();
  while (1) {
    blink_green(500);
  }
}

void blink_abekobe(int interval)
{
  digitalWrite(R_LED, LOW);
  digitalWrite(G_LED, HIGH);
  delay(interval);
  digitalWrite(R_LED, HIGH);
  digitalWrite(G_LED, LOW);
  delay(interval);
}

void blink_green(int interval)
{
  digitalWrite(G_LED, HIGH);
  delay(interval);
  digitalWrite(G_LED, LOW);
  delay(interval);  
}

void blink_redgreen(int interval)
{
  digitalWrite(R_LED, HIGH);
  digitalWrite(G_LED, HIGH);
  delay(interval);
  digitalWrite(R_LED, LOW);
  digitalWrite(G_LED, LOW);
  delay(interval);  
}

void green(int v) {
  digitalWrite(G_LED, v);
}

void fill(int v) {
  int i, r, c, g = 0;
  int read_v0_0, read_v1_0;
  int read_v0_1, read_v1_1;

  if (DEBUG >= VERBOSE_1) {
    Serial.print("Writing v: ");
    Serial.println(v);
  }
  for (r = 0; r < (1<<BUS_SIZE); r++) {
    if (DEBUG >= VERBOSE_1) {
      interrupts();
      Serial.print("Writing to row ");
      Serial.println(g_row);
      noInterrupts();
    }

    for (c = 0; c < (1<<BUS_SIZE); c++) {
        if (DEBUG >= VERBOSE_4) {
          interrupts();
          Serial.print("Writing to column ");
          Serial.println(c);
          noInterrupts();
        }
        green(g ? HIGH : LOW);
        /* The same two data pins are used for both read and write,
         * so when nothing is connected we would just read the value we just wrote.
         * So let's write 0 and 1 (or 1 and 0) to two addresses and read them back.
         * We should get 0 and 1, but if there's nothing connected we'd get 1 and 0,
         * which 
         */
        writeAddress(c, v, v);
        writeAddress(c+1, !v, !v);
        readAddress(c, &read_v0_0, &read_v1_0);
        readAddress(c+1, &read_v0_1, &read_v1_1);
        if (DEBUG >= VERBOSE_3) {
          interrupts();
          Serial.print("Read v0_0: ");
          Serial.println(read_v0_0);
//          Serial.print("Read v1_0: ");
//          Serial.println(read_v1_0);
          Serial.print("Read v0_1: ");
          Serial.println(read_v0_1);
//          Serial.print("Read v1_1: ");
//          Serial.println(read_v1_1);
          noInterrupts();
        }
        if ((read_v0_0 != v) || // (read_v1_0 != v) ||
            (read_v0_1 != !v)) { //|| (read_v1_1 != v)) {
          error(c, v,
                read_v0_0,
                read_v1_0,
                read_v0_1,
                read_v1_1);
        }
        g ^= 1;
    }

    advanceRow();
  }

  for (i = 0; i < 50; i++) {
    blink_redgreen(100);
  }
}

void setup() {
  int i;

  Serial.begin(115200);
  while (!Serial)
    ; /* wait */

  Serial.println();
  Serial.print("ZX81 RAM PACK TESTER");

  for (i = 0; i < BUS_SIZE; i++)
    pinMode(a_bus[i], OUTPUT);

  pinMode(XA14, OUTPUT);
  pinMode(MREQ, OUTPUT);
  pinMode(WR, OUTPUT);

  pinMode(R_LED, OUTPUT);
  pinMode(G_LED, OUTPUT);

  /* Input and output is tied together on RAM pack.
   * We'll leave the pinMode on INPUT for most of the time and only set to OUTPUT when writing.
   */
  pinMode(XD0, INPUT);
//  pinMode(XD1, INPUT);

  digitalWrite(WR, HIGH);
  digitalWrite(MREQ, HIGH);
  digitalWrite(XA14, HIGH);

  Serial.flush();

  digitalWrite(R_LED, LOW);
  digitalWrite(G_LED, LOW);

  noInterrupts();

  /* Datasheet says: "Several cycles are required after power-up before proper device operation is achieved. Any 8 cycles which perform refresh are adequate for this purpose."
   * We'll just perform a refresh on all rows. */
  for (i = 0; i < (1<<BUS_SIZE); i++) {
    /* Should work fine timing-wise with standard Arduino digitalWrite() (tRC min: 375 ns, no max apparently) */
    interrupts();
    Serial.print("init: refreshing row ");
    Serial.println(g_row);
    Serial.flush();
    noInterrupts();
    advanceRow();
    digitalWrite(MREQ, LOW);
    digitalWrite(MREQ, HIGH);
  }
}

void loop() {
  interrupts(); Serial.print("."); Serial.flush(); noInterrupts(); fill(0);
  interrupts(); Serial.print("."); Serial.flush(); noInterrupts(); fill(1);
  ok();
}

In my case, all DRAM chips passed the test except the one controlling D5. Even the very first read wouldn’t work out. I therefore replaced that one and hooray, things worked again! Here’s a pic of a 3d maze game running with the repaired RAM.

Some random notes on how to do the actual replacement

Before replacing the defective RAM chip I also tried piggybacking, but that didn’t make the test pass. I was planning on using my oscilloscope to get an idea of what’s going wrong when piggybacking, but things were just too finicky and I abandoned that plan. If you try yourself, make sure to put your multimeter in continuity mode and check that your piggybacked RAM chip is actually making contact.

I cut off the legs of the chip I 99% knew was bad and then desoldered the legs. Applying heat using a soldering iron from above and using a desoldering pump from below (or the other way round) worked reasonably well.

It should be okay to use a socket on most chips. Here’s a photo of the boards sandwiched up again after the replacement. You can see that there’s quite some clearance left:

Let me know if you have any questions about this repair.

Useful developer tool console one-liners/bookmarklets

Many websites lack useful features, many websites go to great lengths to prevent you from downloading images, etc.

When things get too annoying I sometimes open the developer tools and write a short piece of JavaScript to help me out. Okay, but isn’t it annoying to open developer tools every time? Yes! So right-click your bookmarks toolbar, press the button to add a bookmark, give it an appropriate title, and in the URL, put “javascript:” followed by the JavaScript code. (Most of the following examples already have javascript: prepended to the one-liner.)

Note that you have to encapsulate most of these snippets in an anonymous function, i.e. (function() { … })(). Otherwise your browser might open a page containing whatever value your code snippet returns.

(It is my belief that all of the following code snippets aren’t copyrightable with a clear conscience, as they may constitute the most obvious way to do something. The code snippets can therefore be regarded as public domain, or alternatively, at your option, as published under the WTFPL.)

Note that these snippets are only tested on Firefox. (They should work in Chrome too, though.)

Hacker News: scroll to next top-level comment

Q: How likely is this to stop working if the site gets re-designed?
A: Would probably stop working.

This is a one-liner to jump to the first/second/third/… top-level comment. (Because often the first few threads get very monotonous after a while?) If you don’t see anything happening, maybe there is no other top-level comment on the current page.

document.querySelectorAll("img[src='s.gif'][width='0']")[0].scrollIntoView(true)
document.querySelectorAll("img[src='s.gif'][width='0']")[1].scrollIntoView(true)
document.querySelectorAll("img[src='s.gif'][width='0']")[2].scrollIntoView(true)

javascript:document.querySelectorAll("img[src='s.gif'][width='0']")[1].scrollIntoView(true) # Bookmarklet version. Basically just javascript: added at the beginning.

The “true” parameter in scrollIntoView(true) means that the comment will appear at the top (if possible). Giving false will cause the comment to be scrolled to to appear at the bottom of your screen. Not that the scrolling isn’t perfect; the comment will be half-visible.

It would be useful to be able to remember where we last jumped to, and then jump to that + 1. We can add a global variable for that, and to be reasonably sure it doesn’t clash with an existing variable we’ll give it a name like ‘pkqcbcnll’. If the variable isn’t defined yet, we’ll get an error, so to avoid errors, we’ll use typeof to determine if we need to define the variable or not.

javascript:if (typeof pkqcbcnll == 'undefined') { pkqcbcnll = 0 }; document.querySelectorAll("img[src='s.gif'][width='0']")[pkqcbcnll++].scrollIntoView(true);

Wikipedia (and any other site): remove images on page

Q: How likely is this to stop working if the site gets re-designed?
A: Unlikely to stop working, ever.

Sometimes Wikipedia images aren’t so nice to look at, here’s a quick one-liner to get rid of them:

document.querySelectorAll("img").forEach(function(i) { i.remove() });

Instagram (and e.g. Amazon and other sites): open main image in new tab

Q: How likely is this to stop working if the site gets re-designed?
A: Not too likely to stop working.

When you search for images or look at an author’s images in the grid layout, right-click one of the images and press “open in new tab”, then use the following script to open just the image in a new tab. I.e., it works on pages like this: https://www.instagram.com/p/CSSx_E3pmId/ (random cat picture, no endorsement intended.)

Or on Amazon product pages, you’ll often get a large image overlaid on the page when you hover your cursor over a thumbnail. When you execute this one-liner in that state, you’ll get that image in a new tab for easy saving/copying/sharing. E.g., on this page: https://www.amazon.co.jp/dp/B084H7ZYTT you will get this image in a new tab if you hover over that thumbnail. (Random product on Amazon, no endorsement intended.)

This script works by going through all image tags on the page and finding the source URL of the largest image. As of this writing, I believe this is the highest resolution you can get without guessing keys or reverse-engineering.

javascript:var imgs=document.getElementsByTagName("img");var height;var max_height=0;var i_for_max_height=0;for(var i=0;i<imgs.length;i++){height=parseInt(getComputedStyle(imgs[i]).getPropertyValue("height"));
if(height>max_height){max_height=height;i_for_max_height=i;}}open(imgs[i_for_max_height].getAttribute("src"));

xkcd: Add title text underneath image (useful for mouse-less browsing)

Q: How likely is this to stop working if the site gets re-designed?
A: May or may not survive site designs.

Useful for mouseless comic reading. There’s just one <img> with title text, so we’ll take the super-simple approach. Alternatively we could for example use #comic > img or we could select the largest image on the page as above.

javascript:document.querySelectorAll("img[title]").forEach(function(img) { document.getElementById("comic").innerHTML += "<br />
" + img.getAttribute("title"); })

Instagram: Remove login screen that appears after a while in the search results

To do this, we have to get rid of something layered on top of the page, and then remove the “overflow: hidden;” style attribute on the body tag.

A lot of pages put “overflow: hidden” on the body tags when displaying nag screens, so maybe it’s useful to have this as a separate bookmarklet. Anyway, in the following example we do both at once.

javascript:document.querySelector("div[role=presentation]").remove() ; document.body.removeAttribute("style");

Sites that block copy/paste on input elements

This snippet will present a modal prompt without any restrictions, and put that into the text input element. This example doesn’t work in iframes, and doesn’t check that we’re actually on an input element:

javascript:(function(){ document.activeElement.value = prompt("Please enter value")})()

The following snippet also handles iframes (e.g. on https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_input_test) and should even handle nested iframes (untested). The problem is that the <iframe> element becomes the activeElement when an input element is in an <iframe>. So we’ll loop until we find an activeElement that doesn’t have a contentDocument object. And then blindly assume that we’re on an input element:

javascript:(function(){var el = document.activeElement; while (el.contentDocument) { el = el.contentDocument; } el.activeElement.value = prompt("Please enter value")})()

Removing <iframe>s to get rid of a lot of ads

Many ads on the internet use <iframe> tags. Getting rid of all of these at once may clean up pages quite a bit — but some pages actually use <iframe>s for legitimate purposes.

javascript:(function(){document.querySelectorAll("iframe").forEach(function(i) { i.remove() });})();

Amazon Music: Get list of songs in your library

I.e., songs for which you have clicked the “+” icon. Maybe you want to get a list of your songs so you can move to a different service, or maybe you just want the list. The code presented here isn’t too likely to survive a redesign, but could probably be adjusted if necessary. Some JavaScript knowledge might come in handy if you want to get this script to work.

Amazon Music makes this process rather difficult because the UI unloads and reloads elements dynamically when it thinks you have too many on the page at once.

We are talking about this page (using the default, alphabetically ordered setting), BTW:

Ideally, you would just press Ctrl+A and paste the result into an editor, or select all table cells using, e.g., Ctrl+click.

However, you’ll only get around 15 items in that case. (The previous design let you copy everything at once and was superior in other respects too, IMO.)

Anyway, we’ll use the MutationObserver to get the list. Using the MutationObserver, we’ll get notified when elements are added to the page. Then we just need to scroll all the way down and output the collected list. We may get duplicates, depending on how the page is implemented, but we’ll ignore those for now — we may have duplicates anyway if we have the same song added more than once. So I recommend you get rid of the duplicates yourself, by using e.g. sort/uniq or by loading the list into Excel or LibreOffice Calc or Google Sheets, or whatever you want.

On Amazon Music’s page, the <music-image-rows> elements that are dynamically added to the page contain four <div> elements classed col1, col2, col3, and col4. (This could of course change any time.) These <div>s contain the song name, artist name, album name, and song length, respectively, which is all we want (well, all I want). We’ll just use querySelectorAll on the newly added element to select .col1, .col2, .col3, and .col4 and output the textContent to the console. Occasionally, a parent element pops up that contains all the previous .col* <div> elements. We’ll ignore that by only evaluating selections that have exactly four elements.

Scroll to top of page
Execute code (e.g., by opening console and pasting in the code)
Slowly scroll to the end of the page
Execute observer.disconnect() in the console (otherwise text will keep popping up if you scroll)
Select and copy all text in the console (or use right-click → Export Visible Messages To), paste in an editor or (e.g.) Excel. There are some Find&Replace regular expressions below that you could use to post-process the output.
Note that the first few entries in the list (I think sometimes it’s just the first one, sometimes it’s the first four) are never going to be newly added to the document, so you will have to copy them into your text file yourself.

The code’s output is rather spreadsheet-friendly, tab-deliminated and one line per entry. You can just paste that into your favorite spreadsheet software.

The code cannot really be called a one-liner at this point, but feel free to re-format it and package it as a bookmarklet, if you want.

observer = new MutationObserver(function(mutations) {
     mutations.forEach(function(mutation) {
         if (mutation.type === "childList") {
             if (mutation.target && mutation.addedNodes.length) {
                 var string = "";
                 selector = mutation.target.querySelectorAll(".col1, .col2, .col3, .col4");
                 if (selector.length == 4) {
                     selector.forEach(function(e) { if (e.textContent) string += e.textContent + "\t" });
                     string += "----------\n"
                     console.log(string);
                 }
             }
         }
     });
 });
 observer.observe(document, { childList: true, subtree: true });

Post-processing regular expressions (the “debugger eval code” one may be Firefox-specific):

Find: [0-9 ]*debugger eval code.*\n
Replace: (leave empty)

Find: \n----------\n
Replace: \n
(Do this until there are no more instances of the "Find" expression)

Find: \t----------
Replace: (leave empty)

There are a lot of duplicates. You can get rid of them either by writing your list to a file and then executing, e.g.: sort -n amazon_music.txt | uniq > amazon_music_uniq.txt
Or you can get Excel to do the work for you.
Make sure that you have the correct number of lines. (The Amazon Music UI enumerates all entries in the list. You would want the same number of lines in your text file.)

Outputting QR codes on the terminal

In these dark ages, a lot of software (mostly chat apps) only work on smartphones. While it’s easy to connect a USB-OTG hub to most smartphones (even my dirt-cheap Android smartphone supports this (now three years old)), having two keyboards on your desk can be kind of annoying.

While there are a bunch of possible solutions to this problem, many of these solutions do not fix the problem when you’re not on your home setup. Which is why I often just use QR codes to send URLs to my phone, and there are a lot of QR code generator sites out there.

QR code generator sites are useful because they work everywhere, but many are slow and clunky. Perhaps acceptable in a pinch, but… what if you could just generate QR codes on the terminal?

Well, some cursory googling revealed this library: https://github.com/qpliu/qrencode-go, which doesn’t have any external (non-standard library) dependencies, is short enough to skim over for malicious code, and comes with an easily adapted example. (I am reasonably confident that there is no malicious code at ad8353b4581fa11fc01a50ebf56db3833462fc13.)

Note: I very rarely use Go. Here is what I did to compile this:

$ git clone https://github.com/qpliu/qrencode-go
$ mkdir src
$ mv qrencode/ src/
$ cat > qrcodegenerator.go
package main
 import (
     "bytes"
     "os"
     "qrencode"
 )
 func main() {
     var buf bytes.Buffer
     for i, arg := range os.Args {
         if i > 1 {
             if err := buf.WriteByte(' '); err != nil {
                 panic(err)
             }
         }
         if i > 0 {
             if _, err := buf.WriteString(arg); err != nil {
                 panic(err)
             }
         }
     }
     grid, err := qrencode.Encode(buf.String(), qrencode.ECLevelQ)
     if err != nil {
         panic(err)
     }
     grid.TerminalOutput(os.Stdout)
 }
$ GOPATH=$PWD go build qrcodegenerator.go
$ ./qrcodegenerator test
// QR CODE IS OUTPUT HERE

Note: the above code is adapted from example code in the README.md file and is therefore LGPL3.

Since Go binaries are static (that’s what I’ve heard at least), you can then move the executable anywhere you like (e.g. ~/bin) and generate QR codes anywhere. Note that they’re pretty huge, i.e. for ‘https://blog.qiqitori.com’ (26 bytes) the QR code’s width will be 62 characters. For e.g. ‘https://blog.qiqitori.com/2020/10/outputting-qr-codes-on-the-terminal/’ (this post) the width is 86 characters.