Does it run DOOM?

I was trying to find a good use case for my LILIGO TTGO T-Watch. It’s a programmable smart watch featuring the amazing ESP32 chip and a 240x240 color LCD screen.

I keep hearing about Doom running on this and that, sometimes directly and sometimes using the device as an exotic external screen. My project falls into the latter category, but it was a lot of fun to implement!

As this was mostly a learning experience, I took several wrong turns and mention them in this writeup.

As with most of tiny projects, I got the basics version up and running on a two afternoons and then fiddled with the code for a few more days to get a more presentable result.

alt

It works!

Building steps

Getting the video signal across

There are a couple of ways to transmit data from a computer to the ESP32-based watch. Wi-Fi, Bluetooth and a serial port.

Serial port seemed promising - I initially targeted 120x120 picture, which can be represented by 14400 bits or 1800 bytes. Targeting 115200 bauds I could achieve around 7 fps (115200 / 16200 bauds per frame).

The default serial configuration is 8 data bits, no parity and one stop bit, hence 9 bauds per byte.

With the goal was of displaying DOOM on the smart watch in mind, I thought. I decided to divide this into three phases:

  • Scale the image down to the resolution of the watch
  • Implement a proof of concept serial display on the T-Watch
  • Optimize until it works better

Scaling Doom

I started with finding a suitable Doom source port. I discovered Chocolate Doom, which accurately reproduces the game as it was played in the 1990s.

Building it on Ubuntu is straightforward, so I dived straight into the code and started poking around.

Wrong turn #1 - modifying Doom internal resolution

The first thing I did was try to find how a resolution gets set.

At first I tried to change the internal resolution of the renderer - there was a #define for SCREEN_WIDTH 320 and SCREEN_HEIGHT 200 in i_video.h.

Changing this to 120x75 made the game crash. I attached a debugger to see where exactly and the game was attempting to render some things at locations beyond 120x75. Scaling all coordinates by 0.5x helped get the menu to display, but it crashed again as soon as I started the game.

Scaling and dithering

After studying the Chocolate Doom source port some more I realized it has a series of buffers and textures that represent stages of the rendering pipeline.

The engine itself draws assets with a palette to make a better use of 255 colors.

  • the game is drawn into an 8-bit paletted 320x200 paletted screen buffer.
  • blit into a 32-bit ARGB 320x200 buffer
  • rendered into an upscaled texture using a nearest linear scaling (e.g. 640x400)
  • rendered to the screen (e.g. 800x600) using linear scaling

It was clear the dithering and output to the serial port should happen somewhere within this pipeline.

Wrong turn #2

I realized looking at the screenshots from various buffers that I’ve been dithering the wrong display buffer (the 320x200 display buffer), then downscaled in SDL to 240x150, which caused artifacts and didn’t look as good.

The correct way was to scale the display buffer to 240x150, then dither, then send this over the wire.

Dithering

After a little research on 1-bit graphics I realized there are two commonly used dithering algorithms - I implemented both:

Ordered dithering

Ordered dithering is a simple algorithm that produces a characteristic crosshatch pattern.

It works by applying a threshold map to the pixels displayed, causing pixels from the map to change color based on the distance from the original color to black and white.

ordered dithering

Ordered dithering patterns

Floyd-Steinberg dithering

Floyd-Steinberg dithering operates using error diffusion and is characterized by its grainy or speckled appearance.

Because Floyd-Steinberg works by pushing the quantization error from a pixel to its neighboring pixels, a slight change in the scene can propagate over the entire screen. I found that aesthetically less pleasing than the more predictable ordered dithering, as it was simply less jumpy.

doom dithered

Floyd-Steinberg dithering above, ordered dithering below

Grayscale and gamma

In both dithering algorithms we convert the color to grayscale with the following algorithm:

//get r,g,b color values
uint8_t r, g, b;
uint32_t pix = getpixel(s, x, y);
SDL_GetRGB(pix, s->format, &r, &g, &b);
      
// Convert the pixel value to grayscale / intensity
grayscale = .299f * r + .587f * g + .114f * b;

Doom is quite dark, so it’s hard to see anything in the default gamma setting. Fortunately the engine also features gamma correction, that can be toggled with the F11 key in game.

gamma

Gamma settings 1, 3, 5 going from unusable in monochrome to pretty bright

Watch as an external display

To make data transfer a bit less intensive I decided to transmit in 120x120 resolution and then double the pixels to 240x240 on the device.

Initially I programmed my watch to be a simple single-core serial display. It reads one row of pixels from the serial port as 120 bits that represent black or white pixels, then expands each bit onto a 16-bit array value.

Transfer rate

When transmitting 120x120 pixels at around 16 FPS we produce around 15 * 9 (data + stop bit) * 120 (rows) * 16 (fps) = 259 kbits of data per second.

The algorihm for

char rxBuffer[RECEIVE_LINE_BYTES];
uint32_t lineBuffer[DISPLAY_LINE_LENGTH];
...
Serial.readBytes(rxBuffer, RECEIVE_LINE_BYTES);
tft->setAddrWindow(0, y, DISPLAY_LINE_LENGTH, 2);
convertPixelsBetweenBuffers();
tft->startWrite();
tft->pushPixels(lineBuffer, DISPLAY_LINE_LENGTH);
tft->pushPixels(lineBuffer, DISPLAY_LINE_LENGTH);
tft->endWrite();

The pixel conversion does the horizontal pixel doubling and prepares it to the converts input (a 120 bits field) to the display line (240 16-bit pixels, stored in a 120 32-bit array). It iterates bit by bit . We can display data on the TTGO T-Watch using the TFT_eSPI library. It includes a pushPixels function that expects a buffer of 16-bit pixel colors.

I wrote a supporting tool to feed the watch some pixels in Python. The bits are sent using the pyserial library in a loop.

Connecting the Watch and Doom

Python is not very helpful in the Chocolate Doom port, so I had to write the serial frame transmitter in C. I’ve adapted the first code snippet I found I found on Stack Overflow, credit goes to sawdust.

Still working in the 120x120 pixel resolution, this is how the intermediate result looked:

first attempt

It’s incredibly low-res. Also, because I was lazy, the last row of pixels of the status bar is leaking all the way down the screen as I just repeated rows 75 up to 120 in the output stream :-)

I eventually bumped the resolution to 240x150.

There’s nothing special about the serial output module, there’s a function that takes an SDL_Surface assuming its dimensions being 240x150, loops over the RGB values and for every row of 240 pixels (bits) spits out 30 bytes.

uint8_t bit = 7;
for(y = 0; y < SERIAL_BUFFER_HEIGHT; ++y) {
    memset(buf, 0, SERIAL_BUFFER_BYTES);     
    for(x = 0; x < SERIAL_BUFFER_WIDTH; ++x) {
        pix = getpixel(s, x, y);
        if(g == 255)
            buf[x >> 3] |= (1 << bit);

        if(bit-- == 0)
            bit = 7;
    }

Getting it faster

Straightforward increase of the baudrate to 500,000 caused some screen tearing, it seemed that the serial receiver code on the watch was having hard time keeping up. After optimizing the bit conversion loop it could handle stable 500,000 bauds, leading to high enough framerate to consider increasing the resolution to 240x240.

Wrong turn #3 - going multi-core on the ESP32

I thought I could leverage the second core on the ESP32 and have the watch run two tasks - one that processes the serial data and another that decodes it and sends to the display.

I implemented a proof of concept serial display that uses the FreeRTOS Task notification API to communicate between the task using notifications.

It used two buffers - one to receive the serial data which got copied to the display task and the notifications were supposed to let the tasks know that they can touch the shared buffer.

The source lives in the multithreaded branch - but it didn’t really work faster, which lead me to the next attempt:

Using DMA for the speed

What helped in the end was using Direct Memory Access (DMA) transfer using ` tft->pushPixelsDMA(lineBuffer, DISPLAY_LINE_LENGTH);`. It’s basically “fire and forget” data transfer - the controller will move the data from the RAM to the display and practically allow the microcontroller to execute other code as opposed to operating the SPI bus.

Now I could increase the baudrate to 921600 and practically double the framerate.

Vertical Synchronization?

If we just dump the data to the screen without some kind of synchronization or alignment, the device wouldn’t know where the boundary between the frame data lies.

It also means we would need to be lucky to start the transmission in sync with the watch displaying the first row of the frame data.

To fix this, I added a simple VSYNC message that the watch sends to the PC over the serial port when it starts drawing the first row. Upon receiving VSYNC, the PC should start abandon the current frame and start sending another frame from the beginning. I’ve added a handler for this to the python support tools, but decided not to for Doom as it was easier to just reset the line currently being drawn if no data has come for a while across the serial port.

Finishing touches

A series of color schemes livens up the 1-bit color depth - just black and white is kind of boring. This has a straightforward implementation on the T-Watch side, reacting to the touch of the touchscreen with digitalRead(...)

colors

Various color schemes, photos of the watch display

Some gifs of Doom in action

gif-ordered

Ordered dithering, black & white, Doom 1

gif-fs

Floyd-Steinberg dithering, Doom 2

We also get Heretic and Hexen!

hexen

Hexen

Appendix: Potential improvements

Frame compression

As an experiment, I’ve added zlib compression to the Doom engine to compress the frames. An uncompressed 240x150 frame fits in 4500 bytes, with the fastest zlib compression it usually shrank into 2800 bytes, which is a saving of around 37%. That means that I could trade some the CPU time on the ESP32 (currently spent on receiving data) from the serial port for the decompression, I could potentially increase the frame rate or send some other data along with video.

Sound

There’s also a possibility of transmitting sound data, in theory. Chocolate Doom supports PC speaker output, which operates on playing back tones (see source or doom wiki page). To implement this over serial, one would need to mix the tones in with frame data and implement a player thread that plays back the tones over the i2s interface to the watch speaker.

The simplest way to make this work would probably be something inspired by the Windows port behavior that either produces a beep for a specified duration or stays silent.

Running Doom on the watch directly

There’s a port of DOOM by unlimitedbacon to the watch: https://github.com/unlimitedbacon/TTGO-DOOM that actually runs on the watch.

Source and build instructions

https://github.com/jborza/chocolate-doom -> My Chocolate Doom fork with the dithering and serial output. For serial port configuration see src/i_serial.h, for video tweaks see the definitions on top of src/i_video.c.

Build instructions for Chocolate Doom on Debian/Ubuntu.

https://github.com/jborza/watch-doom-receiver -> The serial display tool for the watch. Required libraries: ESP32 support, TTGO T-Watch library

After Doom is built, the watch software is up and running, the PC and the watch is connected with a USB cable, run

chocolate-doom -iwad doom2.wad -width 960 -height 600, keep looking at the watch and play!