16 Oct 2020

Adventures in hardware, part 6 - 128x64 display

Displaying things on a 128x64 display from an FPGA

One of my FPGA boards came with a connector for a parallel-interface 12864 display, so I ordered one online, soldered 20 pins, used sample code provided by the vendor and it lit up.

However, I was not really happy with using the display just for 4 lines of text, as I like pictures, and end up with something like this.

Text mode

The text mode on the 128x64 display was easy, as I already had sample Verilog code to work with.

How does it work?

We display text by

turning on the display
setting display position
writing ASCII bytes into data display ram

A naive approach

This can be done in a simple way - describing a state machine with a state for every character and iterating through the states until it’s all written out.

    ...
    set6:   begin  rs<=0; dat<=8'h98; next<=dat23; 

    dat23:   begin  rs<=1; dat<="F"; next<=dat24; end 
    dat24:   begin  rs<=1; dat<="P"; next<=dat25; end 
    dat25:   begin  rs<=1; dat<="G"; next<=dat26; end 
    dat26:   begin  rs<=1; dat<="A"; next<=nul;   end

See the full module: https://gist.github.com/jborza/b6ed20ad809ddac717295da006ad68ba

Generating a ROM

It would be much easier to just load the data we want to display from a file into some kind of memory structure.

Quartus can load a memory from a text file with the following Verilog construct:

 reg [7:0] mem [0:63];
 ...
 initial begin;
	$readmemb("rom.txt", mem);
 end

The expected format contains literal ones and zeroes to be loaded into the memory (8-bits as we load it into an “array” of 8-bit values):

Then we can define a Verilog task such as this to prepare the data output for the display and advance the x offset.

 reg [7:0] mem [0:63];
 reg [3:0] mem_index;
 reg [3:0] x;

 task write_characters_row;
	input [2:0] y;
	input [5:0] next_state;

	begin
		rs <= 1;
		dat <= mem[y*16 + x];
		x <= x + 1;
		if(x == 15) begin
			next <= next_state;
		end
	end
 endtask

And then use it in in a simplified state machine (command task sends a specific command to the display controller)

...
always @(posedge clk_display)
...
set3: begin command(SET_8BIT_BASIC_INSTR, row0); x <= 0; end
row0: begin write_characters_row(0, move_to_row1); end
move_to_row1: begin command(line1, row1); x <= 0; end 
row1: begin	write_characters_row(1, move_to_row2); end
...

Which is a bit nicer than explicit

Source code for this ROM-backed text mode module: https://github.com/jborza/altera-12864-demo/blob/main/LCD12864.v

Graphic mode

Let’s do graphic mode now!

As the control chip on my display is ST7920, so we need to take a look at its datasheet to see how do we initialize the graphic mode and send pixel data to the screen.

Initializing the display

The datasheet specifies an order of instructions, which I got wrong on the first try, as some of the instructions can only change one setting bit at a time. ( In same instruction cannot alter DL, RE and G at once. Make sure that change DL or G first and then RE. (sic)).

It also helped to look at some C code in forums online - I can still read C faster than VHDL or Verilog.

As the ST7920 supports basic and extended instruction modes, it turned out one I had to execute commands in both modes.

Verilog tasks to the rescue

Defining Verilog tasks helped to make the state machine more readable as well.

 task command;
	input [7:0] data;
	input [5:0] next_state;
	
	begin
		rs <= 0;
		dat <= data;
		next <= next_state;
	end
 endtask
 ..
  set0: begin command(SET_MODE_8BIT, set1); mem_index <= 0; y <= 0; x<=0; end 
  set1: begin command(DISPLAY_ON_CURSOR_OFF_BLINK_OFF, set2); end // display on       
  set2: begin command(SET_MODE_GRAPHIC, set3); end // extended instruction set
  set3: begin command(SET_MODE_GRAPHIC, data0); end //graphic mode on

Drawing pixels

The datasheet describes the algorithm for displaying graphics on the display as:

set the vertical address
set the horizontal address
write two bytes into the GDRAM (with two consecutive writes)

Note that the above which takes 4 cycles for every 16 pixels. However, as the data sheet also says that writing into the RAM advances the address counter horizontally, we only need to update the address at the end every row to move the counter down vertically.

That means we need to convert the 128x64 monochrome picture into bits, load these bits on into some kind of ROM in the design and iterate over the ROM.

I wrote a ROM generator in Python as well, but the first version was incorrect because I didn’t read the display addressing part of the datasheet correctly.

Figuring out the display addressing mode

I realized after a couple of hours that the address actually works as if it were a 256x32 display, sliced into two 128x32 slices, which are placed on top of each other to form a 128x64 screen with a weird addressing. That means the ROM generation code just needed to be reworked.

Generating picture data ROM

Converting the picture to the RAM with a Python code

from PIL import Image
# assuming 1bpp 64x128 png
im = Image.open('lenna.png')

def print_row(y):
    #get 8 bits at a time
    for x in range(0,128,8):
        byte = 0
        for x_offset in range(0,8):
            if(x_offset > 0):
                byte = byte << 1
            px = im.getpixel((x+x_offset,y))
            byte = byte | px
        print(f'{byte:08b}')

for y in range(0,32):
    #interlace rows Y and Y+32
    print_row(y)
    print_row(y+32)

All of it together

In the spirit of computer science camaraderie, I used the standard Lenna test image . By cropping it to 128x64 pixels, applying dithering it was good to get encoded to a ROM:

original, dithered ROM source, photo of the display

And a GIF of the rendering on the board (apologies for the noise, the display memory starts up uninitialized):

Making the display faster

As you can see from the gif, it’s really slow - takes around 6 seconds for the entire screen.

The datasheet specifies how long an operation is supposed to take:

Instruction	Exec time
Display Clear	1.6 ms
Function Set	72 µs
Display control	72 µs
Write to (GD)RAM	72 µs
Set GDRAM address	72 µs

A back of the envelope calculation says that the filling the entire display should take:

  64 (rows) * ((set vertical and horizontal address) + 128 px / 8 px/write * (write to RAM)
= 64 * (2 * 72 us + 16 * 72 us)
= 64 * 1296 us
= 82.944 ms

As seen on the gif, it actually took much longer - it made more sense once I saw a real timing from an oscilloscope.

The “en” pin is triggered with a frequency 392 Hz (cycle of 2.55 ms), which made sense after better looking at the code I reused that used an overflow of a 16-bit counter, so 50 MHz / 65536 ~= 762 Hz, which divided by two is 381 Hz - roughly similar to the measurement.

As the datasheet specified execution time of 72 us, I could go much faster and ended up with 11-bit counter, yielding 12.2 KHz or 81.9 µs per operation. With 1152 ticks per screen refresh we can redraw the screen at around 10 frames per second, which is nice.

What did I learn in Verilog

Tasks are useful for repeating code, they are analogous to void functions in programming. They can interact with “global” wires and registers and accept parameters as well.

Parameters can be used similarly to constants in programming languages instead of magic numbers. (localparam in Verilog 2001)

localparam SET_MODE_8BIT = 8'b00110000;

Verilog automatically extends signal width, so one of the bugs down the way was missing a zero. VHDL is strict with types and doesn’t let you do this.

parameter y_initial = 8'b1000000;
//was meant to be     8'b10000000, 
//got converted into  8'b01000000;

We can load the initial state of memory from a file in Quartus:

initial begin;
	$readmemb("rom.txt", mem);
end

What’s next

Of course this is not a very useful real world example - you would typically want to display changing data on the display, so it would need to be backed by memory

I’d like to modify this into a display controller component that displays could be reused in a project

The code

Hosted with ❤️ on GitHub: https://github.com/jborza/lcd-12864-graphic