Juraj's Blog

30 May 2021

Breaking and fixing the RISC-V Store-Conditional instructions

Trying to debug the Linux boot process on the emuriscv RISC-V emulator has lead me to discover another fun bug, one that appears when you don’t read the specification as thoroughly as you should.

The bug manifested itself by an endless loop that began when the code was trying to return from the very first user-mode system call.

Whenever a program started by Linux kernel invokes a system call, the execution context (registers) is saved (see _save_context ), so it can be returned back to the state after the system call is finished.

When resuming from this system call, the kernel (in version 5.10), does some cleanup in a symbol called resume_userspace , that saves the kernel state (doing the system call in kernel mode), then proceeds to restore the registers from the stack, so they have the same state as was previously saved when the syscall was called - this is a subroutine called restore_all (below).

resume_userspace:
	/* Interrupts must be disabled here so flags are checked atomically */
	lw	s0,0(tp) // REG_L s0, TASK_TI_FLAGS(tp) 
	andi s1, s0, _TIF_WORK_MASK
	bnez s1, work_pending

	/* Save unwound kernel stack pointer in thread_info */
	addi	s0,sp,144 //addi s0, sp, PT_SIZE_ON_STACK
	sw	s0,8(tp)	//REG_S s0, TASK_TI_KERNEL_SP(tp)

	/* Save TP into the scratch register , so we can find the kernel data */
	csrw CSR_SCRATCH, tp

restore_all:
	lw	a0,128(sp)	//REG_L a0, PT_STATUS(sp)
	/* ... here we clear any existing reservation  */
	lw	a2,0(sp)
	sc.w	zero,a2,(sp)

	csrw sstatus, a0
	csrw sepc, a2

	lw	ra,4(sp)	//REG_L x1,  PT_RA(sp)
	lw	gp,12(sp)	//REG_L x3,  PT_GP(sp) 
	lw	tp,16(sp)	//REG_L x4,  PT_TP(sp) 
	lw	t0,20(sp)	//REG_L x5,  PT_T0(sp) 
	...
	lw	t5,120(sp)  //REG_L x30, PT_T5(sp)
	lw	t6,124(sp)  //REG_L x31, PT_T6(sp)

	lw	sp,8(sp)    //REG_L x2,  PT_SP(sp)

	sret

What happened (incorrectly)

As we encountered the sc instruction, the condition was not applied, the a2 register was written to against the spec.

sc.w	zero,a2,(sp) //rd, rs2, rs1
csrw sstatus, a0
csrw sepc, a2
...
sret

The subsequent csrw sepc, a2 then stored this invalid value in sepc. As we also know from the ecalls/syscalls post , kernel returns from the system call via the sret instruction to the address stored in the sepc register.

What should have happened

If the code was to execute correctly, sepc would have pointed to the address just after the original ecall instruction as set here in handle_syscall .

A comment in the kernel says something relevant:

The current load reservation is effectively part of the processor’s state, in the sense that load reservations cannot be shared between different hart contexts. We can’t actually save and restore a load reservation, so instead here we clear any existing reservation – it’s always legal for implementations to clear load reservations at any point (as long as the forward progress guarantee is kept, but we’ll ignore that here).

Load-reserved / Store-conditional

The load-reserved/store-conditional (LR/SC) instructions are used in multithreading to achieve synchronization. They are used to implement an atomic read-modify-write operation.

LR loads a word from an address in the rs1, and registers a reservation on that memory address. SC writes a word in rs2 to the address in rs1, if a valid reservation still exists on that address, and stores a success code to rd.

What does the spec say?

The specification has a chapter 7.2 Load-Reserved/Store-Conditional Instructions, that deal with complex atomic operations on a single memory word.

Note that the spec also says:

We restricted the length of LR/SC sequences to fit within 64 contiguous instruction bytes in the base ISA to avoid undue restrictions on instruction cache and TLB size and associativity.

Some don’t seem to implement this constraint, for example in QEMU (trans_rva.c.inc ) doesn’t clear load_res, but Spike (riscv-isa-sim) does so at some intervals in sim.cc .

Fixing the implementation

My initial version didn’t include any of the conditional logic, and I was left with a pretty nondescript //TODO LR/SC reservation comment.

It was fixed by introducing a load_reservation state variable that is set by lr and checked (and cleared) by sc in this emuriscv commit .

There’s also a LRSC test case in the riscv-tests repository that goes through the individual test cases that’s also worth checking out.