Mini-kernel tutorial

This page contains several small exercises that should help you with your first steps within MIPS or RISC-V kernel running in MSIM.

We will provide you with both instructions how to run MSIM as well as a source code of small kernel that you can play with.

Toolchain setup

We expect that you have a cross-compiler toolchain already installed so that you can try the examples yourself.

Please, refer to our instructions if you need help with installing a toolchain.

Once you have your toolchain ready, we can dive into kernel code for real :-).

First compilation

If you have never compiled an operating system kernel (or if you are new to C, GCC, or make), you may wish to start with compiling a smaller kernel first.

Please, clone the MSIM repository.

This tutorial contains examples for both MIPS and RISC-V mini-kernels (both in 32bit variants). Because the architectures are rather similar (after all, RISC-V designers admit they were inspired by MIPS) the text will only contain following markers if the behavior (or source code) differs significantly between these two architectures.

MIPS specific notes

MIPS examples are in the contrib/kernel-tutorial-mips32/ subdirectory.

RISC-V specific notes

RISC-V examples are in the contrib/kernel-tutorial-riscv32/ subdirectory.

We will start inside the first subdirectory, please, choose either MIPS or RISC-V architecture for this exercise (of course, intrepid developers can choose to inspect and experiment with both at the same time).

Before we discuss the contents of the directory, we will build the kernel. All the examples use make as the build tool so simply type make to build it.

The make command launches the make tool, which reads dependency rules from a file named Makefile and uses them to figure out how to compile C sources into a binary executable.

In this case, make should run a sequence of commands to build the loader.bin executable from the loader.S source, and the kernel.bin executable from the head.S and main.c sources.

make will produce the following output (there might be some differences in the paths but otherwise the output should look the same on your machine).

  • MIPS
  • RISC-V
make -C kernel
make[1]: Entering directory './kernel'
/usr/bin/mipsel-unknown-linux-gnu-gcc -march=r4000 -mabi=32 -mgp32 -msoft-float -mlong32 -G 0 -mno-abicalls -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -I. -D__ASM__ -c -o boot/loader.o boot/loader.S
/usr/bin/mipsel-unknown-linux-gnu-ld -G 0 -static -g -T kernel.lds -Map loader.map -o loader.raw boot/loader.o
/usr/bin/mipsel-unknown-linux-gnu-objcopy -O binary loader.raw loader.bin
/usr/bin/mipsel-unknown-linux-gnu-objdump -d loader.raw > loader.disasm
/usr/bin/mipsel-unknown-linux-gnu-gcc -O2 -march=r4000 -mabi=32 -mgp32 -msoft-float -mlong32 -G 0 -mno-abicalls -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11  -c -o src/main.o src/main.c
/usr/bin/mipsel-unknown-linux-gnu-gcc -march=r4000 -mabi=32 -mgp32 -msoft-float -mlong32 -G 0 -mno-abicalls -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -I. -D__ASM__ -c -o src/head.o src/head.S
/usr/bin/mipsel-unknown-linux-gnu-ld -G 0 -static -g -T kernel.lds -Map kernel.map -o kernel.raw src/main.o src/head.o
/usr/bin/mipsel-unknown-linux-gnu-objcopy -O binary kernel.raw kernel.bin
/usr/bin/mipsel-unknown-linux-gnu-objdump -d kernel.raw > kernel.disasm
make[1]: Leaving directory './kernel'
make -C kernel
make[1]: Entering directory './kernel'
/usr/bin/riscv32-unknown-elf-gcc -msmall-data-limit=0 -mstrict-align -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -mno-riscv-attribute -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -I. -D__ASM__ -march=rv32g -c -o boot/loader.o boot/loader.S
/usr/bin/riscv32-unknown-elf-ld -G 0 -static -g -T loader.lds -Map loader.map -o loader.raw boot/loader.o
/usr/bin/riscv32-unknown-elf-ld: warning: loader.raw has a LOAD segment with RWX permissions
/usr/bin/riscv32-unknown-elf-objcopy -O binary loader.raw loader.bin
/usr/bin/riscv32-unknown-elf-objdump -d loader.raw > loader.disasm
/usr/bin/riscv32-unknown-elf-gcc -O2 -msmall-data-limit=0 -mstrict-align -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -mno-riscv-attribute -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -march=rv32g  -c -o src/main.o src/main.c
/usr/bin/riscv32-unknown-elf-gcc -msmall-data-limit=0 -mstrict-align -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -mno-riscv-attribute -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -I. -D__ASM__ -march=rv32g -c -o src/head.o src/head.S
/usr/bin/riscv32-unknown-elf-ld -G 0 -static -g -T kernel.lds -Map kernel.map -o kernel.raw src/main.o src/head.o
/usr/bin/riscv32-unknown-elf-ld: warning: kernel.raw has a LOAD segment with RWX permissions
/usr/bin/riscv32-unknown-elf-objcopy -O binary kernel.raw kernel.bin
/usr/bin/riscv32-unknown-elf-objdump -d kernel.raw > kernel.disasm
make[1]: Leaving directory './kernel'

Extra information (using make)

The advantage of using make as opposed to a shell script is in that make will only rebuild files (along dependency chains) that have changed since the last compilation, which saves build time, especially on larger projects (you can try that by running make again now).

In this example, the rules in the top-level Makefile just tell make to run make again, but this time using the Makefile in the kernel subdirectory; more details of the compilation will come later on.

Note that there is msim.conf in our directory. It contains directives for the MSIM simulator, configuring it so as to provide a simple computer equipped with one processor, two blocks of memory, and a console-like device for textual output (we will dissect the configuration in the next exercise).

To run the compiled kernel code, run msim without any arguments. MSIM will load the binary images (loader.bin and kernel.bin) into the two memory blocks and reset the simulated CPU so that it starts executing code at factory-defined addresses. You should see the following output:

  • MIPS
  • RISC-V
Hello, World.
<msim> Alert: XHLT: Machine halt

Cycles: 41
Hello, World.
<msim> Alert: EHALT: Machine halt

Cycles: 42

The “Hello, World.” message was printed from C code compiled into machine code running on the processor of your choosing. Getting the target processor to execute your (compiled) C code is usually one of the major technical obstacles when starting OS development from scratch, which is why we have taken care of this step for now.

The last line (as well as the line prefixed with <msim>) is the output of the simulator, telling us how many virtual cycles has the CPU executed. This is the exact amount of executed instructions. We can safely ignore those lines for now.

Important

If the compilation failed for you, or if the execution printed something completely different, please, feel free to contact us: please, open an issue here and describe what have you tried, what failed and please do not forget to describe your environment.

If you are a NSWI200 student, please, prefer the standard means of communicating with your teachers instead of the GitHub issues. Thank you.

Configuring the virtual machine

We will now take a closer look at the msim.conf file, which contains the configuration of the simulated computer that runs your kernel.

Using a simulated computer instead of a real one makes it much easier to develop a small kernel (for one thing, installation does not require sacrificing your own computer, also, the simulation is completely deterministic and therefore bugs that appear once keep appearing until you fix them). However, rest assured the simulated environment is close enough to the real thing.

Reading msim.conf from top to bottom and ignoring the comment lines starting with the # character, the first configuration line tells MSIM to add one processor and name it cpu0

  • MIPS
  • RISC-V
add dr4kcpu cpu0
add drvcpu cpu0

MIPS specific notes

The MIPS R4000 processor device is named dr4kcpu.

RISC-V specific notes

The RISC-V RV32IMA processor device is named drvcpu.

The next two groups of directives add two blocks of physical memory, one for the bootloader and one for the main memory, both initialized from files on disk.

The main memory block (called mainmem) is a read-write memory with a size of 1 MiB. The memory block is initialized with the contents of the kernel/kernel.bin file before the simulated computer starts running:

  • MIPS
  • RISC-V
add rwm mainmem 0
mainmem generic 1M
mainmem load "kernel/kernel.bin"
add rwm mainmem 0x80000000
mainmem generic 1M
mainmem load "kernel/kernel.bin"

MIPS specific notes

The mainmem memory segment starts at physical address 0. The processor then maps it to a virtual address 0x80000000 (so printing a pointer address in your code will print addresses with the highest bit set).

RISC-V specific notes

The mainmem memory segment starts at physical address 0x80000000. The processor uses identity mapping when booting, hence we do not need to explicitly distinguish virtual and physical addresses (at least, for now¨).

The bootloader memory block (called loadermem) is a read-only memory initialized with the contents of the kernel/loader.bin file:

  • MIPS
  • RISC-V
add rom loadermem 0x1FC00000
loadermem generic 4K
loadermem load "kernel/loader.bin"
add rom loadermem 0xF0000000
loadermem generic 8K
loadermem load "kernel/loader.bin"

MIPS specific notes

The loadermem memory segment starts at physical address 0x1FC00000 and has a size of 4 KiB.

RISC-V specific notes

The loadermem memory segment starts at physical address 0xF0000000 and has a size of 8 KiB.

Finally, we add a simple output device (called printer), which will allow the code running in the simulator to display text on the host computer console. This is similar to serial console found on real hardware, except the printer device is much simpler:

  • MIPS
  • RISC-V
add dprinter printer 0x10000000
add dprinter printer 0x90000000

MIPS specific notes

This device resides at physical address 0x10000000.

RISC-V specific notes

This device resides at physical address 0x90000000.

This is actually enough for a simple machine and more than enough for our purposes :-).

Disassembling the kernel

With the simulator configured to provide us with a simple computer, it is now time to look at the files in the kernel directory. Again, there is a Makefile which controls the compilation, and a linker script which controls the layout of the binary image produced by the linker.

Extra information (linker scripts)

We will not dissect the linker script further, because we will not need to modify it in this tutorial.

As a matter of fact, linker scripts are rarely modified and in normal circumstances come with your linker. For our purposes, where we have a non-standard kernel and a simplified emulator, we have our own ones.

The boot subdirectory contains loader.S, an assembly source file which contains the computer bootloader code. On a real computer, the bootloader is (ultimately) responsible for loading the operating system into memory. In our case, the MSIM simulator does this for us (see the directives telling MSIM to load kernel/kernel.bin into mainmem in msim.conf), so we just need a few instructions to make the processor jump into the kernel code after reset.

The loader code needs to be present at a specific address (it is hard-wired into the CPU, see msim.conf) which the CPU starts executing instructions from after a power up/reset. Other than that, the loader code does not really do anything – it just jumps to another fixed address, where our main code will reside.

MIPS specific notes

The loader jumps to address 0x80000400.

The reason why we keep the rest of the kernel code separate from the loader is quite simple – the entry point of the loader is quite far from the entry points of the exception handlers, which are also hardwired, and which the kernel must implement. We simply want to keep the rest of the kernel code in one piece, and that means next to the exception handlers.

RISC-V specific notes

The loader jumps to address 0x80001000.

The loader.S file is compiled and linked into loader.bin. This file contains only machine instructions (no symbol information, no debugging information, no relocation information): it is code in its rawest form, a form that the CPU actually sees.

Look into loader.bin and loader.disasm. The second one is a disassembly of the binary format back to assembler.

cat loader.disasm
hexdump -C loader.bin

Since loader.bin and loader.disasm are produced from loader.S, they should contain the same instructions as in the original loader.S. Do take a look.

Self-test quiz

A question for you: why are the instructions in loader.disasm different from loader.S?

Hint

Think about the limited instruction repertoire of the CPU.

Solution MIPS

The difference in code concerns the loading of the 32-bit constant (jump target address). The CPU does not have an instruction that can load an entire 32-bit constant in one go (because the instruction itself must fit into 32 bits), hence two instructions are used. The assembly code uses a shorthand notation so that the programmer does not have to perform this trivial conversion.

Solution RISC-V

The difference in code concerns the loading of the 32-bit constant (jump target address). The CPU does not have an instruction that can load an entire 32-bit constant in one go (because the instruction itself must fit into 32 bits), hence two instructions would need to be used generally. (For example li t0, 0x0x80000001 would be transformed into lui t0, 0x80000 and addi t0, t0, 1 - try it yourself!) Our code manages with only one, because the lowest 12 bits (3 hex digits) of our target address are all 0. The lui t0, 0x80001 instruction loads the constant 0x80001 to the highest 20 bits of t0, meaning it sets it to 0x80001000, which is exactly our desired address. The assembly code uses a shorthand notation so that the programmer does not have to perform this trivial conversion.

From boot to C code

We will now look into the src directory, where the foundations of our kernel reside.

The head.S file contains a lot of assembly code, but do not be afraid ;-).

MIPS specific notes

Find the line containing start: (around line 120). Above this, we can see a special directive .org 0x400 that says that the following code will be placed at address 0x400 bytes away from the start of the code segment. The linker specifies that the code segment starts at 0x80000000, together this yields 0x80000400 - exactly the address our boot loader jumps to! Hence, after the boot loader is done, the execution will continue here.

We start by setting up few registers (such as the stack pointer) and execute jal kernel_main. This will pass control from the assembly code to the kernel_main function, which is a standard C function that you can see if you open src/main.c.

RISC-V specific notes

Find the line containing start: (around line 90). Above this, we can see a special directive .org 0x1000 that says that the following code will be placed at address 0x1000 bytes away from the start of the code segment. The linker specifies that the code segment starts at 0x80000000, together this yields 0x80001000 - exactly the address our boot loader jumps to! Hence, after the boot loader is done, the execution will continue here.

We start by setting up few registers (such as the stack pointer and the mepc CSR) and execute mret. This will pass control from the assembly code to the kernel_main function, which is a standard C function that you can see if you open src/main.c.

These few lines of assembler (loader.S and head.S) constitute the only assembly code needed to boot the processor and get into C.

Extra information (assembler and booting)

One cannot boot a CPU without at least a bit of assembler that jumps into a C code. But the assembly code is usually straightforward and only sets-up basic registers and stack.

Feel free to return to this code later, understanding it completely is not required to continue with the tutorial. As long as you understand that we need special instructions to jump to a C code, you will be fine.

kernel_main is where the fun starts

The last file we have not commented much on is src/main.c.

It contains the kernel_main() function, which is called shortly after boot. This is the function, where the kernel would initialize itself or launch the first userspace process (e.g. init on Linux).

Right now it contains only a very short greeting.

Printing from the simulator is trivial: since we told MSIM that there should be a console printer device available at an particular address. MSIM monitors this address and any write to it causes the written character to appear at the console.

MIPS specific notes

A question for you: if you look up the console printer device address in the source code, you will see it is 0x90000000, but msim.conf says 0x10000000. Why?

Hint

Think about virtual and physical addresses.

Solution

The code uses virtual addresses, but the simulator configuration uses physical addresses (exactly what a real hardware would see). In the kernel segment, virtual addresses are mapped to physical addresses simply by masking the highest bit - virtual address 0x80000000 therefore corresponds to physical address 0, and so on. The mapping is intentionally simple because the kernel must run even before more complex mapping structures, such as page tables, can be set up.

An important note: you probably noticed that we print the characters one by one instead of using printf or puts. That is because we are in our own kernel and we do not have any of these functions. As a matter of fact, we will have only functions that we implement ourselfs.

Thus, there is no printf, no malloc and definitely no fopen (unless you implement them yourself).

The first modification of the kernel

Modify the kernel so that it prints the greeting with an exclamation mark instead of a plain period. After all, we can be proud of it ;-).

Before running msim again do not forget to recompile with make.

Solution

Just replace '.' with '!' in main.c :-).

Note that make should recompile only main.c into main.o and re-link the kernel.* files. Files related to the bootloader should remain without change.

Tracing the execution

Let’s see which instructions were actually executed by MSIM. This may come in handy in later debugging tasks.

We will run msim -t. This turns on a trace mode where MSIM prints every instruction as it is executed. (Unfortunately, there is just one console, so the MSIM output is interleaved with your OS output.)

Self-test quiz

Compare the trace with your *.disasm files. What is the difference?

Solution

The answer is obvious: *.disasm contains the code in its static form while the trace represents the true execution - jumps are taken, loop bodies are executed repeatedly etc.

Stepping through the execution

To run the kernel instruction by instruction interactively, launch MSIM with msim -i. This time, MSIM will wait for further commands, as indicated by the [msim] prompt.

Simply typing continue will resume standard execution, which will run our OS and eventually terminate MSIM.

Run MSIM again but instead of typing continue, we will just hit Enter.

An empty command in MSIM is equivalent to typing step and executes a single instruction. We should see how the greeting starts to appear next to the prompt as we continue pressing Enter.

We can also do step 10 to execute ten instructions at once.

Entering the debugger

Stepping through our kernel from the very first instruction is not so useful for debugging when the code we are interested in is executed long after boot. In that case, we can also enter the interactive mode programmatically, by asking for it from inside our (kernel) code.

That is something that is super-easy when running in a simulator such as MSIM but somewhat more difficult on real hardware. That is why simulators are so useful :-).

To enter the interactive mode, we will use a special assembly language instruction, which the real CPU does not recognize but MSIM does.

We will insert the following fragment at a location (in the C code) where we want to interrupt the execution.

  • MIPS
  • RISC-V
__asm__ volatile(".word 0x29\n");
__asm__ volatile("ebreak\n");

Let us try it: insert the break after printing Hello. If we execute msim, it will print Hello and enter interactive mode. We can again step through the execution or continue.

Inspecting the registers

Let us start MSIM in interactive mode again and type set trace as the first command.

Then we will hit Enter several times. We executed several instructions and MSIM is printing what instructions are executed.

We can also inspect all registers at once. We will use the cpu0 rd command for a register dump of the cpu0` processor (that is the only processor that we added to our computer in MSIM).

This is an extremely useful command as it allows us to inspect what is the current state of the processor and what code it executes.

Self-test quiz

Which register would tell you what code is executed?

Solution

The pc register is the program counter telling the (virtual) address where the CPU decodes the next instruction.

Matching instructions back to source code

Start MSIM again in the interactive mode and step until it starts printing the greeting. Look at the register dump.

You will see something like this (note that we have dropped the 64bit extension to make the dump a bit shorter):

  • MIPS
  • RISC-V
 0 00000000   at 00000000   v0 90000000   v1 00000000   a0 00000000
a1 00000048   a2 00000000   a3 00000000   t0 00000000   t1 00000000
t2 00000000   t3 00000000   t4 00000000   t5 00000000   t6 00000000
t7 00000000   s0 00000000   s1 00000000   s2 00000000   s3 00000000
s4 00000000   s5 00000000   s6 00000000   s7 00000000   t8 00000000
t9 00000000   k0 0000FF01   k1 00000000   gp 80000000   sp 80000400
fp 00000000   ra 80000420   pc 8000043C   lo 00000000   hi 00000000
   zero:      0    ra: 80001060    sp: 80001000    gp:        0
   tp:        0    t0:      800    t1:        0    t2:        0
s0/fp:        0    s1:        0    a0:        0    a1:        0
   a2:        0    a3:        0    a4:       48    a5: 90000000
   a6:        0    a7:        0    s2:        0    s3:        0
   s4:        0    s5:        0    s6:        0    s7:        0
   s8:        0    s9:        0   s10:        0   s11:        0
   t3:        0    t4:        0    t5:        0    t6:        0
   pc: 8000106c                               Privilege mode: S

MIPS specific notes

In our dump, pc contains the 8000043C. If we open kernel.disasm and find this address there, we will see it is few lines below 80000430 <kernel_main> which indicates that it is an instruction inside kernel_main().

RISC-V specific notes

In our dump, pc contains 8000106c. If we open kernel.disasm and find this address there, we will see it is few lines below 80001060 <kernel_main> which indicates that it is an instruction inside kernel_main().

This is extremely important information because it allows us to decide in which function our OS will be when it is interrupted etc.

We can interrupt code in MSIM by hitting Ctrl-C. That is useful if our code enters an unexpected loop and we want to investigate in which function it got stuck.

Instruction and memory dumps

MSIM allows us to inspect not only registers but also memory.

Let us see the string directory. It contains almost the same code as the previous example, but uses iteration over a string (const char *) to print the greeting.

Self-test quiz

Compile the code, run MSIM interactively and step until it starts printing characters.

What is the value of the program counter?

Let’s inspect the code of the loop. We can look at kernel.disasm or inspect it directly from MSIM.

MIPS specific notes

To inspect things in MSIM, we need to work with physical addresses. Recall that pc contains a virtual address. As long as our code runs in the kernel segment, the mapping between the virtual and physical addresses is hardwired into the processor as a simple shift by 2GB. For example, virtual address 0x8000042C maps to physical address 0x42C.

It is quite important to remember that if we see an address above 0x80000000 in MSIM, it points into the kernel segment, but if we see a numerically lower address, it is either an untranslated physical address (such as those in msim.conf), an address in the user segment, which at this time most likely indicates a bug in our code.

Now, we will take the virtual address 0x80000042C, translate it to a physical address (simply by removing the leading 8), and disassemble in MSIM.

RISC-V specific notes

We can use the address 0x8000106c directly, as we are using the BARE virtual address translation mode, which keeps the addresses unchanged.

To disassemble instructions in MSIM:

  • MIPS
  • RISC-V
[msim] dumpins r4k 0x42c 10
[msim] dumpins rv 0x80001060 10

This will dump 10 instructions starting at the specified address.

MIPS specific notes

We should notice that we are (in overly simplified terms) reading the string via registers v0 and v1 and writing it to the console via a0.

Let’s look at the register content:

v0 80000460   v1 00000048   a0 90000000

v0 looks like a virtual address of our kernel, v1 looks like an ASCII value (actually, it is the capital H) and a0 is the address of our console (recall code in src/main.c).

So we can guess that v0 would contain the address of the string.

RISC-V specific notes

We should notice that we are (in overly simplified terms) reading the string via registers a4 and a5 and writing it to the console via a3.

Let’s look at the register content:

a3: 90000000    a4:       48    a5: 8000108a

a5 looks like a virtual address of our kernel, a4 looks like an ASCII value (actually, it is the uppercase H) and a3 is the address of our console (recall code in src/main.c).

So we can guess that a5 would contain the address of the string.

Let’s look at that address. Now we do not want to see it as an instruction dump but rather as plain memory dump, hence:

  • MIPS
  • RISC-V
[msim] dumpmem 0x460 4
  0x00000460    6c6c6548 57202c6f 646c726f 00000a21
[msim] dumpmem 0x8000108a 4
  0x080001088   6c6c6548 57202c6f 646c726f 00000a21

6c6c is actually ll from our Hello greeting and if we translate the rest of the numbers, it is really our greeting.

Self-test quiz

Why is the string ordered backwards?

If we run hexdump -C kernel.bin you will see these characters there as well.

Solution

While we read strings character by character, MSIM dumps memory by 4 byte words. Both MIPS and RISC-V are little endian, so the bytes on lower addresses take place in less significant bits of the word, making them appear more towards the right when written down.

Exception handling

Let’s now see how MSIM (and our kernel) behaves when things go wrong.

We will use the unaligned directory. We will compile it and let us open main.c.

It contains a simple code: we build an array of individual bytes and later typecast it to a 32-bit integer. This is something our program might do for example to inspect memory, however, it is also an operation that may be illegal on some CPUs. Including ours as we will shortly see.

(The code uses volatile variables to prevent the compiler from optimizing the code too much.)

If we run the code, MSIM will switch to the interactive mode and show a dump of registers. This is because the access to a 32-bit integer that is not aligned (the address we access is not a multiple of the size of an integer) is illegal. The CPU reacts by generating an exception. Our kernel is currently written so that it reacts to an exception by switching MSIM to the interactive mode (which is a sane default for debugging).

We can return to this example and run (once MSIM switches to the interactive mode) the following commands to find what addresses caused the problem and what is the interrupt code (type).

  • MIPS
  • RISC-V
cpu0 cp0d 0x0d
cpu0 cp0d 0x08
cpu0 cp0d 0x0e
cpu0 csrd mepc
cpu0 csrd mcause
cpu0 csrd mtval

The volatile modifier

Let us go back to our first kernel again.

You perhaps noticed that our console printer uses a special modifier volatile. If you are new to C, you may want to read for example this article about volatile first.

Self-test quiz

Let’s compile the code and open kernel.disasm again. We will see that most code of kernel_main() is a mix of constant loads (li) and stores to memory (sb). These instructions represent the call to print_char that writes the character to a special part of memory that represents the console (recall that MSIM is printing any value written here on your console).

Now let us remove the volatile modifier and recompile the code. Let us run MSIM again.

Nothing (except the newline) was printed!

We will look at the disassembly again: the code is much shorter! Why?

Hint

Imagine what the code looks like when print_char is actually inlined into kernel_main.

Solution

Without volatile, the source is actually this:

char *printer = (char*)(0x90000000);
*printer = 'H';
*printer = 'e';
...
*printer = '.';

Any decent compiler will recognize that we are overwriting the same variable without reading the values. When optimizing code, the compiler is only required to preserve an externally visible behavior, and a write that nobody reads is not externally visible - hence all writes but the last are removed by the compiler. This means only *printer = '\n' remains.

Using volatile informs the compiler that someone else (here it is the console device of the simulator, but it can also be another thread) can read or write the variable and therefore accesses to it must not be optimized away.

Surviving without sources

The directory endless contains only an image of a simple kernel, without sources.

The kernel image contains an endless loop. Run MSIM, after a while break the execution with Ctrl-C to get into the interactive mode.

Inspect the state of the machine and decide in which function the endless loop is (function names are in the kernel.disasm file).

Hint

Dump the registers.

Solution MIPS

The PC register will contain values around 0x80000460, hence it is function endless_two.

Solution RISC-V

The PC register will contain values around 0x80001090, hence it is function endless_two.

The complex one

The printers directory again contains only a binary kernel image, this time it is a bit bigger kernel and msim.conf actually contains several printers (consoles).

The task is simple: determine what console device is actually used. This changes with every boot so do not try editing msim.conf, that would be cheating ;-) …

Note that with newer version of MSIM, you need to execute with -n as the hardware is configured with time device that adds non-determinism to the simulator.

To find the right answer, inspect the code loaded into MSIM and check the contents of the registers. To make the task easier, the kernel prints dots in an infinite loop.

Solution

The printer number is the last but one digit in the Run id.

Tracing the instructions would be enough, somewhere in the registers we would see the address of the printer.

Other option is to look into the disassembly and we would see that print_char was not inlined. Hence we can watch until program reaches this point and then inspect the target address of the sb instruction.

MIPS specific notes

Watch until the program counter reaches address 0x80000430 and look into the content of the v0 register.

RISC-V specific notes

Watch until the program counter reaches address 0x80001068 and look into the content of the a5 register.