Mini-kernel tutorial ==================== This page contains several small exercises that should help you with your first steps within MIPS or RISC-V kernel running in MSIM. We will provide you with both instructions how to run MSIM as well as a source code of small kernel that you can play with. .. contents:: Here is an overview of the exercises. :local: Toolchain setup --------------- We expect that you have a cross-compiler toolchain already installed so that you can try the examples yourself. Please, refer to our :doc:`instructions ` if you need help with installing a toolchain. Once you have your toolchain ready, we can dive into kernel code for real :-). First compilation ----------------- If you have never compiled an operating system kernel (or if you are new to C, GCC, or make), you may wish to start with compiling a smaller kernel first. Please, clone the `MSIM repository `__. This tutorial contains examples for both MIPS and RISC-V mini-kernels (both in 32bit variants). Because the architectures are rather similar (after all, RISC-V designers admit they were inspired by MIPS) the text will only contain following markers if the behavior (or source code) differs significantly between these two architectures. .. archbox:: MIPS MIPS examples are in the ``contrib/kernel-tutorial-mips32/`` subdirectory. .. archbox:: RISC-V RISC-V examples are in the ``contrib/kernel-tutorial-riscv32/`` subdirectory. We will start inside the ``first`` subdirectory, please, choose either MIPS or RISC-V architecture for this exercise (of course, intrepid developers can choose to inspect and experiment with both at the same time). Before we discuss the contents of the directory, we will build the kernel. All the examples use `make `__ as the build tool so simply type ``make`` to build it. The ``make`` command launches the make tool, which reads dependency rules from a file named ``Makefile`` and uses them to figure out how to compile C sources into a binary executable. In this case, make should run a sequence of commands to build the ``loader.bin`` executable from the ``loader.S`` source, and the ``kernel.bin`` executable from the ``head.S`` and ``main.c`` sources. ``make`` will produce the following output (there might be some differences in the paths but otherwise the output should look the same on your machine). .. tabs:: arch .. code-tab:: bash :caption: MIPS make -C kernel make[1]: Entering directory './kernel' /usr/bin/mipsel-unknown-linux-gnu-gcc -march=r4000 -mabi=32 -mgp32 -msoft-float -mlong32 -G 0 -mno-abicalls -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -I. -D__ASM__ -c -o boot/loader.o boot/loader.S /usr/bin/mipsel-unknown-linux-gnu-ld -G 0 -static -g -T kernel.lds -Map loader.map -o loader.raw boot/loader.o /usr/bin/mipsel-unknown-linux-gnu-objcopy -O binary loader.raw loader.bin /usr/bin/mipsel-unknown-linux-gnu-objdump -d loader.raw > loader.disasm /usr/bin/mipsel-unknown-linux-gnu-gcc -O2 -march=r4000 -mabi=32 -mgp32 -msoft-float -mlong32 -G 0 -mno-abicalls -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -c -o src/main.o src/main.c /usr/bin/mipsel-unknown-linux-gnu-gcc -march=r4000 -mabi=32 -mgp32 -msoft-float -mlong32 -G 0 -mno-abicalls -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -I. -D__ASM__ -c -o src/head.o src/head.S /usr/bin/mipsel-unknown-linux-gnu-ld -G 0 -static -g -T kernel.lds -Map kernel.map -o kernel.raw src/main.o src/head.o /usr/bin/mipsel-unknown-linux-gnu-objcopy -O binary kernel.raw kernel.bin /usr/bin/mipsel-unknown-linux-gnu-objdump -d kernel.raw > kernel.disasm make[1]: Leaving directory './kernel' .. code-tab:: bash :caption: RISC-V make -C kernel make[1]: Entering directory './kernel' /usr/bin/riscv32-unknown-elf-gcc -msmall-data-limit=0 -mstrict-align -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -mno-riscv-attribute -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -I. -D__ASM__ -march=rv32g -c -o boot/loader.o boot/loader.S /usr/bin/riscv32-unknown-elf-ld -G 0 -static -g -T loader.lds -Map loader.map -o loader.raw boot/loader.o /usr/bin/riscv32-unknown-elf-ld: warning: loader.raw has a LOAD segment with RWX permissions /usr/bin/riscv32-unknown-elf-objcopy -O binary loader.raw loader.bin /usr/bin/riscv32-unknown-elf-objdump -d loader.raw > loader.disasm /usr/bin/riscv32-unknown-elf-gcc -O2 -msmall-data-limit=0 -mstrict-align -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -mno-riscv-attribute -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -march=rv32g -c -o src/main.o src/main.c /usr/bin/riscv32-unknown-elf-gcc -msmall-data-limit=0 -mstrict-align -fno-pic -fno-builtin -ffreestanding -nostdlib -nostdinc -mno-riscv-attribute -pipe -Wall -Wextra -Werror -Wno-unused-parameter -Wmissing-prototypes -g3 -std=c11 -I. -D__ASM__ -march=rv32g -c -o src/head.o src/head.S /usr/bin/riscv32-unknown-elf-ld -G 0 -static -g -T kernel.lds -Map kernel.map -o kernel.raw src/main.o src/head.o /usr/bin/riscv32-unknown-elf-ld: warning: kernel.raw has a LOAD segment with RWX permissions /usr/bin/riscv32-unknown-elf-objcopy -O binary kernel.raw kernel.bin /usr/bin/riscv32-unknown-elf-objdump -d kernel.raw > kernel.disasm make[1]: Leaving directory './kernel' .. extras:: using ``make`` The advantage of using make as opposed to a shell script is in that make will only rebuild files (along dependency chains) that have changed since the last compilation, which saves build time, especially on larger projects (you can try that by running ``make`` again now). In this example, the rules in the top-level ``Makefile`` just tell make to run ``make`` again, but this time using the ``Makefile`` in the ``kernel`` subdirectory; more details of the compilation will come later on. Note that there is ``msim.conf`` in our directory. It contains directives for the MSIM simulator, configuring it so as to provide a simple computer equipped with one processor, two blocks of memory, and a console-like device for textual output (we will dissect the configuration in the next exercise). To run the compiled kernel code, run ``msim`` without any arguments. MSIM will load the binary images (``loader.bin`` and ``kernel.bin``) into the two memory blocks and reset the simulated CPU so that it starts executing code at factory-defined addresses. You should see the following output: .. tabs:: arch .. code-tab:: msim :caption: MIPS Hello, World. Alert: XHLT: Machine halt Cycles: 41 .. code-tab:: msim :caption: RISC-V Hello, World. Alert: EHALT: Machine halt Cycles: 42 The “Hello, World.” message was printed from C code compiled into machine code running on the processor of your choosing. Getting the target processor to execute your (compiled) C code is usually one of the major technical obstacles when starting OS development from scratch, which is why we have taken care of this step for now. The last line (as well as the line prefixed with ````) is the output of the simulator, telling us how many virtual cycles has the CPU executed. This is the exact amount of executed instructions. We can safely ignore those lines for now. .. important:: If the compilation failed for you, or if the execution printed something completely different, please, feel free to contact us: please, `open an issue here `__ and describe what have you tried, what failed and please do not forget to describe your environment. If you are a NSWI200 student, please, prefer the standard means of communicating with your teachers instead of the GitHub issues. Thank you. Configuring the virtual machine ------------------------------- We will now take a closer look at the ``msim.conf`` file, which contains the configuration of the simulated computer that runs your kernel. Using a simulated computer instead of a real one makes it much easier to develop a small kernel (for one thing, installation does not require sacrificing your own computer, also, the simulation is completely deterministic and therefore bugs that appear once keep appearing until you fix them). However, rest assured the simulated environment is close enough to the real thing. Reading ``msim.conf`` from top to bottom and ignoring the comment lines starting with the ``#`` character, the first configuration line tells MSIM to add one processor and name it ``cpu0`` .. tabs:: arch .. code-tab:: msim :caption: MIPS add dr4kcpu cpu0 .. code-tab:: msim :caption: RISC-V add drvcpu cpu0 .. archbox:: MIPS The MIPS R4000 processor device is named ``dr4kcpu``. .. archbox:: RISC-V The RISC-V RV32IMA processor device is named ``drvcpu``. The next two groups of directives add two blocks of physical memory, one for the bootloader and one for the main memory, both initialized from files on disk. The main memory block (called ``mainmem``) is a read-write memory with a size of ``1 MiB``. The memory block is initialized with the contents of the ``kernel/kernel.bin`` file before the simulated computer starts running: .. tabs:: arch .. code-tab:: msim :caption: MIPS add rwm mainmem 0 mainmem generic 1M mainmem load "kernel/kernel.bin" .. code-tab:: msim :caption: RISC-V add rwm mainmem 0x80000000 mainmem generic 1M mainmem load "kernel/kernel.bin" .. archbox:: MIPS The ``mainmem`` memory segment starts at physical address ``0``. The processor then maps it to a virtual address ``0x80000000`` (so printing a pointer address in your code will print addresses with the highest bit set). .. archbox:: RISC-V The ``mainmem`` memory segment starts at physical address ``0x80000000``. The processor uses identity mapping when booting, hence we do not need to explicitly distinguish virtual and physical addresses (at least, for now¨). The bootloader memory block (called ``loadermem``) is a read-only memory initialized with the contents of the ``kernel/loader.bin`` file: .. tabs:: arch .. code-tab:: msim :caption: MIPS add rom loadermem 0x1FC00000 loadermem generic 4K loadermem load "kernel/loader.bin" .. code-tab:: msim :caption: RISC-V add rom loadermem 0xF0000000 loadermem generic 8K loadermem load "kernel/loader.bin" .. archbox:: MIPS The ``loadermem`` memory segment starts at physical address ``0x1FC00000`` and has a size of ``4 KiB``. .. archbox:: RISC-V The ``loadermem`` memory segment starts at physical address ``0xF0000000`` and has a size of ``8 KiB``. Finally, we add a simple output device (called ``printer``), which will allow the code running in the simulator to display text on the host computer console. This is similar to serial console found on real hardware, except the printer device is much simpler: .. tabs:: arch .. code-tab:: msim :caption: MIPS add dprinter printer 0x10000000 .. code-tab:: msim :caption: RISC-V add dprinter printer 0x90000000 .. archbox:: MIPS This device resides at physical address ``0x10000000``. .. archbox:: RISC-V This device resides at physical address ``0x90000000``. This is actually enough for a simple machine and more than enough for our purposes :-). Disassembling the kernel ------------------------ With the simulator configured to provide us with a simple computer, it is now time to look at the files in the ``kernel`` directory. Again, there is a ``Makefile`` which controls the compilation, and a linker script which controls the layout of the binary image produced by the linker. .. extras:: linker scripts We will not dissect the linker script further, because we will not need to modify it in this tutorial. As a matter of fact, linker scripts are rarely modified and in normal circumstances come with your linker. For our purposes, where we have a non-standard kernel and a simplified emulator, we have our own ones. The ``boot`` subdirectory contains ``loader.S``, an assembly source file which contains the computer bootloader code. On a real computer, the bootloader is (ultimately) responsible for loading the operating system into memory. In our case, the MSIM simulator does this for us (see the directives telling MSIM to load ``kernel/kernel.bin`` into ``mainmem`` in ``msim.conf``), so we just need a few instructions to make the processor jump into the kernel code after reset. The loader code needs to be present at a specific address (it is hard-wired into the CPU, see ``msim.conf``) which the CPU starts executing instructions from after a power up/reset. Other than that, the loader code does not really do anything – it just jumps to another fixed address, where our main code will reside. .. archbox:: MIPS The loader jumps to address ``0x80000400``. The reason why we keep the rest of the kernel code separate from the loader is quite simple – the entry point of the loader is quite far from the entry points of the exception handlers, which are also hardwired, and which the kernel must implement. We simply want to keep the rest of the kernel code in one piece, and that means next to the exception handlers. .. archbox:: RISC-V The loader jumps to address ``0x80001000``. The ``loader.S`` file is compiled and linked into ``loader.bin``. This file contains only machine instructions (no symbol information, no debugging information, no relocation information): it is code in its rawest form, a form that the CPU actually sees. Look into ``loader.bin`` and ``loader.disasm``. The second one is a disassembly of the binary format back to assembler. :: cat loader.disasm hexdump -C loader.bin Since ``loader.bin`` and ``loader.disasm`` are produced from ``loader.S``, they should contain the same instructions as in the original ``loader.S``. Do take a look. .. quiz:: A question for you: why are the instructions in ``loader.disasm`` different from ``loader.S``? .. collapse:: Hint Think about the limited instruction repertoire of the CPU. .. collapse:: Solution MIPS The difference in code concerns the loading of the 32-bit constant (jump target address). The CPU does not have an instruction that can load an entire 32-bit constant in one go (because the instruction itself must fit into 32 bits), hence two instructions are used. The assembly code uses a shorthand notation so that the programmer does not have to perform this trivial conversion. .. collapse:: Solution RISC-V The difference in code concerns the loading of the 32-bit constant (jump target address). The CPU does not have an instruction that can load an entire 32-bit constant in one go (because the instruction itself must fit into 32 bits), hence two instructions would need to be used generally. (For example ``li t0, 0x0x80000001`` would be transformed into ``lui t0, 0x80000`` and ``addi t0, t0, 1`` - try it yourself!) Our code manages with only one, because the lowest 12 bits (3 hex digits) of our target address are all 0. The ``lui t0, 0x80001`` instruction loads the constant ``0x80001`` to the highest 20 bits of ``t0``, meaning it sets it to ``0x80001000``, which is exactly our desired address. The assembly code uses a shorthand notation so that the programmer does not have to perform this trivial conversion. From boot to C code ------------------- We will now look into the ``src`` directory, where the foundations of our kernel reside. The ``head.S`` file contains a lot of assembly code, but do not be afraid ;-). .. archbox:: MIPS Find the line containing ``start:`` (around line 120). Above this, we can see a special directive ``.org 0x400`` that says that the following code will be placed at address 0x400 bytes away from the start of the code segment. The linker specifies that the code segment starts at ``0x80000000``, together this yields ``0x80000400`` - exactly the address our boot loader jumps to! Hence, after the boot loader is done, the execution will continue here. We start by setting up few registers (such as the stack pointer) and execute ``jal kernel_main``. This will pass control from the assembly code to the ``kernel_main`` function, which is a standard C function that you can see if you open ``src/main.c``. .. archbox:: RISC-V Find the line containing ``start:`` (around line 90). Above this, we can see a special directive ``.org 0x1000`` that says that the following code will be placed at address 0x1000 bytes away from the start of the code segment. The linker specifies that the code segment starts at 0x80000000, together this yields ``0x80001000`` - exactly the address our boot loader jumps to! Hence, after the boot loader is done, the execution will continue here. We start by setting up few registers (such as the stack pointer and the ``mepc`` CSR) and execute ``mret``. This will pass control from the assembly code to the ``kernel_main`` function, which is a standard C function that you can see if you open ``src/main.c``. These few lines of assembler (``loader.S`` and ``head.S``) constitute the only assembly code needed to boot the processor and get into C. .. extras:: assembler and booting One cannot boot a CPU without at least a bit of assembler that jumps into a C code. But the assembly code is usually straightforward and only sets-up basic registers and stack. Feel free to return to this code later, understanding it completely is not required to continue with the tutorial. As long as you understand that we need special instructions to jump to a C code, you will be fine. ``kernel_main`` is where the fun starts --------------------------------------- The last file we have not commented much on is ``src/main.c``. It contains the ``kernel_main()`` function, which is called shortly after boot. This is the function, where the kernel would initialize itself or launch the first userspace process (e.g. ``init`` on Linux). Right now it contains only a very short greeting. Printing from the simulator is trivial: since we told MSIM that there should be a console printer device available at an particular address. MSIM monitors this address and any write to it causes the written character to appear at the console. .. archbox:: MIPS A question for you: if you look up the console printer device address in the source code, you will see it is ``0x90000000``, but ``msim.conf`` says ``0x10000000``. Why? .. collapse:: Hint Think about virtual and physical addresses. .. collapse:: Solution The code uses virtual addresses, but the simulator configuration uses physical addresses (exactly what a real hardware would see). In the kernel segment, virtual addresses are mapped to physical addresses simply by masking the highest bit - virtual address ``0x80000000`` therefore corresponds to physical address 0, and so on. The mapping is intentionally simple because the kernel must run even before more complex mapping structures, such as page tables, can be set up. An important note: you probably noticed that we print the characters one by one instead of using ``printf`` or ``puts``. That is because we are in our own kernel and we do not have any of these functions. As a matter of fact, **we will have only functions that we implement ourselfs**. Thus, there is no ``printf``, no ``malloc`` and definitely no ``fopen`` (unless you implement them yourself). The first modification of the kernel ------------------------------------ Modify the kernel so that it prints the greeting with an exclamation mark instead of a plain period. After all, we can be proud of it ;-). Before running ``msim`` again do not forget to recompile with ``make``. .. collapse:: Solution Just replace ``'.'`` with ``'!'`` in ``main.c`` :-). Note that ``make`` should recompile only ``main.c`` into ``main.o`` and re-link the ``kernel.*`` files. Files related to the bootloader should remain without change. Tracing the execution --------------------- Let’s see which instructions were actually executed by MSIM. This may come in handy in later debugging tasks. We will run ``msim -t``. This turns on a trace mode where MSIM prints every instruction as it is executed. (Unfortunately, there is just one console, so the MSIM output is interleaved with your OS output.) .. quiz:: Compare the trace with your ``*.disasm`` files. What is the difference? .. collapse:: Solution The answer is obvious: ``*.disasm`` contains the code in its static form while the trace represents the true execution - jumps are taken, loop bodies are executed repeatedly etc. Stepping through the execution ------------------------------ To run the kernel instruction by instruction interactively, launch MSIM with ``msim -i``. This time, MSIM will wait for further commands, as indicated by the ``[msim]`` prompt. Simply typing ``continue`` will resume standard execution, which will run our OS and eventually terminate MSIM. Run MSIM again but instead of typing ``continue``, we will just hit Enter. An empty command in MSIM is equivalent to typing ``step`` and executes a single instruction. We should see how the greeting starts to appear next to the prompt as we continue pressing Enter. We can also do ``step 10`` to execute ten instructions at once. .. _entering-the-debugger: Entering the debugger --------------------- Stepping through our kernel from the very first instruction is not so useful for debugging when the code we are interested in is executed long after boot. In that case, we can also enter the interactive mode programmatically, by asking for it from inside our (kernel) code. That is something that is super-easy when running in a simulator such as MSIM but somewhat more difficult on real hardware. That is why simulators are so useful :-). To enter the interactive mode, we will use a special assembly language instruction, which the real CPU does not recognize but MSIM does. We will insert the following fragment at a location (in the C code) where we want to interrupt the execution. .. tabs:: arch .. code-tab:: c :caption: MIPS __asm__ volatile(".word 0x29\n"); .. code-tab:: c :caption: RISC-V __asm__ volatile("ebreak\n"); Let us try it: insert the break after printing ``Hello``. If we execute ``msim``, it will print ``Hello`` and enter interactive mode. We can again step through the execution or ``continue``. Inspecting the registers ------------------------ Let us start MSIM in interactive mode again and type ``set trace`` as the first command. Then we will hit Enter several times. We executed several instructions and MSIM is printing what instructions are executed. We can also inspect all registers at once. We will use the ``cpu0 rd`` command for a **r**\ egister **d**\ ump of the `cpu0`` processor (that is the only processor that we added to our computer in MSIM). This is an extremely useful command as it allows us to inspect what is the current state of the processor and what code it executes. .. quiz:: Which register would tell you what code is executed? .. collapse:: Solution The ``pc`` register is the program counter telling the (virtual) address where the CPU decodes the next instruction. Matching instructions back to source code ----------------------------------------- Start MSIM again in the interactive mode and step until it starts printing the greeting. Look at the register dump. You will see something like this (note that we have dropped the 64bit extension to make the dump a bit shorter): .. tabs:: arch .. code-tab:: msim :caption: MIPS 0 00000000 at 00000000 v0 90000000 v1 00000000 a0 00000000 a1 00000048 a2 00000000 a3 00000000 t0 00000000 t1 00000000 t2 00000000 t3 00000000 t4 00000000 t5 00000000 t6 00000000 t7 00000000 s0 00000000 s1 00000000 s2 00000000 s3 00000000 s4 00000000 s5 00000000 s6 00000000 s7 00000000 t8 00000000 t9 00000000 k0 0000FF01 k1 00000000 gp 80000000 sp 80000400 fp 00000000 ra 80000420 pc 8000043C lo 00000000 hi 00000000 .. code-tab:: msim :caption: RISC-V zero: 0 ra: 80001060 sp: 80001000 gp: 0 tp: 0 t0: 800 t1: 0 t2: 0 s0/fp: 0 s1: 0 a0: 0 a1: 0 a2: 0 a3: 0 a4: 48 a5: 90000000 a6: 0 a7: 0 s2: 0 s3: 0 s4: 0 s5: 0 s6: 0 s7: 0 s8: 0 s9: 0 s10: 0 s11: 0 t3: 0 t4: 0 t5: 0 t6: 0 pc: 8000106c Privilege mode: S .. archbox:: MIPS In our dump, ``pc`` contains the ``8000043C``. If we open ``kernel.disasm`` and find this address there, we will see it is few lines below ``80000430 `` which indicates that it is an instruction inside ``kernel_main()``. .. archbox:: RISC-V In our dump, ``pc`` contains ``8000106c``. If we open ``kernel.disasm`` and find this address there, we will see it is few lines below ``80001060 `` which indicates that it is an instruction inside ``kernel_main()``. This is extremely important information because it allows us to decide in which function our OS will be when it is interrupted etc. We can interrupt code in MSIM by hitting ``Ctrl-C``. That is useful if our code enters an unexpected loop and we want to investigate in which function it got stuck. Instruction and memory dumps ---------------------------- MSIM allows us to inspect not only registers but also memory. Let us see the ``string`` directory. It contains almost the same code as the previous example, but uses iteration over a string (``const char *``) to print the greeting. .. quiz:: Compile the code, run MSIM interactively and step until it starts printing characters. What is the value of the program counter? Let’s inspect the code of the loop. We can look at ``kernel.disasm`` or inspect it directly from MSIM. .. archbox:: MIPS To inspect things in MSIM, we need to work with physical addresses. Recall that ``pc`` contains a virtual address. As long as our code runs in the kernel segment, the mapping between the virtual and physical addresses is hardwired into the processor as a simple shift by 2GB. For example, virtual address ``0x8000042C`` maps to physical address ``0x42C``. It is quite important to remember that if we see an address above ``0x80000000`` in MSIM, it points into the kernel segment, but if we see a numerically lower address, it is either an untranslated physical address (such as those in ``msim.conf``), an address in the user segment, which at this time most likely indicates a bug in our code. Now, we will take the virtual address ``0x80000042C``, translate it to a physical address (simply by removing the leading ``8``), and disassemble in MSIM. .. archbox:: RISC-V We can use the address ``0x8000106c`` directly, as we are using the BARE virtual address translation mode, which keeps the addresses unchanged. To disassemble instructions in MSIM: .. tabs:: arch .. code-tab:: msim :caption: MIPS [msim] dumpins r4k 0x42c 10 .. code-tab:: msim :caption: RISC-V [msim] dumpins rv 0x80001060 10 This will dump 10 instructions starting at the specified address. .. archbox:: MIPS We should notice that we are (in overly simplified terms) reading the string via registers ``v0`` and ``v1`` and writing it to the console via ``a0``. Let’s look at the register content: :: v0 80000460 v1 00000048 a0 90000000 ``v0`` looks like a virtual address of our kernel, ``v1`` looks like an ASCII value (actually, it is the capital ``H``) and ``a0`` is the address of our console (recall code in ``src/main.c``). So we can guess that ``v0`` would contain the address of the string. .. archbox:: RISC-V We should notice that we are (in overly simplified terms) reading the string via registers ``a4`` and ``a5`` and writing it to the console via ``a3``. Let’s look at the register content: :: a3: 90000000 a4: 48 a5: 8000108a ``a5`` looks like a virtual address of our kernel, ``a4`` looks like an ASCII value (actually, it is the uppercase ``H``) and ``a3`` is the address of our console (recall code in ``src/main.c``). So we can guess that ``a5`` would contain the address of the string. Let’s look at that address. Now we do not want to see it as an instruction dump but rather as plain **m**\ emory **d**\ ump, hence: .. tabs:: arch .. code-tab:: msim :caption: MIPS [msim] dumpmem 0x460 4 0x00000460 6c6c6548 57202c6f 646c726f 00000a21 .. code-tab:: msim :caption: RISC-V [msim] dumpmem 0x8000108a 4 0x080001088 6c6c6548 57202c6f 646c726f 00000a21 ``6c6c`` is actually ``ll`` from our ``Hello`` greeting and if we translate the rest of the numbers, it is really our greeting. .. quiz:: Why is the string ordered backwards? If we run ``hexdump -C kernel.bin`` you will see these characters there as well. .. collapse:: Solution While we read strings character by character, MSIM dumps memory by 4 byte words. Both MIPS and RISC-V are little endian, so the bytes on lower addresses take place in less significant bits of the word, making them appear more towards the right when written down. Exception handling ------------------ Let’s now see how MSIM (and our kernel) behaves when things go wrong. We will use the ``unaligned`` directory. We will compile it and let us open ``main.c``. It contains a simple code: we build an array of individual bytes and later typecast it to a 32-bit integer. This is something our program might do for example to inspect memory, however, it is also an operation that may be illegal on some CPUs. Including ours as we will shortly see. (The code uses ``volatile`` variables to prevent the compiler from optimizing the code too much.) If we run the code, MSIM will switch to the interactive mode and show a dump of registers. This is because the access to a 32-bit integer that is not aligned (the address we access is not a multiple of the size of an integer) is illegal. The CPU reacts by generating an exception. Our kernel is currently written so that it reacts to an exception by switching MSIM to the interactive mode (which is a sane default for debugging). We can return to this example and run (once MSIM switches to the interactive mode) the following commands to find what addresses caused the problem and what is the interrupt code (type). .. tabs:: arch .. code-tab:: msim :caption: MIPS cpu0 cp0d 0x0d cpu0 cp0d 0x08 cpu0 cp0d 0x0e .. code-tab:: msim :caption: RISC-V cpu0 csrd mepc cpu0 csrd mcause cpu0 csrd mtval The ``volatile`` modifier ------------------------- Let us go back to our first kernel again. You perhaps noticed that our console printer uses a special modifier ``volatile``. If you are new to C, you may want to read for example `this article `__ about ``volatile`` first. .. quiz:: Let's compile the code and open ``kernel.disasm`` again. We will see that most code of ``kernel_main()`` is a mix of constant loads (``li``) and stores to memory (``sb``). These instructions represent the call to ``print_char`` that writes the character to a special part of memory that represents the console (recall that MSIM is printing any value written here on your console). Now let us remove the ``volatile`` modifier and recompile the code. Let us run MSIM again. Nothing (except the newline) was printed! We will look at the disassembly again: the code is much shorter! Why? .. collapse:: Hint Imagine what the code looks like when ``print_char`` is actually inlined into ``kernel_main``. .. collapse:: Solution Without ``volatile``, the source is actually this: .. code-block:: c char *printer = (char*)(0x90000000); *printer = 'H'; *printer = 'e'; ... *printer = '.'; Any decent compiler will recognize that we are overwriting the same variable without reading the values. When optimizing code, the compiler is only required to preserve an externally visible behavior, and a write that nobody reads is not externally visible - hence all writes but the last are removed by the compiler. This means only ``*printer = '\n'`` remains. Using ``volatile`` informs the compiler that someone else (here it is the console device of the simulator, but it can also be another thread) can read or write the variable and therefore accesses to it must not be optimized away. Surviving without sources ------------------------- The directory ``endless`` contains only an image of a simple kernel, without sources. The kernel image contains an endless loop. Run MSIM, after a while break the execution with ``Ctrl-C`` to get into the interactive mode. Inspect the state of the machine and decide in which function the endless loop is (function names are in the ``kernel.disasm`` file). .. collapse:: Hint Dump the registers. .. collapse:: Solution MIPS The ``PC`` register will contain values around ``0x80000460``, hence it is function ``endless_two``. .. collapse:: Solution RISC-V The ``PC`` register will contain values around ``0x80001090``, hence it is function ``endless_two``. The complex one --------------- The ``printers`` directory again contains only a binary kernel image, this time it is a bit bigger kernel and ``msim.conf`` actually contains several printers (consoles). The task is simple: determine what console device is actually used. This changes with every boot so do not try editing ``msim.conf``, that would be cheating ;-) … Note that with newer version of MSIM, you need to execute with ``-n`` as the hardware is configured with time device that adds non-determinism to the simulator. To find the right answer, inspect the code loaded into MSIM and check the contents of the registers. To make the task easier, the kernel prints dots in an infinite loop. .. collapse:: Solution The printer number is the last but one digit in the *Run id*. Tracing the instructions would be enough, somewhere in the registers we would see the address of the printer. Other option is to look into the disassembly and we would see that ``print_char`` was not inlined. Hence we can watch until program reaches this point and then inspect the target address of the ``sb`` instruction. .. archbox:: MIPS Watch until the program counter reaches address ``0x80000430`` and look into the content of the ``v0`` register. .. archbox:: RISC-V Watch until the program counter reaches address ``0x80001068`` and look into the content of the ``a5`` register.