I want to boot my Kernel on a real machine

Testing and running the Kernel with QEMU is nice, it is faster for developing but it is not what we want. We want to run things in real hardware.

#mtools
echo "drive c: file="`pwd`/bootable.img" partition=1" > ~/.mtoolsrc

#create an image
dd if=/dev/zero of=bootable.img count=088704 bs=512
mpartition -I c:
mpartition -c -t 88 -h 16 -s 63 c:
mformat c:
mmd c:/boot
mmd c:/boot/grub

#copy grub bootloader
mcopy grub/grub-0.94-i386-pc/boot/grub/stage1 c:/boot/grub
mcopy grub/grub-0.94-i386-pc/boot/grub/stage2 c:/boot/grub
mcopy grub/grub-0.94-i386-pc/boot/grub/fat_stage1_5 c:/boot/grub

#grub
echo "(hd0) bootable.img" > bmap
printf "geometry (hd0) 88 16 63 n root (hd0,0) n setup (hd0)n" | /usr/sbin/grub --device-map=bmap --batch

#copy menu.lst to bootable.img
mcopy menu.lst c:/boot/grub/


 
 
 
 
mcopy kernel.bin c:/boot/grub/
We can test this image using qemu like this
qemu -hda bootable.img
Then, the image can be burned into a pendrive using ‘dd’ or ‘usb-imagewriter’

  1. sudo apt-get install usb-imagewriter (in order to install the GUI Tool)
  2. sudo dd if=bootable.img of=/dev/sdX (where sdX is the pendrive location into /dev)

Monolithic or Micro Kernels

Small differences

The Kernel is that tiny but very important part of the Operating System that acts as bridge between the applications and the low level resources of the computer.When your computer boots, after the boot loader is executed, the Kernel is called. I/O operations and Memory are the most important abstractions of the Kernel, and if the Kernel can run in several processors, it is also important the CPU abstraction.
There are two types of Kernels those called “monolithic kernels” and “microkernels”. The first one embedded all the OS services inside the Kernel. May be this is the easier Kernel to implement, but if there is a bug in some part of the code, it will probably crash the Kernel. Microkernels takes a different approach. Why implementing a monolithic Kernel is not a bad idea? Because this type of kernels are faster, they may have less bugs, and the compiled version may be smaller. And this is the approach used in traditional Unix-like systems. So, it’s not a bad idea.
Microkernels implements minimal OS services, like memory management, multitasking or process communications, but leaves other services like networking outside the kernel, and those services should be implemented in the user space area. This kind of kernels are easier to maintain, and you can load and unload a module without restarting the kernel. But generally the kernel occupies a lot of memory, and bugs are harder to fix.

The Kernel as an abstraction layer between hardware and software

Starting from Scratch… (It's a nice day for writing a Kernel)

As most of us wished when we were kids, I also wanted to have my own OS. In those days, when I was nearly 12, it was not easy to find somebody that teaches me how to write an operating system, also, it was very difficult to find access to Internet or even a BBS. Now things a little bit easier, and may be I can my dream come true.
The first thing I will point out is… “I will not create a boot loader”. And this is a crucial decision. GRUB is good enough in order to let things prepared so my kernel can run properly.
Creating a kernel, either a monolithic or a microkernel, involves dealing in some point with assembly code. But as modern computers are plenty of resources like RAM and disk, it’s better to reduce the amount of assembly code. That’s why the only thing I will write in assembly right now, it’s my start up code. Why? Because there are some things that needs to be set like, “excuse me, this code is 32-bit”, and “please CPU, stop running”. That is why I will create an assembly file that defines a global function that calls the kernel’s main.
File: xkernel_boot.asm
%include "grub.inc" ; GRUB comes to rescue!

[BITS 32]
[global start]
[extern _k_main] ;the kernel's main is a function defined in a cpp file.


 
start:
call _k_main ;starts the kernel
cli ; disables CPU's interrupts
hlt ; stops the CPU

File: grub.inc
;; this is for building a multiboot header
;; a multiboot header must exist for GRUB
;; to load the kernel


MULTIBOOT_PAGE_ALIGN equ 1<<0
MULTIBOOT_MEMORY_INFO equ 1<<1
MULTIBOOT_AOUT_KLUDGE equ 1<<16
MULTIBOOT_HEADER_MAGIC equ 0x1BADB002
MULTIBOOT_HEADER_FLAGS equ MULTIBOOT_PAGE_ALIGN | MULTIBOOT_MEMORY_INFO | MULTIBOOT_AOUT_KLUDGE
MULTIBOOT_CHECKSUM equ -(MULTIBOOT_HEADER_MAGIC + MULTIBOOT_HEADER_FLAGS)

When the boot loader executes this piece of code, it will call the k_main defined in kernel.cpp. Writing a kernel in C++ is a bit harder than in C. Because the fact that it’s done in C++ doesn’t mean that you can use all the languages improvement in a kernel. For example, there is no “new” or “delete”, not even try/catch or dynamic casting. If you want this operations in the kernel, you have to implement the standard C++ library. (TIP: this features can be switched off when compiling and linking)
So what’s the advantage of making a Kernel in C++? Making it object oriented!
File: xkernel.cpp
/* XKernel :: by Ezequiel L. Aceto */
// main does not take any argument in the kernel

int k_main()
{
std::cout << "XKernel" << std::endl;
//TODO: implement kernel...
return 0;
}

Let’s compile!

g++ -c *.cpp -ffreestanding -nostdlib -fno-builtin -fno-rtti -fno-exceptions

-ffreestanding        A freestanding environment is one that may not have stdlib, and may not have a main
-nostdlib               No stdlib
-fno-builtin           Don’t recognize built-in functions that do not begin with `__builtin_’ as prefix.
-fno-rtti                Disable generation of information about every class with virtual functions for use by the C++ runtime type identification features (`dynamic_cast’ and `typeid’).
-fno-exceptions     Compile with no exceptions support
 
From this point on, it is always useful to create some functions that handles the video card, in order to write chars, clean the screen. Let’s name the video driver XKernel_SVC (Simple Video Controller) cpp and h.
Then we compile all the files using the same command we used for compiling the xkernel.cpp. As you may noticed, we have an assembly file which should also be compiled. For this purpose you will need ‘nasm’ (Netwide Assembler’) which is a general purpose x86 assembler.

nasm -f aout xkernel_boot.asm -o xkernel_boot.o

Then, the Linking process starts. But a Linker script will be used in order to simplify the process.

OUTPUT_FORMAT("binary")
ENTRY(start)
SECTIONS
{
.text 0x100000:
{
code =.;_code =.;__code=.;
*(.text)
.=ALIGN(4096);
}

.data:
{
data =.;_data =.;__data=.;
*(.data)
.=ALIGN(4096);

}
.bss:
{
bss =.;_bss =.;__bss=.;
*(.bss)
.=ALIGN(4096);

 
 
}
}

Then we link:

ld -T linker.ld -o xkernel.bin xkernel_boot.o xkernel.o xkernel_svc.o

This will cause a few errors, like “undefined reference to ‘std::ios_base::Init::Init()'” or “undefined reference to ‘__dso_handle’. This errors are caused by unimplemented c++ features. In order to solve this, let’s make a few changes.
File: icxxabi.h

#ifndef _ICXXABI_H
#define _ICXXABI_H

#define ATEXIT_MAX_FUNCS 128
#ifdef __cplusplus
extern "C" {
#endif

typedef unsigned uarch_t;
struct atexit_func_entry_t
{
/*
* Each member is at least 4 bytes large. Such that each entry is 12bytes.
* 128 * 12 = 1.5KB exact.
**/
void (*destructor_func)(void *);
void *obj_ptr;
void *dso_handle;
};
int __cxa_atexit(void (*f)(void *), void *objptr, void *dso);
void __cxa_finalize(void *f);
#ifdef __cplusplus
};
#endif

 
 
#endif

File: icxxabi.cpp

#include "./icxxabi.h"

#ifdef __cplusplus
extern "C" {
#endif

atexit_func_entry_t __atexit_funcs[ATEXIT_MAX_FUNCS];
uarch_t __atexit_func_count = 0;

void *__dso_handle = 0; //Attention! Optimally, you should remove the '= 0' part and define this in your asm script.
int __cxa_atexit(void (*f)(void *), void *objptr, void *dso)
{
if (__atexit_func_count >= ATEXIT_MAX_FUNCS) {return -1;};
__atexit_funcs[__atexit_func_count].destructor_func = f;
__atexit_funcs[__atexit_func_count].obj_ptr = objptr;
__atexit_funcs[__atexit_func_count].dso_handle = dso;
__atexit_func_count++;
return 0; /*I would prefer if functions returned 1 on success, but the ABI says...*/
};
void __cxa_finalize(void *f)
{
uarch_t i = __atexit_func_count;
if (!f)
{
/*
* According to the Itanium C++ ABI, if __cxa_finalize is called without a
* function ptr, then it means that we should destroy EVERYTHING MUAHAHAHA!!
*
* TODO:
* Note well, however, that deleting a function from here that contains a __dso_handle
* means that one link to a shared object file has been terminated. In other words,
* We should monitor this list (optional, of course), since it tells us how many links to
* an object file exist at runtime in a particular application. This can be used to tell
* when a shared object is no longer in use. It is one of many methods, however.
**/
//You may insert a prinf() here to tell you whether or not the function gets called. Testing
//is CRITICAL!
while (--i)
{
if (__atexit_funcs[i].destructor_func)
{
/* ^^^ That if statement is a safeguard...
* To make sure we don't call any entries that have already been called and unset at runtime.
* Those will contain a value of 0, and calling a function with value 0
* will cause undefined behaviour. Remember that linear address 0,
* in a non-virtual address space (physical) contains the IVT and BDA.
*
* In a virtual environment, the kernel will receive a page fault, and then probably
* map in some trash, or a blank page, or something stupid like that.
* This will result in the processor executing trash, and...we don't want that.
**/
(*__atexit_funcs[i].destructor_func)(__atexit_funcs[i].obj_ptr);
};
};
return;
};
for ( ; i >= 0; )
{
/*
* The ABI states that multiple calls to the __cxa_finalize(destructor_func_ptr) function
* should not destroy objects multiple times. Only one call is needed to eliminate multiple
* entries with the same address.
*
* FIXME:
* This presents the obvious problem: all destructors must be stored in the order they
* were placed in the list. I.e: the last initialized object's destructor must be first
* in the list of destructors to be called. But removing a destructor from the list at runtime
* creates holes in the table with unfilled entries.
* Remember that the insertion algorithm in __cxa_atexit simply inserts the next destructor
* at the end of the table. So, we have holes with our current algorithm
* This function should be modified to move all the destructors above the one currently
* being called and removed one place down in the list, so as to cover up the hole.
* Otherwise, whenever a destructor is called and removed, an entire space in the table is wasted.
**/
if (__atexit_funcs[i].destructor_func == f)
{
/*
* Note that in the next line, not every destructor function is a class destructor.
* It is perfectly legal to register a non class destructor function as a simple cleanup
* function to be called on program termination, in which case, it would not NEED an
* object This pointer. A smart programmer may even take advantage of this and register
* a C function in the table with the address of some structure containing data about
* what to clean up on exit.
* In the case of a function that takes no arguments, it will simply be ignore within the
* function itself. No worries.
**/
(*__atexit_funcs[i].destructor_func)(__atexit_funcs[i].obj_ptr);
__atexit_funcs[i].destructor_func = 0;
/*
* Notice that we didn't decrement __atexit_func_count: this is because this algorithm
* requires patching to deal with the FIXME outlined above.
**/
};
};
};

 
 
#ifdef __cplusplus
};
#endif

Icxxabi will solve the errors related with __dso_handle and __cxa_atexit, but not with Init() and static initialization and destruction. For solving this error, it is necessary to implement local static variables, and as an extra feature, the ‘new’ and ‘delete’ operator.

Makefile for the Simple C Kernel

Just to do things in the right way…

CFLAGS := -fno-stack-protector -fno-builtin -nostdinc -O -g -Wall -I.
CC := g++
AC := as
LD := ld

all: kernel.bin
kernel.bin: kernel_loader.o kernel.o kernel_video.o
$(LD) -T kernel_linker.ld -o kernel.bin kernel_loader.o kernel.o kernel_video.o
@echo Done!
kernel_loader.o: kernel_loader.s
$(AC) -o kernel_loader.o kernel_loader.s
main.o: kernel.c
$(CC) $(CFLAGS) -c -o kernel.o kernel.c
kernel_video.o: kernel_video.c
$(CC) $(CFLAGS) -c -o kernel_video.o kernel_video.c
clean:
rm -f *.o *.bin

 
 

Simple Kernel in C

Sometimes it’s better to take one step back in order to take two step forwards. In the long journey until a get a simple Kernel in C++, I decided to start with a simple Kernel in C in order to understand how things works. Getting a C++ Kernel, even the simplest one, can be difficult as you will have no STL in the kernel, no new or delete operators, no virtual methods, no exceptions!
All the kernels (at least the one I know) have a few files in common:

  • a boot loader written in assembly
  • a kernel (may be written in C,C++,Basic, Pascal)
  • a linker file

The boot loader it is the entry point for the kernel. It prepares things in order to call a function defined in a C file. It glue things between your kernel and the boot loader.
kernel_loader.s

.global loader # making entry point visible to linker

# setting up the Multiboot header - see GRUB docs for details
.set ALIGN, 1<<0 # align loaded modules on page boundaries
.set MEMINFO, 1<<1 # provide memory map
.set FLAGS, ALIGN | MEMINFO # this is the Multiboot 'flag' field
.set MAGIC, 0x1BADB002 # 'magic number' lets bootloader find the header
.set CHECKSUM, -(MAGIC + FLAGS) # checksum required

.align 4
.long MAGIC
.long FLAGS
.long CHECKSUM

# reserve initial kernel stack space
.set STACKSIZE, 0x4000 # that is, 16k.
.comm stack, STACKSIZE, 32 # reserve 16k stack on a quadword boundary


loader:
mov $(stack + STACKSIZE), %esp # set up the stack
push %eax # Multiboot magic number
push %ebx # Multiboot data structure

mov $start_ctors, %ebx # call the constructors
jmp 2f
1:
call *(%ebx)
add $4, %ebx
2:
cmp $end_ctors, %ebx
jb 1b

call kmain # call kernel proper
mov $end_dtors, %ebx # call the destructors
jmp 4f
3:
sub $4, %ebx
call *(%ebx)
4:
cmp $start_dtors, %ebx
jb 3b
cli
hang:
hlt # halt cpu
jmp hang

 
The Kernel will be coded in kernel.c. Where you can find that there is a Magic Number check, and if things goes ok, it will print some messages into the screen writing directly to the Video Memory.

#define KERNEL_MAGIC_NUMBER 0x2BADB002

#include "kernel_video.h"
extern "C" void kmain( void* mbd, unsigned int magic )
{
if ( magic != KERNEL_MAGIC_NUMBER )
{
kvideo_write_char('n');
kvideo_write_str((char*)"Kernel Panic. Invalid Magic Number");

return;
}


// Locked and Loaded
kvideo_write_char('n');
kvideo_write_str((char*)"loading Kernel");
kvideo_write_char('n');
kvideo_write_str((char*)"Work In Progress.");
}

 
The video functions are coded in the kernel_video files. And basically let us write strings and handle some non printable chars.
kernel_video.h

#ifndef _KVIDEO_H_
#define _KVIDEO_H_

#define _KVIDEO_HORIZONTAL_CHARACTERS 80
#define _KVIDEO_VERTICAL_LINES 25
#define _KVIDEO_RAM_SIZE 2000

void kvideo_clrscr();
void kvideo_handle_non_printable_char(char c);
void kvideo_write_char(char c);
void kvideo_write_str(char* str);


#endif
kernel_video.c

#include "kernel_video.h"

unsigned short* _kvideo_ram = (unsigned short*)0xB8000; // where the video ram starts
unsigned int _kvideo_block_offset = 0;
unsigned int _kvideo_block = 0;

void kvideo_clrscr()
{
unsigned int i = 0;
for (;i < _KVIDEO_RAM_SIZE;i++)
{
_kvideo_ram[i] = (unsigned char)' ' | 0x0700;
}
_kvideo_block_offset = 0;
_kvideo_block = 0;
}

void kvideo_handle_non_printable_char(char c)
{
if (c == 'n')
{
_kvideo_block += _KVIDEO_HORIZONTAL_CHARACTERS;
_kvideo_block_offset = 0;
}
else if (c == 'r')
{
_kvideo_block_offset = 0;
}
}


void kvideo_write_char(char c)
{
if (c < 30)
{
kvideo_handle_non_printable_char(c);
return;
}
if (_kvideo_block_offset >= _KVIDEO_HORIZONTAL_CHARACTERS)
{
_kvideo_block_offset = 0;
_kvideo_block += _KVIDEO_HORIZONTAL_CHARACTERS;
}

if (_kvideo_block >= _KVIDEO_RAM_SIZE)
{
kvideo_clrscr();
}
_kvideo_ram[_kvideo_block + _kvideo_block_offset] = (unsigned char)c | 0x0700;
_kvideo_block_offset++;
}

void kvideo_write_str(char* str)
{
while (*str)
{
kvideo_write_char(*str);
str++;
}
}

 
With all of this files, you just need to build and link using

as -o kernel_loader.o kernel_loader.s
g++ -o kernel_video.o -c kernel_video.c -nostdlib -fno-builtin -nostartfiles -nodefaultlibs -fno-exceptions -fno-rtti -fno-stack-protector
g++ -o kernel.o -c kernel.c -nostdlib -fno-builtin -nostartfiles -nodefaultlibs -fno-exceptions -fno-rtti -fno-stack-protector
ld -T kernel_linker.ld -o kernel.bin kernel_loader.o kernel_video.o kernel.o

And that will generate a binary file with the kernel, video functions and loader that can be run in your PC or in a virtual machine like QEMU. In order to execute this binary in QEMU, simple write

qemu -no-kvm -net none -kernel kernel.bin

And you will see something like this: