I want to boot my Kernel on a real machine

Testing and running the Kernel with QEMU is nice, it is faster for developing but it is not what we want. We want to run things in real hardware.

echo "drive c: file="`pwd`/bootable.img" partition=1" > ~/.mtoolsrc

#create an image
dd if=/dev/zero of=bootable.img count=088704 bs=512
mpartition -I c:
mpartition -c -t 88 -h 16 -s 63 c:
mformat c:
mmd c:/boot
mmd c:/boot/grub

#copy grub bootloader
mcopy grub/grub-0.94-i386-pc/boot/grub/stage1 c:/boot/grub
mcopy grub/grub-0.94-i386-pc/boot/grub/stage2 c:/boot/grub
mcopy grub/grub-0.94-i386-pc/boot/grub/fat_stage1_5 c:/boot/grub

echo "(hd0) bootable.img" > bmap
printf "geometry (hd0) 88 16 63 n root (hd0,0) n setup (hd0)n" | /usr/sbin/grub --device-map=bmap --batch

#copy menu.lst to bootable.img
mcopy menu.lst c:/boot/grub/

mcopy kernel.bin c:/boot/grub/
We can test this image using qemu like this
qemu -hda bootable.img
Then, the image can be burned into a pendrive using ‘dd’ or ‘usb-imagewriter’

  1. sudo apt-get install usb-imagewriter (in order to install the GUI Tool)
  2. sudo dd if=bootable.img of=/dev/sdX (where sdX is the pendrive location into /dev)

Starting from Scratch… (It's a nice day for writing a Kernel)

As most of us wished when we were kids, I also wanted to have my own OS. In those days, when I was nearly 12, it was not easy to find somebody that teaches me how to write an operating system, also, it was very difficult to find access to Internet or even a BBS. Now things a little bit easier, and may be I can my dream come true.
The first thing I will point out is… “I will not create a boot loader”. And this is a crucial decision. GRUB is good enough in order to let things prepared so my kernel can run properly.
Creating a kernel, either a monolithic or a microkernel, involves dealing in some point with assembly code. But as modern computers are plenty of resources like RAM and disk, it’s better to reduce the amount of assembly code. That’s why the only thing I will write in assembly right now, it’s my start up code. Why? Because there are some things that needs to be set like, “excuse me, this code is 32-bit”, and “please CPU, stop running”. That is why I will create an assembly file that defines a global function that calls the kernel’s main.
File: xkernel_boot.asm
%include "grub.inc" ; GRUB comes to rescue!

[BITS 32]
[global start]
[extern _k_main] ;the kernel's main is a function defined in a cpp file.

call _k_main ;starts the kernel
cli ; disables CPU's interrupts
hlt ; stops the CPU

File: grub.inc
;; this is for building a multiboot header
;; a multiboot header must exist for GRUB
;; to load the kernel


When the boot loader executes this piece of code, it will call the k_main defined in kernel.cpp. Writing a kernel in C++ is a bit harder than in C. Because the fact that it’s done in C++ doesn’t mean that you can use all the languages improvement in a kernel. For example, there is no “new” or “delete”, not even try/catch or dynamic casting. If you want this operations in the kernel, you have to implement the standard C++ library. (TIP: this features can be switched off when compiling and linking)
So what’s the advantage of making a Kernel in C++? Making it object oriented!
File: xkernel.cpp
/* XKernel :: by Ezequiel L. Aceto */
// main does not take any argument in the kernel

int k_main()
std::cout << "XKernel" << std::endl;
//TODO: implement kernel...
return 0;

Let’s compile!

g++ -c *.cpp -ffreestanding -nostdlib -fno-builtin -fno-rtti -fno-exceptions

-ffreestanding        A freestanding environment is one that may not have stdlib, and may not have a main
-nostdlib               No stdlib
-fno-builtin           Don’t recognize built-in functions that do not begin with `__builtin_’ as prefix.
-fno-rtti                Disable generation of information about every class with virtual functions for use by the C++ runtime type identification features (`dynamic_cast’ and `typeid’).
-fno-exceptions     Compile with no exceptions support
From this point on, it is always useful to create some functions that handles the video card, in order to write chars, clean the screen. Let’s name the video driver XKernel_SVC (Simple Video Controller) cpp and h.
Then we compile all the files using the same command we used for compiling the xkernel.cpp. As you may noticed, we have an assembly file which should also be compiled. For this purpose you will need ‘nasm’ (Netwide Assembler’) which is a general purpose x86 assembler.

nasm -f aout xkernel_boot.asm -o xkernel_boot.o

Then, the Linking process starts. But a Linker script will be used in order to simplify the process.

.text 0x100000:
code =.;_code =.;__code=.;

data =.;_data =.;__data=.;

bss =.;_bss =.;__bss=.;


Then we link:

ld -T linker.ld -o xkernel.bin xkernel_boot.o xkernel.o xkernel_svc.o

This will cause a few errors, like “undefined reference to ‘std::ios_base::Init::Init()'” or “undefined reference to ‘__dso_handle’. This errors are caused by unimplemented c++ features. In order to solve this, let’s make a few changes.
File: icxxabi.h

#ifndef _ICXXABI_H
#define _ICXXABI_H

#define ATEXIT_MAX_FUNCS 128
#ifdef __cplusplus
extern "C" {

typedef unsigned uarch_t;
struct atexit_func_entry_t
* Each member is at least 4 bytes large. Such that each entry is 12bytes.
* 128 * 12 = 1.5KB exact.
void (*destructor_func)(void *);
void *obj_ptr;
void *dso_handle;
int __cxa_atexit(void (*f)(void *), void *objptr, void *dso);
void __cxa_finalize(void *f);
#ifdef __cplusplus


File: icxxabi.cpp

#include "./icxxabi.h"

#ifdef __cplusplus
extern "C" {

atexit_func_entry_t __atexit_funcs[ATEXIT_MAX_FUNCS];
uarch_t __atexit_func_count = 0;

void *__dso_handle = 0; //Attention! Optimally, you should remove the '= 0' part and define this in your asm script.
int __cxa_atexit(void (*f)(void *), void *objptr, void *dso)
if (__atexit_func_count >= ATEXIT_MAX_FUNCS) {return -1;};
__atexit_funcs[__atexit_func_count].destructor_func = f;
__atexit_funcs[__atexit_func_count].obj_ptr = objptr;
__atexit_funcs[__atexit_func_count].dso_handle = dso;
return 0; /*I would prefer if functions returned 1 on success, but the ABI says...*/
void __cxa_finalize(void *f)
uarch_t i = __atexit_func_count;
if (!f)
* According to the Itanium C++ ABI, if __cxa_finalize is called without a
* function ptr, then it means that we should destroy EVERYTHING MUAHAHAHA!!
* Note well, however, that deleting a function from here that contains a __dso_handle
* means that one link to a shared object file has been terminated. In other words,
* We should monitor this list (optional, of course), since it tells us how many links to
* an object file exist at runtime in a particular application. This can be used to tell
* when a shared object is no longer in use. It is one of many methods, however.
//You may insert a prinf() here to tell you whether or not the function gets called. Testing
while (--i)
if (__atexit_funcs[i].destructor_func)
/* ^^^ That if statement is a safeguard...
* To make sure we don't call any entries that have already been called and unset at runtime.
* Those will contain a value of 0, and calling a function with value 0
* will cause undefined behaviour. Remember that linear address 0,
* in a non-virtual address space (physical) contains the IVT and BDA.
* In a virtual environment, the kernel will receive a page fault, and then probably
* map in some trash, or a blank page, or something stupid like that.
* This will result in the processor executing trash, and...we don't want that.
for ( ; i >= 0; )
* The ABI states that multiple calls to the __cxa_finalize(destructor_func_ptr) function
* should not destroy objects multiple times. Only one call is needed to eliminate multiple
* entries with the same address.
* This presents the obvious problem: all destructors must be stored in the order they
* were placed in the list. I.e: the last initialized object's destructor must be first
* in the list of destructors to be called. But removing a destructor from the list at runtime
* creates holes in the table with unfilled entries.
* Remember that the insertion algorithm in __cxa_atexit simply inserts the next destructor
* at the end of the table. So, we have holes with our current algorithm
* This function should be modified to move all the destructors above the one currently
* being called and removed one place down in the list, so as to cover up the hole.
* Otherwise, whenever a destructor is called and removed, an entire space in the table is wasted.
if (__atexit_funcs[i].destructor_func == f)
* Note that in the next line, not every destructor function is a class destructor.
* It is perfectly legal to register a non class destructor function as a simple cleanup
* function to be called on program termination, in which case, it would not NEED an
* object This pointer. A smart programmer may even take advantage of this and register
* a C function in the table with the address of some structure containing data about
* what to clean up on exit.
* In the case of a function that takes no arguments, it will simply be ignore within the
* function itself. No worries.
__atexit_funcs[i].destructor_func = 0;
* Notice that we didn't decrement __atexit_func_count: this is because this algorithm
* requires patching to deal with the FIXME outlined above.

#ifdef __cplusplus

Icxxabi will solve the errors related with __dso_handle and __cxa_atexit, but not with Init() and static initialization and destruction. For solving this error, it is necessary to implement local static variables, and as an extra feature, the ‘new’ and ‘delete’ operator.