Starting from Scratch… (It's a nice day for writing a Kernel)

As most of us wished when we were kids, I also wanted to have my own OS. In those days, when I was nearly 12, it was not easy to find somebody that teaches me how to write an operating system, also, it was very difficult to find access to Internet or even a BBS. Now things a little bit easier, and may be I can my dream come true.
The first thing I will point out is… “I will not create a boot loader”. And this is a crucial decision. GRUB is good enough in order to let things prepared so my kernel can run properly.
Creating a kernel, either a monolithic or a microkernel, involves dealing in some point with assembly code. But as modern computers are plenty of resources like RAM and disk, it’s better to reduce the amount of assembly code. That’s why the only thing I will write in assembly right now, it’s my start up code. Why? Because there are some things that needs to be set like, “excuse me, this code is 32-bit”, and “please CPU, stop running”. That is why I will create an assembly file that defines a global function that calls the kernel’s main.
File: xkernel_boot.asm
%include "" ; GRUB comes to rescue!

[BITS 32]
[global start]
[extern _k_main] ;the kernel's main is a function defined in a cpp file.

call _k_main ;starts the kernel
cli ; disables CPU's interrupts
hlt ; stops the CPU

;; this is for building a multiboot header
;; a multiboot header must exist for GRUB
;; to load the kernel


When the boot loader executes this piece of code, it will call the k_main defined in kernel.cpp. Writing a kernel in C++ is a bit harder than in C. Because the fact that it’s done in C++ doesn’t mean that you can use all the languages improvement in a kernel. For example, there is no “new” or “delete”, not even try/catch or dynamic casting. If you want this operations in the kernel, you have to implement the standard C++ library. (TIP: this features can be switched off when compiling and linking)
So what’s the advantage of making a Kernel in C++? Making it object oriented!
File: xkernel.cpp
/* XKernel :: by Ezequiel L. Aceto */
// main does not take any argument in the kernel

int k_main()
std::cout << "XKernel" << std::endl;
//TODO: implement kernel...
return 0;

Let’s compile!

g++ -c *.cpp -ffreestanding -nostdlib -fno-builtin -fno-rtti -fno-exceptions

-ffreestanding        A freestanding environment is one that may not have stdlib, and may not have a main
-nostdlib               No stdlib
-fno-builtin           Don’t recognize built-in functions that do not begin with `__builtin_’ as prefix.
-fno-rtti                Disable generation of information about every class with virtual functions for use by the C++ runtime type identification features (`dynamic_cast’ and `typeid’).
-fno-exceptions     Compile with no exceptions support
From this point on, it is always useful to create some functions that handles the video card, in order to write chars, clean the screen. Let’s name the video driver XKernel_SVC (Simple Video Controller) cpp and h.
Then we compile all the files using the same command we used for compiling the xkernel.cpp. As you may noticed, we have an assembly file which should also be compiled. For this purpose you will need ‘nasm’ (Netwide Assembler’) which is a general purpose x86 assembler.

nasm -f aout xkernel_boot.asm -o xkernel_boot.o

Then, the Linking process starts. But a Linker script will be used in order to simplify the process.

.text 0x100000:
code =.;_code =.;__code=.;

data =.;_data =.;__data=.;

bss =.;_bss =.;__bss=.;


Then we link:

ld -T linker.ld -o xkernel.bin xkernel_boot.o xkernel.o xkernel_svc.o

This will cause a few errors, like “undefined reference to ‘std::ios_base::Init::Init()'” or “undefined reference to ‘__dso_handle’. This errors are caused by unimplemented c++ features. In order to solve this, let’s make a few changes.
File: icxxabi.h

#ifndef _ICXXABI_H
#define _ICXXABI_H

#define ATEXIT_MAX_FUNCS 128
#ifdef __cplusplus
extern "C" {

typedef unsigned uarch_t;
struct atexit_func_entry_t
* Each member is at least 4 bytes large. Such that each entry is 12bytes.
* 128 * 12 = 1.5KB exact.
void (*destructor_func)(void *);
void *obj_ptr;
void *dso_handle;
int __cxa_atexit(void (*f)(void *), void *objptr, void *dso);
void __cxa_finalize(void *f);
#ifdef __cplusplus


File: icxxabi.cpp

#include "./icxxabi.h"

#ifdef __cplusplus
extern "C" {

atexit_func_entry_t __atexit_funcs[ATEXIT_MAX_FUNCS];
uarch_t __atexit_func_count = 0;

void *__dso_handle = 0; //Attention! Optimally, you should remove the '= 0' part and define this in your asm script.
int __cxa_atexit(void (*f)(void *), void *objptr, void *dso)
if (__atexit_func_count >= ATEXIT_MAX_FUNCS) {return -1;};
__atexit_funcs[__atexit_func_count].destructor_func = f;
__atexit_funcs[__atexit_func_count].obj_ptr = objptr;
__atexit_funcs[__atexit_func_count].dso_handle = dso;
return 0; /*I would prefer if functions returned 1 on success, but the ABI says...*/
void __cxa_finalize(void *f)
uarch_t i = __atexit_func_count;
if (!f)
* According to the Itanium C++ ABI, if __cxa_finalize is called without a
* function ptr, then it means that we should destroy EVERYTHING MUAHAHAHA!!
* Note well, however, that deleting a function from here that contains a __dso_handle
* means that one link to a shared object file has been terminated. In other words,
* We should monitor this list (optional, of course), since it tells us how many links to
* an object file exist at runtime in a particular application. This can be used to tell
* when a shared object is no longer in use. It is one of many methods, however.
//You may insert a prinf() here to tell you whether or not the function gets called. Testing
while (--i)
if (__atexit_funcs[i].destructor_func)
/* ^^^ That if statement is a safeguard...
* To make sure we don't call any entries that have already been called and unset at runtime.
* Those will contain a value of 0, and calling a function with value 0
* will cause undefined behaviour. Remember that linear address 0,
* in a non-virtual address space (physical) contains the IVT and BDA.
* In a virtual environment, the kernel will receive a page fault, and then probably
* map in some trash, or a blank page, or something stupid like that.
* This will result in the processor executing trash, and...we don't want that.
for ( ; i >= 0; )
* The ABI states that multiple calls to the __cxa_finalize(destructor_func_ptr) function
* should not destroy objects multiple times. Only one call is needed to eliminate multiple
* entries with the same address.
* This presents the obvious problem: all destructors must be stored in the order they
* were placed in the list. I.e: the last initialized object's destructor must be first
* in the list of destructors to be called. But removing a destructor from the list at runtime
* creates holes in the table with unfilled entries.
* Remember that the insertion algorithm in __cxa_atexit simply inserts the next destructor
* at the end of the table. So, we have holes with our current algorithm
* This function should be modified to move all the destructors above the one currently
* being called and removed one place down in the list, so as to cover up the hole.
* Otherwise, whenever a destructor is called and removed, an entire space in the table is wasted.
if (__atexit_funcs[i].destructor_func == f)
* Note that in the next line, not every destructor function is a class destructor.
* It is perfectly legal to register a non class destructor function as a simple cleanup
* function to be called on program termination, in which case, it would not NEED an
* object This pointer. A smart programmer may even take advantage of this and register
* a C function in the table with the address of some structure containing data about
* what to clean up on exit.
* In the case of a function that takes no arguments, it will simply be ignore within the
* function itself. No worries.
__atexit_funcs[i].destructor_func = 0;
* Notice that we didn't decrement __atexit_func_count: this is because this algorithm
* requires patching to deal with the FIXME outlined above.

#ifdef __cplusplus

Icxxabi will solve the errors related with __dso_handle and __cxa_atexit, but not with Init() and static initialization and destruction. For solving this error, it is necessary to implement local static variables, and as an extra feature, the ‘new’ and ‘delete’ operator.

Makefile for the Simple C Kernel

Just to do things in the right way…

CFLAGS := -fno-stack-protector -fno-builtin -nostdinc -O -g -Wall -I.
CC := g++
AC := as
LD := ld

all: kernel.bin
kernel.bin: kernel_loader.o kernel.o kernel_video.o
$(LD) -T kernel_linker.ld -o kernel.bin kernel_loader.o kernel.o kernel_video.o
@echo Done!
kernel_loader.o: kernel_loader.s
$(AC) -o kernel_loader.o kernel_loader.s
main.o: kernel.c
$(CC) $(CFLAGS) -c -o kernel.o kernel.c
kernel_video.o: kernel_video.c
$(CC) $(CFLAGS) -c -o kernel_video.o kernel_video.c
rm -f *.o *.bin


Simple Kernel in C

Sometimes it’s better to take one step back in order to take two step forwards. In the long journey until a get a simple Kernel in C++, I decided to start with a simple Kernel in C in order to understand how things works. Getting a C++ Kernel, even the simplest one, can be difficult as you will have no STL in the kernel, no new or delete operators, no virtual methods, no exceptions!
All the kernels (at least the one I know) have a few files in common:

  • a boot loader written in assembly
  • a kernel (may be written in C,C++,Basic, Pascal)
  • a linker file

The boot loader it is the entry point for the kernel. It prepares things in order to call a function defined in a C file. It glue things between your kernel and the boot loader.

.global loader # making entry point visible to linker

# setting up the Multiboot header - see GRUB docs for details
.set ALIGN, 1<<0 # align loaded modules on page boundaries
.set MEMINFO, 1<<1 # provide memory map
.set FLAGS, ALIGN | MEMINFO # this is the Multiboot 'flag' field
.set MAGIC, 0x1BADB002 # 'magic number' lets bootloader find the header
.set CHECKSUM, -(MAGIC + FLAGS) # checksum required

.align 4
.long MAGIC
.long FLAGS

# reserve initial kernel stack space
.set STACKSIZE, 0x4000 # that is, 16k.
.comm stack, STACKSIZE, 32 # reserve 16k stack on a quadword boundary

mov $(stack + STACKSIZE), %esp # set up the stack
push %eax # Multiboot magic number
push %ebx # Multiboot data structure

mov $start_ctors, %ebx # call the constructors
jmp 2f
call *(%ebx)
add $4, %ebx
cmp $end_ctors, %ebx
jb 1b

call kmain # call kernel proper
mov $end_dtors, %ebx # call the destructors
jmp 4f
sub $4, %ebx
call *(%ebx)
cmp $start_dtors, %ebx
jb 3b
hlt # halt cpu
jmp hang

The Kernel will be coded in kernel.c. Where you can find that there is a Magic Number check, and if things goes ok, it will print some messages into the screen writing directly to the Video Memory.


#include "kernel_video.h"
extern "C" void kmain( void* mbd, unsigned int magic )
if ( magic != KERNEL_MAGIC_NUMBER )
kvideo_write_str((char*)"Kernel Panic. Invalid Magic Number");


// Locked and Loaded
kvideo_write_str((char*)"loading Kernel");
kvideo_write_str((char*)"Work In Progress.");

The video functions are coded in the kernel_video files. And basically let us write strings and handle some non printable chars.

#ifndef _KVIDEO_H_
#define _KVIDEO_H_

#define _KVIDEO_RAM_SIZE 2000

void kvideo_clrscr();
void kvideo_handle_non_printable_char(char c);
void kvideo_write_char(char c);
void kvideo_write_str(char* str);


#include "kernel_video.h"

unsigned short* _kvideo_ram = (unsigned short*)0xB8000; // where the video ram starts
unsigned int _kvideo_block_offset = 0;
unsigned int _kvideo_block = 0;

void kvideo_clrscr()
unsigned int i = 0;
for (;i < _KVIDEO_RAM_SIZE;i++)
_kvideo_ram[i] = (unsigned char)' ' | 0x0700;
_kvideo_block_offset = 0;
_kvideo_block = 0;

void kvideo_handle_non_printable_char(char c)
if (c == 'n')
_kvideo_block_offset = 0;
else if (c == 'r')
_kvideo_block_offset = 0;

void kvideo_write_char(char c)
if (c < 30)
if (_kvideo_block_offset >= _KVIDEO_HORIZONTAL_CHARACTERS)
_kvideo_block_offset = 0;

if (_kvideo_block >= _KVIDEO_RAM_SIZE)
_kvideo_ram[_kvideo_block + _kvideo_block_offset] = (unsigned char)c | 0x0700;

void kvideo_write_str(char* str)
while (*str)

With all of this files, you just need to build and link using

as -o kernel_loader.o kernel_loader.s
g++ -o kernel_video.o -c kernel_video.c -nostdlib -fno-builtin -nostartfiles -nodefaultlibs -fno-exceptions -fno-rtti -fno-stack-protector
g++ -o kernel.o -c kernel.c -nostdlib -fno-builtin -nostartfiles -nodefaultlibs -fno-exceptions -fno-rtti -fno-stack-protector
ld -T kernel_linker.ld -o kernel.bin kernel_loader.o kernel_video.o kernel.o

And that will generate a binary file with the kernel, video functions and loader that can be run in your PC or in a virtual machine like QEMU. In order to execute this binary in QEMU, simple write

qemu -no-kvm -net none -kernel kernel.bin

And you will see something like this:

Python and PyGame meet WebOS =)

The PalmPDK includes a toolchain of GCC in order to build C and C++ applications, and Python is a nice one. The idea behind this is to allow webOS to run Python apps, but what is more important for my is PyGame. If Python is able to run in webOS and PyGame works without much modifications, then a lot of game can be ported to webOS, and very nice applications can be done.
I wrote this idea in the Palm WebOS C/C++ a long time ago, and after a few tries without success, Thomas Perl got it working with the help of the WebOS Internals PDK He did a great job there, and also he is making a library for binding the Plug-In API using SWIG.
My idea now that this is working, and it works nice as you can see in Thomas’ website, is to create a “Python environment” application and a “template application”. The Python environment application will just check if you have the lastest binaries of Python + PyGame inside your SD Card (The complete Python and PyGame lib is nearly 27MBytes). If not, it will download it, and install in on the SD Card.
Once this is done, then it comes the time for the “template application”. This will be a standard SDL project, that will check if you have the “Python environment”. If not it will redirect you to the App. Catalog in order to install it. But if you have it, it will launch “”, a file bundled in this project from where you can start writing your own python application.
The idea behind this is to make Python a popular language for creating games and application for WebOS. May be in a future even it’s bundled by default in WebOS and this will turn into a nice experience but useless. Until that happens, this solution seems to be very good.

webOS low level programming – basic app

For Windows and Mac OS, Palm provides a low level SDK called Plug-in Development Kit (PDK) which allows you to develop C and C++ native applications or plug-ins that can be called from a web app. In the OS X version of the PDK, a plug-in is installed in XCode (yes, you can use another IDE, but the installer gives a beautiful wizard for Xcode). So let’s create a simple application!

Inside Xcode, go to ‘File’ and click on ‘New Project’ (or Shift + Command + N). You will see a new template called SDL Application, and that is what we are going to create.

This action will create a project with a simple structure that includes some necessary files in order to run the application on your Mac. Yes, you can (fully) develop an SDL application in your Mac (or PC) and then use nearly the same code inside the Palm Pre. What is interesting about this project wizard is that includes all the header files needed, and a script file to build the project for the device.

I will add ‘-lpdl’ to ‘Other Linker Flags’which will let me do some specific calls to the OS. This has to be handle will care. Due to the fact that this functions are only available in webOS, you should place some define to avoid executing and compiling that code when building the app for the desktop simulator.

Let’s create an empty main.cpp file with it’s header file, and the start building the application step by step

Main.h includes SDL header file and also some Palm’s headers.  What is more important, it will have definitions that will help running the app in the Mac and in the device properly.

Main.cpp  has functions to initialize the screen, setup the device and handles events from the user interface.

The most important part of this tutorial comes now. How it looks like in the emulator? and how to build and deploy this application into the device?

For running this application on the emulator, comment the RUNNING_ON_DEVICE definition and press ‘Build and Run’ or ‘Build and Debug’ in Xcode. You will only see something a blank screen of 320×480 pixels as this application doesn’t do something more instersting.

Running inside the Palm Pre requires going to the terminal and editing a few script files. The wizard generates a file called ‘’ that gives us a hand with the build process but doesn’t help too much in the deployment process. This file must be editing in order to know where is the source code and how the output file is called. It will looks similar to this

As you may noticed, if you run this script, you will get a binary file called ‘simple’ app. Eventhough you can copy this file to the device and run the code, the recommended way is to generate a package and install that package using palm tools. So the next step involves packaging the application. As in the web application a description file is requiered. I created a STAGING directory and inside that directory created a file called ‘appinfo.json’ with the following content.

As this is a native application it is very important to make our binary fila as executable, using ‘echo filemode.755=simpleapp >’.

Once this files are created, we can call palm-package with STAGING as argument, and the install the package using palm-install.

This action installs the application and places an icon in the launcher. Clicking the icon runs the full screen application.