Chapter 8. System Calls
8.1. System Calls
So far, the only thing we've done was to use well defined kernel mechanisms to register /proc files and
device handlers. This is fine if you want to do something the kernel programmers thought you'd want, such as
write a device driver. But what if you want to do something unusual, to change the behavior of the system in
some way? Then, you're mostly on your own.
This is where kernel programming gets dangerous. While writing the example below, I killed the open()
system call. This meant I couldn't open any files, I couldn't run any programs, and I couldn't shutdown the
computer. I had to pull the power switch. Luckily, no files died. To ensure you won't lose any files either,
please run sync right before you do the insmod and the rmmod.
Forget about /proc files, forget about device files. They're just minor details. The real process to kernel
communication mechanism, the one used by all processes, is system calls. When a process requests a service
from the kernel (such as opening a file, forking to a new process, or requesting more memory), this is the
mechanism used. If you want to change the behaviour of the kernel in interesting ways, this is the place to do
it. By the way, if you want to see which system calls a program uses, run strace <arguments>.
In general, a process is not supposed to be able to access the kernel. It can't access kernel memory and it can't
call kernel functions. The hardware of the CPU enforces this (that's the reason why it's called `protected
mode').
System calls are an exception to this general rule. What happens is that the process fills the registers with the
appropriate values and then calls a special instruction which jumps to a previously defined location in the
kernel (of course, that location is readable by user processes, it is not writable by them). Under Intel CPUs,
this is done by means of interrupt 0x80. The hardware knows that once you jump to this location, you are no
longer running in restricted user mode, but as the operating system kernel −−− and therefore you're allowed to
do whatever you want.
The location in the kernel a process can jump to is called system_call. The procedure at that location checks
the system call number, which tells the kernel what service the process requested. Then, it looks at the table of
system calls (sys_call_table) to see the address of the kernel function to call. Then it calls the function,
and after it returns, does a few system checks and then return back to the process (or to a different process, if
the process time ran out). If you want to read this code, it's at the source file
arch/$<$architecture$>$/kernel/entry.S, after the line ENTRY(system_call).
So, if we want to change the way a certain system call works, what we need to do is to write our own function
to implement it (usually by adding a bit of our own code, and then calling the original function) and then
change the pointer at sys_call_table to point to our function. Because we might be removed later and
we don't want to leave the system in an unstable state, it's important for cleanup_module to restore the
table to its original state.
The source code here is an example of such a kernel module. We want to `spy' on a certain user, and to
printk() a message whenever that user opens a file. Towards this end, we replace the system call to open a
file with our own function, called our_sys_open. This function checks the uid (user's id) of the current
process, and if it's equal to the uid we spy on, it calls printk() to display the name of the file to be opened.
Then, either way, it calls the original open() function with the same parameters, to actually open the file.
Chapter 8. System Calls 47