Removing All Syscall Invocations from Kernel Space

There's an effort under way to reduce and ultimately remove all system call invocations from within kernel space. Dominik Brodowski was leading this effort, and he posted some patches to remove a lot of instances from the kernel. Among other things, he said, these patches would make it easier to clean up and optimize the syscall entry points, and also easier to clean up the parts of the kernel that still needed to pretend to be in userspace, just so they could keep using syscalls.

The rationale behind these patches, as expressed by Andy Lutomirski, ultimately was to prevent user code from ever gaining access to kernel memory. Sharing syscalls between kernel space and user space made that impossible at the moment. Andy hoped the patches would go into the kernel quickly, without needing to wait for further cleanup.

Linus Torvalds had absolutely no criticism of these patches, and he indicated that this was a well desired change. He offered to do a little extra housekeeping himself with the kernel release schedule to make Dominik's tasks easier. Linus also agreed with Andy that any cleanup effort could wait—he didn't mind accepting ugly patches to update the syscall calling conventions first, and then accept the cleanup patches later.

Ingo Molnar predicted that with Dominik's changes, the size of the compiled kernel would decrease—always a good thing. But Dominik said no, and in fact he ran some quick numbers for Ingo and found that with his patches, the compiled kernel was actually a few bytes larger. Ingo was surprised but not mortified, saying the slight size increase would not be a showstopper.

This project is similar—although maybe smaller in scope—to the effort to get rid of the big kernel lock (BKL). In the case of the BKL, no one could figure out for years even how to begin to replace it, until finally folks decided to convert all BKL instances into identical local implementations that could be replaced piecemeal with more specialized and less heavyweight locks. After that, it was just a question of slogging through each one until finally even the most finicky instances were replaced with more specialized locking code.

Dominik seems to be using a similar technique now, in which areas of the kernel that still need syscalls can masquerade as user space, while areas of the kernel that are easier to fix get cleaned up first.

Note: if you're mentioned above and want to post a response above the comment section, send a message with your response text to