How and Why for BPF
BPF is a powerful component in the Linux kernel and the tools that make use of it are vastly varied and numerous. In this article we examine the general usefulness of BPF and guide you on a path towards taking advantage of BPF’s utility and power. One aspect of BPF, like many technologies, is that at first blush it can appear overwhelming. We seek to remove that feeling and to get you started.
What is BPF?
BPF is the name, and no longer an acronym, but it was originally Berkeley Packet Filter and then eBPF for Extended BPF, and now just BPF. BPF is a kernel and user-space observability scheme for Linux.
A description is that BPF is a verified-to-be-safe, fast to switch-to, mechanism, for running code in Linux kernel space to react to events such as function calls, function returns, and trace points in kernel or user space.
To use BPF one runs a program that is translated to instructions that will be run in kernel space. Those instructions may be interpreted or translated to native instructions. For most users it doesn’t matter the exact nature.
While in the kernel, the BPF code can perform actions for events, like, create stack traces, count the events or collect counts into buckets for histograms.
Through this BPF programs provide both fast and immensely powerful and flexible means for deep observability of what is going on in the Linux kernel or in user space. Observability into user space from kernel space is possible, of course, because the kernel can control and observe code executing in user mode.
Running BPF programs amounts to having a user program make BPF system calls which are checked for appropriate privileges and verified to execute within limits. For example, in the Linux kernel version 5.4.44, the BPF system call checks for privilege with:
if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN)) return -EPERM;
The BPF system call checks for a sysctl controlled value and for a capability. The sysctl variable can be set to one with the command
but to set it to zero you must reboot and make sure to not have your system configured to set it to one at boot time.
Because BPF is doing the work in kernel space significant time and overhead is saved avoiding context switches and by not necessitating transferring large amounts of data back to user space.
Not all kernel functions can be traced. For example if you were to try
funccount-bpfcc '*_copy_to_user' you may get output like:
cannot attach kprobe, Invalid argument Failed to attach BPF program b'trace_count_3' to kprobe b'_copy_to_user'
This is kind of mysterious. If you check the output from dmesg you would see something like: