The Linux kernel has always been an ideal place to implement monitoring/observability, networking, and security. Unfortunately this was often impractical as it required changing kernel source code or loading kernel modules, and resulted in layers of abstractions stacked on top of each other. eBPF is a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. By making the Linux kernel programmable, infrastructure software can leverage existing layers, making them more intelligent and feature-rich without continuing to add additional layers of complexity to the system.
eBPF has resulted in the development of a completely new generation of tooling in areas such as networking, security, application profiling/tracing and performance troubleshooting that no longer rely on existing kernel functionality but instead actively reprogram runtime behavior without compromising execution efficiency or safety.
eBPF.io is a place for everybody to learn and collaborate on the topic of eBPF. eBPF is an open community and everybody can particpate and share. Whether you want to read a first introduction to eBPF, find further reading material or make your first steps to becoming contributors to major eBPF projects, eBPF.io will help you along the way.
The following chapters are a quick introduction into eBPF. If you would like to learn more about eBPF, see the eBPF & XDP Reference Guide. Whether you are a developer looking to build an eBPF program, or interested in leveraging a solution that uses eBPF, it is useful to understand the basic concepts and architecture.
eBPF programs are event-driven and are run when the kernel or an application passes a certain hook point. Pre-defined hooks include system calls, function entry/exit, kernel tracepoints, network events, and several others.
If a predefined hook does not exist for a particular need, it is possible to create a kernel probe (kprobe) or user probe (uprobe) to attach eBPF programs almost anywhere in kernel or user applications.
In a lot of scenarios, eBPF is not used directly but indirectly via projects like Cilium, bcc, or bpftrace which provide an abstraction on top of eBPF and do not require to write programs directly but instead offer the ability to specify intent-based definitions which are then implemented with eBPF.
If no higher-level abstraction exists, programs need to be written directly. The Linux kernel expects eBPF programs to be loaded in the form of bytecode. While it is of course possible to write bytecode directly, the more common development practice is to leverage a compiler suite like LLVM to compile pseudo-C code into eBPF bytecode.
When the desired hook has been identified, the eBPF program can be loaded into the Linux kernel using the bpf system call. This is typically done using one of the available eBPF libraries. The next section provides an introduction into the available development toolchains.
As the program is loaded into the Linux kernel, it passes through two steps before being attached to the requested hook:
The verification step ensures that the eBPF program is safe to run. It validates that the program meets several conditions, for example:
The Just-in-Time (JIT) compilation step translates the generic bytecode of the program into the machine specific instruction set to optimize execution speed of the program. This makes eBPF programs run as efficiently as natively compiled kernel code or as code loaded as a kernel module.
A vital aspect of eBPF programs is the ability to share collected information and to store state. For this purpose, eBPF programs can leverage the concept of eBPF maps to store and retrieve data in a wide set of data structures. eBPF maps can be accessed from eBPF programs as well as from applications in user space via a system call.
The following is an incomplete list of supported map types to given an understanding of the diversity in data structures. For various map types, both a shared and a per-CPU variation is available.
eBPF programs cannot call into arbitrary kernel functions. Allowing this would bind eBPF programs to particular kernel versions and would complicate compatibility of programs. Instead, eBPF programs can make function calls into helper functions, a well-known and stable API offered by the kernel.
The set of available helper calls is constantly evolving. Examples of available helper calls:
eBPF programs are composable with the concept of tail and function calls. Function calls allow defining and calling functions within an eBPF program. Tail calls can call and execute another eBPF program and replace the execution context, similar to how the execve() system call operates for regular processes.
With great power there must also come great responsibility.
eBPF is an incredibly powerful technology and now runs at the heart of many critical software infrastructure components. During the development of eBPF, the safety of eBPF was the most crucial aspect when eBPF was considered for inclusion into the Linux kernel. eBPF safety is ensured through several layers:
Unless unprivileged eBPF is enabled, all processes that intend to load eBPF programs into the Linux kernel must be running in privileged mode (root) or require the capability CAP_BPF. This means that untrusted programs cannot load eBPF programs.
If unprivileged eBPF is enabled, unprivileged processes can load certain eBPF programs subject to a reduced functionality set and with limited access to the kernel.
If a process is allowed to load an eBPF program, all programs still pass through the eBPF verifier. The eBPF verifier ensures the safety of the program itself. This means, for example:
Upon successful completion of the verification, the eBPF program runs through a hardening process according to whether the program is loaded from a privileged or unprivileged process. This step includes:
eBPF programs cannot access arbitrary kernel memory directly. Access to data and data structures that lie outside of the context of the program must be accessed via eBPF helpers. This guarantees consistent data access and makes any such access subject to the privileges of the eBPF program, e.g. an eBPF program running is allowed to modify the data of certain data structures if the modification can be guaranteed to be safe. An eBPF program cannot randomly modify data structures in the kernel.
Let’s start with an analogy. Do you remember GeoCities? 20 years ago, web pages used to be almost exclusively written in static markup language (HTML). A web page was basically a document with an application (browser) able to display it. Looking at web pages today, web pages have become full-blown applications and web-based technology has replaced a vast majority of applications written in languages requiring compilation. What enabled this evolution?
Why did the evolution happen? Programmers were no longer as bound to users running particular browser versions. Instead of convincing standards bodies that a new HTML tag was needed, the availability of the necessary building blocks decoupled the pace of innovation of the underlying browser from the application running on top. This is of course a bit oversimplified as HTML did evolve over time and contributed to the success but the evolution of HTML itself would not have been sufficient.
For all of the above, exact counter parts can be found in eBPF for the same reason.
Now let’s return to eBPF. In order to understand the programmability impact of eBPF on the Linux kernel, it helps to have a high-level understanding of the architecture of the Linux kernel and how it interacts with applications and the hardware.
The main purpose of the Linux kernel is to abstract the hardware or virtual hardware and provide a consistent API (system calls) allowing for applications to run and share the resources. In order to achieve this, a wide set of subsystems and layers are maintained to distribute these responsibilities. Each subsystem typically allows for some level of configuration to account for different needs of users. If a desired behavior cannot be configured, a kernel change is required, historically, leaving two options:
|Native Support||Kernel Module|
Several development toolchains exist to assist in the development and management of eBPF programs. All of them address different needs of users:
BCC is a framework that enables users to write python programs with eBPF programs embedded inside them. The framework is primarily targeted for use cases which involve application and system profiling/tracing where an eBPF program is used to collect statistics or generate events and a counterpart in user space collects the data and displays it in a human readable form. Running the python program will generate the eBPF bytecode and load it into the kernel.
bpftrace is a high-level tracing language for Linux eBPF and available in recent Linux kernels (4.x). bpftrace uses LLVM as a backend to compile scripts to eBPF bytecode and makes use of BCC for interacting with the Linux eBPF subsystem as well as existing Linux tracing capabilities: kernel dynamic tracing (kprobes), user-level dynamic tracing (uprobes), and tracepoints. The bpftrace language is inspired by awk, C and predecessor tracers such as DTrace and SystemTap.
The eBPF Go library provides a generic eBPF library that decouples the process of getting to the eBPF bytecode and the loading and management of eBPF programs. eBPF programs are typically created by writing a higher level language and then use the clang/LLVM compiler to compile to eBPF bytecode.
The libbpf library is a C/C++-based generic eBPF library which helps to decouple the loading of eBPF object files generated from the clang/LLVM compiler into the kernel and generally abstracts interaction with the BPF system call by providing easy to use library APIs for applications.
If you would like to learn more about eBPF, continue reading using the following additional materials: