eBPF Summit 2020 - Virtual - Oct 28-29, 2020 - CFP and Registration now open. Click here.

    eBPF Documentation

    What is eBPF?

    The Linux kernel has always been an ideal place to implement monitoring/observability, networking, and security. Unfortunately this was often impractical as it required changing kernel source code or loading kernel modules, and resulted in layers of abstractions stacked on top of each other. eBPF is a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. By making the Linux kernel programmable, infrastructure software can leverage existing layers, making them more intelligent and feature-rich without continuing to add additional layers of complexity to the system.

    eBPF has resulted in the development of a completely new generation of tooling in areas such as networking, security, application profiling/tracing and performance troubleshooting that no longer rely on existing kernel functionality but instead actively reprogram runtime behavior without compromising execution efficiency or safety.

    What is eBPF.io?

    eBPF.io is a place for everybody to learn and collaborate on the topic of eBPF. eBPF is an open community and everybody can particpate and share. Whether you want to read a first introduction to eBPF, find further reading material or make your first steps to becoming contributors to major eBPF projects, eBPF.io will help you along the way.

    Introduction to eBPF

    The following chapters are a quick introduction into eBPF. If you would like to learn more about eBPF, see the eBPF & XDP Reference Guide. Whether you are a developer looking to build an eBPF program, or interested in leveraging a solution that uses eBPF, it is useful to understand the basic concepts and architecture.

    Hook Overview

    eBPF programs are event-driven and are run when the kernel or an application passes a certain hook point. Pre-defined hooks include system calls, function entry/exit, kernel tracepoints, network events, and several others.

    If a predefined hook does not exist for a particular need, it is possible to create a kernel probe (kprobe) or user probe (uprobe) to attach eBPF programs almost anywhere in kernel or user applications.

    How are eBPF programs written?

    In a lot of scenarios, eBPF is not used directly but indirectly via projects like Cilium, bcc, or bpftrace which provide an abstraction on top of eBPF and do not require to write programs directly but instead offer the ability to specify intent-based definitions which are then implemented with eBPF.

    If no higher-level abstraction exists, programs need to be written directly. The Linux kernel expects eBPF programs to be loaded in the form of bytecode. While it is of course possible to write bytecode directly, the more common development practice is to leverage a compiler suite like LLVM to compile pseudo-C code into eBPF bytecode.

    Loader & Verification Architecture

    When the desired hook has been identified, the eBPF program can be loaded into the Linux kernel using the bpf system call. This is typically done using one of the available eBPF libraries. The next section provides an introduction into the available development toolchains.

    As the program is loaded into the Linux kernel, it passes through two steps before being attached to the requested hook:

    Verification

    The verification step ensures that the eBPF program is safe to run. It validates that the program meets several conditions, for example:

    • The process loading the eBPF program holds the required capabilities (privileges). Unless unprivileged eBPF is enabled, only privileged processes can load eBPF programs.
    • The program does not crash or otherwise harm the system.
    • The program always runs to completion (i.e. the program does not sit in a loop forever, holding up further processing).

    JIT Compilation

    The Just-in-Time (JIT) compilation step translates the generic bytecode of the program into the machine specific instruction set to optimize execution speed of the program. This makes eBPF programs run as efficiently as natively compiled kernel code or as code loaded as a kernel module.

    Maps

    A vital aspect of eBPF programs is the ability to share collected information and to store state. For this purpose, eBPF programs can leverage the concept of eBPF maps to store and retrieve data in a wide set of data structures. eBPF maps can be accessed from eBPF programs as well as from applications in user space via a system call.

    The following is an incomplete list of supported map types to given an understanding of the diversity in data structures. For various map types, both a shared and a per-CPU variation is available.

    • Hash tables, Arrays
    • LRU (Least Recently Used)
    • Ring Buffer
    • Stack Trace
    • LPM (Longest Prefix match)
    • ...

    Helper Calls

    eBPF programs cannot call into arbitrary kernel functions. Allowing this would bind eBPF programs to particular kernel versions and would complicate compatibility of programs. Instead, eBPF programs can make function calls into helper functions, a well-known and stable API offered by the kernel.

    The set of available helper calls is constantly evolving. Examples of available helper calls:

    • Generate random numbers
    • Get current time & date
    • eBPF map access
    • Get process/cgroup context
    • Manipulate network packets and forwarding logic

    Tail & Function Calls

    eBPF programs are composable with the concept of tail and function calls. Function calls allow defining and calling functions within an eBPF program. Tail calls can call and execute another eBPF program and replace the execution context, similar to how the execve() system call operates for regular processes.

    eBPF Safety

    With great power there must also come great responsibility.

    eBPF is an incredibly powerful technology and now runs at the heart of many critical software infrastructure components. During the development of eBPF, the safety of eBPF was the most crucial aspect when eBPF was considered for inclusion into the Linux kernel. eBPF safety is ensured through several layers:

    Required Privileges

    Unless unprivileged eBPF is enabled, all processes that intend to load eBPF programs into the Linux kernel must be running in privileged mode (root) or require the capability CAP_BPF. This means that untrusted programs cannot load eBPF programs.

    If unprivileged eBPF is enabled, unprivileged processes can load certain eBPF programs subject to a reduced functionality set and with limited access to the kernel.

    Verifier

    If a process is allowed to load an eBPF program, all programs still pass through the eBPF verifier. The eBPF verifier ensures the safety of the program itself. This means, for example:

    • Programs are validated to ensure they always run to completion, e.g. an eBPF program may never block or sit in a loop forever. eBPF programs may contain so called bounded loops but the program is only accepted if the verifier can ensure that the loop contains an exit condition which is guaranteed to become true.
    • Programs may not use any uninitialized variables or access memory out of bounds.
    • Programs must fit within the size requirements of the system. It is not possible to load arbitrarily large eBPF programs.
    • Program must have a finite complexity. The verifier will evaluate all possible execution paths and must be capable of completing the analysis within the limits of the configured upper complexity limit.

    Hardening

    Upon successful completion of the verification, the eBPF program runs through a hardening process according to whether the program is loaded from a privileged or unprivileged process. This step includes:

    • Program execution protection: The kernel memory holding an eBPF program is protected and made read-only. If for any reason, whether it is a kernel bug or malicious manipulation, the eBPF program is attempted to be modified, the kernel will crash instead of allowing it to continue executing the corrupted/manipulated program.
    • Mitigation against Spectre: Under speculation CPUs may mispredict branches and leave observable side effects that could be extracted through a side channel. To name a few examples: eBPF programs mask memory access in order redirect access under transient instructions to controlled areas, the verifier also follows program paths accessible only under speculative execution and the JIT compiler emits Retpolines in case tail calls cannot be converted to direct calls.
    • Constant blinding: All constants in the code are blinded to prevent JIT spraying attacks. This prevents attackers from injecting executable code as constants which in the presence of another kernel bug, could allow an attacker to jump into the memory section of the eBPF program to execute code.

    Abstracted Runtime Context

    eBPF programs cannot access arbitrary kernel memory directly. Access to data and data structures that lie outside of the context of the program must be accessed via eBPF helpers. This guarantees consistent data access and makes any such access subject to the privileges of the eBPF program, e.g. an eBPF program running is allowed to modify the data of certain data structures if the modification can be guaranteed to be safe. An eBPF program cannot randomly modify data structures in the kernel.

    Why eBPF?

    The Power of Programmability

    Let’s start with an analogy. Do you remember GeoCities? 20 years ago, web pages used to be almost exclusively written in static markup language (HTML). A web page was basically a document with an application (browser) able to display it. Looking at web pages today, web pages have become full-blown applications and web-based technology has replaced a vast majority of applications written in languages requiring compilation. What enabled this evolution?

    The short-answer is programmability with the introduction of JavaScript. It unlocked a massive revolution resulting in browsers to evolve into almost independent operating systems.

    Why did the evolution happen? Programmers were no longer as bound to users running particular browser versions. Instead of convincing standards bodies that a new HTML tag was needed, the availability of the necessary building blocks decoupled the pace of innovation of the underlying browser from the application running on top. This is of course a bit oversimplified as HTML did evolve over time and contributed to the success but the evolution of HTML itself would not have been sufficient.

    Before taking this example and applying it to eBPF, let's look at a couple of key aspects that were vital in the introduction of JavaScript:

    • Safety: Untrusted code runs in the browser of the user. This was solved by sandboxing JavaScript programs and abstracting access to browser data.
    • Continuous Delivery: Evolution of program logic must be possible without requiring to constantly ship new browser versions. This was solved by providing the right low-level building blocks sufficient to build arbitrary logic.
    • Performance: Programmability must be provided with minimal overhead. This was solved with the introduction of a Just-in-Time (JIT) compiler.

    For all of the above, exact counter parts can be found in eBPF for the same reason.

    eBPF's impact on the Linux Kernel

    Now let’s return to eBPF. In order to understand the programmability impact of eBPF on the Linux kernel, it helps to have a high-level understanding of the architecture of the Linux kernel and how it interacts with applications and the hardware.

    The main purpose of the Linux kernel is to abstract the hardware or virtual hardware and provide a consistent API (system calls) allowing for applications to run and share the resources. In order to achieve this, a wide set of subsystems and layers are maintained to distribute these responsibilities. Each subsystem typically allows for some level of configuration to account for different needs of users. If a desired behavior cannot be configured, a kernel change is required, historically, leaving two options:

  • Change kernel source code and convince the Linux kernel community that the change is required.
  • Wait several years for the new kernel version to become a commodity.
  • Write a kernel module
  • Fix it up regularly, as every kernel release may break it
  • Risk corrupting your Linux kernel due to lack of security boundaries
  • Native SupportKernel Module

        With eBPF, a new option is available that allows for reprogramming the behavior of the Linux kernel without requiring changes to kernel source code or loading a kernel module. In many ways, this is very similar to how JavaScript and other scripting languages unlocked the evolution of systems which had become difficult or expensive to change.

        Development Toolchains

        Several development toolchains exist to assist in the development and management of eBPF programs. All of them address different needs of users:

        bcc

        BCC is a framework that enables users to write python programs with eBPF programs embedded inside them. The framework is primarily targeted for use cases which involve application and system profiling/tracing where an eBPF program is used to collect statistics or generate events and a counterpart in user space collects the data and displays it in a human readable form. Running the python program will generate the eBPF bytecode and load it into the kernel.

        bpftrace

        bpftrace is a high-level tracing language for Linux eBPF and available in recent Linux kernels (4.x). bpftrace uses LLVM as a backend to compile scripts to eBPF bytecode and makes use of BCC for interacting with the Linux eBPF subsystem as well as existing Linux tracing capabilities: kernel dynamic tracing (kprobes), user-level dynamic tracing (uprobes), and tracepoints. The bpftrace language is inspired by awk, C and predecessor tracers such as DTrace and SystemTap.

        eBPF Go Library

        The eBPF Go library provides a generic eBPF library that decouples the process of getting to the eBPF bytecode and the loading and management of eBPF programs. eBPF programs are typically created by writing a higher level language and then use the clang/LLVM compiler to compile to eBPF bytecode.

        libbpf C/C++ Library

        The libbpf library is a C/C++-based generic eBPF library which helps to decouple the loading of eBPF object files generated from the clang/LLVM compiler into the kernel and generally abstracts interaction with the BPF system call by providing easy to use library APIs for applications.

        Further Reading

        If you would like to learn more about eBPF, continue reading using the following additional materials:

        Documentation

        Tutorials

        Talks

        Generic

        Cilium

        Hubble

        Books

        Articles & Blogs