Bottom Lines
- eBPF enables the creation of mini-programs that run in response to events.
- eBPF is a revolutionary technology because it lets programmers execute custom bytecode within the kernel without having to change the kernel or load kernel modules.
- eBPF is event-driven, i.e. each eBPF program is an event handler. These events are called “hooks”.
- eBPF programs interact with user-space programs via eBPF maps that are key-value pairs.
Memory is partitioned between kernel and user space in the Linux architecture. The kernel space is where the kernel core code and device drivers are executed. Kernel-space processes have complete access to all hardware, including the CPU, memory, and storage. All other processes operate in user space, which is dependent on the kernel for hardware access. The user space is the space where user applications are run. The user space code has limited direct access to hardware and relies on kernel space to complete its operation. In other words, processes in the user-space connect with the kernel through system calls to perform privileged tasks such as disc or network I/O.
While this separation provides a safe segregation of processes, the syscall interface is insufficient in certain circumstances, and developers want further flexibility to execute custom code directly in the kernel without modifying the kernel’s source code, and eBPF makes that possible.
Moreso, kernel modules, on the other hand, pose security issues due to their ability to execute arbitrary code directly in the kernel space. A kernel module with erroneous code may easily crash the kernel. Through eBPF, Linux offers a mechanism for running safe, certified sandboxed programmes in the kernel area.
So What is eBPF?
eBPF, as in extended Berkeley Packet Filter, is an in-kernel virtual machine running programs passed from user space. eBPF is a mechanism for Linux applications to execute code in Linux kernel space, it allows you to package the user space application logic to be executed in the Linux kernel space as bytecode.
The website analogy may help you grasp eBPF: HTML is designed to be static. Javascript enables you to create dynamic HTML web pages. For instance, on mouse click do X, or on page, load perform Y. eBPF is what JavaScript is to HTML. Instead of a fixed kernel, eBPF enables the creation of mini-programs that run in response to events such as network transmission and are executed in a secure virtual machine inside the kernel.
The primary difference is that although it is possible for a malfunctioning Javascript to break your website, eBPF has inbuilt protection to prevent you from breaking the kernel.
Futhermore, eBPF is very popular with teams that need to operate in high-performance environments. For example, Netflix has about 15 eBPF programs running on every server instance, Facebook, in contrast, has about 40 eBPF programs that are active on every server with another 100 eBPF programs that get spawned and killed as needed (source)
How does eBPF Work?
eBPF programs are event-driven, meaning they can be hooked to certain events and run by the kernel when that particular event occurs. The program can store information in maps, print to ring buffers, or call a subset of kernel functions defined by a special API. The kernel manages the map and ring buffer structures, and multiple eBPF programs can access the same map to share data.
eBPF programs follow these steps:
-
The bytecode of the eBPF program is sent to the kernel along with a program type that determines where the program needs to be attached, which in-kernel helper functions the verifier will allow to be called, whether network packet data can be accessed directly, and what type of object will pass as the first argument to the program.
-
The kernel runs a verifier on the bytecode. The verifier runs several security checks on the bytecode, ensuring that the program terminates and does not contain any loop that could potentially lock up the kernel. It also stimulates the execution of the eBPF program and checks the state of the virtual machine at every step to ensure the register and stack states are valid. Finally, it uses the program type to restrict the allowed kernel function calls from the program.
-
The bytecode is JIT-compiled into native code and attached to the specified location.
-
When the specified event occurs, the program is executed and writes data to the ring buffer or the map.
-
The map or ring buffer can be read by the user space to get the program result.
What are the benefits of eBPF?
eBPF is most commonly used to trace and profile userspace processes and, more recently, as a way to enhance observability capabilities. It has many distinct benefits over other methods:
-
eBPF applications are sandboxed and verified, ensuring that the kernel does not crash or stall in a loop. This improves the security of kernel modules.
-
eBPF shifts packet filtering from user space to kernel space, reducing superfluous packet copies and resulting in a large speed boost. The software runs rapidly since it is JIT-compiled.
-
Using eBPF does not need the modification of kernel source code or the creation of full-fledged kernel modules. An eBPF application is simple to create and run.
eBPF Tools
You can use several open-source tools to build custom programs that get loaded into the kernel at runtime in case you want to get your hands dirty. The list includes:
-
BPF Compiler Collection (BCC) provides a toolkit for building efficient kernel manipulation programs.
-
bpftrace is a high-level tracing language for eBPF programs.
-
There are also language-specific tools like gobpf for Golang, libbpf for C/C++, and redbpf for Rust.
Conclusion
eBPF is a fantastic addition to the Linux kernel. The ability to execute code in the kernel in a safe and sandboxed manner is useful for observability, network traffic management, and containerisation.
Reference Materials
eBPF Essentials
- ebpf.io - A gateway to discover all the basics of eBPF, including a listing of the main related projects and of community resources.
- Cilium’s BPF and XDP Reference Guide - In-depth documentation about most features and aspects of eBPF.
Kernel Documentation
- BPF Documentation - Index for BPF-related documentation coming with the Linux kernel.
- linux/Documentation/networking/filter.rst - eBPF specification (somewhat outdated; information should still be valid, but not exhaustive).
- BPF Design Q&A - Frequently Asked Questions on the decisions behind the BPF infrastructure.
- HOWTO interact with BPF subsystem - Frequently Asked Questions about contributing to eBPF development.
User Space eBPF
- uBPF - Written in C. Contains an interpreter, a JIT compiler for x86_64 architecture, an assembler and a disassembler.
- A generic implementation - With support for FreeBSD kernel, FreeBSD user space, Linux kernel, Linux user space and macOS user space. Used for the VALE software switch’s BPF extension module.
- rbpf - Written in Rust. Interpreter for Linux, macOS and Windows, and JIT-compiler for x86_64 under Linux.
- PREVAIL - A user space verifier for eBPF using an abstract interpretation layer, with support for loops.
- oster - Written in Go. A tool for tracing execution of Go programs by attaching eBPF to uprobes.
- wachy - A tracing profiler that aims to make eBPF uprobe-based debugging easier to use. This is done by displaying traces in a UI next to the source code and allowing interactive drilldown analysis.
Other
- IO Visor’s Unofficial eBPF spec - Summary of eBPF syntax and operation codes.
⚠ DISCLAIMER
Opinions expressed are solely my own and do not express the views or opinions of my employer.