System call instrumentation on Linux/x86‑64 using memory‑indirect calls, part I
System call instrumentation on Linux/x86‑64 using memory‑indirect calls, part I
Rambles around computer science Diverting trains of thought, wasting precious time
Date: Mon, 15 Jun 2026
My library, libsystrap, provides a straightforward method for instrumenting system calls within the Linux x86-64 userland. However, the current architecture is plagued by a double-trap overhead.
The Current Bottleneck
Currently, system calls are replaced with the ud2 instruction. This triggers a SIGILL trap, and the actual system call is then executed from within the signal handler. This creates a secondary trap and introduces several complex edge cases.
Exploring Existing Research
Several recent papers have attempted to solve this. While some are general, others focus specifically on system call instrumentation:
- Liteinst: Focuses on instruction punning.
- E9Patch: A closely related approach to Liteinst.
- zpoline: Specifically targets system call instrumentation.
- lazypoline / K23: Follow-up works to improve the robustness of zpoline.
The Core Conflict: Instruction Length
The fundamental issue is a quirk of Intel's instruction encoding. Most useful jump instructions require bytes, but we often need to patch smaller instructions. For example, the syscall instruction (0f 05) is only two bytes long.
1. Instruction Punning
Instruction punning attempts to "cheat" the length requirement. If we have a syscall followed by other instructions:
| Byte Position | Original Meaning | Punning Meaning |
|---|---|---|
0f 05 | syscall | Start of Jump/Call |
WW | Next Instr Byte 1 | Least Significant Byte of Offset |
xx | Next Instr Byte 2 | Offset Byte 2 |
yy | Next Instr Byte 3 | Offset Byte 3 |
zz | Next Instr Byte 4 | Offset Byte 4 |
Because x86 is little-endian, WW is the least significant byte. This means the jump target is largely fixed, with only a 256-byte range of "wiggle room."
The Statistical Gamble: If the high-order bytes (
xx,yy,zz) aren't zero or very small, there is a decent chance the jump lands in a usable memory area. If it doesn't, the system falls back toud2.
Note: E9Patch uses complex compound punning to increase coverage. While this requires significant virtual address space (roughly one page per site), physical pages can be shared to save RAM.
2. The zpoline Approach
zpoline avoids statistical guesswork. It replaces the 2-byte syscall with:
ff d0 call *%rax
Since %rax contains the system call number (a small non-negative integer), this triggers a call to a very low memory address.
The Trade-offs:
- Null Pointer Risk: Requires mapping code at address zero, bypassing hardware null-pointer protection.
- Mitigation 1: Use Intel memory protection keys (MPK) for execute-only memory.
- Mitigation 2: Validate return addresses via a hash table or bitmap.
- Privilege Requirements: Mapping low memory on Linux requires root/system privileges.
- Stability: If
%raxcontains a high value, the program crashes instead of returningENOSYS.
A New Direction: x86 Segmentation
I wondered if other encoding corners could offer better trade-offs. I turned to x86 segmentation, a feature that remains active even in 64-bit mode.
In protected mode, memory translation follows this logic:
Linux allows users to manipulate the Local Descriptor Table (LDT) via the modify_ldt() system call. I hypothesized that a 2-byte instruction could indirect through this table to reach the instrumentation code.
The "Near Miss"
Initially, I optimistically hoped to use a "long call" (lcall in AT&T, call far in Intel). I imagined something like:
ff 18 # lcall *(%rax)
Why this was naive:
- It is illogical to store a 16-bit selector in a 64-bit register like
%rax. - The
linlcallstands for "long," not "Local" (as in LDT). %raxcould only hold a local selector, as we cannot control the Global Descriptor Table (GDT).
"The
lcallinstruction allows calling into a different code segment, whereas a 'near call' stays within the current segment."
In modern "flat" memory models (standard for 32-bit Unix and mandatory for 64-bit), the segment base is simply .
