CVE-2026-31431 "Copy Fail" — why the kernel matters
29 April 2026. Trivial local privilege escalation in the Linux kernel via the AF_ALG crypto subsystem (CVE-2026-31431, “Copy Fail”). Public PoC, cross-distribution, mitigation belongs on the host. What I deployed for my managed hosting customers — and which follow-up LPEs (Dirty Frag, Fragnesia) sit on this line.
TL;DR — the 90-second summary
Affected?
Linux kernel 5.10 LTS through 6.6 LTS in practically all mainstream distributions — Debian, Ubuntu, RHEL, SUSE, Amazon Linux. All container hosts as well, regardless of the container distribution.
Risk?
Local privilege escalation to root via the AF_ALG crypto subsystem. Public PoC available, trivially reproducible.
Immediate action?
Where the kernel patch hasn't been deployed yet: disable algif_aead via modprobe blacklist and restrict crypto_user access. Then verify against the PoC.
Recommendation?
German Mittelstand: deploy the distribution patch, schedule a reboot. Enterprise/Kubernetes: additionally enable detection hooks (eBPF/Tetragon) on crypto-user syscalls.
Criticality?
High (see badge in the page header).
CVE-2026-31431 is a pure kernel vulnerability. It sits in the Linux kernel's crypto subsystem, specifically in the AF_ALG interface and the handling of certain memory operations around algif_aead. The public proof-of-concept fits on a sheet of A4 paper and needs no exotic preconditions. It uses standard kernel functionality that's active on practically every Linux system, and it works cross-distribution.
The detailed technical write-up is at copy.fail. Important for framing: this is a local privilege escalation. An attacker first needs code execution as an unprivileged user on the host. Once that's given, the exploit escalates reliably to root.
A common mistake in framing is: „if the container is secure, the system is secure.“ That's true for many classes of vulnerabilities. For kernel issues, it isn't. Containers, whether based on Debian, Alpine, or Wolfi, don't ship their own kernel. They use the host system's kernel. As soon as a process inside the container addresses the kernel and escalates there, isolation is bypassed.
Who is affected?
The vulnerability is cross-distribution. What matters is not the userland but the host kernel. Container images from Debian, Alpine, Ubuntu or Wolfi are equally affected because none of them ships its own kernel.
Component
Status
Condition
Linux kernel 5.10 LTS through 6.6 LTS
Affected
CONFIG_CRYPTO_USER_API_AEAD enabled (default on mainstream distros)
Linux kernel 6.7+
Conditional
Affected before the distribution backport date, patched afterwards
Debian 11/12, Ubuntu 22.04/24.04
Affected
Until the patch from 1–7 May 2026
RHEL 8/9, Rocky, AlmaLinux
Affected
Until the distribution errata release
Amazon Linux 2/2023
Affected
Until ALAS advisory; Bottlerocket separate
Container images (Debian/Alpine/Wolfi)
Not affected
Userland without its own kernel; the host decides
Kubernetes worker nodes
Affected
If the host kernel isn't patched; every pod is an entry vector
Managed Kubernetes (EKS/AKS/GKE)
Provider-dependent
Worker image refresh is decisive
CI/CD runners (self-hosted)
Highly affected
Multi-tenant workload density increases the exploitation path
WSL2 kernel
Affected
Until the Microsoft kernel update
The vulnerability is particularly critical where many workloads share the same kernel: container hosts with multi-tenant setups, self-hosted Kubernetes workers, and CI/CD runners. For exactly this class of risk, I schedule AI security audits into every release.
Impact
Copy Fail is a local privilege escalation (LPE). The CVSS rating sits in the high range per the NVD preliminary score (7.8 Local Attack Vector, low complexity, no user interaction). RCE or remote escalation isn't directly possible; the exploit requires existing code execution in user space.
What that means in practice:
Container escape effectively possible. Every compromised pod on an unpatched host can gain root on the node. From there, the entire worker node — and through it, the cluster — is exposed.
CI/CD runners as a high-risk target. Self-hosted GitHub runners, GitLab runners, and Jenkins agents regularly execute untrusted code. A single compromised pipeline is enough — I've covered elsewhere in detail why the CI pipeline is the largest concentration point for escalations in the stack.
Shared hosting and multi-tenant VPS. A classic entry point for cross-tenant escalation.
Compliance consequences. Anyone operating under ISO 27001 or NIS-2 has an incident-assessment obligation as soon as an exploit becomes publicly available — regardless of the detection status.
On the business side: downtime due to reboot is the most likely direct impact. Data loss is not a realistic scenario with clean mitigation; the reputational question only arises if an incident occurs before the patch.
Copy Fail in Kubernetes — the second class of threat
On a single Linux VM, Copy Fail is a local privilege escalation. In Kubernetes the same vulnerability manifests differently: not as escalation to root, but as lateral movement between pods — without container escape, without root, without capabilities. Anyone running K8s should keep both pictures in mind.
The operational mechanics emerge from the combination of two Kubernetes properties that look harmless in isolation. First: container images are built in layers, and identical base layers are stored only once physically on a node (OverlayFS). Second: when the Linux kernel reads a file from disk, it keeps a copy in the page cache. The page cache is a node-wide resource and is not isolated per container. It's keyed by (filesystem device, inode). Two pods from the same base layer that execute /usr/bin/cat read from the same page cache entry.
This is exactly where Copy Fail strikes in K8s. An unprivileged process in a pod opens a file read-only, corrupts its page cache copy via the AF_ALG path, and waits. The next pod that executes the same file runs against the manipulated content — without anything ever being written to disk. The attacker never enters the host and sees nothing from the outside: they pick a common binary blindly (cat, bash, a shared library), corrupt it, and let Kubernetes decide which pod touches it next.
The host stayed unaffected in that test. Its /usr/bin/cat sits on a different filesystem with a different inode, and the page cache cleanly separates the two entries. That means: Copy Fail in K8s is not a container-to-host escape, but container-to-container movement across the shared node layer. That doesn't make it less serious, it makes it structurally different. Network policies, RBAC, and file integrity monitoring don't see the attack because neither the network nor the persistent filesystem is touched.
The blast radius depends on your base image hygiene:
Setup
Risk
Reason
All workloads share the same base (e.g. ubuntu:24.04)
High
Every compromised pod reaches every other pod on the same node
Mix with partial overlap
Medium
Blast radius limited to pods sharing layers
Every workload uses its own distroless or scratch image
Low
No shared layers, no shared page cache entry
Privileged DaemonSets (CNI, logging, monitoring) share base with application workloads
Critical
Attacker reaches pods with cluster-admin or host-network privileges
The painful consequence from Stream Security's demo: the attacker doesn't decide who gets hit. The Kubernetes scheduler, image layer sharing, and probe configuration do. Anyone who builds cluster-admin DaemonSets from the same base layer as unprivileged application pods has paved a path that their own platform conventions make trafficable.
Wolfi OS and the question of responsibility
I use Wolfi OS as my container base. For a vulnerability like Copy Fail, the clean framing matters — otherwise the question „is our container distribution to blame?“ gets answered wrong.
Wolfi is an undistro: pure userland without its own kernel. Wolfi uses the host system's kernel in full. Two things follow at the same time: Wolfi is not the cause of Copy Fail, and Wolfi cannot fix it either. Responsibility sits entirely on the host layer. The same applies 1:1 to distroless images, Chainguard images, Alpine, Debian-slim and any other lean container base: none of them ships its own kernel.
This is exactly why I stepped away from compose.yaml as a matter of principle a few weeks ago and built my container topologies declaratively. When the layers are cleanly separated — image, pod spec, host kernel — responsibility can be pinned per incident. In the case of Copy Fail: the host kernel. Image rebuilds would be busywork and would only blur the validation of the actual mitigation.
If you do run Wolfi images, you benefit indirectly: the slimmer userland attack surface reduces the chance that an attacker even reaches the point of launching a local kernel exploit. But that's defense in depth, not mitigation.
Mitigation and immediate actions
The short answer: deploy the distribution patch and reboot. Where a reboot isn't immediately possible, disable algif_aead via module blacklist and restrict crypto-user access. Both workarounds take effect at runtime.
NixOS hosts patch differently: the kernel and the module blacklist are declared in /etc/nixos/configuration.nix, then nixos-rebuild switch pulls both in one step. Advantage: the next generation can be rolled back via the bootloader if the mitigation breaks a productive function.
If you don't want to bump the kernel channel right away, declare only the module blacklist and the sysctl as a stopgap, and pull the channel bump in the next maintenance window. The declarative form makes both steps audit-proof and automatically reproducible.
Workaround without reboot: disable algif_aead
# Unload the module live (immediate effect)
sudo modprobe -r algif_aead
# Persistent blacklist
echo "blacklist algif_aead" | sudo tee /etc/modprobe.d/cve-2026-31431.conf
sudo depmod -a
Restrict crypto_user access
# sysctl stopgap: block the crypto user API for unprivileged processes
echo "kernel.crypto_user_api = 0" | sudo tee /etc/sysctl.d/99-cve-2026-31431.conf
sudo sysctl --system
If you want to structurally minimize the K8s-specific lateral movement risk (see „Copy Fail in Kubernetes“ above), you have three levers beyond the kernel patch:
Diverse base images. Different workloads shouldn't all sit on ubuntu:24.04. Image layers used by only one workload don't land in a shared page cache entry, the lateral movement runs into a dead end.
Distroless or scratch for workloads. Application containers typically don't need system binaries like /usr/bin/cat or /bin/sh. What's not in the image can't be corrupted in its page cache either. This is the structural answer to Copy Fail in the K8s context.
Separate privileged DaemonSets from the workload path. CNI agents, logging sidecars, and monitoring daemons with hostNetwork or hostPath should never come from the same base layer as unprivileged application pods. Otherwise the page cache becomes an elevator to the cluster-admin floor.
At the pod level, a seccomp profile blocks the entry point at the syscall. The RuntimeDefault profile from Docker and Kubernetes lets AF_ALG sockets through, the block has to be set explicitly:
Rollback. The module blacklist can be reverted with sudo rm /etc/modprobe.d/cve-2026-31431.conf and a reload. Applications that use AEAD crypto in user space via AF_ALG (rare, mostly special VPN tools or crypto benchmarks) won't work without the module.
Technical deep dive
The exploitable path sits in the kernel's crypto subsystem. AF_ALG is a socket family that lets userspace processes address kernel crypto implementations, historically introduced for IPsec daemon implementations and hardware-accelerated crypto. The algif_aead module implements AEAD cipher operations (Authenticated Encryption with Associated Data) over this interface.
The vulnerability emerges in the handling of certain recvmsg() paths combined with incorrect reference counting on skb structures. Under specific race conditions, the kernel writes into a buffer that has already been freed — a classic Use-After-Free in the kernel heap. With controlled heap allocation, the freed slot can be occupied by an attacker-controlled data structure, which leads to kernel memory disclosure and ultimately privilege escalation via cred structure manipulation.
Important aspects for assessment:
No capability requirement. Creating an AF_ALG socket needs neither CAP_NET_ADMIN nor CAP_SYS_MODULE. Every unprivileged process can reach the path, provided the module is loaded.
Auto-load behavior.algif_aead is auto-loaded on most distributions as soon as the first AF_ALG socket is opened for AEAD. Simple module unload without a blacklist isn't enough: an attacker reloads it on demand.
Container namespaces don't help. User namespaces isolate UID mappings, but not the kernel code path. Crypto sockets are opened against the host kernel regardless of container context.
seccomp-bpf as defense in depth. A restrictive seccomp profile that limits the socket() syscall to AF_INET/AF_INET6/AF_UNIX blocks the entry point. Docker's and Kubernetes' default seccomp profiles don't cover this.
Trade-off with the stopgap mitigation: the module blacklist only covers the AEAD path. There are related AF_ALG modules (algif_hash, algif_skcipher, algif_rng, algif_aead) that aren't all affected by the same CVE, but auditing them as part of a clean patch wave is recommended.
Detection and verification
Lead questions
Do my Linux hosts run a kernel on the 5.10/5.15/6.1/6.6 LTS line or older — and is the applied patch below 6.18.22 / 6.19.12 / 7.0?
Is algif_aead loadable or built-in on my hosts?
Do I see unusual AF_ALG socket operations from unprivileged processes in the logs — especially combined with setresuid/setreuid/setuid shortly after?
Was the host booted in the last 7–10 days without a kernel update, while exploit code was already circulating in cybercrime forums?
Quick check per host
# Check kernel version
uname -r
# algif_aead module status
lsmod | grep algif_aead
grep CONFIG_CRYPTO_USER_API_AEAD /boot/config-$(uname -r)
# Distribution patch status
# Debian/Ubuntu
apt list --installed 2>/dev/null | grep linux-image
# RHEL/AlmaLinux/Rocky
rpm -qa | grep kernel
# SUSE
zypper search --installed-only -t package kernel-default
Falco / eBPF correlation
If you operate Falco or comparable eBPF monitoring, watch for a three-syscall correlation per process: socket(AF_ALG) → bind/accept on an aead cipher → setresuid(0,0,0) or setreuid(0,0) within a few seconds. That is the exploit signature pattern that holds across PoC variants.
A concrete Falco rule sketch:
- rule: AF_ALG Aead Followed By Setuid
desc: Unprivileged process opens AF_ALG socket and then transitions to UID 0
condition: >
evt.type = socket and
evt.arg.domain = AF_ALG and
proc.uid != 0 and
proc.aname[1] != systemd
output: >
Possible Copy Fail exploitation
(user=%user.name pid=%proc.pid command=%proc.cmdline)
priority: WARNING
The rule alone triggers false positives on legitimate workloads (e.g. some libgcrypt paths). It is meant as a correlation anchor, not a blocking rule — the additional setuid follow-up in the output collection makes the case verifiable.
Audit routine per host
For SMEs without a dedicated SOC team: once a week per host, log uname -r in a central table and compare against the currently recommended patched version. Without automation you get patch drift — and patch drift becomes expensive exactly in high-severity cycles like this one.
Operator recommendation
Operational decision block
If you operate Linux hosts on RHEL/AlmaLinux/Rocky/CentOS Stream — then
apply the distribution patch via dnf upgrade kernel and reboot. If you cannot reboot: KernelCare live patch for EL8/EL9 is available as a stopgap.
If you operate Ubuntu LTS hosts — then
apply the kernel update via apt upgrade linux-image-* plus reboot. KernelCare covers Ubuntu 22.04 LTS (Jammy) including AWS and HWE variants as a live patch.
If you operate Debian stable (bookworm/trixie) — then
patches are available — Debian stable has shipped the updates since early May. If the host has not been updated yet, close that this week. Sid has linux 7.0.4-1.
If you run EC2, Hetzner Cloud, Azure, GCP — then
cloud providers do not patch the hypervisor on your behalf. Your guest kernel has to be updated. On Amazon Linux: AWS security bulletin 2026-027 lists the concrete kernel versions.
If you operate Kubernetes platforms — then
plan a node image update — all worker nodes need the patched kernel; otherwise compromised pods escalate to the host. Container images themselves are not affected (containers share the host kernel).
If you have hosts where AF_ALG applications are NOT used — then
you can deactivate the module as an interim measure: echo "install algif_aead /bin/true" > /etc/modprobe.d/blacklist-algif_aead.conf + reboot. Caution: on the RHEL family the module is built-in, so the initcall must be blacklisted via grubby — or just patch.
What I deliberately do not do
No delayed update to “next maintenance window” when the CISA KEV deadline is tomorrow and the host is publicly reachable. Real-world exploitation against cloud providers is documented.
No modprobe-only workarounds on the RHEL family. They don't work because algif_aead is built-in — only the patch or grubby initcall blacklist helps.
No assumption “our VM is small, who would target it?”. The exploit is 732 bytes of Python and works over any initial-access vector (SSH, web shell, compromised service account).
What I actually did
My build containers and production hosts have been running on patched kernels since early May. Concretely:
Own platform hosts (moselwal.de, ole-hartwig.eu, blog.ole-hartwig.eu, nozzleops.de): Debian stable with the early-May kernel update; reboot completed; algif_aead status verified.
Build containers (moselwal/build-base, moselwal/typo3-builder, moselwal/frankenphp-runtime): base images rebuilt on patched distribution versions, container registry tags refreshed, all build pipelines switched to the new tags.
Customer hosts under maintenance: patches rolled in the same maintenance iteration as the May CVE wave (Composer 2.9.8, TYPO3 14.3.1/13.4.29). SBOM inventory updated per customer.
Detection monitoring: Falco rule for AF_ALG socket plus subsequent setuid transition activated as a correlation signal in central observability.
For customers running their own cloud VMs or container platforms that I do not operate myself, I have distributed patch guidance and support if needed with detection scripts or audit routines.
Frequently asked questions about Copy Fail
Do Kubernetes containers or Wolfi images need to be rebuilt because of Copy Fail?+
No. The vulnerability sits in the host kernel, not in the images. Rebuilds would be activity for activity’s sake and only burn pipeline time — that holds for Wolfi just as for any other container base.
Why is the algif_aead kernel module loaded on my Linux system in the first place?+
AF_ALG is a user-space interface to the kernel crypto subsystem. Very few applications use it in production. Disabling it via modprobe blacklist is therefore typically a no-op for normal operations.
How do I verify that the Copy Fail mitigation actually takes effect on my host?+
I reproduce the public PoC from copy.fail after applying the mitigation. A host counts as cleared only when the escalation fails. A configuration line entered is not enough.
Are EC2, Hetzner Cloud, and Azure VMs automatically protected against CVE-2026-31431?+
Not automatically. Managed Kubernetes providers often ship worker images with mitigations included. Self-managed workers on EC2 or bare metal are your responsibility, and part of my audit.
When will the kernel patch for CVE-2026-31431 land in Debian, Ubuntu, and RHEL?+
As of 30 April 2026, no final mainline patch is available. Distributors will ship the patch after backporting. I track the kernel mailing list and apply the fix once it is stable and my validation has run.
As of 4 May 2026, patches have been released for most distributions. Update now! There are now targeted attacks on container environments!
We don't have an in-house security team — who mitigates Copy Fail on production Linux hosts?+
That is exactly what DevSecOps as a Service and the external IT department are for. I mitigate on your behalf, document the procedure in an audit-ready way, and hand back a verified state.
Conclusion
Copy Fail is the first of the two universal Linux LPE vulnerabilities in the May 2026 wave and the one with the strongest external indication: CISA KEV listing, FCEB remediation deadline 15 May 2026, multiple independent threat-intel confirmations of active exploitation. A 732-byte Python vulnerability that affects every Linux kernel since 2017 — that is not the edge case, it's the SME norm.
Operationally the patch is trivial: dnf upgrade kernel or apt upgrade linux-image-* plus reboot. Strategically it is a test of whether your own patch routine is fast enough to hold KEV deadlines without declaring an emergency. Anyone who has not patched between the April 2026 initial disclosure and mid-May has a pipeline weakness, not a complexity problem.
The cluster lesson: Copy Fail and Dirty Frag are variants of the same pattern — in-place optimisations in the kernel that grant unprivileged write primitives into the page cache. If you have one, you have the other ahead of you. The patch pipeline has to treat both CVEs as a connected task, not as two separate tickets.
Programming since 2002 – self-taught, set up my own business with KO-Web in 2012. Over 100 projects, with a focus on security, performance, automation and quality. Today freelance: DevSecOps consulting, training and software development.