Description
In the Linux kernel, the following vulnerability has been resolved:fs: Prevent file descriptor table allocations exceeding INT_MAXWhen sysctl_nr_open is set to a very high value (for example, 1073741816as set by systemd), processes attempting to use file descriptors nearthe limit can trigger massive memory allocation attempts that exceedINT_MAX, resulting in a WARNING in mm/slub.c: WARNING: CPU: 0 PID: 44 at mm/slub.c:5027 __kvmalloc_node_noprof+0x21a/0x288This happens because kvmalloc_array() and kvmalloc() check if therequested size exceeds INT_MAX and emit a warning when the allocation isnot flagged with __GFP_NOWARN.Specifically, when nr_open is set to 1073741816 (0x3ffffff8) and aprocess calls dup2(oldfd, 1073741880), the kernel attempts to allocate:- File descriptor array: 1073741880 * 8 bytes = 8,589,935,040 bytes- Multiple bitmaps: ~400MB- Total allocation size: > 8GB (exceeding INT_MAX = 2,147,483,647)Reproducer:1. Set /proc/sys/fs/nr_open to 1073741816: # echo 1073741816 > /proc/sys/fs/nr_open2. Run a program that uses a high file descriptor: #include #include int main() { struct rlimit rlim = {1073741824, 1073741824}; setrlimit(RLIMIT_NOFILE, &rlim); dup2(2, 1073741880); // Triggers the warning return 0; }3. Observe WARNING in dmesg at mm/slub.c:5027systemd commit a8b627a introduced automatic bumping of fs.nr_open to themaximum possible value. The rationale was that systems with memorycontrol groups (memcg) no longer need separate file descriptor limitssince memory is properly accounted. However, this change overlookedthat:1. The kernel's allocation functions still enforce INT_MAX as a maximum size regardless of memcg accounting2. Programs and tests that legitimately test file descriptor limits can inadvertently trigger massive allocations3. The resulting allocations (>8GB) are impractical and will always failsystemd's algorithm starts with INT_MAX and keeps halving the valueuntil the kernel accepts it. On most systems, this results in nr_openbeing set to 1073741816 (0x3ffffff8), which is just under 1GB of filedescriptors.While processes rarely use file descriptors near this limit in normaloperation, certain selftests (liketools/testing/selftests/core/unshare_test.c) and programs that test filedescriptor limits can trigger this issue.Fix this by adding a check in alloc_fdtable() to ensure the requestedallocation size does not exceed INT_MAX. This causes the operation tofail with -EMFILE instead of triggering a kernel warning and avoids theimpractical >8GB memory allocation request.
POC
Reference
No PoCs from references.
Github
- https://github.com/w4zu/Debian_security