Description
In the Linux kernel, the following vulnerability has been resolved:xdp: fix invalid wait context of page_pool_destroy()If the driver uses a page pool, it creates a page pool withpage_pool_create().The reference count of page pool is 1 as default.A page pool will be destroyed only when a reference count reaches 0.page_pool_destroy() is used to destroy page pool, it decreases areference count.When a page pool is destroyed, ->disconnect() is called, which ismem_allocator_disconnect().This function internally acquires mutex_lock().If the driver uses XDP, it registers a memory model withxdp_rxq_info_reg_mem_model().The xdp_rxq_info_reg_mem_model() internally increases a page poolreference count if a memory model is a page pool.Now the reference count is 2.To destroy a page pool, the driver should call both page_pool_destroy()and xdp_unreg_mem_model().The xdp_unreg_mem_model() internally calls page_pool_destroy().Only page_pool_destroy() decreases a reference count.If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), wewill face an invalid wait context warning.Because xdp_unreg_mem_model() calls page_pool_destroy() withrcu_read_lock().The page_pool_destroy() internally acquires mutex_lock().Splat looks like:=============================[ BUG: Invalid wait context ]6.10.0-rc6+ #4 Tainted: G W-----------------------------ethtool/1806 is trying to lock:ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150other info that might help us debug this:context-{5:5}3 locks held by ethtool/1806:stack backtrace:CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fedHardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021Call Trace:dump_stack_lvl+0x7e/0xc0__lock_acquire+0x1681/0x4de0? _printk+0x64/0xe0? __pfx_mark_lock.part.0+0x10/0x10? __pfx___lock_acquire+0x10/0x10lock_acquire+0x1b3/0x580? mem_allocator_disconnect+0x73/0x150? __wake_up_klogd.part.0+0x16/0xc0? __pfx_lock_acquire+0x10/0x10? dump_stack_lvl+0x91/0xc0__mutex_lock+0x15c/0x1690? mem_allocator_disconnect+0x73/0x150? __pfx_prb_read_valid+0x10/0x10? mem_allocator_disconnect+0x73/0x150? __pfx_llist_add_batch+0x10/0x10? console_unlock+0x193/0x1b0? lockdep_hardirqs_on+0xbe/0x140? __pfx___mutex_lock+0x10/0x10? tick_nohz_tick_stopped+0x16/0x90? __irq_work_queue_local+0x1e5/0x330? irq_work_queue+0x39/0x50? __wake_up_klogd.part.0+0x79/0xc0? mem_allocator_disconnect+0x73/0x150mem_allocator_disconnect+0x73/0x150? __pfx_mem_allocator_disconnect+0x10/0x10? mark_held_locks+0xa5/0xf0? rcu_is_watching+0x11/0xb0page_pool_release+0x36e/0x6d0page_pool_destroy+0xd7/0x440xdp_unreg_mem_model+0x1a7/0x2a0? __pfx_xdp_unreg_mem_model+0x10/0x10? kfree+0x125/0x370? bnxt_free_ring.isra.0+0x2eb/0x500? bnxt_free_mem+0x5ac/0x2500xdp_rxq_info_unreg+0x4a/0xd0bnxt_free_mem+0x1356/0x2500bnxt_close_nic+0xf0/0x3b0? __pfx_bnxt_close_nic+0x10/0x10? ethnl_parse_bit+0x2c6/0x6d0? __pfx___nla_validate_parse+0x10/0x10? __pfx_ethnl_parse_bit+0x10/0x10bnxt_set_features+0x2a8/0x3e0__netdev_update_features+0x4dc/0x1370? ethnl_parse_bitset+0x4ff/0x750? __pfx_ethnl_parse_bitset+0x10/0x10? __pfx___netdev_update_features+0x10/0x10? mark_held_locks+0xa5/0xf0? _raw_spin_unlock_irqrestore+0x42/0x70? __pm_runtime_resume+0x7d/0x110ethnl_set_features+0x32d/0xa20To fix this problem, it uses rhashtable_lookup_fast() instead ofrhashtable_lookup() with rcu_read_lock().Using xa without rcu_read_lock() here is safe.xa is freed by __xdp_mem_allocator_rcu_free() and this is called bycall_rcu() of mem_xa_remove().The mem_xa_remove() is called by page_pool_destroy() if a referencecount reaches 0.The xa is already protected by the reference count mechanism well in thecontrol plane.So removing rcu_read_lock() for page_pool_destroy() is safe.
POC
Reference
No PoCs from references.
Github
- https://github.com/fkie-cad/nvd-json-data-feeds