{"id":"AZL-65892","summary":"CVE-2025-38472 affecting package kernel for versions less than 6.6.104.2-1","details":"In the Linux kernel, the following vulnerability has been resolved:\n\nnetfilter: nf_conntrack: fix crash due to removal of uninitialised entry\n\nA crash in conntrack was reported while trying to unlink the conntrack\nentry from the hash bucket list:\n    [exception RIP: __nf_ct_delete_from_lists+172]\n    [..]\n #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]\n #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]\n #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]\n    [..]\n\nThe nf_conn struct is marked as allocated from slab but appears to be in\na partially initialised state:\n\n ct hlist pointer is garbage; looks like the ct hash value\n (hence crash).\n ct-\u003estatus is equal to IPS_CONFIRMED|IPS_DYING, which is expected\n ct-\u003etimeout is 30000 (=30s), which is unexpected.\n\nEverything else looks like normal udp conntrack entry.  If we ignore\nct-\u003estatus and pretend its 0, the entry matches those that are newly\nallocated but not yet inserted into the hash:\n  - ct hlist pointers are overloaded and store/cache the raw tuple hash\n  - ct-\u003etimeout matches the relative time expected for a new udp flow\n    rather than the absolute 'jiffies' value.\n\nIf it were not for the presence of IPS_CONFIRMED,\n__nf_conntrack_find_get() would have skipped the entry.\n\nTheory is that we did hit following race:\n\ncpu x \t\t\tcpu y\t\t\tcpu z\n found entry E\t\tfound entry E\n E is expired\t\t\u003cpreemption\u003e\n nf_ct_delete()\n return E to rcu slab\n\t\t\t\t\tinit_conntrack\n\t\t\t\t\tE is re-inited,\n\t\t\t\t\tct-\u003estatus set to 0\n\t\t\t\t\treply tuplehash hnnode.pprev\n\t\t\t\t\tstores hash value.\n\ncpu y found E right before it was deleted on cpu x.\nE is now re-inited on cpu z.  cpu y was preempted before\nchecking for expiry and/or confirm bit.\n\n\t\t\t\t\t-\u003erefcnt set to 1\n\t\t\t\t\tE now owned by skb\n\t\t\t\t\t-\u003etimeout set to 30000\n\nIf cpu y were to resume now, it would observe E as\nexpired but would skip E due to missing CONFIRMED bit.\n\n\t\t\t\t\tnf_conntrack_confirm gets called\n\t\t\t\t\tsets: ct-\u003estatus |= CONFIRMED\n\t\t\t\t\tThis is wrong: E is not yet added\n\t\t\t\t\tto hashtable.\n\ncpu y resumes, it observes E as expired but CONFIRMED:\n\t\t\t\u003cresumes\u003e\n\t\t\tnf_ct_expired()\n\t\t\t -\u003e yes (ct-\u003etimeout is 30s)\n\t\t\tconfirmed bit set.\n\ncpu y will try to delete E from the hashtable:\n\t\t\tnf_ct_delete() -\u003e set DYING bit\n\t\t\t__nf_ct_delete_from_lists\n\nEven this scenario doesn't guarantee a crash:\ncpu z still holds the table bucket lock(s) so y blocks:\n\n\t\t\twait for spinlock held by z\n\n\t\t\t\t\tCONFIRMED is set but there is no\n\t\t\t\t\tguarantee ct will be added to hash:\n\t\t\t\t\t\"chaintoolong\" or \"clash resolution\"\n\t\t\t\t\tlogic both skip the insert step.\n\t\t\t\t\treply hnnode.pprev still stores the\n\t\t\t\t\thash value.\n\n\t\t\t\t\tunlocks spinlock\n\t\t\t\t\treturn NF_DROP\n\t\t\t\u003cunblocks, then\n\t\t\t crashes on hlist_nulls_del_rcu pprev\u003e\n\nIn case CPU z does insert the entry into the hashtable, cpu y will unlink\nE again right away but no crash occurs.\n\nWithout 'cpu y' race, 'garbage' hlist is of no consequence:\nct refcnt remains at 1, eventually skb will be free'd and E gets\ndestroyed via: nf_conntrack_put -\u003e nf_conntrack_destroy -\u003e nf_ct_destroy.\n\nTo resolve this, move the IPS_CONFIRMED assignment after the table\ninsertion but before the unlock.\n\nPablo points out that the confirm-bit-store could be reordered to happen\nbefore hlist add resp. the timeout fixup, so switch to set_bit and\nbefore_atomic memory barrier to prevent this.\n\nIt doesn't matter if other CPUs can observe a newly inserted entry right\nbefore the CONFIRMED bit was set:\n\nSuch event cannot be distinguished from above \"E is the old incarnation\"\ncase: the entry will be skipped.\n\nAlso change nf_ct_should_gc() to first check the confirmed bit.\n\nThe gc sequence is:\n 1. Check if entry has expired, if not skip to next entry\n 2. Obtain a reference to the expired entry.\n 3. Call nf_ct_should_gc() to double-check step 1.\n\nnf_ct_should_gc() is thus called only for entries that already failed an\nexpiry check. After this patch, once the confirmed bit check pas\n---truncated---","modified":"2026-04-01T05:20:39.946837Z","published":"2025-07-28T12:15:29Z","upstream":["CVE-2025-38472"],"references":[{"type":"WEB","url":"https://nvd.nist.gov/vuln/detail/CVE-2025-38472"}],"affected":[{"package":{"name":"kernel","ecosystem":"Azure Linux:3","purl":"pkg:rpm/azure-linux/kernel"},"ranges":[{"type":"ECOSYSTEM","events":[{"introduced":"0"},{"fixed":"6.6.104.2-1"}]}],"database_specific":{"source":"https://github.com/microsoft/AzureLinuxVulnerabilityData/blob/main/osv/AZL-65892.json"}}],"schema_version":"1.7.5"}