In the Linux kernel, the following vulnerability has been resolved:btrfs: adjust subpage bit start based on sectorsizeWhen running machines with 64k page size and a 16k nodesize we startedseeing tree log corruption in production. This turned out to be becausewe were not writing out dirty blocks sometimes, so this in fact affectsall metadata writes.When writing out a subpage EB we scan the subpage bitmap for a dirtyrange. If the range isn't dirty we do bit_start++;to move onto the next bit. The problem is the bitmap is based on thenumber of sectors that an EB has. So in this case, we have a 64kpagesize, 16k nodesize, but a 4k sectorsize. This means our bitmap is 4bits for every node. With a 64k page size we end up with 4 nodes perpage.To make this easier this is how everything looks[0 16k 32k 48k ] logical address[0 4 8 12 ] radix tree offset[ 64k page ] folio[ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers[ | | | | | | | | | | | | | | | | ] bitmapNow we use all of our addressing based on fs_info->sectorsize_bits, soas you can see the above our 16k eb->start turns into radix entry 4.When we find a dirty range for our eb, we correctly do bit_start +=sectors_per_node, because if we start at bit 0, the next bit for thenext eb is 4, to correspond to eb->start 16k.However if our range is clean, we will do bit_start++, which will nowput us offset from our radix tree entries.In our case, assume that the first time we check the bitmap the block isnot dirty, we increment bit_start so now it == 1, and then we looparound and check again. This time it is dirty, and we go to find thatstart using the following equation start = folio_start + bit_start * fs_info->sectorsize;so in the case above, eb->start 0 is now dirty, and we calculate startas 0 + 1 * fs_info->sectorsize = 4096 4096 >> 12 = 1Now we're looking up the radix tree for 1, and we won't find an eb.What's worse is now we're using bit_start == 1, so we do bit_start +=sectors_per_node, which is now 5. If that eb is dirty we will run intothe same thing, we will look at an offset that is not populated in theradix tree, and now we're skipping the writeout of dirty extent buffers.The best fix for this is to not use sectorsize_bits to address nodes,but that's a larger change. Since this is a fs corruption problem fixit simply by always using sectors_per_node to increment the start bit.
No PoCs from references.
- https://github.com/w4zu/Debian_security