e9e3d724e2
The "bad_page()" page allocator sanity check was reported recently (call chain as follows): bad_page+0x69/0x91 free_hot_cold_page+0x81/0x144 skb_release_data+0x5f/0x98 __kfree_skb+0x11/0x1a tcp_ack+0x6a3/0x1868 tcp_rcv_established+0x7a6/0x8b9 tcp_v4_do_rcv+0x2a/0x2fa tcp_v4_rcv+0x9a2/0x9f6 do_timer+0x2df/0x52c ip_local_deliver+0x19d/0x263 ip_rcv+0x539/0x57c netif_receive_skb+0x470/0x49f :virtio_net:virtnet_poll+0x46b/0x5c5 net_rx_action+0xac/0x1b3 __do_softirq+0x89/0x133 call_softirq+0x1c/0x28 do_softirq+0x2c/0x7d do_IRQ+0xec/0xf5 default_idle+0x0/0x50 ret_from_intr+0x0/0xa default_idle+0x29/0x50 cpu_idle+0x95/0xb8 start_kernel+0x220/0x225 _sinittext+0x22f/0x236 It occurs because an skb with a fraglist was freed from the tcp retransmit queue when it was acked, but a page on that fraglist had PG_Slab set (indicating it was allocated from the Slab allocator (which means the free path above can't safely free it via put_page. We tracked this back to an nfsv4 setacl operation, in which the nfs code attempted to fill convert the passed in buffer to an array of pages in __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer to a page struct via virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab bit is set. We can't create a buffer with kmalloc and free it later in the tcp ack path with put_page, so we need to either: 1) ensure that when we create the list of pages, no page struct has PG_Slab set or 2) not use a page list to send this data Given that these buffers can be multiple pages and arbitrarily sized, I think (1) is the right way to go. I've written the below patch to allocate a page from the buddy allocator directly and copy the data over to it. This ensures that we have a put_page free-able page for every entry that winds up on an skb frag list, so it can be safely freed when the frame is acked. We do a put page on each entry after the rpc_call_sync call so as to drop our own reference count to the page, leaving only the ref count taken by tcp_sendpages. This way the data will be properly freed when the ack comes in Successfully tested by myself to solve the above oops. Note, as this is the result of a setacl operation that exceeded a page of data, I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm CCing security on this as well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: security@kernel.org CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
---|---|---|
.. | ||
cache_lib.c | ||
cache_lib.h | ||
callback.c | ||
callback.h | ||
callback_proc.c | ||
callback_xdr.c | ||
client.c | ||
delegation.c | ||
delegation.h | ||
dir.c | ||
direct.c | ||
dns_resolve.c | ||
dns_resolve.h | ||
file.c | ||
fscache-index.c | ||
fscache.c | ||
fscache.h | ||
getroot.c | ||
idmap.c | ||
inode.c | ||
internal.h | ||
iostat.h | ||
Kconfig | ||
Makefile | ||
mount_clnt.c | ||
namespace.c | ||
nfs2xdr.c | ||
nfs3acl.c | ||
nfs3proc.c | ||
nfs3xdr.c | ||
nfs4_fs.h | ||
nfs4filelayout.c | ||
nfs4filelayout.h | ||
nfs4filelayoutdev.c | ||
nfs4namespace.c | ||
nfs4proc.c | ||
nfs4renewd.c | ||
nfs4state.c | ||
nfs4xdr.c | ||
nfsroot.c | ||
pagelist.c | ||
pnfs.c | ||
pnfs.h | ||
proc.c | ||
read.c | ||
super.c | ||
symlink.c | ||
sysctl.c | ||
unlink.c | ||
write.c |