linux-hardened/fs
Filipe Manana 61391d5622 Btrfs: fix hang on error (such as ENOSPC) when writing extent pages
When running low on available disk space and having several processes
doing buffered file IO, I got the following trace in dmesg:

[ 4202.720152] INFO: task kworker/u8:1:5450 blocked for more than 120 seconds.
[ 4202.720401]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 4202.720596] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4202.720874] kworker/u8:1    D 0000000000000001     0  5450      2 0x00000000
[ 4202.720904] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 4202.720908]  ffff8801f62ddc38 0000000000000082 ffff880203ac2490 00000000001d3f40
[ 4202.720913]  ffff8801f62ddfd8 00000000001d3f40 ffff8800c4f0c920 ffff880203ac2490
[ 4202.720918]  00000000001d4a40 ffff88020fe85a40 ffff88020fe85ab8 0000000000000001
[ 4202.720922] Call Trace:
[ 4202.720931]  [<ffffffff816a3cb9>] schedule+0x29/0x70
[ 4202.720950]  [<ffffffffa01ec48d>] btrfs_start_ordered_extent+0x6d/0x110 [btrfs]
[ 4202.720956]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
[ 4202.720972]  [<ffffffffa01ec559>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 4202.720988]  [<ffffffffa0201987>] normal_work_helper+0x137/0x2c0 [btrfs]
[ 4202.720994]  [<ffffffff810680e5>] process_one_work+0x1f5/0x530
(...)
[ 4202.721027] 2 locks held by kworker/u8:1/5450:
[ 4202.721028]  #0:  (%s-%s){++++..}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
[ 4202.721037]  #1:  ((&work->normal_work)){+.+...}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
[ 4202.721054] INFO: task btrfs:7891 blocked for more than 120 seconds.
[ 4202.721258]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 4202.721444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4202.721699] btrfs           D 0000000000000001     0  7891   7890 0x00000001
[ 4202.721704]  ffff88018c2119e8 0000000000000086 ffff8800a33d2490 00000000001d3f40
[ 4202.721710]  ffff88018c211fd8 00000000001d3f40 ffff8802144b0000 ffff8800a33d2490
[ 4202.721714]  ffff8800d8576640 ffff88020fe85bc0 ffff88020fe85bc8 7fffffffffffffff
[ 4202.721718] Call Trace:
[ 4202.721723]  [<ffffffff816a3cb9>] schedule+0x29/0x70
[ 4202.721727]  [<ffffffff816a2ebc>] schedule_timeout+0x1dc/0x270
[ 4202.721732]  [<ffffffff8109bd79>] ? mark_held_locks+0xb9/0x140
[ 4202.721736]  [<ffffffff816a90c0>] ? _raw_spin_unlock_irq+0x30/0x40
[ 4202.721740]  [<ffffffff8109bf0d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
[ 4202.721744]  [<ffffffff816a488f>] wait_for_completion+0xdf/0x120
[ 4202.721749]  [<ffffffff8107fa90>] ? try_to_wake_up+0x310/0x310
[ 4202.721765]  [<ffffffffa01ebee4>] btrfs_wait_ordered_extents+0x1f4/0x280 [btrfs]
[ 4202.721781]  [<ffffffffa020526e>] btrfs_mksubvol.isra.62+0x30e/0x5a0 [btrfs]
[ 4202.721786]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
[ 4202.721799]  [<ffffffffa02056a9>] btrfs_ioctl_snap_create_transid+0x1a9/0x1b0 [btrfs]
[ 4202.721813]  [<ffffffffa020583a>] btrfs_ioctl_snap_create_v2+0x10a/0x170 [btrfs]
(...)

It turns out that extent_io.c:__extent_writepage(), which ends up being called
through filemap_fdatawrite_range() in btrfs_start_ordered_extent(), was getting
-ENOSPC when calling the fill_delalloc callback. In this situation, it returned
without the writepage_end_io_hook callback (inode.c:btrfs_writepage_end_io_hook)
ever being called for the respective page, which prevents the ordered extent's
bytes_left count from ever reaching 0, and therefore a finish_ordered_fn work
is never queued into the endio_write_workers queue. This makes the task that
called btrfs_start_ordered_extent() hang forever on the wait queue of the ordered
extent.

This is fairly easy to reproduce using a small filesystem and fsstress on
a quad core vm:

    mkfs.btrfs -f -b `expr 2100 \* 1024 \* 1024` /dev/sdd
    mount /dev/sdd /mnt

    fsstress -p 6 -d /mnt -n 100000 -x \
        "btrfs subvolume snapshot -r /mnt /mnt/mysnap" \
	    -f allocsp=0 \
	    -f bulkstat=0 \
	    -f bulkstat1=0 \
	    -f chown=0 \
	    -f creat=1 \
	    -f dread=0 \
	    -f dwrite=0 \
	    -f fallocate=1 \
	    -f fdatasync=0 \
	    -f fiemap=0 \
	    -f freesp=0 \
	    -f fsync=0 \
	    -f getattr=0 \
	    -f getdents=0 \
	    -f link=0 \
	    -f mkdir=0 \
	    -f mknod=0 \
	    -f punch=1 \
	    -f read=0 \
	    -f readlink=0 \
	    -f rename=0 \
	    -f resvsp=0 \
	    -f rmdir=0 \
	    -f setxattr=0 \
	    -f stat=0 \
	    -f symlink=0 \
	    -f sync=0 \
	    -f truncate=1 \
	    -f unlink=0 \
	    -f unresvsp=0 \
	    -f write=4

So just ensure that if an error happens while writing the extent page
we call the writepage_end_io_hook callback. Also make it return the
error code and ensure the caller (extent_write_cache_pages) processes
all pages in the page vector even if an error happens only for some
of them, so that ordered extents end up released.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:20 -07:00
..
9p mm: implement ->map_pages for page cache 2014-04-07 16:35:53 -07:00
adfs fs/adfs/super.c: add __init to init_inodecache() 2014-04-07 16:36:08 -07:00
affs fs/affs/super.c: bugfix / double free 2014-05-06 13:05:00 -07:00
afs AFS: Pass an afs_call* to call->async_workfn() instead of a work_struct* 2014-05-23 13:05:22 +01:00
autofs4 autofs: fix lockref lookup 2014-05-06 13:04:59 -07:00
befs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
bfs fs/bfs/inode.c: add __init to init_inodecache() 2014-04-07 16:36:08 -07:00
btrfs Btrfs: fix hang on error (such as ENOSPC) when writing extent pages 2014-06-09 17:20:20 -07:00
cachefiles Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-05-05 15:17:02 -07:00
cifs cifs: fix actimeo=0 corner case when cifs_i->time == jiffies 2014-04-24 22:37:03 -05:00
coda Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
configfs
cramfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
debugfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
devpts fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
dlm net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
ecryptfs Merge branch 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2014-04-04 14:03:05 -07:00
efivarfs efivarfs: 'efivarfs_file_write' function reorganization 2014-03-04 16:16:16 +00:00
efs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
exofs Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2014-04-10 14:33:02 -07:00
exportfs
ext2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
ext4 These are regression and bug fixes for ext4. 2014-04-20 20:43:47 -07:00
f2fs Merge branch 'akpm' (incoming from Andrew) 2014-04-07 16:38:06 -07:00
fat Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
freevxfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
fscache FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree 2014-02-17 13:47:35 -08:00
fuse fuse: add renameat2 support 2014-04-28 16:43:44 +02:00
gfs2 mm: implement ->map_pages for page cache 2014-04-07 16:35:53 -07:00
hfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
hfsplus Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
hostfs mm + fs: store shadow entries in page cache 2014-04-03 16:21:01 -07:00
hpfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
hppfs
hugetlbfs hugetlb: ensure hugepage access is denied if hugepages are not supported 2014-05-06 13:04:58 -07:00
isofs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
jbd jbd: Revise KERN_EMERG error messages 2013-12-04 12:27:46 +01:00
jbd2 jbd2: improve error messages for inconsistent journal heads 2014-03-12 16:38:03 -04:00
jffs2 MTD updates for 3.15: 2014-04-07 10:17:30 -07:00
jfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
kernfs kernfs: move the last knowledge of sysfs out from kernfs 2014-06-03 08:11:18 -07:00
lockd lockd: ensure we tear down any live sockets when socket creation fails during lockd_up 2014-03-28 10:43:08 -04:00
logfs mm + fs: store shadow entries in page cache 2014-04-03 16:21:01 -07:00
minix Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
ncpfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-04-12 17:31:22 -07:00
nfs mm: implement ->map_pages for page cache 2014-04-07 16:35:53 -07:00
nfs_common
nfsd nfsd4: warn on finding lockowner without stateid's 2014-05-21 11:11:21 -04:00
nilfs2 mm: implement ->map_pages for page cache 2014-04-07 16:35:53 -07:00
nls nls: have register_nls() set ->owner 2014-01-25 03:14:05 -05:00
notify fanotify: fix -EOVERFLOW with large files on 64-bit 2014-05-06 13:04:59 -07:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
ocfs2 ocfs2: fix double kmem_cache_destroy in dlm_init 2014-05-23 09:37:30 -07:00
omfs mm + fs: store shadow entries in page cache 2014-04-03 16:21:01 -07:00
openpromfs fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
proc mm: add !pte_present() check on existing hugetlb_entry callbacks 2014-06-06 13:21:16 -07:00
pstore Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
qnx4 fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
qnx6 fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
ramfs fs/ramfs: move ramfs_aops to inode.c 2014-01-23 16:36:58 -08:00
reiserfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
romfs fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
squashfs fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
sysfs kernfs: move the last knowledge of sysfs out from kernfs 2014-06-03 08:11:18 -07:00
sysv Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
ubifs UBIFS: fix remount error path 2014-05-05 09:31:33 +03:00
udf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
ufs fs/ufs: remove unused ufs_super_block_third pointer 2014-04-07 16:36:16 -07:00
xfs xfs: list_lru_init returns a negative error 2014-05-15 09:23:24 +10:00
aio.c aio: fix potential leak in aio_run_iocb(). 2014-05-01 08:37:43 -04:00
anon_inodes.c vfs: Allocate anon_inode_inode in anon_inode_init() 2014-03-27 09:52:54 -07:00
attr.c fs: fix iversion handling 2013-12-05 16:36:21 -06:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c exec: kill the unnecessary mm->def_flags setting in load_elf_binary() 2014-04-07 16:35:52 -07:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c binfmt_misc: add missing 'break' statement 2014-04-03 16:21:16 -07:00
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Ensure we only enable integrity metadata for reads and writes 2014-04-09 08:00:06 -06:00
bio.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
block_dev.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
buffer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
char_dev.c
compat.c locks: rename file-private locks to "open file description locks" 2014-04-22 08:23:58 -04:00
compat_binfmt_elf.c binfmt_elf: add ELF_HWCAP2 to compat auxv entries 2014-03-04 08:05:21 +00:00
compat_ioctl.c fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types 2014-03-06 16:30:44 +01:00
coredump.c coredump: fix va_list corruption 2014-04-19 13:23:31 -07:00
dcache.c dcache: add missing lockdep annotation 2014-05-31 09:13:21 -07:00
dcookies.c fs/compat: fix lookup_dcookie() parameter handling 2014-01-29 16:22:40 -08:00
direct-io.c xfs: update for 3.15-rc1 2014-04-04 15:50:08 -07:00
drop_caches.c drop_caches: add some documentation and info message 2014-04-03 16:21:04 -07:00
eventfd.c eventfd_ctx_fdget(): use fdget() instead of fget() 2014-01-25 03:13:04 -05:00
eventpoll.c epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL 2014-01-02 14:40:30 -08:00
exec.c metag: Reduce maximum stack size to 256MB 2014-05-15 00:00:35 +01:00
fcntl.c locks: rename file-private locks to "open file description locks" 2014-04-22 08:23:58 -04:00
fhandle.c
file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
filesystems.c sys_sysfs: Add CONFIG_SYSFS_SYSCALL 2014-04-03 16:21:05 -07:00
fs-writeback.c One of the main highlights this time, is not the patches themselves 2014-04-04 14:49:16 -07:00
fs_struct.c
inode.c Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
internal.h
ioctl.c
ioprio.c
Kconfig kernfs: add CONFIG_KERNFS 2014-02-07 16:08:57 -08:00
Kconfig.binfmt
libfs.c
locks.c locks: only validate the lock vs. f_mode in F_SETLK codepaths 2014-05-09 11:41:54 -04:00
Makefile kernfs: add CONFIG_KERNFS 2014-02-07 16:08:57 -08:00
mbcache.c ext4: each filesystem creates and uses its own mb_cache 2014-03-18 19:24:49 -04:00
mount.h reduce m_start() cost... 2014-04-01 23:19:09 -04:00
mpage.c
namei.c fix races between __d_instantiate() and checks of dentry flags 2014-04-19 12:30:58 -04:00
namespace.c VFS: Make delayed_free() call free_vfsmnt() 2014-04-01 23:19:18 -04:00
no-block.c
open.c These are regression and bug fixes for ext4. 2014-04-20 20:43:47 -07:00
pipe.c switch pipe_read() to copy_page_to_iter() 2014-04-01 23:19:22 -04:00
pnode.c smarter propagate_mnt() 2014-04-01 23:19:08 -04:00
pnode.h smarter propagate_mnt() 2014-04-01 23:19:08 -04:00
posix_acl.c posix_acl: handle NULL ACL in posix_acl_equiv_mode 2014-05-06 13:58:42 -04:00
proc_namespace.c reduce m_start() cost... 2014-04-01 23:19:09 -04:00
read_write.c Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2014-03-31 14:32:17 -07:00
readdir.c
select.c
seq_file.c
signalfd.c
splice.c vfs: fix vmplice_to_user() 2014-05-28 01:54:52 -04:00
stack.c
stat.c
statfs.c
super.c fs: Don't return 0 from get_anon_bdev 2014-04-16 11:53:08 -07:00
sync.c Revert "writeback: do not sync data dirtied after sync start" 2014-02-22 02:02:28 +01:00
timerfd.c timerfd: support CLOCK_BOOTTIME clock 2014-01-23 16:57:40 -08:00
utimes.c
xattr.c