linux-hardened/fs/btrfs
Filipe Manana b1517622f2 Btrfs: fix deadlock between dedup on same file and starting writeback
If we are deduping two ranges of the same file we need to make sure that
we lock all pages in ascending order, that is, lock first the pages from
the range with lower offset and then the pages from the other range, as
otherwise we can deadlock with a concurrent task that is starting delalloc
(writeback). Example trace:

[74073.052218] INFO: task kworker/u32:10:17997 blocked for more than 120 seconds.
[74073.053889]       Tainted: G        W       4.9.0-rc7-btrfs-next-36+ #1
[74073.055071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[74073.056696] kworker/u32:10  D    0 17997      2 0x00000000
[74073.058606] Workqueue: writeback wb_workfn (flush-btrfs-53176)
[74073.061370]  ffff880031e79858 ffff8802159d2580 ffff880237004580 ffff880031e79240
[74073.064784]  ffff88023f4978c0 ffffc9000817b638 ffffffff814c15e1 0000000000000000
[74073.068386]  ffff88023f4978d8 ffff88023f4978c0 000000000017b620 ffff880031e79240
[74073.071712] Call Trace:
[74073.072884]  [<ffffffff814c15e1>] ? __schedule+0x48f/0x6f4
[74073.075395]  [<ffffffff814c1c8b>] ? bit_wait+0x2f/0x2f
[74073.077511]  [<ffffffff814c18d2>] schedule+0x8c/0xa0
[74073.079440]  [<ffffffff814c4b36>] schedule_timeout+0x43/0xff
[74073.081637]  [<ffffffff8110953e>] ? time_hardirqs_on+0x9/0x14
[74073.083809]  [<ffffffff81095c67>] ? trace_hardirqs_on_caller+0x16/0x197
[74073.086314]  [<ffffffff810bde98>] ? timekeeping_get_ns+0x1e/0x32
[74073.100654]  [<ffffffff810be048>] ? ktime_get+0x41/0x52
[74073.102619]  [<ffffffff814c10f0>] io_schedule_timeout+0xa0/0x102
[74073.104771]  [<ffffffff814c10f0>] ? io_schedule_timeout+0xa0/0x102
[74073.106969]  [<ffffffff814c1ca6>] bit_wait_io+0x1b/0x39
[74073.108954]  [<ffffffff814c1fb8>] __wait_on_bit_lock+0x4f/0x99
[74073.110981]  [<ffffffff8112b692>] __lock_page+0x6b/0x6d
[74073.112833]  [<ffffffff8108ceb4>] ? autoremove_wake_function+0x3a/0x3a
[74073.115010]  [<ffffffffa031178b>] lock_page+0x2f/0x32 [btrfs]
[74073.116999]  [<ffffffffa0311d9f>] lock_delalloc_pages+0xc7/0x1a0 [btrfs]
[74073.119243]  [<ffffffffa0313d15>] find_lock_delalloc_range+0xc3/0x1a4 [btrfs]
[74073.121636]  [<ffffffffa0313e81>] writepage_delalloc.isra.31+0x8b/0x134 [btrfs]
[74073.124229]  [<ffffffffa0315d69>] __extent_writepage+0x1c1/0x2bf [btrfs]
[74073.126372]  [<ffffffffa03160f2>] extent_write_cache_pages.isra.30.constprop.49+0x28b/0x36c [btrfs]
[74073.129371]  [<ffffffffa03165b9>] extent_writepages+0x4b/0x5c [btrfs]
[74073.131440]  [<ffffffffa02fcb59>] ? insert_reserved_file_extent.constprop.42+0x261/0x261 [btrfs]
[74073.134303]  [<ffffffff811b4ce4>] ? writeback_sb_inodes+0xe0/0x4a1
[74073.136298]  [<ffffffffa02fab7f>] btrfs_writepages+0x28/0x2a [btrfs]
[74073.138248]  [<ffffffff81138200>] do_writepages+0x23/0x2c
[74073.139910]  [<ffffffff811b3cab>] __writeback_single_inode+0x105/0x6d2
[74073.142003]  [<ffffffff811b4e96>] writeback_sb_inodes+0x292/0x4a1
[74073.136298]  [<ffffffffa02fab7f>] btrfs_writepages+0x28/0x2a [btrfs]
[74073.138248]  [<ffffffff81138200>] do_writepages+0x23/0x2c
[74073.139910]  [<ffffffff811b3cab>] __writeback_single_inode+0x105/0x6d2
[74073.142003]  [<ffffffff811b4e96>] writeback_sb_inodes+0x292/0x4a1
[74073.143911]  [<ffffffff811b511b>] __writeback_inodes_wb+0x76/0xae
[74073.145787]  [<ffffffff811b53ca>] wb_writeback+0x1cc/0x4d7
[74073.147452]  [<ffffffff811b60cd>] wb_workfn+0x194/0x37d
[74073.149084]  [<ffffffff811b60cd>] ? wb_workfn+0x194/0x37d
[74073.150726]  [<ffffffff8106ce77>] ? process_one_work+0x154/0x4e4
[74073.152694]  [<ffffffff8106cf96>] process_one_work+0x273/0x4e4
[74073.154452]  [<ffffffff8106d6db>] worker_thread+0x1eb/0x2ca
[74073.156138]  [<ffffffff8106d4f0>] ? rescuer_thread+0x2b6/0x2b6
[74073.157837]  [<ffffffff81072a81>] kthread+0xd5/0xdd
[74073.159339]  [<ffffffff810729ac>] ? __kthread_unpark+0x5a/0x5a
[74073.161088]  [<ffffffff814c6257>] ret_from_fork+0x27/0x40
[74073.162680] INFO: lockdep is turned off.
[74073.163855] INFO: task do-dedup:30264 blocked for more than 120 seconds.
[74073.181180]       Tainted: G        W       4.9.0-rc7-btrfs-next-36+ #1
[74073.181180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[74073.185296] fdm-stress      D    0 30264  29974 0x00000000
[74073.186810]  ffff880089595118 ffff880211b8eac0 ffff880237030380 ffff880089594b00
[74073.188998]  ffff88023f2978c0 ffffc900063abb68 ffffffff814c15e1 0000000000000000
[74073.191070]  ffff88023f2978d8 ffff88023f2978c0 00000000003abb50 ffff880089594b00
[74073.193286] Call Trace:
[74073.193990]  [<ffffffff814c15e1>] ? __schedule+0x48f/0x6f4
[74073.195418]  [<ffffffff814c1c8b>] ? bit_wait+0x2f/0x2f
[74073.196796]  [<ffffffff814c18d2>] schedule+0x8c/0xa0
[74073.198163]  [<ffffffff814c4b36>] schedule_timeout+0x43/0xff
[74073.199621]  [<ffffffff81095df5>] ? trace_hardirqs_on+0xd/0xf
[74073.201100]  [<ffffffff810bde98>] ? timekeeping_get_ns+0x1e/0x32
[74073.202686]  [<ffffffff810be048>] ? ktime_get+0x41/0x52
[74073.204051]  [<ffffffff814c10f0>] io_schedule_timeout+0xa0/0x102
[74073.205585]  [<ffffffff814c10f0>] ? io_schedule_timeout+0xa0/0x102
[74073.207123]  [<ffffffff814c1ca6>] bit_wait_io+0x1b/0x39
[74073.208238]  [<ffffffff814c1fb8>] __wait_on_bit_lock+0x4f/0x99
[74073.208871]  [<ffffffff8112b692>] __lock_page+0x6b/0x6d
[74073.209430]  [<ffffffff8108ceb4>] ? autoremove_wake_function+0x3a/0x3a
[74073.210101]  [<ffffffff8112b800>] lock_page+0x2f/0x32
[74073.210636]  [<ffffffff8112c502>] pagecache_get_page+0x5e/0x153
[74073.211270]  [<ffffffffa03257eb>] gather_extent_pages+0x4e/0x109 [btrfs]
[74073.212166]  [<ffffffffa032a04c>] btrfs_dedupe_file_range+0x1e1/0x4dd [btrfs]
[74073.213257]  [<ffffffff8118d9b5>] vfs_dedupe_file_range+0x1c1/0x221
[74073.214086]  [<ffffffff8119e0c4>] do_vfs_ioctl+0x442/0x600
[74073.214767]  [<ffffffff811a7874>] ? rcu_read_unlock+0x5b/0x5d
[74073.215619]  [<ffffffff811a7953>] ? __fget+0x6b/0x77
[74073.216338]  [<ffffffff8119e2d9>] SyS_ioctl+0x57/0x79
[74073.217149]  [<ffffffff814c5fea>] entry_SYSCALL_64_fastpath+0x18/0xad
[74073.218102]  [<ffffffff81109552>] ? time_hardirqs_off+0x9/0x14
[74073.218968]  [<ffffffff810938ce>] ? trace_hardirqs_off_caller+0x1f/0xaa
[74073.219938] INFO: lockdep is turned off.

What happened was the following:

      CPU 1                                       CPU 2

                                             btrfs_dedupe_file_range()
                                               --> using same inode as source
                                                   and target
                                               --> src range is [768K, 1Mb[
                                               --> dst range is [0, 256K[
                                              btrfs_cmp_data_prepare()
                                               --> calls gather_extent_pages()
                                                   for range [768K, 1Mb[ and
                                                   locks all pages in that range

 do_writepages()
  btrfs_writepages()
   extent_writepages()
    extent_write_cache_pages()
     __extent_writepage()
      writepage_delalloc()
       find_lock_delalloc_range()
         --> finds range [0, 1Mb[
         lock_delalloc_pages()
          --> locks all pages in the
              range [0, 768K[
          --> tries to lock page at
              offset 768K
                --> deadlock

                                               --> calls gather_extent_pages()
                                                   to lock pages in the range
                                                   [0, 256K[
                                                    --> deadlock, task at CPU 1
                                                        already locked that
                                                        range and it's trying
                                                        to lock the range we
                                                        locked previously

So fix this by making sure that during a dedup we always lock first the
pages from the range with lower offset.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2017-02-22 15:55:02 -08:00
..
tests Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-12-16 10:53:01 -08:00
acl.c posix_acl: Clear SGID bit when setting file permissions 2016-09-22 10:55:32 +02:00
async-thread.c btrfs: fix crash when tracepoint arguments are freed by wq callbacks 2017-01-09 11:24:50 +01:00
async-thread.h btrfs: limit async_work allocation and worker func duration 2016-12-13 11:01:30 -08:00
backref.c btrfs: remove unused parameter from __add_inline_refs 2017-02-17 12:03:54 +01:00
backref.h
btrfs_inode.h btrfs: Better csum error message for data csum mismatch 2017-02-17 12:03:48 +01:00
check-integrity.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
check-integrity.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
compression.c btrfs: Better csum error message for data csum mismatch 2017-02-17 12:03:48 +01:00
compression.h btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00
ctree.c btrfs: remove unused parameter from tree_move_next_or_upnext 2017-02-17 12:03:52 +01:00
ctree.h btrfs: use btrfs_debug instead of pr_debug in transaction abort 2017-02-17 12:03:56 +01:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: fix over-80 lines introduced by previous cleanups 2017-02-14 15:50:57 +01:00
delayed-inode.h btrfs: Make btrfs_inode_delayed_dir_index_count take btrfs_inode 2017-02-14 15:50:53 +01:00
delayed-ref.c btrfs: qgroup: Move half of the qgroup accounting time out of commit trans 2017-02-17 12:03:55 +01:00
delayed-ref.h Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head 2017-02-14 15:50:59 +01:00
dev-replace.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
dev-replace.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
dir-item.c btrfs: fix over-80 lines introduced by previous cleanups 2017-02-14 15:50:57 +01:00
disk-io.c btrfs: remove unused parameter from btrfs_check_super_valid 2017-02-17 12:03:52 +01:00
disk-io.h btrfs: merge two superblock writing helpers 2017-02-17 12:03:51 +01:00
export.c btrfs: Make btrfs_ino take a struct btrfs_inode 2017-02-14 15:50:51 +01:00
export.h
extent-tree.c btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
extent_io.c btrfs: remove unused parameter from extent_write_cache_pages 2017-02-17 12:03:53 +01:00
extent_io.h btrfs: embed extent_changeset::range_changed to the structure 2017-02-17 12:03:49 +01:00
extent_map.c btrfs: Fix slab accounting flags 2016-07-26 13:52:25 +02:00
extent_map.h btrfs: cleanup, stop casting for extent_map->lookup everywhere 2016-01-15 19:22:28 +01:00
file-item.c btrfs: Make btrfs_ino take a struct btrfs_inode 2017-02-14 15:50:51 +01:00
file.c btrfs: fix over-80 lines introduced by previous cleanups 2017-02-14 15:50:57 +01:00
free-space-cache.c btrfs: btrfs_truncate_free_space_cache always allocates path 2017-02-17 12:03:56 +01:00
free-space-cache.h btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
free-space-tree.c btrfs: remove unused parameter from clean_tree_block 2017-02-17 12:03:51 +01:00
free-space-tree.h Btrfs: implement the free space B-tree 2015-12-17 12:16:47 -08:00
hash.c btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
hash.h btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
inode-item.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
inode-map.c btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
inode-map.h Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots 2016-01-15 19:25:02 +01:00
inode.c btrfs: remove unused parameter from add_pending_csums 2017-02-17 12:03:53 +01:00
ioctl.c Btrfs: fix deadlock between dedup on same file and starting writeback 2017-02-22 15:55:02 -08:00
Kconfig
locking.c btrfs: cleanup, remove stray return statements 2016-01-07 14:30:52 +01:00
locking.h
lzo.c btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00
Makefile Btrfs: add free space tree sanity tests 2015-12-17 12:16:47 -08:00
math.h
ordered-data.c Btrfs: clean up btrfs_ordered_update_i_size 2017-02-14 15:50:58 +01:00
ordered-data.h Btrfs: specify a new ordered extent type for create_io_em 2017-02-17 12:03:48 +01:00
orphan.c
print-tree.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
print-tree.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
props.c btrfs: Make btrfs_ino take a struct btrfs_inode 2017-02-14 15:50:51 +01:00
props.h
qgroup.c btrfs: qgroup: Move half of the qgroup accounting time out of commit trans 2017-02-17 12:03:55 +01:00
qgroup.h btrfs: qgroup: Move half of the qgroup accounting time out of commit trans 2017-02-17 12:03:55 +01:00
raid56.c btrfs: raid56: Remove unused variable in lock_stripe_add 2017-02-14 15:50:59 +01:00
raid56.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
rcu-string.h
reada.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
relocation.c btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
root-tree.c Btrfs: constify struct btrfs_{,disk_}key wherever possible 2017-02-14 15:50:58 +01:00
scrub.c btrfs: convert btrfs_inc_block_group_ro to accept fs_info 2017-02-17 12:03:56 +01:00
send.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
send.h Btrfs: use linux/sizes.h to represent constants 2016-01-07 14:38:02 +01:00
struct-funcs.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
super.c btrfs: remove unused parameter from btrfs_fill_super 2017-02-17 12:03:53 +01:00
sysfs.c btrfs: convert printk(KERN_* to use pr_* calls 2016-09-26 18:08:44 +02:00
sysfs.h btrfs: sysfs: introduce helper for syncing bits with sysfs files 2016-01-21 18:50:40 +01:00
transaction.c btrfs: remove unused parameter from btrfs_prepare_extent_commit 2017-02-17 12:03:52 +01:00
transaction.h btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c btrfs: remove unused parameter from __add_inode_ref 2017-02-17 12:03:54 +01:00
tree-log.h btrfs: Make btrfs_del_inode_ref take btrfs_inode 2017-02-14 15:50:54 +01:00
ulist.c btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
ulist.h btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
uuid-tree.c btrfs: return the actual error value from from btrfs_uuid_tree_iterate 2016-12-19 18:08:15 +01:00
volumes.c btrfs: remove unused parameter from init_first_rw_device 2017-02-17 12:03:54 +01:00
volumes.h Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-12-16 10:53:01 -08:00
xattr.c btrfs: fix over-80 lines introduced by previous cleanups 2017-02-14 15:50:57 +01:00
xattr.h btrfs: Switch to generic xattr handlers 2016-05-17 19:17:09 -04:00
zlib.c btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00