linux-hardened/fs
David Howells f09b443d0e FS-Cache: Synchronise object death state change vs operation submission
When an object is being marked as no longer live, do this under the object
spinlock to prevent a race with operation submission targeted on that object.

The problem occurs due to the following pair of intertwined sequences when the
cache tries to create an object that would take it over the hard available
space limit:

 NETFS INTERFACE
 ===============
 (A) The netfs calls fscache_acquire_cookie().  object creation is deferred to
     the object state machine and the netfs is allowed to continue.

	OBJECT STATE MACHINE KTHREAD
	============================
	(1) The object is looked up on disk by fscache_look_up_object()
	    calling cachefiles_walk_to_object().  The latter finds that the
	    object is not yet represented on disk and calls
	    fscache_object_lookup_negative().

	(2) fscache_object_lookup_negative() sets FSCACHE_COOKIE_NO_DATA_YET
	    and clears FSCACHE_COOKIE_LOOKING_UP, thus allowing the netfs to
	    start queuing read operations.

 (B) The netfs calls fscache_read_or_alloc_pages().  This calls
     fscache_wait_for_deferred_lookup() which sees FSCACHE_COOKIE_LOOKING_UP
     become clear, allowing the read to begin.

 (C) A read operation is set up and passed to fscache_submit_op() to deal
     with.

	(3) cachefiles_walk_to_object() calls cachefiles_has_space(), which
	    fails (or one of the file operations to create stuff fails).
	    cachefiles returns an error to fscache.

	(4) fscache_look_up_object() transits to the LOOKUP_FAILURE state,

	(5) fscache_lookup_failure() sets FSCACHE_OBJECT_LOOKED_UP and
	    FSCACHE_COOKIE_UNAVAILABLE and clears FSCACHE_COOKIE_LOOKING_UP
	    then transits to the KILL_OBJECT state.

	(6) fscache_kill_object() clears FSCACHE_OBJECT_IS_LIVE in an attempt
	    to reject any further requests from the netfs.

	(7) object->n_ops is examined and found to be 0.
	    fscache_kill_object() transits to the DROP_OBJECT state.

 (D) fscache_submit_op() locks the object spinlock, sees if it can dispatch
     the op immediately by calling fscache_object_is_active() - which fails
     since FSCACHE_OBJECT_IS_AVAILABLE has not yet been set.

 (E) fscache_submit_op() then tests FSCACHE_OBJECT_LOOKED_UP - which is set.
     It then queues the object and increments object->n_ops.

	(8) fscache_drop_object() releases the object and eventually
	    fscache_put_object() calls cachefiles_put_object() which suffers
	    an assertion failure here:

		ASSERTCMP(object->fscache.n_ops, ==, 0);

Locking the object spinlock in step (6) around the clearance of
FSCACHE_OBJECT_IS_LIVE ensures that the the decision trees in
fscache_submit_op() and fscache_submit_exclusive_op() don't see the IS_LIVE
flag being cleared mid-decision: either the op is queued before step (7) - in
which case fscache_kill_object() will see n_ops>0 and will deal with the op -
or the op will be rejected.

This, combined with rejecting op submission if the target object is dying, fix
the problem.

The problem shows up as the following oops:

CacheFiles: Assertion failed
CacheFiles: 1 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/cachefiles/interface.c:339!
...
RIP: 0010:[<ffffffffa014fd9c>]  [<ffffffffa014fd9c>] cachefiles_put_object+0x2a4/0x301 [cachefiles]
...
Call Trace:
 [<ffffffffa008674b>] fscache_put_object+0x18/0x21 [fscache]
 [<ffffffffa00883e6>] fscache_object_work_func+0x3ba/0x3c9 [fscache]
 [<ffffffff81054dad>] process_one_work+0x226/0x441
 [<ffffffff81055d91>] worker_thread+0x273/0x36b
 [<ffffffff81055b1e>] ? rescuer_thread+0x2e1/0x2e1
 [<ffffffff81059b9d>] kthread+0x10e/0x116
 [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
 [<ffffffff815579ac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
2015-04-02 14:28:53 +01:00
..
9p Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
adfs
affs fs/affs/super.c: fix switch indentation 2015-02-17 14:34:53 -08:00
afs Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
autofs4 assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
befs fs/befs/linuxvfs.c: remove unnecessary casting 2015-02-17 14:34:50 -08:00
bfs
btrfs Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
cachefiles FS-Cache: Count culled objects and objects rejected due to lack of space 2015-02-24 10:05:27 +00:00
ceph Revert "locks: keep a count of locks on the flctx lists" 2015-02-16 14:32:03 -05:00
cifs Revert "locks: keep a count of locks on the flctx lists" 2015-02-16 14:32:03 -05:00
coda fs/coda/dir.c: forward declaration clean-up 2015-02-17 14:34:50 -08:00
configfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
cramfs
debugfs debugfs: Provide a file creation function that also takes an initial size 2015-02-17 12:21:51 -05:00
devpts
dlm netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
ecryptfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
efivarfs * Move efivarfs from the misc filesystem section to pseudo filesystem, 2015-01-29 19:16:40 +01:00
efs
exofs vfs: remove get_xip_mem 2015-02-16 17:56:03 -08:00
exportfs move d_rcu from overlapping d_child to overlapping d_alias 2014-11-03 15:20:29 -05:00
ext2 ext2: get rid of most mentions of XIP in ext2 2015-02-16 17:56:04 -08:00
ext3 ext3: destroy sbi mutexes in put_super 2015-01-05 11:13:55 +01:00
ext4 Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
f2fs Merge tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs 2015-02-12 19:28:50 -08:00
fat fs: fat: use MSDOS_SB macro to get msdos_sb_info 2015-02-17 14:34:51 -08:00
freevxfs
fscache FS-Cache: Synchronise object death state change vs operation submission 2015-04-02 14:28:53 +01:00
fuse Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
gfs2 Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
hfs fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp 2014-12-10 17:41:16 -08:00
hfsplus hfsplus: fix longname handling 2014-12-18 19:08:10 -08:00
hostfs
hpfs
hppfs vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
hugetlbfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
isofs isofs: Fix bug in the way to check if the year is a leap year 2015-01-07 09:51:49 +01:00
jbd jbd: Deletion of an unnecessary check before the function call "iput" 2014-11-18 10:15:29 +01:00
jbd2 Lots of bugs fixes, including Zheng and Jan's extent status shrinker 2014-12-12 09:28:03 -08:00
jffs2 Merge JFFS2 updates from David Woodhouse 2015-02-16 18:05:26 -08:00
jfs Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
kernfs kernfs: remove KERNFS_STATIC_NAME 2015-02-13 21:21:36 -08:00
lockd Merge branch 'for-3.20' of git://linux-nfs.org/~bfields/linux 2015-02-12 10:39:41 -08:00
logfs
minix
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 14:56:45 -08:00
nfs Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
nfs_common
nfsd nfsd4: fix v3-less build 2015-02-16 11:43:13 -05:00
nilfs2 Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
nls
notify fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask 2015-02-10 14:30:28 -08:00
ntfs fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info 2015-01-20 14:03:04 -07:00
ocfs2 ocfs2: set append dio as a ro compat feature 2015-02-16 17:56:05 -08:00
omfs FS/OMFS: block number sanity check during fill_super operation 2014-10-14 02:18:22 +02:00
openpromfs
overlayfs Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
proc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 14:56:45 -08:00
pstore pstore: Fix sprintf format specifier in pstore_dump() 2015-01-16 16:01:29 -08:00
qnx4
qnx6
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2015-02-10 15:52:38 -08:00
ramfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
reiserfs fs/reiserfs/inode.c: replace 0 by NULL for pointers 2015-02-17 14:34:51 -08:00
romfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
squashfs Squashfs: Add LZ4 compression configuration option 2014-11-27 18:48:44 +00:00
sysfs driver core patches for 3.20-rc1 2015-02-15 11:11:47 -08:00
sysv
ubifs Merge branch 'for-linus-v3.20' of git://git.infradead.org/linux-ubifs 2015-02-15 10:11:39 -08:00
udf udf: remove bool assignment to 0/1 2015-02-05 16:34:25 +01:00
ufs fs/ufs/super.c: fix potential race condition 2015-02-17 14:34:51 -08:00
xfs Merge branch 'akpm' (patches from Andrew) 2015-02-12 18:54:28 -08:00
aio.c Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
binfmt_elf.c Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2014-12-11 17:56:37 -08:00
binfmt_elf_fdpic.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_em86.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_flat.c
binfmt_misc.c unfuck binfmt_misc.c (broken by commit e6084d4) 2014-12-17 08:27:14 -05:00
binfmt_script.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
block_dev.c Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-block 2015-02-12 14:13:23 -08:00
buffer.c fs: clarify rate limit suppressed buffer I/O errors 2014-10-21 13:55:11 -06:00
char_dev.c fs: introduce f_op->mmap_capabilities for nommu mmap support 2015-01-20 14:02:58 -07:00
compat.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
compat_binfmt_elf.c
compat_ioctl.c
coredump.c coredump: add %i/%I in core_pattern to report the tid of the crashed thread 2014-10-14 02:18:21 +02:00
dax.c dax: add dax_zero_page_range 2015-02-16 17:56:04 -08:00
dcache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 14:56:45 -08:00
dcookies.c
direct-io.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
drop_caches.c vmscan: per memory cgroup slab shrinkers 2015-02-12 18:54:09 -08:00
eventfd.c eventfd: don't take the spinlock in eventfd_poll 2015-02-17 14:34:52 -08:00
eventpoll.c epoll: optimize setting task running after blocking 2015-02-13 21:21:40 -08:00
exec.c fs: create proper filename objects using getname_kernel() 2015-01-23 00:22:20 -05:00
fcntl.c vfs: renumber FMODE_NONOTIFY and add to uniqueness check 2015-01-08 15:10:52 -08:00
fhandle.c
file.c fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0) 2014-12-10 17:41:10 -08:00
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
filesystems.c
fs-writeback.c Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
fs_pin.c switch the IO-triggering parts of umount to fs_pin 2015-01-25 23:17:29 -05:00
fs_struct.c
inode.c Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
internal.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 14:56:45 -08:00
ioctl.c fsioctl.c: make generic_block_fiemap() signal-tolerant 2015-02-10 14:30:30 -08:00
Kconfig dax: does not work correctly with virtual aliasing caches 2015-02-16 17:56:04 -08:00
Kconfig.binfmt fs/binfmt_som: Drop kernel support for HP-UX SOM binaries 2015-02-17 16:29:36 +01:00
libfs.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
locks.c locks: fix list insertion when lock is split in two 2015-02-17 17:08:23 -05:00
Makefile Merge branch 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2015-02-17 14:25:58 -08:00
mbcache.c
mount.h switch the IO-triggering parts of umount to fs_pin 2015-01-25 23:17:29 -05:00
mpage.c vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
namei.c audit: replace getname()/putname() hacks with reference counters 2015-01-23 00:23:58 -05:00
namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 14:56:45 -08:00
no-block.c
nsfs.c take the targets of /proc/*/ns/* symlinks to separate fs 2014-12-10 21:30:20 -05:00
open.c Merge branch 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 15:27:47 -08:00
pipe.c
pnode.c mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. 2014-12-02 10:46:50 -06:00
pnode.h
posix_acl.c
proc_namespace.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
read_write.c Merge branch 'iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 15:48:33 -08:00
readdir.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
select.c all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
seq_file.c bitmap, cpumask, nodemask: remove dedicated formatting functions 2015-02-13 21:21:39 -08:00
signalfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
splice.c fs: add vfs_iter_{read,write} helpers 2015-01-29 00:13:13 -05:00
stack.c
stat.c
statfs.c
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 14:56:45 -08:00
sync.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
timerfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
utimes.c
xattr.c new helper: audit_file() 2014-11-19 13:01:26 -05:00