Commit graph

39473 commits

Author SHA1 Message Date
David Howells
f09b443d0e FS-Cache: Synchronise object death state change vs operation submission
When an object is being marked as no longer live, do this under the object
spinlock to prevent a race with operation submission targeted on that object.

The problem occurs due to the following pair of intertwined sequences when the
cache tries to create an object that would take it over the hard available
space limit:

 NETFS INTERFACE
 ===============
 (A) The netfs calls fscache_acquire_cookie().  object creation is deferred to
     the object state machine and the netfs is allowed to continue.

	OBJECT STATE MACHINE KTHREAD
	============================
	(1) The object is looked up on disk by fscache_look_up_object()
	    calling cachefiles_walk_to_object().  The latter finds that the
	    object is not yet represented on disk and calls
	    fscache_object_lookup_negative().

	(2) fscache_object_lookup_negative() sets FSCACHE_COOKIE_NO_DATA_YET
	    and clears FSCACHE_COOKIE_LOOKING_UP, thus allowing the netfs to
	    start queuing read operations.

 (B) The netfs calls fscache_read_or_alloc_pages().  This calls
     fscache_wait_for_deferred_lookup() which sees FSCACHE_COOKIE_LOOKING_UP
     become clear, allowing the read to begin.

 (C) A read operation is set up and passed to fscache_submit_op() to deal
     with.

	(3) cachefiles_walk_to_object() calls cachefiles_has_space(), which
	    fails (or one of the file operations to create stuff fails).
	    cachefiles returns an error to fscache.

	(4) fscache_look_up_object() transits to the LOOKUP_FAILURE state,

	(5) fscache_lookup_failure() sets FSCACHE_OBJECT_LOOKED_UP and
	    FSCACHE_COOKIE_UNAVAILABLE and clears FSCACHE_COOKIE_LOOKING_UP
	    then transits to the KILL_OBJECT state.

	(6) fscache_kill_object() clears FSCACHE_OBJECT_IS_LIVE in an attempt
	    to reject any further requests from the netfs.

	(7) object->n_ops is examined and found to be 0.
	    fscache_kill_object() transits to the DROP_OBJECT state.

 (D) fscache_submit_op() locks the object spinlock, sees if it can dispatch
     the op immediately by calling fscache_object_is_active() - which fails
     since FSCACHE_OBJECT_IS_AVAILABLE has not yet been set.

 (E) fscache_submit_op() then tests FSCACHE_OBJECT_LOOKED_UP - which is set.
     It then queues the object and increments object->n_ops.

	(8) fscache_drop_object() releases the object and eventually
	    fscache_put_object() calls cachefiles_put_object() which suffers
	    an assertion failure here:

		ASSERTCMP(object->fscache.n_ops, ==, 0);

Locking the object spinlock in step (6) around the clearance of
FSCACHE_OBJECT_IS_LIVE ensures that the the decision trees in
fscache_submit_op() and fscache_submit_exclusive_op() don't see the IS_LIVE
flag being cleared mid-decision: either the op is queued before step (7) - in
which case fscache_kill_object() will see n_ops>0 and will deal with the op -
or the op will be rejected.

This, combined with rejecting op submission if the target object is dying, fix
the problem.

The problem shows up as the following oops:

CacheFiles: Assertion failed
CacheFiles: 1 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/cachefiles/interface.c:339!
...
RIP: 0010:[<ffffffffa014fd9c>]  [<ffffffffa014fd9c>] cachefiles_put_object+0x2a4/0x301 [cachefiles]
...
Call Trace:
 [<ffffffffa008674b>] fscache_put_object+0x18/0x21 [fscache]
 [<ffffffffa00883e6>] fscache_object_work_func+0x3ba/0x3c9 [fscache]
 [<ffffffff81054dad>] process_one_work+0x226/0x441
 [<ffffffff81055d91>] worker_thread+0x273/0x36b
 [<ffffffff81055b1e>] ? rescuer_thread+0x2e1/0x2e1
 [<ffffffff81059b9d>] kthread+0x10e/0x116
 [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
 [<ffffffff815579ac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
2015-04-02 14:28:53 +01:00
David Howells
6515d1dbf4 FS-Cache: Handle a new operation submitted against a killed object
Reject new operations that are being submitted against an object if that
object has failed its lookup or creation states or has been killed by the
cache backend for some other reason, such as having been culled.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
2015-04-02 14:28:53 +01:00
David Howells
30ceec6284 FS-Cache: When submitting an op, cancel it if the target object is dying
When submitting an operation, prefer to cancel the operation immediately
rather than queuing it for later processing if the object is marked as dying
(ie. the object state machine has reached the KILL_OBJECT state).

Whilst we're at it, change the series of related test_bit() calls into a
READ_ONCE() and bitwise-AND operators to reduce the number of load
instructions (test_bit() has a volatile address).

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
2015-04-02 14:28:53 +01:00
David Howells
3c3059841a FS-Cache: Move fscache_report_unexpected_submission() to make it more available
Move fscache_report_unexpected_submission() up within operation.c so that it
can be called from fscache_submit_exclusive_op() too.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
2015-04-02 14:28:53 +01:00
David Howells
182d919b84 FS-Cache: Count culled objects and objects rejected due to lack of space
Count the number of objects that get culled by the cache backend and the
number of objects that the cache backend declines to instantiate due to lack
of space in the cache.

These numbers are made available through /proc/fs/fscache/stats

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
2015-02-24 10:05:27 +00:00
Linus Torvalds
b2b89ebfc0 File locking related fixes for v3.20 (pile #2)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU48n/AAoJEAAOaEEZVoIVtbsP/iWEnnP4ZIY8Bai32mQAVgdm
 C20aftlQvtrNWOf9SSjFIZGQDLeExk2RTZMbkJhCS4SkVjdB38mST/mBglFO5MLc
 xarz2FcAApOYAu6d2qkfze3KuCQHq4xPhDs0C2WLf0ENUOeE2nFAZcOccL2VyJvW
 RQF0AslWVhhvbaCnIpmDFx5SnL+yOuMcVJOMO5g3HPjbW8oaZWQuvjTCRxdAI2tk
 CZBZIfyve0KH6WSGHQkAlH5PU3myV3XHgZ4UHqM1nBLF0L2LyRARXGfnbzBcS+G9
 kgX/L7ohwI/VXG9MvD2IyQ7fpMyV60tHmDQBR3eqaxs4OKPD4p2c62LahGtUSxM7
 B9+WX6pypj14MQS96iVtQEHgqGDixQbmIjq+EslwvzqPZR77nYOPmDRP+sWsmok1
 tNRy8WizZPC45SO9gs7LzZQF1eFTMyalW5IZTh4UbwWRjGjJRtpdEmFSWyN6jLuL
 iJnhe39g+sQOqyPPcP6SxcZiCnLj0Y5utrDRwIMM03kKugfC80id+RDTw8I1uQ/p
 Bmch6FoGvn3jFB0O1OAxp6ZbB5KwdKBgNPfzpoK+D7kjKJSWH1tZkFpfSvINKx9g
 yxVahQkHVy9TFPY0uhA6j/IwNZ3c+wdRZ5lbpMKMS46LRvzGc3zNSCn5e6dWOBA2
 GS+K2xmkLo1pRuYv96f9
 =Gn2o
 -----END PGP SIGNATURE-----

Merge tag 'locks-v3.20-2' of git://git.samba.org/jlayton/linux

Pull file locking fixes from Jeff Layton:
 "A small set of patches to fix problems with the recent file locking
  changes that we discussed earlier this week"
"

* tag 'locks-v3.20-2' of git://git.samba.org/jlayton/linux:
  locks: fix list insertion when lock is split in two
  locks: remove conditional lock release in middle of flock_lock_file
  locks: only remove leases associated with the file being closed
  Revert "locks: keep a count of locks on the flctx lists"
2015-02-18 10:21:47 -08:00
Linus Torvalds
402521b8f7 MTD updates for 3.20-rc1
NAND:
 
  * Add new Hisilicon NAND driver for Hip04
  * Add default reboot handler, to ensure all outstanding erase transactions
    complete in time
  * jz4740: convert to use GPIO descriptor API
  * Atmel: add support for sama5d4
  * Change default bitflip threshold to 75% of correction strength
  * Miscellaneous cleanups and bugfixes
 
 SPI NOR:
 
  * Freescale QuadSPI:
    - Fix a few probe() and remove() issues
    - Add a MAINTAINERS entry for this driver
    - Tweak transfer size to increase read performance
    - Add suspend/resume support
  * Add Micron quad I/O support
  * ST FSM SPI: miscellaneous fixes
 
 JFFS2:
 
  * gracefully handle corrupted 'offset' field found on flash
 
 Other:
 
  * bcm47xxpart: add tweaks for a few new devices
  * mtdconcat: set return lengths properly for mtd_write_oob()
  * map_ram: enable use with mtdoops
  * maps: support fallback to ROM/UBI for write-protected NOR flash
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU4qf2AAoJEFySrpd9RFgtmo4P/i7KD+Xx12SgBbO+ZUCqBJhh
 X+gorTFr0YpItdn53i1PA8t+WnnXi4BHY07Y8fCj/JL+lxzS+00156o+hsYAFWIl
 TVvjlFHxUYS/rh7plshd5kbEZunlXBOpWw2Qr4dSoIIuOChaRDm9eGNHJ75D/ImO
 Cr+83cyYAm0F+fCHavZKHUq/iFmpDcrt3vbPx/Rv51W+rs/HqPPUcKxt4iaL5Thk
 R0pkcaZHfJ+pkXfjkgRu/L35RLRVxRkycYvLlVSOyE/KqnzE1RRgFeHUYUiPeCem
 xUEoI0OqIYlR5LuKTt/NsBtz1W0Kcm3AcQDC5QliKnbGCwm9nbHAjqfraaZ4Ks2Z
 4YL/2pJCyJFT6NPjsiwiYkJOzJHvN8tLCSIQrXCtAKAkMn8YMHvWIEC/bVsAkpVq
 V3ke3gmZ8bY7sXyY+Fi5WVW4uxKCwSVtGiAw3i74v3z5hZZ818hkbtPc1J0CANiE
 iqbkLMJ5pvWuVT9V2qGlDqK1MDqNXNLXZgBfT9tJx/q5Ptitva79Ift4teRwery2
 5pD3uSaA3vJE2AGHKPfIyTDFqdDDUDCOWJIGbIKsYoKXSAmuOxuWKEhRMWeZMmjo
 o0ZOrhJqBNp4ZqvAxUddUOsGhRKNa3btPoB+IhAQG4+OBwxknsAY39BzPcBjKrkG
 iEKHgRDXXMe8W2wCalLw
 =+nRk
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20150216' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 "NAND:

   - Add new Hisilicon NAND driver for Hip04
   - Add default reboot handler, to ensure all outstanding erase
     transactions complete in time
   - jz4740: convert to use GPIO descriptor API
   - Atmel: add support for sama5d4
   - Change default bitflip threshold to 75% of correction strength
   - Miscellaneous cleanups and bugfixes

  SPI NOR:

   - Freescale QuadSPI:
   - Fix a few probe() and remove() issues
   - Add a MAINTAINERS entry for this driver
   - Tweak transfer size to increase read performance
   - Add suspend/resume support
   - Add Micron quad I/O support
   - ST FSM SPI: miscellaneous fixes

  JFFS2:

   - gracefully handle corrupted 'offset' field found on flash

  Other:

   - bcm47xxpart: add tweaks for a few new devices
   - mtdconcat: set return lengths properly for mtd_write_oob()
   - map_ram: enable use with mtdoops
   - maps: support fallback to ROM/UBI for write-protected NOR flash"

* tag 'for-linus-20150216' of git://git.infradead.org/linux-mtd: (46 commits)
  mtd: hisilicon: && vs & typo
  jffs2: fix handling of corrupted summary length
  mtd: hisilicon: add device tree binding documentation
  mtd: hisilicon: add a new NAND controller driver for hisilicon hip04 Soc
  mtd: avoid registering reboot notifier twice
  mtd: concat: set the return lengths properly
  mtd: kconfig: replace PPC_OF with PPC
  mtd: denali: remove unnecessary stubs
  mtd: nand: remove redundant local variable
  MAINTAINERS: add maintainer entry for FREESCALE QUAD SPI driver
  mtd: fsl-quadspi: improve read performance by increase AHB transfer size
  mtd: fsl-quadspi: Remove unnecessary 'map_failed' label
  mtd: fsl-quadspi: Remove unneeded success/error messages
  mtd: fsl-quadspi: Fix the error paths
  mtd: nand: omap: drop condition with no effect
  mtd: nand: jz4740: Convert to GPIO descriptor API
  mtd: nand: Request strength instead of bytes for soft BCH
  mtd: nand: default bitflip-reporting threshold to 75% of correction strength
  mtd: atmel_nand: introduce a new compatible string for sama5d4 chip
  mtd: atmel_nand: return max bitflips in all sectors in pmecc_correction()
  ...
2015-02-18 08:01:44 -08:00
Linus Torvalds
533cf7aef2 Merge branch 'for-3.20' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
 "These are fixes for two bugs introduced during the merge window"

* 'for-3.20' of git://linux-nfs.org/~bfields/linux:
  nfsd4: fix v3-less build
  nfsd: fix comparison in fh_fsid_match()
2015-02-17 17:00:54 -08:00
Linus Torvalds
038911597e Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull lazytime mount option support from Al Viro:
 "Lazytime stuff from tytso"

* 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ext4: add optimization for the lazytime mount option
  vfs: add find_inode_nowait() function
  vfs: add support for a lazytime mount option
2015-02-17 16:12:34 -08:00
Linus Torvalds
66dc830d14 Merge branch 'iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter updates from Al Viro:
 "More iov_iter work - missing counterpart of iov_iter_init() for
  bvec-backed ones and vfs_read_iter()/vfs_write_iter() - wrappers for
  sync calls of ->read_iter()/->write_iter()"

* 'iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: add vfs_iter_{read,write} helpers
  new helper: iov_iter_bvec()
2015-02-17 15:48:33 -08:00
Linus Torvalds
05016b0f0a Merge branch 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull getname/putname updates from Al Viro:
 "Rework of getname/getname_kernel/etc., mostly from Paul Moore.  Gets
  rid of quite a pile of kludges between namei and audit..."

* 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  audit: replace getname()/putname() hacks with reference counters
  audit: fix filename matching in __audit_inode() and __audit_inode_child()
  audit: enable filename recording via getname_kernel()
  simpler calling conventions for filename_mountpoint()
  fs: create proper filename objects using getname_kernel()
  fs: rework getname_kernel to handle up to PATH_MAX sized filenames
  cut down the number of do_path_lookup() callers
2015-02-17 15:27:47 -08:00
Linus Torvalds
c6b1de1b64 Merge branch 'debugfs_automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull debugfs patches from Al Viro:
 "debugfs patches, mostly to make it possible for something like tracefs
  to be transparently automounted on given directory in debugfs.

  New primitive in there is debugfs_create_automount(name, parent, func,
  arg), which creates a directory and makes its ->d_automount() return
  func(arg).  Another missing primitive was debugfs_create_file_size() -
  open-coded in quite a few places.  Dave's patch adds it and converts
  the open-code instances to calling it"

* 'debugfs_automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  debugfs: Provide a file creation function that also takes an initial size
  new primitive: debugfs_create_automount()
  debugfs: split end_creating() into success and failure cases
  debugfs: take mode-dependent parts of debugfs_get_inode() into callers
  fold debugfs_mknod() into callers
  fold debugfs_create() into caller
  fold debugfs_mkdir() into caller
  debugfs_mknod(): get rid useless arguments
  fold debugfs_link() into caller
  debugfs: kill __create_file()
  debugfs: split the beginning and the end of __create_file() off
  debugfs_{mkdir,create,link}(): get rid of redundant argument
2015-02-17 15:18:19 -08:00
Linus Torvalds
50652963ea Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc VFS updates from Al Viro:
 "This cycle a lot of stuff sits on topical branches, so I'll be sending
  more or less one pull request per branch.

  This is the first pile; more to follow in a few.  In this one are
  several misc commits from early in the cycle (before I went for
  separate branches), plus the rework of mntput/dput ordering on umount,
  switching to use of fs_pin instead of convoluted games in
  namespace_unlock()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch the IO-triggering parts of umount to fs_pin
  new fs_pin killing logics
  allow attaching fs_pin to a group not associated with some superblock
  get rid of the second argument of acct_kill()
  take count and rcu_head out of fs_pin
  dcache: let the dentry count go down to zero without taking d_lock
  pull bumping refcount into ->kill()
  kill pin_put()
  mode_t whack-a-mole: chelsio
  file->f_path.dentry is pinned down for as long as the file is open...
  get rid of lustre_dump_dentry()
  gut proc_register() a bit
  kill d_validate()
  ncpfs: get rid of d_validate() nonsense
  selinuxfs: don't open-code d_genocide()
2015-02-17 14:56:45 -08:00
Linus Torvalds
e2b74f232e Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - a pile of minor fs fixes and cleanups

 - kexec updates

 - random misc fixes in various places: vmcore, rbtree, eventfd, ipc, seccomp.

 - a series of python-based kgdb helper scripts

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
  seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO
  samples/seccomp: improve label helper
  ipc,sem: use current->state helpers
  scripts/gdb: disable pagination while printing from breakpoint handler
  scripts/gdb: define maintainer
  scripts/gdb: convert CpuList to generator function
  scripts/gdb: convert ModuleList to generator function
  scripts/gdb: use a generator instead of iterator for task list
  scripts/gdb: ignore byte-compiled python files
  scripts/gdb: port to python3 / gdb7.7
  scripts/gdb: add basic documentation
  scripts/gdb: add lx-lsmod command
  scripts/gdb: add class to iterate over CPU masks
  scripts/gdb: add lx_current convenience function
  scripts/gdb: add internal helper and convenience function for per-cpu lookup
  scripts/gdb: add get_gdbserver_type helper
  scripts/gdb: add internal helper and convenience function to retrieve thread_info
  scripts/gdb: add is_target_arch helper
  scripts/gdb: add helper and convenience function to look up tasks
  scripts/gdb: add task iteration class
  ...
2015-02-17 14:35:02 -08:00
Fabian Frederick
0445f01a53 fs/affs/super.c: fix switch indentation
Fix checkpatch error:

  ERROR: switch and case should be at the same indent

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:53 -08:00
Fabian Frederick
0cdfe18ad5 fs/affs/inode.c: remove double extern affs_symlink_inode_operations
affs_symlink_inode_operations was already declared extern in affs.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Fabian Frederick
211c2af014 fs/affs/bitmap.c: remove unnecessary return
return is not needed at the end of function.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Fabian Frederick
b4478e3530 fs/affs/amigaffs.c: remove else after return
else is unnecessary after return -ENAMETOOLONG

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Fabian Frederick
f157853e40 fs/affs: define AFFSNAMEMAX to replace constant use
30 was used all over the place to compare name length against
AFFS maximum name length.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Fabian Frederick
eeb36f8e93 fs/affs: use unsigned int for string lengths
- Some min() were used with different types.

- Create a new variable in __affs_hash_dentry() to process
  affs_check_name()/min() return

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Fabian Frederick
4d29e571e1 fs/affs/super.c: destroy sbi mutex in affs_kill_sb()
Call mutex_destroy() on superblock mutex in affs_kill_sb() otherwise mutex
debugging code isn't able to detect that mutex is used after being freed.
(thanks to Jan Kara for complete definition).

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Fabian Frederick
92b20708f9 fs/affs/file.c: fix direct IO writes beyond EOF
Use the same fallback to normal IO in case of write
operations beyond EOF as fat direct IO. This patch fixes

fsx file -d -Z -r 4096 -w 4096

Report:
  129(129 mod 256): TRUNCATE DOWN from 0x3ff01 to 0xb3f6
  130(130 mod 256): WRITE    0x22000 thru 0x2dfff (0xc000 bytes) HOLE

Thanks to Jan for helping me on this problem.

The ideal solution suggested by Jan Kara would be to use
cont_write_begin() but affs direct_IO shouldn't be used a lot anyway...

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Fabian Frederick
afe305dcc9 fs/affs/file.c: replace if/BUG by BUG_ON
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Geert Uytterhoeven
08fe100d91 fs/affs: fix casting in printed messages
- "inode.i_ino" is "unsigned long",
  - "loff_t" is always "unsigned long long",
  - "sector_t" should be cast to "unsigned long long" for printing,
  - "u32" should not be cast to "unsigned int" for printing.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Chris Mason
e22553e2a2 eventfd: don't take the spinlock in eventfd_poll
The spinlock in eventfd_poll is trying to protect the count of events so
it can decide if it should return POLLIN, POLLERR, or POLLOUT.  But,
because of the way we drop the lock after calling poll_wait, and drop it
again before returning, we have the same pile of races with the lock as
we do with a single read of ctx->count().

This replaces the lock with a read barrier and single read.

eventfd_write does a single bump of ctx->count, so this should not add
new races with adding events.  eventfd_read is similar, it will do a
single decrement with the lock held, and so we're making the race with
concurrent readers slightly larger.

This spinlock is the top CPU user in kernel code during one of our
workloads.  Removing it gives us a ~2% boost.

[arnd@arndb.de: avoid unused variable warning]
[dan.carpenter@oracle.com: type bug in eventfd_poll()]
Signed-off-by: Chris Mason <clm@fb.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
WANG Chao
34b4776429 vmcore: fix PT_NOTE n_namesz, n_descsz overflow issue
When updating PT_NOTE header size (ie.  p_memsz), an overflow issue
happens with the following bogus note entry:

  n_namesz = 0xFFFFFFFF
  n_descsz = 0x0
  n_type   = 0x0

This kind of note entry should be dropped during updating p_memsz.  But
because n_namesz is 32bit, after (n_namesz + 3) & (~3), it's overflow to
0x0, the note entry size looks sane and reserved.

When userspace (eg.  crash utility) is trying to access such bogus note,
it could lead to an unexpected behavior (eg.  crash utility segment fault
because it's reading bogus address).

The source of bogus note hasn't been identified yet.  At least we could
drop the bogus note so user space wouldn't be surprised.

Signed-off-by: WANG Chao <chaowang@redhat.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Randy Wright <rwright@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Rashika Kheria <rashika.kheria@gmail.com>
Cc: Greg Pearson <greg.pearson@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:52 -08:00
Fred Chou
d6bd428275 fs: fat: use MSDOS_SB macro to get msdos_sb_info
Use the MSDOS_SB macro to get msdos_sb_info, instead of coding it
directly.

Signed-off-by: Fred Chou <fred.chou.nd@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:51 -08:00
Fabian Frederick
714b71a3a9 fs/reiserfs/inode.c: replace 0 by NULL for pointers
Fix sparse warning:

  fs/reiserfs/inode.c:2769:19: warning: Using plain integer as NULL pointer

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:51 -08:00
Fabian Frederick
ed3ad79f87 fs/ufs/super.c: fix potential race condition
Let locking subsystem decide on mutex management.  As reported by Andrew
Morton this patch fixes a bug:

: lock_ufs() is assuming that on non-preempt uniprocessor, the calling
: code will run atomically up to the matching unlock_ufs().
:
: But that isn't true. The very first site I looked at (ufs_frag_map)
: does sb_bread() under lock_ufs().  And sb_bread() will call schedule(),
: very commonly.
:
: The ->mutex_owner stuff is a bit hacky but should work OK.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:51 -08:00
Fabian Frederick
61da3ae241 fs/ufs/super.c: remove unnecessary casting
Fix the following coccinelle warning:

  fs/ufs/super.c:1418:7-28: WARNING: casting value returned by memory allocation function to (struct ufs_inode_info *) is useless.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:51 -08:00
Fabian Frederick
b625032b10 fs/coda/dir.c: forward declaration clean-up
- Move operation structures to avoid forward declarations.

- Fix some checkpatch warnings:

WARNING: Missing a blank line after declarations
+		struct inode *host_inode = file_inode(host_file);
+		mutex_lock(&host_inode->i_mutex);

ERROR: that open brace { should be on the previous line
+const struct dentry_operations coda_dentry_operations =
+{

ERROR: that open brace { should be on the previous line
+const struct inode_operations coda_dir_inode_operations =
+{

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:50 -08:00
Fabian Frederick
111d639dd6 fs/befs/linuxvfs.c: remove unnecessary casting
Fix the following coccinelle warning:

  fs/befs/linuxvfs.c:278:14-36: WARNING: casting value returned by memory allocation function to (struct befs_inode_info *) is useless.

[akpm@linux-foundation.org: avoid 80-col ugliness]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:50 -08:00
Linus Torvalds
9cd77374f0 Merge branch 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc update from Helge Deller:
 "The major change in here is the removal of the old HP-UX compat code
  which should have made it possible to load and execute 32-bit HP-UX
  binaries on PA-RISC Linux.  Since it was never functional and since
  nobody cares about old 32-bit HPUX binaries any longer, it's now time
  to free up 3200 lines of kernel code (CONFIG_HPUX and
  CONFIG_BINFMT_SOM).

  Other than that we wire up the execveat() syscall, fix sparse errors
  and have some whitespace cleanups"

* 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  fs/binfmt_som: Drop kernel support for HP-UX SOM binaries
  parisc: Remove unused function
  parisc: macro whitespace fixes
  parisc/uaccess: fix sparse errors
  parisc: hpux - Remove HPUX syscall numbers
  parisc: hpux - Remove hpux gateway page
  parisc: hpux - Delete files in hpux subdirectory
  parisc: hpux - Do not compile hpux subdirectory
  parisc: hpux - Drop support for HP-UX binaries
  parisc: Add error checks when building up signal trampoline handler
  parisc: Wire up execveat syscall
2015-02-17 14:25:58 -08:00
Jeff Layton
2e2f756f81 locks: fix list insertion when lock is split in two
In the case where we're splitting a lock in two, the current code
the new "left" lock in the incorrect spot. It's inserted just
before "right" when it should instead be inserted just before the
new lock.

When we add a new lock, set "fl" to that value so that we can
add "left" before it.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-02-17 17:08:23 -05:00
Jeff Layton
267f112858 locks: remove conditional lock release in middle of flock_lock_file
As Linus pointed out:

    Say we have an existing flock, and now do a new one that conflicts. I
    see what looks like three separate bugs.

     - We go through the first loop, find a lock of another type, and
    delete it in preparation for replacing it

     - we *drop* the lock context spinlock.

     - BUG #1? So now there is no lock at all, and somebody can come in
    and see that unlocked state. Is that really valid?

     - another thread comes in while the first thread dropped the lock
    context lock, and wants to add its own lock. It doesn't see the
    deleted or pending locks, so it just adds it

     - the first thread gets the context spinlock again, and adds the lock
    that replaced the original

     - BUG #2? So now there are *two* locks on the thing, and the next
    time you do an unlock (or when you close the file), it will only
    remove/replace the first one.

...remove the "drop the spinlock" code in the middle of this function as
it has always been suspicious. This should eliminate the potential race
that can leave two locks for the same struct file on the list.

He also pointed out another thing as a bug -- namely that you
flock_lock_file removes the lock from the list unconditionally when
doing a lock upgrade, without knowing whether it'll be able to set the
new lock. Bruce pointed out that this is expected behavior and may help
prevent certain deadlock situations.

We may want to revisit that at some point, but it's probably best that
we do so in the context of a different patchset.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-02-17 15:23:09 -05:00
Jeff Layton
c4e136cda1 locks: only remove leases associated with the file being closed
We don't want to remove all leases just because one filp was closed.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-02-17 15:22:57 -05:00
David Howells
e59b4e9187 debugfs: Provide a file creation function that also takes an initial size
Provide a file creation function that also takes an initial size so that the
caller doesn't have to set i_size, thus meaning that we don't have to call
deal with ->d_inode in the callers.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-17 12:21:51 -05:00
Helge Deller
35e88d5c22 fs/binfmt_som: Drop kernel support for HP-UX SOM binaries
The parisc arch has been the only user of HP-UX SOM binaries.

Support for HP-UX executables was never finished and since we now drop support
for the HP-UX compat layer anyway, it does not makes sense to keep the
BINFMT_SOM support.

Cc: linux-fsdevel@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
2015-02-17 16:29:36 +01:00
Brian Norris
eb928d40a9 Merge JFFS2 updates from David Woodhouse 2015-02-16 18:05:26 -08:00
Joseph Qi
160cc26663 ocfs2: set append dio as a ro compat feature
Intruduce a bit OCFS2_FEATURE_RO_COMPAT_APPEND_DIO and check it in
write flow. If the bit is not set, fall back to the old way.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:05 -08:00
Joseph Qi
4813962bee ocfs2: wait for orphan recovery first once append O_DIRECT write crash
If one node has crashed with orphan entry leftover, another node which do
append O_DIRECT write to the same file will override the
i_dio_orphaned_slot.  Then the old entry won't be cleaned forever.  If
this case happens, we let it wait for orphan recovery first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:05 -08:00
Joseph Qi
3a83b342c8 ocfs2: complete the rest request through buffer io
Complte the rest request thourgh buffer io after direct write performed.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:05 -08:00
Joseph Qi
d943d59dd3 ocfs2: do not fallback to buffer I/O write if appending
Now we can do direct io and do not fallback to buffered IO any more in
case of append O_DIRECT write.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:05 -08:00
Joseph Qi
49255dce65 ocfs2: allocate blocks in ocfs2_direct_IO_get_blocks
Allow blocks allocation in ocfs2_direct_IO_get_blocks.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:05 -08:00
Joseph Qi
24c40b329e ocfs2: implement ocfs2_direct_IO_write
Implement ocfs2_direct_IO_write.  Add the inode to orphan dir first, and
then delete it once append O_DIRECT finished.

This is to make sure block allocation and inode size are consistent.

[akpm@linux-foundation.org: fix it for "block: Add discard flag to blkdev_issue_zeroout() function"]
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:05 -08:00
Joseph Qi
ed460cffc2 ocfs2: add orphan recovery types in ocfs2_recover_orphans
Define two orphan recovery types, which indicates if need truncate file or
not.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:04 -08:00
Joseph Qi
06ee5c75b5 ocfs2: add functions to add and remove inode in orphan dir
Add functions to add inode to orphan dir and remove inode in orphan dir.
Here we do not call ocfs2_prepare_orphan_dir and ocfs2_orphan_add
directly.  Because append O_DIRECT will add inode to orphan two and may
result in more than one orphan entry for the same inode.

[akpm@linux-foundation.org: avoid dynamic stack allocation]
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Cc: Fengguang Wu <fengguang.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:04 -08:00
Joseph Qi
026749a86e ocfs2: prepare some interfaces used in append direct io
Currently in case of append O_DIRECT write (block not allocated yet),
ocfs2 will fall back to buffered I/O.  This has some disadvantages.
Firstly, it is not the behavior as expected.  Secondly, it will consume
huge page cache, e.g.  in mass backup scenario.  Thirdly, modern
filesystems such as ext4 support this feature.

In this patch set, the direct I/O write doesn't fallback to buffer I/O
write any more because the allocate blocks are enabled in direct I/O now.

This patch (of 9):

Prepare some interfaces which will be used in append O_DIRECT write.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:04 -08:00
Matthew Wilcox
d92576f116 dax: does not work correctly with virtual aliasing caches
The DAX code accesses the underlying storage through the kernel's linear
mapping, which may not be cache-coherent with user mappings on ARM, MIPS
or SPARC.  Temporarily disable the DAX code until this problem is
resolved.

The original XIP code also had this problem, but it was never noticed.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:04 -08:00
Ross Zwisler
923ae0ff92 ext4: add DAX functionality
This is a port of the DAX functionality found in the current version of
ext2.

[matthew.r.wilcox@intel.com: heavily tweaked]
[akpm@linux-foundation.org: remap_pages went away]
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:04 -08:00