This is another clean up in the logging code. This per-transaction
list was largely unused. Its main function was to ensure that the
number of buffers in a transaction was correct, however that counter
was only used to check the number of buffers in the bd_list_tr, plus
an assert at the end of each transaction. With the assert now changed
to use the calculated buffer counts, we can remove both bd_list_tr and
its associated counter.
This should make the code easier to understand as well as shrinking
a couple of structures.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The main part of this patch merges the two functions used to
write metadata and data buffers to the log. Most of the code
is common between the two functions, so this provides a nice
clean up, and makes the code more readable.
The gfs2_get_log_desc() function is also extended to take two more
arguments, and thus avoid having to set the length and data1
fields of this strucuture as a separate operation.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Prior to this patch, we have two ways of sending i/o to the log.
One of those is used when we need to allocate both the data
to be written itself and also a buffer head to submit it. This
is done via sb_getblk and friends. This is used mostly for writing
log headers.
The other method is used when writing blocks which have some
in-place counterpart. This is the case for all the metadata
blocks which are journalled, and when journaled data is in use,
for unescaped journalled data blocks.
This patch replaces both of those two methods, and about half
a dozen separate i/o submission points with a single i/o
submission function. We also go direct to bio rather than
using buffer heads, since this allows us to build i/o
requests of the maximum size for the block device in
question. It also reduces the memory required for flushing
the log, which can be very useful in low memory situations.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In the future, the qadata structure will be eliminated and merged
back in with the block reservation structure, after we extend the
lifespan of that. This patch is a step forward in eliminating the
qadata structure. It adds a variable to the do_grow function to
determine when unstuffing is necessary, and has been done.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In the resource group code, we have no less than three different
kinds of block references: block relative to the file system (u64),
block relative to the rgrp (u32), and block relative to the bitmap.
This is a small step to making the code more readable; it renames
variable blk to biblk to solidify in my mind that it's relative to
the bitmap and nothing else.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch just fixes a bunch of function parameter comments.
Slowly, over the years, the comments have gotten out of date
(mostly my fault, as I haven't been good at keeping them up to date).
This patch rectifies some of that.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch changes block reservations so it uses slab storage.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch makes function gfs2_page_add_databufs static.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch renames function gfs2_close to gfs2_release.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since we always write the buffer directly after this function
returns, we might as well merge it into here. This is a clean
up in preparation for some further updates to the log code
which are coming soon.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The "pull" argument to log_write_header() is only used
for debug purposes and it is not really needed any more. There
are other tests for this particular problem, so I think we can
dispose of it in order to simplify the code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch instructs DLM to prevent an "in place" conversion, where the
lock just stays on the granted queue, and instead forces the conversion to
the back of the convert queue. This is done on upward conversions only.
This is useful in cases where, for example, a lock is frequently needed in
PR on one node, but another node needs it temporarily in EX to update it.
This may happen, for example, when the rindex is being updated by gfs2_grow.
The gfs2_grow needs to have the lock in EX, but the other nodes need to
re-read it to retrieve the updates. The glock is already granted in PR on
the non-growing nodes, so this prevents them from continually re-granting
the lock in PR, and forces the EX from gfs2_grow to go through.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix merge between commit 3adadc08cc ("net ax25: Reorder ax25_exit to
remove races") and commit 0ca7a4c87d ("net ax25: Simplify and
cleanup the ax25 sysctl handling")
The former moved around the sysctl register/unregister calls, the
later simply removed them.
With help from Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
These are two low-risk bug fixes for ext4, fixing a compile warning
and a potential deadlock.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABCAAGBQJPlgZ8AAoJENNvdpvBGATwewkP/ioo2U05O4tzmt05+HICw1ZK
vh1x6oaO3bUMa21pKBzS60rDc+EDu61E+bjVrsasOmom8DZyOP92SiwaDnIsKn6p
JBSNwzIOPmuPflEY3tnOsnOZ1umZcB16uhki1Rk1HE0nRPdKiyKJKZnbSzmUGWUW
gJwHbHddxZKTmDrEy4CxfbwwKKVm2SQUO5crLohFst4JsXc1h6muEfkcAZvCfZ68
1PQIkTkJUXArQuTuxzP89r7L8tqHJv4iOz+PT0FlluGWvgJUWIOVvjdJfPuQTmLi
UNzvtoQxuxjdZuCK/D16kNTkOEPzOhMlNW1djAntdCQohHIJG0Hd5bFju9bybSLz
838sTCEFxRS7rdBEXiksWsPCVDz/QVnPft0RG9jqXd6dRPFr/XJ1rAeDTjW2vmWw
ZO28p99aolA5At02AlSf9IgMIME0gKejnvpRo703UW456BlFIXPK3e/nbtE7Eb5A
HcZhvIwncWE4cbq2/AboielPSnyx6Z3SJS0hBIQ2wG40xcL/jxYL7K2/trkUr2KH
H3/4RsrSlLDXqHRJ4cVW75zKMgyNvc+60HDlAxE62LqKFR7K93hdlHpnkySy/1St
FaIiipH8Tmt+u6tqn6rlR82vRxd/dkLgQMpCWm4Et4THXlvisZkbxaDXrEGx79qg
v8eEdmHeJuLcQesm9TrS
=Ygid
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
"These are two low-risk bug fixes for ext4, fixing a compile warning
and a potential deadlock."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
super.c: unused variable warning without CONFIG_QUOTA
jbd2: use GFP_NOFS for blkdev_issue_flush
sb info is only checked with quota support.
fs/ext4/super.c: In function ‘parse_options’:
fs/ext4/super.c:1600:23: warning: unused variable ‘sbi’ [-Wunused-variable]
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
flush request is issued in transaction commit code path, so looks using
GFP_KERNEL to allocate memory for flush request bio falls into the classic
deadlock issue. I saw btrfs and dm get it right, but ext4, xfs and md are
using GFP.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
This includes one short patch fixing the behavior of
the QUECVT flag, which the gfs2 folks are waiting on.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPlYXZAAoJEDgbc8f8gGmqpzYP/RFkCn8mC5y5cM8lWBk2JQAJ
u7khyqowm3TWxjIpX85n7Uxq1vEX4RxFiRzCeiZj3ZoWE3PEQim8Tqrw8SFs8lcT
y7oYL6TBkgCbM1ROuKDqXRiw8oRAfRud3cqtRvQzxuds3AoaoyYvE6N+to2y9XlR
5DuUBJEtrpKOEdW1ZeXeUmCnvDwrUyEFuIlACoyochzbk6ug1EF926dgSaViE4ZG
OFcGMy8ELNqVYibVcJof2ZfztTvrMcXPIpsJrkK5tIW6w6q+2+eN4Xc2/xMZ4OYc
5AHHXxrqbK1ZABLrqsK/lUQi0Z241kAnqIi33i2nl3mhWSDF3K5CNXmrF9rvGsN7
wEqsfdGOnwFQucF1VU95neo+jYMnom9VGodpvSop7Xy5r+i59MPcfMDfz/I1KqX7
vBDuM5rwisYNfOb6wsfFNcBhkf1ktgo2h2iH5UdIaWfHApF1Lnls7D2j/o7r2uxF
tRd4sPhRt2eIn68XRggbWOVxMfdUKtaW50ZhKzW9osMItYX748O8XfQdk0sQUbD9
ZXbFEfbfsfRgMKhMSyNFcGDh6ePsT/cmZL/zR5VKVEHuprL3hEDPhCui5GT0Sm1G
9sXpLu9p51r0d4OIJpScOFMv8aD64w/mwLJ3r5nrGZz2APK9SwWJOqX82fyqivQc
uvO42yNGkwSGnBjXKiM6
=KDNZ
-----END PGP SIGNATURE-----
Merge tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm fixes from David Teigland:
"This includes one short patch fixing the behavior of the QUECVT flag,
which the gfs2 folks are waiting on."
* tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: fix QUECVT when convert queue is empty
The QUECVT flag should not prevent conversions from
being granted immediately when the convert queue is
empty.
Signed-off-by: David Teigland <teigland@redhat.com>
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retest the RB_EMPTY_NODE() condition under the spin lock
to ensure that we don't call rb_erase() more than once on the
same state owner.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
... since exit_mmap() is coming and it will munmap() everything anyway.
In all other cases aio_free_ring() has ctx->mm == current->mm; moreover,
all other callers of vm_munmap() have mm == current->mm, so this will
allow us to get rid of mm argument of vm_munmap().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The NFSv4 spec is ambiguous about whether or not it is permissible
to reuse open owner names, so play it safe. This patch adds a timestamp
to the state_owner structure, and combines that with the IDA based
uniquifier.
Fixes a regression whereby the Linux server returns NFS4ERR_BAD_SEQID.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This continues the theme started with vm_brk() and vm_munmap():
vm_mmap() does the same thing as do_mmap(), but additionally does the
required VM locking.
This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have
to export our internal do_mmap_pgoff() function.
Some day we hopefully don't have to export do_mmap() either, if all
modular users can become the simpler vm_mmap() instead. We're actually
very close to that already, with the notable exception of the (broken)
use in i810, and a couple of stragglers in binfmt_elf.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Like the vm_brk() function, this is the same as "do_munmap()", except it
does the VM locking for the caller.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It does the same thing as "do_brk()", except it handles the VM locking
too.
It turns out that all external callers want that anyway, so we can make
do_brk() static to just mm/mmap.c while at it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When hostname contains colon (e.g. when it is an IPv6 address) it needs
to be enclosed in brackets to make parsing of NFS device string possible.
Fix nfs_do_root_mount() to enclose hostname properly when needed. NFS code
actually does not need this as it does not parse the string passed by
nfs_do_root_mount() but the device string is exposed to userspace in
/proc/mounts.
CC: Josh Boyer <jwboyer@redhat.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In the recent update of the cifs_iovec_write code to use async writes,
the handling of the file position was broken. That patch added a local
"offset" variable to handle the offset, and then only updated the
original "*poffset" before exiting.
Unfortunately, it copied off the original offset from the beginning,
instead of doing so after generic_write_checks had been called. Fix
this by moving the initialization of "offset" after that in the
function.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Pull nfsd bugfixes from J. Bruce Fields:
"One bugfix, and one minor header fix from Jeff Layton while we're
here"
* 'for-3.4' of git://linux-nfs.org/~bfields/linux:
nfsd: include cld.h in the headers_install target
nfsd: don't fail unchecked creates of non-special files
If the file wasn't opened for writing, then truncate and ftruncate
need to report the appropriate errors.
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Since we may be simulating flock() locks using NFS byte range locks,
we can't rely on the VFS having checked the file open mode for us.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
All callers of nfs4_handle_exception() that need to handle
NFS4ERR_OPENMODE correctly should set exception->inode
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Pull fuse updates from Miklos Szeredi.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: use flexible array in fuse.h
fuse: allow nanosecond granularity
fuse: O_DIRECT support for files
fuse: fix nlink after unlink
The bitfield member mount_opt was too small by one bit to hold the mount
option that enabled to include data extents in the integrity checker.
Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR
option was added (git rebase silently merges so that the increase of the
size of the bitfield member is lost), the bit limit was removed entirely.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
When a filesystem is mounted with the degraded option, it is
possible that some of the devices are not there.
btrfs_ioctl_dev_info() crashs in this case because the device
name is a NULL pointer. This ioctl was only used for scrub.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
It is basically a good thing if we are interruptible when waiting for
free space, but the generality in which it is implemented currently
leads to system calls being interruptible that are not documented this
way. For example git can't handle interrupted unlink(), leading to
corrupt repos under space pressure.
Instead we raise the bar to only be interruptible by SIGKILL.
Thanks to David Sterba for suggesting this.
Signed-off-by: Arne Jansen <sensille@gmx.net>
The caller expects this function to return with the lock held and
releases it immediately on error.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid. This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success. So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Fix a bug, where in case we need to adjust stripe_size so that the
length of the resulting chunk is less than or equal to max_chunk_size,
DUP chunks turn out to be only half as big as they could be.
Cc: Arne Jansen <sensille@gmx.net>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
iref_to_path and iterate_irefs both increment the eb's refcount to use it
after releasing the path. Both depend on consistent data remaining in the
extent buffer and need a read lock to protect it.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Avoid calling free_extent_buffer more than once when the iterator function
returns non-zero. The only code that uses this is scrub repair for corrupted
nodatasum blocks.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Make free_ipath() behave like most other freeing functions in the
kernel and gracefully do nothing when passed a NULL pointer.
Besides this making the bahaviour consistent with functions such as
kfree(), vfree(), btrfs_free_path() etc etc, it also fixes a real NULL
deref issue in fs/btrfs/ioctl.c::btrfs_ioctl_ino_to_path(). In that
function we have this code:
...
ipath = init_ipath(size, root, path);
if (IS_ERR(ipath)) {
ret = PTR_ERR(ipath);
ipath = NULL;
goto out;
}
...
out:
btrfs_free_path(path);
free_ipath(ipath);
...
If we ever take the true branch of that 'if' statement we'll end up
passing a NULL pointer to free_ipath() which will subsequently
dereference it and we'll go "Boom" :-(
This patch will avoid that.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
clear_extent_bit()
{
next_node = rb_next(&state->rb_node);
...
clear_state_bit(state); <-- this may free next_node
if (next_node) {
state = rb_entry(next_node);
...
}
}
clear_state_bit() calls merge_state() which may free the next node
of the passing extent_state, so clear_extent_bit() may end up
referencing freed memory.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Currently it returns a set of bits that were cleared, but this return
value is not used at all.
Moreover it doesn't seem to be useful, because we may clear the bits
of a few extent_states, but only the cleared bits of last one is
returned.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Our code is not ready to cope with a sectorsize that's not equal to PAGE_SIZE.
It will lead to hanging-on while writing something.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Normally when there are 2 copies of a block, we add both to the
reada extent tree and prefetch only the one that is easier to reach.
This way we can better utilize multiple devices.
In case of DUP this makes no sense as both copies reside on the
same device.
Signed-off-by: Arne Jansen <sensille@gmx.net>
When inserting into the radix tree returns EEXIST, get the existing
entry without giving up the spinlock in between.
There was a race for both the zones trees and the extent tree.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Follow those instructions, and you'll trigger a warning in the
beginning of d_set_d_op():
# mkfs.btrfs /dev/loop3
# mount /dev/loop3 /mnt
# btrfs sub create /mnt/sub
# btrfs sub snap /mnt /mnt/snap
# touch /mnt/snap/sub
touch: cannot touch `tmp': Permission denied
__d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and
then simple_lookup() reset it to simple_dentry_operations, which
triggered the warning.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
This fixes a scalability problem reported by Andi Kleen and Tim Chen;
they were quite secretive about the precise nature of their workload,
but they later admitted that it only showed up when they were using a
large sparse file, so the amount of data I/O that was needed was close
to zero. I'm not sure how realistic this is and it's only a
regression if you consider changes made since 2.6.39 to be a
"regression" vis-a-vis the policy regarding post-merge window bug
fixes, but Linus agreed it was worth fixing, so I'm including it in
this pull request.
This also fixes the journalled quota mount options, which I
accidentally broke while I was cleaning up the mount option handling.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABCAAGBQJPjbs2AAoJENNvdpvBGATw/4UQAOMsxRlzbMnOAmohIBmesJiB
nTPX41dNw0QCXstMFjvSyRbUJI0NZPGg6ZDhbtoQ/c42/izZOVNd7gh7QQYHRCt2
Oqh6WS159P04ixcAe8NgCm5B5AV/C5Er/vSOUZENBaBo430vvrZasifsyUgQnx+b
PxXlYsfPVzaQVeSxCu/68OeiBRLcLwmKZ6MicOaWt9GCNoCsWzgU+/LNskYuscPI
841yQL6/BE7redU2E9qoEn/xjxx57hJOj2iiIAuqGPmNmRQqq3VtvTqNZHldNBLp
Hz5mB3zSZsPg0uwvp+OxrnpP37NQCCn1L64UFIXxqGF47mcCYw7schAoGBtqwGQS
neGUfkzG4beKk7kojyDawvnrrvVn4iCMaIkR1ZjzjPOk+QBPagckrS2nmuObbYzJ
l4lmHq1v8nOh0clxqJPioNp5/Y13sbpEOY4tAa6sLpzKLKXF330RNuwwrKKHB7zo
ZhCvSwVEjmxacgPCPhFJnR3fxtjoXWR8WvJs7H+gZB/QaC8hjEYw6xvvrkw8mAiS
RNe3cYdYAz6kOJdtJrJaMzp/1CYdHydf+0WvNYCQ/d1poPr7uU5wQY61hdm2gFxh
owsbVAOiEFjZWJHqrRRdcg2irTpINJTS3iRe4g/ltcvYzzRSeOcZNWvkKFspq3i8
EUMHRBxLzPMRa+gU6Unm
=jb3f
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 regression fixes from Ted Ts'o:
"This fixes a scalability problem reported by Andi Kleen and Tim Chen;
they were quite secretive about the precise nature of their workload,
but they later admitted that it only showed up when they were using a
large sparse file, so the amount of data I/O that was needed was close
to zero.
I'm not sure how realistic this is and it's only a regression if you
consider changes made since 2.6.39 to be a "regression" vis-a-vis the
policy regarding post-merge window bug fixes, but Linus agreed it was
worth fixing, so I'm including it in this pull request.
This also fixes the journalled quota mount options, which I
accidentally broke while I was cleaning up the mount option handling."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix handling of journalled quota options
ext4: address scalability issue by removing extent cache statistics
Pull vfs fixes from Al Viro:
"A bunch of endianness fixes and a couple of nfsd error value fixes.
Speaking of endianness stuff, I'm rather tempted to slap
ccflags-y += -D__CHECK_ENDIAN__
in fs/Makefile, if not making it default for the entire tree; nfsd
regressions I've caught make one hell of a pile and we'd obviously
benefit from having that kind of stuff caught earlier..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
lockd: fix the endianness bug
ocfs2: ->e_leaf_clusters endianness breakage
ocfs2: ->rl_count endianness breakage
ocfs: ->rl_used breakage on big-endian
ocfs2: ->l_next_free_req breakage on big-endian
btrfs: btrfs_root_readonly() broken on big-endian
ext4: fix endianness breakage in ext4_split_extent_at()
nfsd: fix compose_entry_fh() failure exits
nfsd: fix error value on allocation failure in nfsd4_decode_test_stateid()
nfsd: fix endianness breakage in TEST_STATEID handling
nfsd: fix error values returned by nfsd4_lockt() when nfsd_open() fails
nfsd: fix b0rken error value for setattr on read-only mount
Pull CIFS fixes from Steve French.
* git://git.samba.org/sfrench/cifs-2.6:
Fix number parsing in cifs_parse_mount_options
Cleanup handling of NULL value passed for a mount option
Commit 26092bf5 broke handling of journalled quota mount options by
trying to parse argument of every mount option as a number. Fix this
by dealing with the quota options before we call match_int().
Thanks to Jan Kara for discovering this regression.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Andi Kleen and Tim Chen have reported that under certain circumstances
the extent cache statistics are causing scalability problems due to
cache line bounces.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Pull the minimal btrfs branch from Chris Mason:
"We have a use-after-free in there, along with errors when mount -o
discard is enabled, and a BUG_ON(we should compile with UP more
often)."
* 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: use commit root when loading free space cache
Btrfs: fix use-after-free in __btrfs_end_transaction
Btrfs: check return value of bio_alloc() properly
Btrfs: remove lock assert from get_restripe_target()
Btrfs: fix eof while discarding extents
Btrfs: fix uninit variable in repair_eb_io_failure
Revert "Btrfs: increase the global block reserve estimates"
With this change, calling
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time. For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set. The same is true for file capabilities.
Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.
To determine if the NO_NEW_PRIVS bit is set, a task may call
prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)
This functionality is desired for the proposed seccomp filter patch
series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.
Another potential use is making certain privileged operations
unprivileged. For example, chroot may be considered "safe" if it cannot
affect privileged tasks.
Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use. It is fixed in a subsequent patch.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
v18: updated change desc
v17: using new define values as per 3.4
Signed-off-by: James Morris <james.l.morris@oracle.com>
->root_flags is __le64 and all accesses to it go through the helpers
that do proper conversions. Except for btrfs_root_readonly(), which
checks bit 0 as in host-endian...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The function kstrtoul() used to parse number strings in the mount
option parser is set to expect a base 10 number . This treats the octal
numbers passed for mount options such as file_mode as base10 numbers
leading to incorrect behavior.
Change the 'base' argument passed to kstrtoul from 10 to 0 to
allow it to auto-detect the base of the number passed.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@samba.org>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
->ee_len is __le16, so assigning cpu_to_le32() to it is going to do
Bad Things(tm) on big-endian hosts...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Restore the original logics ("fail on mountpoints, negatives and in
case of fh_compose() failures"). Since commit 8177e (nfsd: clean up
readdirplus encoding) that got broken -
rv = fh_compose(fhp, exp, dchild, &cd->fh);
if (rv)
goto out;
if (!dchild->d_inode)
goto out;
rv = 0;
out:
is equivalent to
rv = fh_compose(fhp, exp, dchild, &cd->fh);
out:
and the second check has no effect whatsoever...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
->ts_id_status gets nfs errno, i.e. it's already big-endian; no need
to apply htonl() to it. Broken by commit 174568 (NFSD: Added TEST_STATEID
operation) last year...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
nfsd_open() already returns an NFS error value; only vfs_test_lock()
result needs to be fed through nfserrno(). Broken by commit 55ef12
(nfsd: Ensure nfsv4 calls the underlying filesystem on LOCKT)
three years ago...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A user reported that booting his box up with btrfs root on 3.4 was way
slower than on 3.3 because I removed the ideal caching code. It turns out
that we don't load the free space cache if we're in a commit for deadlock
reasons, but since we're reading the cache and it hasn't changed yet we are
safe reading the inode and free space item from the commit root, so do that
and remove all of the deadlock checks so we don't unnecessarily skip loading
the free space cache. The user reported this fixed the slowness. Thanks,
Tested-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Here are some minor fixes for the driver core and kobjects that people have
reported recently.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk+HTLAACgkQMUfUDdst+ylwlACcCcVRLdGIyW7Qjd1VB8hAVCSy
27EAniaKfzOiPAkrb9718D6gfdBTVU0B
=c1Bm
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and kobject fixes from Greg KH:
"Here are some minor fixes for the driver core and kobjects that people
have reported recently.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kobject: provide more diagnostic info for kobject_add_internal() failures
sysfs: handle 'parent deleted before child added'
sysfs: Prevent crash on unset sysfs group attributes
sysfs: Update the name hash for an entry after changing the namespace
drivers/base: fix compiler warning in SoC export driver - idr should be ida
drivers/base: Remove unneeded spin_lock_init call for soc_lock
Pull timer fixes from Thomas Gleixner:
"The itimer removal one is not strictly a fix, but I really wanted to
avoid a rebase of the urgent ones."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "clocksource: Load the ACPI PM clocksource asynchronously"
clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot()
itimer: Use printk_once instead of WARN_ONCE
nohz: Fix stale jiffies update in tick_nohz_restart()
tick: Document TICK_ONESHOT config option
proc: stats: Use arch_idle_time for idle and iowait times if available
itimer: Schedule silent NULL pointer fixup in setitimer() for removal
49b25e0540 introduced a use-after-free bug
that caused spurious -EIO's to be returned.
Do the check before we free the transaction.
Cc: David Sterba <dsterba@suse.cz>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
bio_alloc() has the possibility of returning NULL.
So, it is necessary to check the return value.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This fixes a regression introduced by fc67c450. spin_is_locked() always
returns 0 on UP kernels, which caused assert in get_restripe_target() to
be fired on every call from btrfs_reduce_alloc_profile() on UP systems.
Remove it completely for now, it's not clear if it's going to be needed
in future.
Reported-by: Bobby Powers <bobbypowers@gmail.com>
Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Tested-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We miscalculate the length of extents we're discarding, and it leads to
an eof of device.
Reported-by: Daniel Blueman <daniel@quora.org>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We'd have to be passing bogus extent buffers for this uninit variable to
actually be used, but set it to zero just in case.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Pull vfs fixes from Al Viro:
"Regression fix in mtdchar_open(), fix for a really old leak
(almost never hit in practice - it's a b0rken failure exit in
simple_fill_super()) and a typo fix in vfs.txt (misspelled
method type)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
typo fix in Documentation/filesystems/vfs.txt
dentry leak in simple_fill_super() failure exit
fix breakage in mtdchar_open(), sanitize failure exits
This reverts commit 5500cdbe14.
We've had a number of complaints of early enospc that bisect down
to this patch. We'll hae to fix the reservations differently.
CC: stable@kernel.org
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Allow blank user= and ip= mount option. Also clean up redundant
checks for NULL values since the token parser will not actually
match mount options with NULL values unless explicitly specified.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Acked-by: Jeff Layton <jlayton@samba.org>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Allow a v3 unchecked open of a non-regular file succeed as if it were a
lookup; typically a client in such a case will want to fall back on a
local open, so succeeding and giving it the filehandle is more useful
than failing with nfserr_exist, which makes it appear that nothing at
all exists by that name.
Similarly for v4, on an open-create, return the same errors we would on
an attempt to open a non-regular file, instead of returning
nfserr_exist.
This fixes a problem found doing a v4 open of a symlink with
O_RDONLY|O_CREAT, which resulted in the current client returning EEXIST.
Thanks also to Trond for analysis.
Cc: stable@kernel.org
Reported-by: Orion Poplawski <orion@cora.nwra.com>
Tested-by: Orion Poplawski <orion@cora.nwra.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull GFS2 fixes from Steven Whitehouse
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Allow caching of rindex glock
GFS2: Make sure rindex is uptodate before starting transactions
GFS2: use depends instead of select in kconfig
GFS2: put glock reference in error patch of read_rindex_entry
Derrik Pates reports that an utimensat with a NULL argument results in the
current time being sent from the kernel with 1 second granularity.
Reported-by: Derrik Pates <demon@now.ai>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
In scsi at least two cases of the parent device being deleted before the
child is added have been observed.
1/ scsi is performing async scans and the device is removed prior to the
async can thread running (can happen with an in-opportune / unlikely
unplug during initial scan).
2/ libsas discovery event running after the parent port has been torn
down (this is a bug in libsas).
Result in crash signatures like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
...
Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
Stack:
00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
Call Trace:
[<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
[<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8125e641>] kobject_add_varg+0x41/0x50
[<ffffffff8125e70b>] kobject_add+0x64/0x66
[<ffffffff8131122b>] device_add+0x12d/0x63a
In this scenario the parent is still valid (because we have a
reference), but it has been device_del()'d which means its kobj->sd
pointer is NULL'd via:
device_del()->kobject_del()->sysfs_remove_dir()
...and then sysfs_create_dir() (without this fix) goes ahead and
de-references parent_sd via sysfs_ns_type():
return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;
This scenario is being fixed in scsi/libsas, but if other subsystems
present the same ordering the system need not immediately crash.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch allows caching of the rindex glock. We were previously
setting the GL_NOCACHE bit when the glock was released. That forced
the rindex inode to be invalidated, which caused us to re-read
rindex at the next access. However, it caused the glock to be
unnecessarily bounced around the cluster. This patch allows
the glock to remain cached, but it still causes the rindex to be
re-read once it has been written to by gfs2_grow.
Ben and I have tested single-node gfs2_grow cases and I've tested
clustered gfs2_grow cases on my four-node cluster.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is needed to allow renaming network devices that have been moved
to another network namespace.
Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
d_genocide() does _not_ evict dentries; it just removes extra ref
pinning each of those. Normally it's followed by shrinking the
tree (it's done just before generic_shutdown_super() by kill_litter_super()),
but in case of simple_fill_super() nothing of that kind will follow.
Just do shrink_dcache_parent() manually.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
I have a new optimized x86 "strncpy_from_user()" that will use these
same helper functions for all the same reasons the name lookup code uses
them. This is preparation for that.
This moves them into an architecture-specific header file. It's
architecture-specific for two reasons:
- some of the functions are likely to want architecture-specific
implementations. Even if the current code happens to be "generic" in
the sense that it should work on any little-endian machine, it's
likely that the "multiply by a big constant and shift" implementation
is less than optimal for an architecture that has a guaranteed fast
bit count instruction, for example.
- I expect that if architectures like sparc want to start playing
around with this, we'll need to abstract out a few more details (in
particular the actual unaligned accesses). So we're likely to have
more architecture-specific stuff if non-x86 architectures start using
this.
(and if it turns out that non-x86 architectures don't start using
this, then having it in an architecture-specific header is still the
right thing to do, of course)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
1) Fix inaccuracies in network driver interface documentation, from Ben
Hutchings.
2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.
3) Compile warning, locking, and refcounting fixes in netfilter's
xt_CT, from Pablo Neira Ayuso.
4) phonet sendmsg needs to validate user length just like any other
datagram protocol, fix from Sasha Levin.
5) Ipv6 multicast code uses wrong loop index, from RongQing Li.
6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
and Yuval Mintz.
7) mlx4 erroneously allocates 4 pages at a time, regardless of page
size, fix from Thadeu Lima de Souza Cascardo.
8) SCTP socket option wasn't extended in a backwards compatible way,
fix from Thomas Graf.
9) Add missing address change event emissions to bonding, from Shlomo
Pongratz.
10) /proc/net/dev regressed because it uses a private offset to track
where we are in the hash table, but this doesn't track the offset
pullback that the seq_file code does resulting in some entries being
missed in large dumps.
Fix from Eric Dumazet.
11) do_tcp_sendpage() unloads the send queue way too fast, because it
invokes tcp_push() when it shouldn't. Let the natural sequence
generated by the splice paths, and the assosciated MSG_MORE
settings, guide the tcp_push() calls.
Otherwise what goes out of TCP is spaghetti and doesn't batch
effectively into GSO/TSO clusters.
From Eric Dumazet.
12) Once we put a SKB into either the netlink receiver's queue or a
socket error queue, it can be consumed and freed up, therefore we
cannot touch it after queueing it like that.
Fixes from Eric Dumazet.
13) PPP has this annoying behavior in that for every transmit call it
immediately stops the TX queue, then calls down into the next layer
to transmit the PPP frame.
But if that next layer can take it immediately, it just un-stops the
TX queue right before returning from the transmit method.
Besides being useless work, it makes several facilities unusable, in
particular things like the equalizers. Well behaved devices should
only stop the TX queue when they really are full, and in PPP's case
when it gets backlogged to the downstream device.
David Woodhouse therefore fixed PPP to not stop the TX queue until
it's downstream can't take data any more.
14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
changes, re-add. From Marc Kleine-Budde.
15) Fix link flaps in ixgbe, from Eric W. Multanen.
16) Descriptor writeback fixes in e1000e from Matthew Vick.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
net: fix a race in sock_queue_err_skb()
netlink: fix races after skb queueing
doc, net: Update ndo_start_xmit return type and values
doc, net: Remove instruction to set net_device::trans_start
doc, net: Update netdev operation names
doc, net: Update documentation of synchronisation for TX multiqueue
doc, net: Remove obsolete reference to dev->poll
ethtool: Remove exception to the requirement of holding RTNL lock
MAINTAINERS: update for Marvell Ethernet drivers
bonding: properly unset current_arp_slave on slave link up
phonet: Check input from user before allocating
tcp: tcp_sendpages() should call tcp_push() once
ipv6: fix array index in ip6_mc_add_src()
mlx4: allocate just enough pages instead of always 4 pages
stmmac: re-add IFF_UNICAST_FLT for dwmac1000
bnx2x: Clear MDC/MDIO warning message
bnx2x: Fix BCM57711+BCM84823 link issue
bnx2x: Clear BCM84833 LED after fan failure
bnx2x: Fix BCM84833 PHY FW version presentation
bnx2x: Fix link issue for BCM8727 boards.
...
commit 2f53384424 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.
We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.
Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.
For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge batch of fixes from Andrew Morton:
"The simple_open() cleanup was held back while I wanted for laggards to
merge things.
I still need to send a few checkpoint/restore patches. I've been
wobbly about merging them because I'm wobbly about the overall
prospects for success of the project. But after speaking with Pavel
at the LSF conference, it sounds like they're further toward
completion than I feared - apparently davem is at the "has stopped
complaining" stage regarding the net changes. So I need to go back
and re-review those patchs and their (lengthy) discussion."
* emailed from Andrew Morton <akpm@linux-foundation.org>: (16 patches)
memcg swap: use mem_cgroup_uncharge_swap fix
backlight: add driver for DA9052/53 PMIC v1
C6X: use set_current_blocked() and block_sigmask()
MAINTAINERS: add entry for sparse checker
MAINTAINERS: fix REMOTEPROC F: typo
alpha: use set_current_blocked() and block_sigmask()
simple_open: automatically convert to simple_open()
scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()
libfs: add simple_open()
hugetlbfs: remove unregister_filesystem() when initializing module
drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback
fs/xattr.c:setxattr(): improve handling of allocation failures
fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
sysrq: use SEND_SIG_FORCED instead of force_sig()
proc: fix mount -t proc -o AAA
Many users of debugfs copy the implementation of default_open() when
they want to support a custom read/write function op. This leads to a
proliferation of the default_open() implementation across the entire
tree.
Now that the common implementation has been consolidated into libfs we
can replace all the users of this function with simple_open().
This replacement was done with the following semantic patch:
<smpl>
@ open @
identifier open_f != simple_open;
identifier i, f;
@@
-int open_f(struct inode *i, struct file *f)
-{
(
-if (i->i_private)
-f->private_data = i->i_private;
|
-f->private_data = i->i_private;
)
-return 0;
-}
@ has_open depends on open @
identifier fops;
identifier open.open_f;
@@
struct file_operations fops = {
...
-.open = open_f,
+.open = simple_open,
...
};
</smpl>
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
debugfs and a few other drivers use an open-coded version of
simple_open() to pass a pointer from the file to the read/write file
ops. Add support for this simple case to libfs so that we can remove
the many duplicate copies of this simple function.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It was introduced by d1d5e05ffd ("hugetlbfs: return error code when
initializing module") but as Al pointed out, is a bad idea.
Quoted comments from Al:
"Note that unregister_filesystem() in module init is *always* wrong;
it's not an issue here (it's done too early to care about and
realistically the box is not going anywhere - it'll panic when attempt
to exec /sbin/init fails, if not earlier), but it's a damn bad
example.
Consider a normal fs module. Somebody loads it and in parallel with
that we get a mount attempt on that fs type. It comes between
register and failure exits that causes unregister; at that point we
are screwed since grabbing a reference to module as done by mount is
enough to prevent exit, but not to prevent the failure of init. As
the result, module will get freed when init fails, mounted fs of that
type be damned."
So remove it.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allocation can be as large as 64k.
- Add __GFP_NOWARN so the a falied kmalloc() is silent
- Fall back to vmalloc() if the kmalloc() failed
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allocation can be as large as 64k. As David points out, "falling
back to vmalloc here is much better solution than failing to retreive
the attribute - it will work no matter how fragmented memory gets. That
means we don't get incomplete backups occurring after days or months of
uptime and successful backups".
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This size is user controllable, up to a maximum of XATTR_LIST_MAX (64k).
So it's trivial for someone to trigger a stream of order:4 page
allocation errors.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The proc_parse_options() call from proc_mount() runs only once at boot
time. So on any later mount attempt, any mount options are ignored
because ->s_root is already initialized.
As a consequence, "mount -o <options>" will ignore the options. The
only way to change mount options is "mount -o remount,<options>".
To fix this, parse the mount options unconditionally.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Tested-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the call from gfs2_blk2rgrd to function
gfs2_rindex_update and replaces it with individual calls.
The former way turned out to be too problematic.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Pull CIFS fixes from Steve French.
* git://git.samba.org/sfrench/cifs-2.6:
Fix UNC parsing on mount
Remove unnecessary check for NULL in password parser
CIFS: Fix VFS lock usage for oplocked files
Revert "CIFS: Fix VFS lock usage for oplocked files"
cifs: writing past end of struct in cifs_convert_address()
cifs: silence compiler warnings showing up with gcc-4.7.0
CIFS: Fix VFS lock usage for oplocked files
The code cleanup of cifs_parse_mount_options resulted in a new bug being
introduced in the parsing of the UNC. This results in vol->UNC being
modified before vol->UNC was allocated.
Reported-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The password parser has an unnecessary check for a NULL value which
triggers warnings in source checking tools. The code contains artifacts
from the old parsing code which are no longer required.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We can deadlock if we have a write oplock and two processes
use the same file handle. In this case the first process can't
unlock its lock if the second process blocked on the lock in the
same time.
Fix it by using posix_lock_file rather than posix_lock_file_wait
under cinode->lock_mutex. If we request a blocking lock and
posix_lock_file indicates that there is another lock that prevents
us, wait untill that lock is released and restart our call.
Cc: stable@kernel.org
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
"s6->sin6_scope_id" is an int bits but strict_strtoul() writes a long
so this can corrupt memory on 64 bit systems.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
gcc-4.7.0 has started throwing these warnings when building cifs.ko.
CC [M] fs/cifs/cifssmb.o
fs/cifs/cifssmb.c: In function ‘CIFSSMBSetCIFSACL’:
fs/cifs/cifssmb.c:3905:9: warning: array subscript is above array bounds [-Warray-bounds]
fs/cifs/cifssmb.c: In function ‘CIFSSMBSetFileInfo’:
fs/cifs/cifssmb.c:5711:8: warning: array subscript is above array bounds [-Warray-bounds]
fs/cifs/cifssmb.c: In function ‘CIFSSMBUnixSetFileInfo’:
fs/cifs/cifssmb.c:6001:25: warning: array subscript is above array bounds [-Warray-bounds]
This patch cleans up the code a bit by using the offsetof macro instead
of the funky "&pSMB->hdr.Protocol" construct.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We can deadlock if we have a write oplock and two processes
use the same file handle. In this case the first process can't
unlock its lock if another process blocked on the lock in the
same time.
Fix this by removing lock_mutex protection from waiting on a
blocked lock and protect only posix_lock_file call.
Cc: stable@kernel.org
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Pull second try at vfs part d#2 from Al Viro:
"Miklos' first series (with do_lookup() rewrite split into edible
chunks) + assorted bits and pieces.
The 'untangling of do_lookup()' series is is a splitup of what used to
be a monolithic patch from Miklos, so this series is basically "how do
I convince myself that his patch is correct (or find a hole in it)".
No holes found and I like the resulting cleanup, so in it went..."
Changes from try 1: Fix a boot problem with selinux, and commit messages
prettied up a bit.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
vfs: fix out-of-date dentry_unhash() comment
vfs: split __lookup_hash
untangling do_lookup() - take __lookup_hash()-calling case out of line.
untangling do_lookup() - switch to calling __lookup_hash()
untangling do_lookup() - merge d_alloc_and_lookup() callers
untangling do_lookup() - merge failure exits in !dentry case
untangling do_lookup() - massage !dentry case towards __lookup_hash()
untangling do_lookup() - get rid of need_reval in !dentry case
untangling do_lookup() - eliminate a loop.
untangling do_lookup() - expand the area under ->i_mutex
untangling do_lookup() - isolate !dentry stuff from the rest of it.
vfs: move MAY_EXEC check from __lookup_hash()
vfs: don't revalidate just looked up dentry
vfs: fix d_need_lookup/d_revalidate order in do_lookup
ext3: move headers to fs/ext3/
migrate ext2_fs.h guts to fs/ext2/ext2.h
new helper: ext2_image_size()
get rid of pointless includes of ext2_fs.h
ext2: No longer export ext2_fs.h to user space
mtdchar: kill persistently held vfsmount
...
64252c75a2 "vfs: remove dget() from
dentry_unhash()" changed the implementation but not the comment.
Cc: Sage Weil <sage@newdream.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Split __lookup_hash into two component functions:
lookup_dcache - tries cached lookup, returns whether real lookup is needed
lookup_real - calls i_op->lookup
This eliminates code duplication between d_alloc_and_lookup() and
d_inode_lookup().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Everything arriving into if (!dentry) will have need_reval = 1.
Indeed, the only way to get there with need_reval reset to 0 would
be via
if (unlikely(d_need_lookup(dentry)))
goto unlazy;
if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
status = d_revalidate(dentry, nd);
if (unlikely(status <= 0)) {
if (status != -ECHILD)
need_reval = 0;
goto unlazy;
...
unlazy:
/* no assignments to dentry */
if (dentry && unlikely(d_need_lookup(dentry))) {
dput(dentry);
dentry = NULL;
}
and if d_need_lookup() had already been false the first time around, it
will remain false on the second call as well.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
d_lookup() *will* fail after successful d_invalidate(), if we are
holding i_mutex all along. IOW, we don't need to jump back to
l: - we know what path will be taken there and can do that (i.e.
d_alloc_and_lookup()) directly.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Duplicate the revalidation-related parts into if (!dentry) branch.
Next step will be to pull them under i_mutex.
This and the next 8 commits are more or less a splitup of patch
by Miklos; folks, when you are working with something that convoluted,
carve your patches up into easily reviewed steps, especially when
a lot of codepaths involved are rarely hit...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The only caller of __lookup_hash() that needs the exec permission check on
parent is lookup_one_len().
All lookup_hash() callers already checked permission in LOOKUP_PARENT walk.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
__lookup_hash() calls ->lookup() if the dentry needs lookup and on success
revalidates the dentry (all under dir->i_mutex).
While this is harmless it doesn't make a lot of sense.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Doing revalidate on a dentry which has not yet been looked up makes no sense.
Move the d_need_lookup() check before d_revalidate().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1. TRACE_EVENT(sched_process_exec) forgets to actually use the
old pid argument, it sets ->old_pid = p->pid.
2. search_binary_handler() uses the wrong pid number. tracepoint
needs the global pid_t from the root namespace, while old_pid
is the virtual pid number as it seen by the tracer/parent.
With this patch we have two pid_t's in search_binary_handler(),
not really nice. Perhaps we should switch to "struct pid*", but
in this case it would be better to cleanup the current code
first and move the "depth == 0" code outside.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Smith <dsmith@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/20120330162636.GA4857@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Artem's cleanup of the MTD API continues apace.
Fixes and improvements for ST FSMC and SuperH FLCTL NAND, amongst others.
More work on DiskOnChip G3, new driver for DiskOnChip G4.
Clean up debug/warning printks in JFFS2 to use pr_<level>.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEABECAAYFAk92K6UACgkQdwG7hYl686NrMACfWQJRWasR78MWKfkT2vWZwTFJ
X5AAoKiSYO2pfo5gWJGOAahNC1zUqMX0
=i3Vb
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.4' of git://git.infradead.org/mtd-2.6
Pull MTD changes from David Woodhouse:
- Artem's cleanup of the MTD API continues apace.
- Fixes and improvements for ST FSMC and SuperH FLCTL NAND, amongst
others.
- More work on DiskOnChip G3, new driver for DiskOnChip G4.
- Clean up debug/warning printks in JFFS2 to use pr_<level>.
Fix up various trivial conflicts, largely due to changes in calling
conventions for things like dmaengine_prep_slave_sg() (new inline
wrapper to hide new parameter, clashing with rewrite of previously last
parameter that used to be an 'append' flag, and is now a bitmap of
'unsigned long flags').
(Also some header file fallout - like so many merges this merge window -
and silly conflicts with sparse fixes)
* tag 'for-linus-3.4' of git://git.infradead.org/mtd-2.6: (120 commits)
mtd: docg3 add protection against concurrency
mtd: docg3 refactor cascade floors structure
mtd: docg3 increase write/erase timeout
mtd: docg3 fix inbound calculations
mtd: nand: gpmi: fix function annotations
mtd: phram: fix section mismatch for phram_setup
mtd: unify initialization of erase_info->fail_addr
mtd: support ONFI multi lun NAND
mtd: sm_ftl: fix typo in major number.
mtd: add device-tree support to spear_smi
mtd: spear_smi: Remove default partition information from driver
mtd: Add device-tree support to fsmc_nand
mtd: fix section mismatch for doc_probe_device
mtd: nand/fsmc: Remove sparse warnings and errors
mtd: nand/fsmc: Add DMA support
mtd: nand/fsmc: Access the NAND device word by word whenever possible
mtd: nand/fsmc: Use dev_err to report error scenario
mtd: nand/fsmc: Use devm routines
mtd: nand/fsmc: Modify fsmc driver to accept nand timing parameters via platform
mtd: fsmc_nand: add pm callbacks to support hibernation
...
Pull cifs fixes from Steve French.
* git://git.samba.org/sfrench/cifs-2.6:
[CIFS] Update CIFS version number to 1.77
CIFS: Add missed forcemand mount option
[CIFS] Fix trivial sparse warning with asyn i/o patch
cifs: handle "sloppy" option appropriately
cifs: use standard token parser for mount options
cifs: remove /proc/fs/cifs/OplockEnabled
cifs: convert cifs_iovec_write to use async writes
cifs: call cifs_update_eof with i_lock held
cifs: abstract out function to marshal up the iovec array for async writes
cifs: fix up get_numpages
cifs: make cifsFileInfo_get return the cifsFileInfo pointer
cifs: fix allocation in cifs_write_allocate_pages
cifs: allow caller to specify completion op when allocating writedata
cifs: add pid field to cifs_writedata
cifs: add new cifsiod_wq workqueue
CIFS: Change mid_q_entry structure fields
CIFS: Expand CurrentMid field
CIFS: Separate protocol-specific code from cifs_readv_receive code
CIFS: Separate protocol-specific code from demultiplex code
CIFS: Separate protocol-specific code from transport routines
Pull btrfs fixes and features from Chris Mason:
"We've merged in the error handling patches from SuSE. These are
already shipping in the sles kernel, and they give btrfs the ability
to abort transactions and go readonly on errors. It involves a lot of
churn as they clarify BUG_ONs, and remove the ones we now properly
deal with.
Josef reworked the way our metadata interacts with the page cache.
page->private now points to the btrfs extent_buffer object, which
makes everything faster. He changed it so we write an whole extent
buffer at a time instead of allowing individual pages to go down,,
which will be important for the raid5/6 code (for the 3.5 merge
window ;)
Josef also made us more aggressive about dropping pages for metadata
blocks that were freed due to COW. Overall, our metadata caching is
much faster now.
We've integrated my patch for metadata bigger than the page size.
This allows metadata blocks up to 64KB in size. In practice 16K and
32K seem to work best. For workloads with lots of metadata, this cuts
down the size of the extent allocation tree dramatically and fragments
much less.
Scrub was updated to support the larger block sizes, which ended up
being a fairly large change (thanks Stefan Behrens).
We also have an assortment of fixes and updates, especially to the
balancing code (Ilya Dryomov), the back ref walker (Jan Schmidt) and
the defragging code (Liu Bo)."
Fixed up trivial conflicts in fs/btrfs/scrub.c that were just due to
removal of the second argument to k[un]map_atomic() in commit
7ac687d9e0.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (75 commits)
Btrfs: update the checks for mixed block groups with big metadata blocks
Btrfs: update to the right index of defragment
Btrfs: do not bother to defrag an extent if it is a big real extent
Btrfs: add a check to decide if we should defrag the range
Btrfs: fix recursive defragment with autodefrag option
Btrfs: fix the mismatch of page->mapping
Btrfs: fix race between direct io and autodefrag
Btrfs: fix deadlock during allocating chunks
Btrfs: show useful info in space reservation tracepoint
Btrfs: don't use crc items bigger than 4KB
Btrfs: flush out and clean up any block device pages during mount
btrfs: disallow unequal data/metadata blocksize for mixed block groups
Btrfs: enhance superblock sanity checks
Btrfs: change scrub to support big blocks
Btrfs: minor cleanup in scrub
Btrfs: introduce common define for max number of mirrors
Btrfs: fix infinite loop in btrfs_shrink_device()
Btrfs: fix memory leak in resolver code
Btrfs: allow dup for data chunks in mixed mode
Btrfs: validate target profiles only if we are going to use them
...
Git commit a25cac5198 "proc: Consider NO_HZ when printing idle and
iowait times" changes the code for /proc/stat to use get_cpu_idle_time_us
and get_cpu_iowait_time_us if the system is running with nohz enabled.
For architectures which define arch_idle_time (currently s390 only)
this is a change for the worse. The result of arch_idle_time is supposed
to be the exact sleep time of the target cpu and should be used instead
of the value kept by the scheduler.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120330122308.18720283@de.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x32 support for x86-64 from Ingo Molnar:
"This tree introduces the X32 binary format and execution mode for x86:
32-bit data space binaries using 64-bit instructions and 64-bit kernel
syscalls.
This allows applications whose working set fits into a 32 bits address
space to make use of 64-bit instructions while using a 32-bit address
space with shorter pointers, more compressed data structures, etc."
Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}
* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
x32: Fix alignment fail in struct compat_siginfo
x32: Fix stupid ia32/x32 inversion in the siginfo format
x32: Add ptrace for x32
x32: Switch to a 64-bit clock_t
x32: Provide separate is_ia32_task() and is_x32_task() predicates
x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
x86/x32: Fix the binutils auto-detect
x32: Warn and disable rather than error if binutils too old
x32: Only clear TIF_X32 flag once
x32: Make sure TS_COMPAT is cleared for x32 tasks
fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
fs: Fix close_on_exec pointer in alloc_fdtable
x32: Drop non-__vdso weak symbols from the x32 VDSO
x32: Fix coding style violations in the x32 VDSO code
x32: Add x32 VDSO support
x32: Allow x32 to be configured
x32: If configured, add x32 system calls to system call tables
x32: Handle process creation
x32: Signal-related system calls
x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h>
...
This reverts commit b43d17f319.
Dave Jones reports that it causes lockups on his laptop, and his debug
output showed a lot of processes hung waiting for page_writeback (or
more commonly - processes hung waiting for a lock that was held during
that writeback wait).
The page_writeback hint made Ted suggest that Dave look at this commit,
and Dave verified that reverting it makes his problems go away.
Ted says:
"That commit fixes a race which is seen when you write into fallocated
(and hence uninitialized) disk blocks under *very* heavy memory
pressure. Furthermore, although theoretically it could trigger under
normal direct I/O writes, it only seems to trigger if you are issuing
a huge number of AIO writes, such that a just-written page can get
evicted from memory, and then read back into memory, before the
workqueue has a chance to update the extent tree.
This race has been around for a little over a year, and no one noticed
until two months ago; it only happens under fairly exotic conditions,
and in fact even after trying very hard to create a simple repro under
lab conditions, we could only reproduce the problem and confirm the
fix on production servers running MySQL on very fast PCIe-attached
flash devices.
Given that Dave was able to hit this problem pretty quickly, if we
confirm that this commit is at fault, the only reasonable thing to do
is to revert it IMO."
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull nfsd changes from Bruce Fields:
Highlights:
- Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features,
moving us closer to a complete 4.1 implementation.
- Bernd Schubert fixed a long-standing problem with readdir cookies on
ext2/3/4.
- Jeff Layton performed a long-overdue overhaul of the server reboot
recovery code which will allow us to deprecate the current code (a
rather unusual user of the vfs), and give us some needed flexibility
for further improvements.
- Like the client, we now support numeric uid's and gid's in the
auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x.
Plus miscellaneous bugfixes and cleanup.
Thanks to everyone!
There are also some delegation fixes waiting on vfs review that I
suppose will have to wait for 3.5. With that done I think we'll finally
turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly
symbolic as it's been on by default in distro's for a while).
And the list of 4.1 todo's should be achievable for 3.5 as well:
http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues
though we may still want a bit more experience with it before turning it
on by default.
* 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits)
nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
nfsd4: use auth_unix unconditionally on backchannel
nfsd: fix NULL pointer dereference in cld_pipe_downcall
nfsd4: memory corruption in numeric_name_to_id()
sunrpc: skip portmap calls on sessions backchannel
nfsd4: allow numeric idmapping
nfsd: don't allow legacy client tracker init for anything but init_net
nfsd: add notifier to handle mount/unmount of rpc_pipefs sb
nfsd: add the infrastructure to handle the cld upcall
nfsd: add a header describing upcall to nfsdcld
nfsd: add a per-net-namespace struct for nfsd
sunrpc: create nfsd dir in rpc_pipefs
nfsd: add nfsd4_client_tracking_ops struct and a way to set it
nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
NFSD: Fix nfs4_verifier memory alignment
NFSD: Fix warnings when NFSD_DEBUG is not defined
nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
ext4: return 32/64-bit dir name hash according to usage type
fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
...
Single fix for a commit from the first batch of patches through Andrew.
* emailed from Andrew Morton <akpm@linux-foundation.org>:
pagemap: remove remaining unneeded spin_lock()
Commit 025c5b2451 ("thp: optimize away unnecessary page table
locking") moves spin_lock() into pmd_trans_huge_lock() in order to avoid
locking unless pmd is for thp. So this spin_lock() is a bug.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Sterba had put in patches to look for mixed data/metadata groups
with metadata bigger than 4KB. But these ended up in the wrong place
and it wasn't testing the feature flag correctly.
This updates the tests to make sure our sizes are matching
Signed-off-by: Chris Mason <chris.mason@oracle.com>