Commit 43cedbf0e8 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.
Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.
Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.
Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
Merge Andrew's second set of patches:
- MM
- a few random fixes
- a couple of RTC leftovers
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
rtc/rtc-88pm80x: remove unneed devm_kfree
rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
tmpfs: distribute interleave better across nodes
mm: remove redundant initialization
mm: warn if pg_data_t isn't initialized with zero
mips: zero out pg_data_t when it's allocated
memcg: gix memory accounting scalability in shrink_page_list
mm/sparse: remove index_init_lock
mm/sparse: more checks on mem_section number
mm/sparse: optimize sparse_index_alloc
memcg: add mem_cgroup_from_css() helper
memcg: further prevent OOM with too many dirty pages
memcg: prevent OOM with too many dirty pages
mm: mmu_notifier: fix freed page still mapped in secondary MMU
mm: memcg: only check anon swapin page charges for swap cache
mm: memcg: only check swap cache pages for repeated charging
mm: memcg: split swapin charge function into private and public part
mm: memcg: remove needless !mm fixup to init_mm when charging
mm: memcg: remove unneeded shmem charge type
...
Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.
PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.
[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull nfsd changes from J. Bruce Fields:
"This has been an unusually quiet cycle--mostly bugfixes and cleanup.
The one large piece is Stanislav's work to containerize the server's
grace period--but that in itself is just one more step in a
not-yet-complete project to allow fully containerized nfs service.
There are a number of outstanding delegation, container, v4 state, and
gss patches that aren't quite ready yet; 3.7 may be wilder."
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
NFSd: make boot_time variable per network namespace
NFSd: make grace end flag per network namespace
Lockd: move grace period management from lockd() to per-net functions
LockD: pass actual network namespace to grace period management functions
LockD: manage grace list per network namespace
SUNRPC: service request network namespace helper introduced
NFSd: make nfsd4_manager allocated per network namespace context.
LockD: make lockd manager allocated per network namespace
LockD: manage grace period per network namespace
Lockd: add more debug to host shutdown functions
Lockd: host complaining function introduced
LockD: manage used host count per networks namespace
LockD: manage garbage collection timeout per networks namespace
LockD: make garbage collector network namespace aware.
LockD: mark host per network namespace on garbage collect
nfsd4: fix missing fault_inject.h include
locks: move lease-specific code out of locks_delete_lock
locks: prevent side-effects of locks_release_private before file_lock is initialized
NFSd: set nfsd_serv to NULL after service destruction
NFSd: introduce nfsd_destroy() helper
...
This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I don't think there's a practical difference for the range of values
these interfaces should see, but it would be safer to be unambiguous.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The gss_mech_list_pseudoflavors() function provides a list of
currently registered GSS pseudoflavors. This list does not include
any non-GSS flavors that have been registered with the RPC client.
nfs4_find_root_sec() currently adds these extra flavors by hand.
Instead, nfs4_find_root_sec() should be looking at the set of flavors
that have been explicitly registered via rpcauth_register(). And,
other areas of code will soon need the same kind of list that
contains all flavors the kernel currently knows about (see below).
Rather than cloning the open-coded logic in nfs4_find_root_sec() to
those new places, introduce a generic RPC function that generates a
full list of registered auth flavors and pseudoflavors.
A new rpc_authops method is added that lists a flavor's
pseudoflavors, if it has any. I encountered an interesting module
loader loop when I tried to get the RPC client to invoke
gss_mech_list_pseudoflavors() by name.
This patch is a pre-requisite for server trunking discovery, and a
pre-requisite for fixing up the in-kernel mount client to do better
automatic security flavor selection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch replaces the usage of simple_strtoul with kstrtoint in
get_int(), since the simple_str* family doesn't account for overflow
and is deprecated.
Also, in this specific case, the long from strtol is silently converted
to an int by the caller.
As Joe Perches <joe@perches.com> suggested, this patch also removes
the redundant temporary variable rv, since kstrtoint() will not write to
anint unless it's successful.
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Neaten code style in get_int().
Also use sizeof() instead of hard coded number as suggested by
Joe Perches <joe@perches.com>.
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Callers of xdr_read_pages() will want to know exactly how much XDR
data is encoded in the pages after the data realignment.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that xdr_inline_decode() will automatically cross into the page
buffers, we need to ensure that it doesn't exceed the total reply
message length.
This patch sets up a counter that tracks the number of words
remaining in the reply message, and ensures that xdr_inline_decode,
xdr_read_pages and xdr_enter_page respect the end of message boundary.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the rq_flavor into struct svc_cred, and use it in setclientid and
exchange_id comparisons as well.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Instead of keeping the principal name associated with a request in a
structure that's private to auth_gss and using an accessor function,
move it to svc_cred.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This new routine is responsible for service registration in a specified
network context.
The idea is to separate service creation from per-net operations.
Note also: since registering service with svc_bind() can fail, the
service will be destroyed and during destruction it will try to
unregister itself from rpcbind. In this case unregistration has to be
skipped.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch also changes svcauth_unix_purge() function: added network namespace
as a parameter and thus loop over all networks was replaced by only one call
for ip map cache purge.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Loads of these:
linux/net/sunrpc/rpcb_clnt.c:942:2: warning: suggest braces around
empty body in ‘do’ statement [-Wempty-body]
show up when I unset CONFIG_PROC_SYSCTL. Seen with
gcc (GCC) 4.6.1 20110908 (Red Hat 4.6.1-9)
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This allows us to turn on/off the dprintk() debugging interfaces for
those distributions that don't ship the 'rpcdebug' utility.
It also allows us to add Kbuild dependencies. Specifically, we already
know that dprintk() in general relies on CONFIG_SYSCTL. Now it turns out
that the NFS dprintks depend on CONFIG_CRC32 after we added support
for the filehandle hash.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment
(different address spaces)
- svc_partial_recvfrom now takes a struct kvec, so the variable
save_iovbase needs to be an ordinary (void *)
Make a bunch of variables in net/sunrpc/xprtsock.c static
Fix a couple of "warning: symbol 'foo' was not declared. Should it be
static?" reports.
Fix a couple of conflicting function declarations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Sparse complains that the definition function definition and the
implementation aren't anotated the same way.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSv4.0 clients must send endpoint information for their callback
service to NFSv4.0 servers during their first contact with a server.
Traditionally on Linux, user space provides the callback endpoint IP
address via the "clientaddr=" mount option.
During an NFSv4 migration event, it is possible that an FSID may be
migrated to a destination server that is accessible via a different
source IP address than the source server was. The client must update
callback endpoint information on the destination server so that it can
maintain leases and allow delegation.
Without a new "clientaddr=" option from user space, however, the
kernel itself must construct an appropriate IP address for the
callback update. Provide an API in the RPC client for upper layer
RPC consumers to acquire a source address for a remote.
The mechanism used by the mount.nfs command is copied: set up a
connected UDP socket to the designated remote, then scrape the source
address off the socket. We are careful to select the correct network
namespace when setting up the temporary UDP socket.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When the cl_xprt field is updated, the cl_server field will also have
to change. Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.4 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A migration event will replace the rpc_xprt used by an rpc_clnt. To
ensure this can be done safely, all references to cl_xprt must now use
a form of rcu_dereference().
Special care is taken with rpc_peeraddr2str(), which returns a pointer
to memory whose lifetime is the same as the rpc_xprt.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: fix lockdep splats and layering violations ]
[ cel: forward ported to 3.4 ]
[ cel: remove rpc_max_reqs(), add rpc_net_ns() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, wait queue, used for polling of RPC pipe changes from user-space,
is a part of RPC pipe. But the pipe data itself can be released on NFS umount
prior to dentry-inode pair, connected to it (is case of this pair is open by
some process).
This is not a problem for almost all pipe users, because all PipeFS file
operations checks pipe reference prior to using it.
Except evenfd. This thing registers itself with "poll" file operation and thus
has a reference to pipe wait queue. This leads to oopses on destroying eventfd
after NFS umount (like rpc_idmapd do) since not pipe data left to the point
already.
The solution is to wait queue from pipe data to internal RPC inode data. This
looks more logical, because this wiat queue used only for user-space processes,
which already holds inode reference.
Note: upcalls have to get pipe->dentry prior to dereferecing wait queue to make
sure, that mount point won't disappear from underneath us.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The svcrdma transport was un-marshalling requests in-place. This resulted
in sparse warnings due to __beXX data containing both NBO and HBO data.
The code has been restructured to do byte-swapping as the header is
parsed instead of when the header is validated immediately after receipt.
Also moved extern declarations for the workqueue and memory pools to the
private header file.
Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Include RPC statistics from all data servers in /proc/self/mountstats for pNFS
filelayout mounts.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Our dprintk() debugging facility doesn't specify any verbosity level
for it's printk() calls, but it should.
The default verbosity for printk's is KERN_DEFAULT. You might argue
that these are debugging printk's and thus the verbosity should be
KERN_DEBUG. That would mean that to see NFS and SUNRPC debugging
output an admin would also have to boost the syslog verbosity, which
would be insufferably noisy.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
With static RPC slots, the xprt backlog queue stats were useful in showing
when the transport (TCP) was starved by lack of RPC slots. The new dynamic
RPC slot code, commit d9ba131d8f, always
provides an RPC slot and so only uses the xprt backlog queue when the
tcp_max_slot_table_entries value has been hit or when an allocation error
occurs. All requests are now placed on the xprt sending or pending queue which
need to be monitored for debugging.
The max_slot stat shows the maximum number of dynamic RPC slots reached which is
useful when debugging performance issues.
Add the new fields at the end of the mountstats xprt stanza so that mountstats
outputs the previous correct values and ignores the new fields. Bump
NFS_IOSTATS_VERS.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The tracepoint code relies on the queue->name being defined in order to
be able to display the name of the waitqueue on which an RPC task is
sleeping.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
This patch introduces per-net Lockd initialization and destruction routines.
The logic is the same as in global Lockd up and down routines. Probably the
solution is not the best one. But at least it looks clear.
So per-net "up" routine are called only in case of lockd is running already. If
per-net resources are not allocated yet, then service is being registered with
local portmapper and lockd sockets created.
Per-net "down" routine is called on every lockd_down() call in case of global
users counter is not zero.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
v2: Added comment to BUG_ON's in svc_destroy() to make code looks clearer.
This patch introduces network namespace filter for service destruction
function.
Nothing special here - just do exactly the same operations, but only for
tranports in passed networks namespace context.
BTW, BUG_ON() checks for empty service transports lists were returned into
svc_destroy() function. This is because of swithing generic svc_close_all() to
networks namespace dependable svc_close_net().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since the scheme of limiting the number of TCP slots to whatever will
fit in the current TCP window seems to be working well (Andy reports
getting within 20% of the 'iperf' send performance on a 10GigE link)
we should just let that be the default mode of operation.
Users may still set their own limits using the tcp_max_slot_table_entries
parameter if they need to.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Service transports are parametrized by network namespace. And thus lookup of
transport instance have to take network namespace into account.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
This patch makes it possible to create NFSd program entry ("/proc/net/rpc/nfsd")
in passed network namespace context instead of hard-coded "init_net".
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix build errors in linux/sunrpc/stats.h when CONFIG_PROC_FS
is not enabled:
- add parameter names to inline functions
- fix placement of '(' in rpc_proc_unregister()
Fixes these errors:
include/linux/sunrpc/stats.h:72:63: error: parameter name omitted
include/linux/sunrpc/stats.h:73:46: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'net'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch makes it possible to create NFS program entry ("/proc/net/rpc/nfs")
in passed network namespace context instead of hard-coded "init_net".
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
All cache users now uses network-namespace-aware routines, so generic ones
are obsolete.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
This patch makes GSS auth cache details allocated and registered per network
namespace context.
Thus with this patch rsi_cache and rsc_cache contents for network namespace "X"
are controlled from proc file system mount for the same network namespace "X".
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
This patch prepares infrastructure for network namespace aware cache detail
allocation.
One note about adding network namespace link to cache structure. It's going to
be used later in NFS DNS cache parsing routine (nfs_dns_parse for rpc_pton()
call).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
On service shutdown we can be sure, that no more users of it left except
current. Thus it looks like using current network namespace context is safe in
this case.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Lockd and NFSd services will handle requests from and to many network
nsamespaces. And thus have to be registered and unregistered per network
namespace.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Parametrize rpc_uaddr2sockaddr() by network context and thus force it's callers to pass
in network context instead of using hard-coded "init_net".
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Parametrize rpc_pton() by network context and thus force it's callers to pass
in network context instead of using hard-coded "init_net".
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>