linux-hardened/net
Chuck Lever ae72950abf xprtrdma: Add data structure to manage RDMA Send arguments
Problem statement:

Recently Sagi Grimberg <sagi@grimberg.me> observed that kernel RDMA-
enabled storage initiators don't handle delayed Send completion
correctly. If Send completion is delayed beyond the end of a ULP
transaction, the ULP may release resources that are still being used
by the HCA to complete a long-running Send operation.

This is a common design trait amongst our initiators. Most Send
operations are faster than the ULP transaction they are part of.
Waiting for a completion for these is typically unnecessary.

Infrequently, a network partition or some other problem crops up
where an ordering problem can occur. In NFS parlance, the RPC Reply
arrives and completes the RPC, but the HCA is still retrying the
Send WR that conveyed the RPC Call. In this case, the HCA can try
to use memory that has been invalidated or DMA unmapped, and the
connection is lost. If that memory has been re-used for something
else (possibly not related to NFS), and the Send retransmission
exposes that data on the wire.

Thus we cannot assume that it is safe to release Send-related
resources just because a ULP reply has arrived.

After some analysis, we have determined that the completion
housekeeping will not be difficult for xprtrdma:

 - Inline Send buffers are registered via the local DMA key, and
   are already left DMA mapped for the lifetime of a transport
   connection, thus no additional handling is necessary for those
 - Gathered Sends involving page cache pages _will_ need to
   DMA unmap those pages after the Send completes. But like
   inline send buffers, they are registered via the local DMA key,
   and thus will not need to be invalidated

In addition, RPC completion will need to wait for Send completion
in the latter case. However, nearly always, the Send that conveys
the RPC Call will have completed long before the RPC Reply
arrives, and thus no additional latency will be accrued.

Design notes:

In this patch, the rpcrdma_sendctx object is introduced, and a
lock-free circular queue is added to manage a set of them per
transport.

The RPC client's send path already prevents sending more than one
RPC Call at the same time. This allows us to treat the consumer
side of the queue (rpcrdma_sendctx_get_locked) as if there is a
single consumer thread.

The producer side of the queue (rpcrdma_sendctx_put_locked) is
invoked only from the Send completion handler, which is a single
thread of execution (soft IRQ).

The only care that needs to be taken is with the tail index, which
is shared between the producer and consumer. Only the producer
updates the tail index. The consumer compares the head with the
tail to ensure that the a sendctx that is in use is never handed
out again (or, expressed more conventionally, the queue is empty).

When the sendctx queue empties completely, there are enough Sends
outstanding that posting more Send operations can result in a Send
Queue overflow. In this case, the ULP is told to wait and try again.
This introduces strong Send Queue accounting to xprtrdma.

As a final touch, Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
suggested a mechanism that does not require signaling every Send.
We signal once every N Sends, and perform SGE unmapping of N Send
operations during that one completion.

Reported-by: Sagi Grimberg <sagi@grimberg.me>
Suggested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17 13:47:56 -05:00
..
6lowpan
9p net/9p: switch p9_fd_read to kernel_write 2017-09-04 19:05:16 -04:00
802 net: introduce __skb_put_[zero, data, u8] 2017-06-20 13:30:14 -04:00
8021q net: 8021q: skip packets if the vlan is down 2017-10-04 18:16:48 -07:00
appletalk networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
atm net: atm: make atmdev_ops const 2017-08-09 22:43:50 -07:00
ax25 net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t 2017-07-04 22:35:19 +01:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-09 16:28:45 -07:00
bluetooth Revert "Bluetooth: Add option for disabling legacy ioctl interfaces" 2017-09-28 13:20:32 -07:00
bpf
bridge netfilter: ebtables: fix race condition in frame_filter_net_init() 2017-09-29 13:36:06 +02:00
caif net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
can rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
ceph libceph: don't allow bidirectional swap of pg-upmap-items 2017-09-19 20:34:29 +02:00
core net: rtnetlink: fix info leak in RTM_GETSTATS call 2017-10-03 10:18:00 -07:00
dcb rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
dccp net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() 2017-08-31 11:43:47 -07:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2017-09-03 17:08:42 -07:00
dns_resolver
dsa net: dsa: Fix network device registration order 2017-09-28 10:12:53 -07:00
ethernet networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
hsr net/hsr: Check skb_put_padto() return value 2017-08-22 13:40:23 -07:00
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-05 20:03:35 -07:00
ife
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2017-10-09 10:39:52 -07:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2017-10-09 10:39:52 -07:00
ipx net, ipx: convert ipx_route.refcnt from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
iucv iucv: Convert sk_wmem_alloc accesses to refcount_t. 2017-07-03 02:31:22 -07:00
kcm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-01 17:42:05 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-15 20:23:23 -07:00
l2tp l2tp: fix l2tp_eth module loading 2017-10-01 22:35:07 -07:00
l3mdev
lapb net, lapb: convert lapb_cb.refcnt from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
llc net, llc: convert llc_sap.refcnt from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
mac80211 mac80211: fix deadlock in driver-managed RX BA session start 2017-09-06 15:22:02 +02:00
mac802154
mpls rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
ncsi net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references 2017-09-05 09:11:45 -07:00
netfilter netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-10-09 15:18:04 +02:00
netlabel
netlink netlink: do not set cb_running if dump's start() errs 2017-10-09 10:27:49 -07:00
netrom net, netrom: convert nr_node.refcount from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
nfc NFC: Add sockaddr length checks before accessing sa_family in bind handlers 2017-06-23 00:38:31 +02:00
nsh nsh: add GSO support 2017-08-29 15:16:52 -07:00
openvswitch openvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()' 2017-09-12 20:37:31 -07:00
packet packet: only test po->has_vnet_hdr once in packet_snd 2017-09-28 10:24:31 -07:00
phonet rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
psample networking: make skb_put & friends return void pointers 2017-06-16 11:48:39 -04:00
qrtr rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
rds rds: Fix incorrect statistics counting 2017-09-07 20:07:13 -07:00
rfkill
rose
rxrpc rxrpc: Make service connection lookup always check for retry 2017-09-05 14:39:17 -07:00
sched net_sched: remove cls_flower idr on failure 2017-09-21 15:13:52 -07:00
sctp sctp: Fix a big endian bug in sctp_diag_dump() 2017-09-26 21:16:29 -07:00
smc net/smc: no close wait in case of process shut down 2017-09-21 15:31:03 -07:00
strparser strparser: initialize all callbacks 2017-08-24 21:57:50 -07:00
sunrpc xprtrdma: Add data structure to manage RDMA Send arguments 2017-11-17 13:47:56 -05:00
switchdev net: switchdev: Remove bridge bypass support from switchdev 2017-08-07 14:48:48 -07:00
tipc tipc: Unclone message at secondary destination lookup 2017-10-08 21:13:23 -07:00
tls tls: make tls_sw_free_resources static 2017-09-14 09:55:21 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-21 17:06:42 -07:00
vmw_vsock hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK) 2017-08-28 15:38:18 -07:00
wimax
wireless nl80211: Define policy for packet pattern attributes 2017-10-04 14:39:21 +02:00
x25 X25: constify null_x25_address 2017-08-03 09:13:51 -07:00
xfrm xfrm: don't call xfrm_policy_cache_flush under xfrm_state_lock 2017-09-28 09:39:05 +02:00
compat.c net: compat: assert the size of cmsg copied in is as expected 2017-09-20 15:36:18 -07:00
Kconfig net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. 2017-09-04 13:25:20 +02:00
Makefile nsh: add GSO support 2017-08-29 15:16:52 -07:00
socket.c net: fixes for skb_send_sock 2017-08-16 11:27:52 -07:00
sysctl_net.c