linux-hardened/net
Neil Brown 61d0a8e6a8 NFS/RPC: fix problems with reestablish_timeout and related code.
[[resending with correct cc:  - "vfs.kernel.org" just isn't right!]]

xprt->reestablish_timeout is used to cause TCP connection attempts to
back off if the connection fails so as not to hammer the network,
but to still allow immediate connections when there is no reason to
believe there is a problem.

It is not used for the first connection (when transport->sock is NULL)
but only on reconnects.

It is currently set:

 a/ to 0 when xs_tcp_state_change finds a state of TCP_FIN_WAIT1
    on the assumption that the client has closed the connection
    so the reconnect should be immediate when needed.
 b/ to at least XS_TCP_INIT_REEST_TO when xs_tcp_state_change
    detects TCP_CLOSING or TCP_CLOSE_WAIT on the assumption that the
    server closed the connection so a small delay at least is
    required.
 c/ as above when xs_tcp_state_change detects TCP_SYN_SENT, so that
    it is never 0 while a connection has been attempted, else
    the doubling will produce 0 and there will be no backoff.
 d/ to double is value (up to a limit) when delaying a connection,
    thus providing exponential backoff and
 e/ to XS_TCP_INIT_REEST_TO in xs_setup_tcp as simple initialisation.

So you can see it is highly dependant on xs_tcp_state_change being
called as expected.  However experimental evidence shows that
xs_tcp_state_change does not see all state changes.
("rpcdebug -m rpc trans" can help show what actually happens).

Results show:
 TCP_ESTABLISHED is reported when a connection is made.  TCP_SYN_SENT
 is never reported, so rule 'c' above is never effective.

 When the server closes the connection, TCP_CLOSE_WAIT and
 TCP_LAST_ACK *might* be reported, and TCP_CLOSE is always
 reported.  This rule 'b' above will sometimes be effective, but
 not reliably.

 When the client closes the connection, it used to result in
 TCP_FIN_WAIT1, TCP_FIN_WAIT2, TCP_CLOSE.  However since commit
 f75e674 (SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on
 reconnect) we don't see *any* events on client-close.  I think this
 is because xs_restore_old_callbacks is called to disconnect
 xs_tcp_state_change before the socket is closed.
 In any case, rule 'a' no longer applies.

So all that is left are rule d, which successfully doubles the
timeout which is never rest, and rule e which initialises the timeout.

Even if the rules worked as expected, there would be a problem because
a successful connection does not reset the timeout, so a sequence
of events where the server closes the connection (e.g. during failover
testing) will cause longer and longer timeouts with no good reason.

This patch:

 - sets reestablish_timeout to 0 in xs_close thus effecting rule 'a'
 - sets it to 0 in xs_tcp_data_ready to ensure that a successful
   connection resets the timeout
 - sets it to at least XS_TCP_INIT_REEST_TO after it is doubled,
   thus effecting rule c

I have not reimplemented rule b and the new version of rule c
seems sufficient.

I suspect other code in xs_tcp_data_ready needs to be revised as well.
For example I don't think connect_cookie is being incremented as often
as it should be.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-23 14:36:37 -04:00
..
9p virtio: add virtio IDs file 2009-09-23 22:26:32 +09:30
802 net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
8021q vlan: adds drops accounting 2009-09-03 20:02:17 -07:00
appletalk Have atalk_route_packet() return NET_RX_SUCCESS not NET_XMIT_SUCCESS 2009-09-14 17:02:47 -07:00
atm atm/br2684: netif_stop_queue() when atm device busy and netif_wake_queue() when we can send packets again. 2009-09-02 23:46:10 -07:00
ax25 net: Move rx skb_orphan call to where needed 2009-06-23 16:36:25 -07:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2009-09-22 07:51:28 -07:00
bridge net: Add DEVTYPE support for Ethernet based devices 2009-09-11 12:54:55 -07:00
can can: fix NOHZ local_softirq_pending 08 warning 2009-09-15 01:31:34 -07:00
core mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
dcb dcbnl: Add implementations of dcbnl setapp/getapp commands 2009-09-01 01:24:36 -07:00
dccp mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
decnet mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
dsa netdev: convert pseudo-devices to netdev_tx_t 2009-09-01 01:13:07 -07:00
econet Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-08-12 17:44:53 -07:00
ethernet net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
ieee802154 ieee802154: add locking for seq numbers 2009-09-15 18:25:16 +04:00
ipv4 mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
ipv6 seq_file: constify seq_operations 2009-09-23 07:39:29 -07:00
ipx headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
irda net: file_operations should be const 2009-09-02 01:03:53 -07:00
iucv af_iucv: fix race when queueing skbs on the backlog queue 2009-09-16 20:57:39 -07:00
key net: file_operations should be const 2009-09-02 01:03:53 -07:00
lapb net: remove NET_RX_BAD and NET_RX_CN* defines 2009-07-05 19:15:35 -07:00
llc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-09-02 00:32:56 -07:00
mac80211 rc80211_minstrel: fix contention window calculation 2009-09-16 16:21:00 -04:00
netfilter mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
netlabel Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-07-30 19:22:43 -07:00
netlink mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
netrom Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-09-02 00:32:56 -07:00
packet af_packet: style cleanups 2009-07-23 18:01:10 -07:00
phonet Phonet: Netlink event for autoconfigured addresses 2009-09-14 17:03:27 -07:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2009-09-17 20:53:52 -07:00
rfkill rfkill: add the GPS radio type 2009-08-04 16:44:23 -04:00
rose net: constify remaining proto_ops 2009-09-14 17:03:09 -07:00
rxrpc trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
sched trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
sctp mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
sunrpc NFS/RPC: fix problems with reestablish_timeout and related code. 2009-09-23 14:36:37 -04:00
tipc tipc: fix test of bearer_priority range in tipc_register_media() 2009-08-29 00:19:42 -07:00
unix net: unix: fix sending fds in multiple buffers 2009-09-11 11:31:45 -07:00
wanrouter headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
wimax wimax: fix warning caused by not checking retval of rfkill_set_hw_state() 2009-06-11 11:12:48 -07:00
wireless trivial: remove unnecessary semicolons 2009-09-21 15:14:58 +02:00
x25 headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
xfrm net: file_operations should be const 2009-09-02 01:03:53 -07:00
compat.c net/compat/wext: send different messages to compat tasks 2009-07-15 08:53:39 -07:00
Kconfig net/compat/wext: send different messages to compat tasks 2009-07-15 08:53:39 -07:00
Makefile net: remove redundant sched/ in net/Makefile 2009-07-12 20:11:14 -07:00
nonet.c
socket.c Move magic numbers into magic.h 2009-09-23 07:39:28 -07:00
sysctl_net.c net: sysctl_net - use net_eq to compare nets 2009-03-16 16:23:30 +01:00
TUNABLE