linux-hardened/net
Eric Dumazet 75c119afe1 tcp: implement rb-tree based retransmit queue
Using a linear list to store all skbs in write queue has been okay
for quite a while : O(N) is not too bad when N < 500.

Things get messy when N is the order of 100,000 : Modern TCP stacks
want 10Gbit+ of throughput even with 200 ms RTT flows.

40 ns per cache line miss means a full scan can use 4 ms,
blowing away CPU caches.

SACK processing often can use various hints to avoid parsing
whole retransmit queue. But with high packet losses and/or high
reordering, hints no longer work.

Sender has to process thousands of unfriendly SACK, accumulating
a huge socket backlog, burning a cpu and massively dropping packets.

Using an rb-tree for retransmit queue has been avoided for years
because it added complexity and overhead, but now is the time
to be more resistant and say no to quadratic behavior.

1) RTX queue is no longer part of the write queue : already sent skbs
are stored in one rb-tree.

2) Since reaching the head of write queue no longer needs
sk->sk_send_head, we added an union of sk_send_head and tcp_rtx_queue

Tested:

 On receiver :
 netem on ingress : delay 150ms 200us loss 1
 GRO disabled to force stress and SACK storms.

for f in `seq 1 10`
do
 ./netperf -H lpaa6 -l30 -- -K bbr -o THROUGHPUT|tail -1
done | awk '{print $0} {sum += $0} END {printf "%7u\n",sum}'

Before patch :

323.87
351.48
339.59
338.62
306.72
204.07
304.93
291.88
202.47
176.88
   2840

After patch:

1700.83
2207.98
2070.17
1544.26
2114.76
2124.89
1693.14
1080.91
2216.82
1299.94
  18053

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07 00:28:54 +01:00
..
6lowpan
9p net/9p: switch p9_fd_read to kernel_write 2017-09-04 19:05:16 -04:00
802 net: introduce __skb_put_[zero, data, u8] 2017-06-20 13:30:14 -04:00
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
appletalk networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
atm net: atm: make atmdev_ops const 2017-08-09 22:43:50 -07:00
ax25 net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t 2017-07-04 22:35:19 +01:00
batman-adv This cleanup patchset includes the following patches: 2017-10-06 10:12:52 -07:00
bluetooth Revert "Bluetooth: Add option for disabling legacy ioctl interfaces" 2017-09-28 13:20:32 -07:00
bpf bpf: add meta pointer for direct access 2017-09-26 13:36:44 -07:00
bridge net: bridge: Pass extack to down to netdev_master_upper_dev_link 2017-10-04 21:39:34 -07:00
caif net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
can rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
ceph libceph: don't allow bidirectional swap of pg-upmap-items 2017-09-19 20:34:29 +02:00
core Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
dcb rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
dccp net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() 2017-08-31 11:43:47 -07:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2017-09-03 17:08:42 -07:00
dns_resolver
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
ethernet networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
hsr net/hsr: Check skb_put_padto() return value 2017-08-22 13:40:23 -07:00
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-05 20:03:35 -07:00
ife
ipv4 tcp: implement rb-tree based retransmit queue 2017-10-07 00:28:54 +01:00
ipv6 net/ipv6: Convert icmpv6_push_pending_frames to void 2017-10-06 09:52:31 -07:00
ipx net, ipx: convert ipx_route.refcnt from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
iucv iucv: Convert sk_wmem_alloc accesses to refcount_t. 2017-07-03 02:31:22 -07:00
kcm kcm: Remove redundant unlikely() 2017-09-26 09:54:06 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-15 20:23:23 -07:00
l2tp l2tp: fix l2tp_eth module loading 2017-10-01 22:35:07 -07:00
l3mdev
lapb net, lapb: convert lapb_cb.refcnt from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
llc net, llc: convert llc_sap.refcnt from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
mac80211 mac80211: fix deadlock in driver-managed RX BA session start 2017-09-06 15:22:02 +02:00
mac802154 mac802154: Fix MAC header and payload encrypted 2017-09-20 13:37:16 +02:00
mpls rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
ncsi net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references 2017-09-05 09:11:45 -07:00
netfilter netfilter: ipset: ipset list may return wrong member count for set with timeout 2017-09-18 17:35:32 +02:00
netlabel
netlink netlink: do not proceed if dump's start() errs 2017-09-30 16:13:31 +01:00
netrom net, netrom: convert nr_node.refcount from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
nfc net: nfc: llcp_core: use setup_timer() helper. 2017-09-25 13:19:20 -07:00
nsh nsh: add GSO support 2017-08-29 15:16:52 -07:00
openvswitch net: Add extack to upper device linking 2017-10-04 21:39:33 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
phonet rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
psample networking: make skb_put & friends return void pointers 2017-06-16 11:48:39 -04:00
qrtr rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
rds RDS: IB: Initialize max_items based on underlying device attributes 2017-10-05 21:16:33 -07:00
rfkill net: rfkill: gpio: Switch to devm_acpi_dev_add_driver_gpios() 2017-06-13 11:07:51 +02:00
rose
rxrpc rxrpc: Make service connection lookup always check for retry 2017-09-05 14:39:17 -07:00
sched net: add rb_to_skb() and other rb tree helpers 2017-10-07 00:28:53 +01:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-23 10:16:53 -07:00
strparser strparser: initialize all callbacks 2017-08-24 21:57:50 -07:00
sunrpc IB: Correct MR length field to be 64-bit 2017-09-25 11:47:23 -04:00
switchdev net: switchdev: Remove bridge bypass support from switchdev 2017-08-07 14:48:48 -07:00
tipc tipc: use only positive error codes in messages 2017-10-01 04:03:35 +01:00
tls tls: make tls_sw_free_resources static 2017-09-14 09:55:21 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-21 17:06:42 -07:00
vmw_vsock VSOCK: add sock_diag interface 2017-10-05 18:44:17 -07:00
wimax
wireless nl80211: fix null-ptr dereference on invalid mesh configuration 2017-09-18 22:51:07 +02:00
x25 X25: constify null_x25_address 2017-08-03 09:13:51 -07:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-01 17:42:05 -07:00
compat.c net: compat: assert the size of cmsg copied in is as expected 2017-09-20 15:36:18 -07:00
Kconfig net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. 2017-09-04 13:25:20 +02:00
Makefile nsh: add GSO support 2017-08-29 15:16:52 -07:00
socket.c net: fixes for skb_send_sock 2017-08-16 11:27:52 -07:00
sysctl_net.c