linux-hardened/net
David Ahern 58956317c8 neighbor: Improve garbage collection
The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:03:10 -08:00
..
6lowpan
9p Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-03 10:35:52 -07:00
802
8021q net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
appletalk
atm Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
ax25 ax25: remove blank line at EOF 2018-07-24 14:10:42 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-19 10:55:00 -08:00
bluetooth net: core: dev: Add extack argument to dev_open() 2018-12-06 13:26:06 -08:00
bpf bpf: add tests for direct packet access from CGROUP_SKB 2018-10-19 13:49:34 -07:00
bpfilter net: bpfilter: Set user mode helper's command line 2018-10-22 19:37:36 -07:00
bridge bridge: Add br_fdb_clear_offload() 2018-12-07 12:59:08 -08:00
caif Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
can can: raw: check for CAN FD capable netdev in raw_sendmsg() 2018-11-09 17:19:34 +01:00
ceph libceph: fall back to sendmsg for slab pages 2018-11-19 17:59:47 +01:00
core neighbor: Improve garbage collection 2018-12-07 16:03:10 -08:00
dcb net: dcb: Add priority-to-DSCP map getters 2018-07-27 13:17:50 -07:00
dccp net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
decnet net/decnet: add missing indentation 2018-11-16 19:42:49 -08:00
dns_resolver dns: Allow the dns resolver to retrieve a server set 2018-10-04 09:40:52 -07:00
dsa net: dsa: Set the master device's MTU to account for DSA overheads 2018-12-06 12:18:17 -08:00
ethernet net: ethernet: provide nvmem_get_mac_address() 2018-12-03 15:40:30 -08:00
hsr
ieee802154 net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net 2018-09-21 19:45:52 -07:00
ife
ipv4 net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
ipv6 net: core: dev: Add extack argument to dev_open() 2018-12-06 13:26:06 -08:00
iucv iucv: Remove SKB list assumptions. 2018-11-10 16:55:11 -08:00
kcm Revert "kcm: remove any offset before parsing messages" 2018-09-17 18:43:42 -07:00
key Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-07-27 09:33:37 -07:00
l2tp l2tp: fix a sock refcnt leak in l2tp_tunnel_register 2018-11-14 22:49:31 -08:00
l3mdev l3mdev: add function to retreive upper master 2018-12-03 14:15:26 -08:00
lapb
llc llc: do not use sk_eat_skb() 2018-10-22 19:59:20 -07:00
mac80211 mac80211: implement ieee80211_tx_rate_update to update rate 2018-10-12 13:05:40 +02:00
mac802154 mac802154: Remove VLA usage of skcipher 2018-09-28 12:46:07 +08:00
mpls net/mpls: Handle kernel side filtering of route dumps 2018-10-16 00:14:07 -07:00
ncsi net/ncsi: Add NCSI Mellanox OEM command 2018-11-27 16:37:20 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-28 22:10:54 -08:00
netlabel netlabel: check for IPV4MASK in addrinfo_get 2018-09-21 18:58:34 -07:00
netlink netlink: Add answer_flags to netlink_callback 2018-10-16 00:13:12 -07:00
netrom
nfc Merge branch 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-10-24 14:43:41 +01:00
nsh
openvswitch net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
packet packet: copy user buffers before orphan or clone 2018-11-23 11:08:03 -08:00
phonet
psample
qrtr
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-12 21:38:46 -07:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-04 21:33:03 -07:00
rose
rxrpc rxrpc: Fix life check 2018-11-15 11:35:40 -08:00
sched net: netem: use a list in addition to rbtree 2018-12-05 20:18:41 -08:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-28 22:10:54 -08:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-24 17:01:43 -08:00
strparser bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-19 10:55:00 -08:00
switchdev switchdev: Replace port obj add/del SDO with a notification 2018-11-23 18:02:24 -08:00
tipc tipc: fix node keep alive interval calculation 2018-12-05 20:52:31 -08:00
tls bpf: helper to pop data from messages 2018-11-28 22:07:57 +01:00
unix Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
vmw_vsock vsock: split dwork to avoid reinitializations 2018-08-07 12:39:13 -07:00
wimax wimax: remove blank lines at EOF 2018-07-24 14:10:42 -07:00
wireless nl80211: Add per peer statistics to compute FCS error rate 2018-10-12 12:56:34 +02:00
x25 x25: remove blank lines at EOF 2018-07-24 14:10:42 -07:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00
xfrm Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-11-03 18:25:17 -07:00
compat.c y2038: socket: Change recvmmsg to use __kernel_timespec 2018-08-29 15:42:24 +02:00
Kconfig bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
Makefile
socket.c socket: do a generic_file_splice_read when proto_ops has no splice_read 2018-11-17 21:34:11 -08:00
sysctl_net.c