linux-hardened/net
David S. Miller 5e9965c15b Merge branch 'kill_rtcache'
The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables.  The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.

The general flow of this patch series is that first the routing cache
is removed.  We build a completely new rtable entry every lookup
request.

Next we make some simplifications due to the fact that removing the
routing cache causes several members of struct rtable to become no
longer necessary.

Then we need to make some amends such that we can legally cache
pre-constructed routes in the FIB nexthops.  Firstly, we need to
invalidate routes which are hit with nexthop exceptions.  Secondly we
have to change the semantics of rt->rt_gateway such that zero means
that the destination is on-link and non-zero otherwise.

Now that the preparations are ready, we start caching precomputed
routes in the FIB nexthops.  Output and input routes need different
kinds of care when determining if we can legally do such caching or
not.  The details are in the commit log messages for those changes.

The patch series then winds down with some more struct rtable
simplifications and other tidy ups that remove unnecessary overhead.

On a SPARC-T3 output route lookups are ~876 cycles.  Input route
lookups are ~1169 cycles with rpfilter disabled, and about ~1468
cycles with rpfilter enabled.

These measurements were taken with the kbench_mod test module in the
net_test_tools GIT tree:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git

That GIT tree also includes a udpflood tester tool and stresses
route lookups on packet output.

For example, on the same SPARC-T3 system we can run:

	time ./udpflood -l 10000000 10.2.2.11

with routing cache:
real    1m21.955s       user    0m6.530s        sys     1m15.390s

without routing cache:
real    1m31.678s       user    0m6.520s        sys     1m25.140s

Performance undoubtedly can easily be improved further.

For example fib_table_lookup() performs a lot of excessive
computations with all the masking and shifting, some of it
conditionalized to deal with edge cases.

Also, Eric's no-ref optimization for input route lookups can be
re-instated for the FIB nexthop caching code path.  I would be really
pleased if someone would work on that.

In fact anyone suitable motivated can just fire up perf on the loading
of the test net_test_tools benchmark kernel module.  I spend much of
my time going:

bash# perf record insmod ./kbench_mod.ko dst=172.30.42.22 src=74.128.0.1 iif=2
bash# perf report

Thanks to helpful feedback from Joe Perches, Eric Dumazet, Ben
Hutchings, and others.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22 17:04:15 -07:00
..
9p net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
802 tokenring: delete all remaining driver support 2012-05-15 20:23:16 -04:00
8021q netpoll: move np->dev and np->dev_name init into __netpoll_setup() 2012-07-17 09:02:36 -07:00
appletalk net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
atm net: Remove casts to same type 2012-06-04 11:45:11 -04:00
ax25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-19 11:17:30 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-10 23:56:33 -07:00
bluetooth Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
bridge net: fix race condition in several drivers when reading stats 2012-07-22 12:12:32 -07:00
caif Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-19 11:17:30 -07:00
can can: gw: Remove pointless casts 2012-07-10 22:36:17 +02:00
ceph net: Fix non-kernel-doc comments with kernel-doc start marker 2012-07-10 23:13:45 -07:00
core Merge branch 'kill_rtcache' 2012-07-22 17:04:15 -07:00
dcb net: Fix non-kernel-doc comments with kernel-doc start marker 2012-07-10 23:13:45 -07:00
dccp ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code. 2012-07-20 13:36:54 -07:00
decnet net: Document dst->obsolete better. 2012-07-20 13:31:21 -07:00
dns_resolver Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-05-21 20:27:36 -07:00
dsa dsa: Convert compare_ether_addr to ether_addr_equal 2012-05-09 20:49:19 -04:00
ethernet ipx: move peII functions 2012-07-19 10:48:00 -07:00
ieee802154 6lowpan: Change byte order when storing/accessing to len field 2012-07-16 22:52:02 -07:00
ipv4 Merge branch 'kill_rtcache' 2012-07-22 17:04:15 -07:00
ipv6 net: Document dst->obsolete better. 2012-07-20 13:31:21 -07:00
ipx ipx: move peII functions 2012-07-19 10:48:00 -07:00
irda irda: Fix typo in irda 2012-07-16 23:23:52 -07:00
iucv net: remove skb_orphan_try() 2012-06-15 15:30:15 -07:00
key net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
l2tp net: l2tp_eth: provide tx_dropped counter 2012-06-29 00:52:32 -07:00
lapb lapb: Neaten debugging 2012-05-17 18:45:20 -04:00
llc net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
mac802154 mac802154: sparse warnings: make symbols static 2012-07-12 07:54:45 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-19 11:17:30 -07:00
netlabel netlabel: use GFP flags from caller instead of GFP_ATOMIC 2012-03-22 19:29:57 -04:00
netlink net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
netrom net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
nfc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
openvswitch Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch 2012-07-20 16:16:34 -07:00
packet net: added support for 40GbE link. 2012-06-27 15:42:24 -07:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
rds net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
rfkill rfkill: Add the capability to switch all devices of all type in __rfkill_switch_all(). 2012-06-06 15:18:17 -04:00
rose net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-10 23:56:33 -07:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-19 11:17:30 -07:00
sctp Merge branch 'kill_rtcache' 2012-07-22 17:04:15 -07:00
sunrpc ipv6: add ipv6_addr_hash() helper 2012-07-18 11:28:46 -07:00
tipc tipc: remove print_buf and deprecated log buffer code 2012-07-13 19:34:43 -04:00
unix net: make sock diag per-namespace 2012-07-16 22:31:34 -07:00
wanrouter net/wanrouter: Deprecate and schedule for removal 2012-05-24 16:22:53 -04:00
wimax net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
x25 net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
xfrm net: Document dst->obsolete better. 2012-07-20 13:31:21 -07:00
compat.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-05-21 20:27:36 -07:00
Kconfig net: drop NET dependency from HAVE_BPF_JIT 2012-05-21 12:50:12 -07:00
Makefile econet: remove ancient bug ridden protocol 2012-05-18 01:35:08 -04:00
nonet.c
socket.c net: netprio_cgroup: rework update socket logic 2012-07-22 12:44:01 -07:00
sysctl_net.c net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00