linux-hardened/drivers/net
Jacob Keller 671889e674 i40e: avoid race condition when sending filters to firmware for addition
Refactor how we add new filters to firmware to avoid a race condition
that can occur due to removing filters from the hash temporarily.

To understand the race condition, suppose that you have a number of MAC
filters, but have not yet added any VLANs. Now, add two VLANs in rapid
succession. A possible resulting flow would look something like the
following:

(1) lock hash for add VLAN
(2) add the new MAC/VLAN combos for each current MAC filter
(3) unlock hash
(4) lock hash for filter sync
(5) notice that we have a VLAN, so prepare to update all MAC filters
    with VLAN=-1 to be VLAN=0.
(6) move NEW and REMOVE filters to temporary list
(7) unlock hash
(8) lock hash for add VLAN
(9) add new MAC/VLAN combos. Notice that no MAC filters are currently in
    the hash list, so we don't add any VLANs <--- BUG!
(10) unlock hash
(11) sync the temporary lists to firmware
(12) lock hash for post-sync
(13) move the temporary elements back to the main list
....

Because we take filters out of the main hash into temporary lists, we
introduce a narrow window where it is possible that other callers to the
list will not see some of the filters which were previously added but
have not yet been finalized. This results in sometimes dropping VLAN
additions, and could also result in failing to add a MAC address on the
newly added VLAN.

One obvious way to avoid this race condition would be to lock the entire
firmware process. Unfortunately this does not work because adminq
firmware commands take a mutex which results in a sleep while atomic
BUG(). So, we can't use the simplest approach.

An alternative approach is to simply not remove the filters from the
hash list while adding. Instead, add an i40e_new_mac_filter structure
which we will use to track added filters. This avoids the need to remove
the filter from the hash list. We'll store a pointer to the original
i40e_mac_filter, along with our own copy of the state.

We won't update the state directly, so as to avoid race with other code
that may modify the state while under the lock. We are safe to read
f->macaddr and f->vlan since these only change in two locations. The
first is on filter creation, which must have already occurred. The
second is inside i40e_correct_vlan_filters which was previously run
after creation of this object and can't be run again until after. Thus,
we should be safe to read the MAC address and VLAN while outside the
lock.

We also aren't going to run into a use-after-free issue because the only
place where we free filters is when they are marked FAILED or when we
remove them inside the sync subtask. Since the subtask has its own
critical flag to prevent duplicate runs, we know this won't happen. We
also know that the only location to transition a filter from NEW to
FAILED is inside the subtask also, so we aren't worried about that
either.

Use the wrapper i40e_new_mac_filter for additions, and once we've
finalized the addition to firmware, we will update the filter state
inside a lock, and then free the wrapper structure.

In order to avoid a possible race condition with filter deletion, we
won't update the original filter state unless it is still
I40E_FILTER_NEW when we finish the firmware sync.

This approach is more complex, but avoids race conditions related to
filters being temporarily removed from the list. We do not need the same
behavior for deletion because we always unconditionally removed the
filters from the list regardless of the firmware status.

Change-Id: I14b74bc2301f8e69433fbe77ebca532db20c5317
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-11 20:39:01 -08:00
..
appletalk net/appletalk: Fix kernel memory disclosure 2017-01-09 16:34:39 -05:00
arcnet
bonding net: remove ndo_neigh_{construct, destroy} from stacked devices 2017-02-06 11:25:57 -05:00
caif
can can: flexcan: switch imx6 and vf610 to timestamp based offloading 2017-02-06 15:13:45 +01:00
cris
dsa net: dsa: remove unnecessary phy*.h includes 2017-02-10 13:51:04 -05:00
ethernet i40e: avoid race condition when sending filters to firmware for addition 2017-02-11 20:39:01 -08:00
fddi fddi: skfp: Use more common logging styles 2016-12-29 11:37:14 -05:00
fjes drivers: net: generalize napi_complete_done() 2017-01-30 15:10:42 -05:00
hamradio NET: mkiss: Fix panic 2017-02-10 13:41:13 -05:00
hippi Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
hyperv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-07 16:29:30 -05:00
ieee802154 ieee802154: atusb: fix driver to work with older firmware versions 2017-01-12 22:12:43 +01:00
ipvlan ipvtap: IP-VLAN based tap driver 2017-02-11 20:59:41 -05:00
irda net: Remove usage of net_device last_rx member 2017-01-18 17:22:49 -05:00
phy Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
plip
ppp net: make ndo_get_stats64 a void function 2017-01-08 17:51:44 -05:00
slip net: make ndo_get_stats64 a void function 2017-01-08 17:51:44 -05:00
team net: remove ndo_neigh_{construct, destroy} from stacked devices 2017-02-06 11:25:57 -05:00
usb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
vmxnet3 drivers: net: generalize napi_complete_done() 2017-01-30 15:10:42 -05:00
wan net: wan: slic_ds26522: Remove .owner field for driver 2017-02-07 11:41:15 -05:00
wimax
wireless Some more updates: 2017-02-10 14:31:51 -05:00
xen-netback drivers: net: generalize napi_complete_done() 2017-01-30 15:10:42 -05:00
dummy.c net: dummy: Introduce dummy virtual functions 2017-01-24 14:07:22 -05:00
eql.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
geneve.c geneve: avoid use-after-free of skb->data 2016-12-02 14:07:11 -05:00
gtp.c gtp: let userspace handle packets for invalid tunnels 2017-01-29 13:54:00 -05:00
ifb.c net-tc: convert tc_from to tc_from_ingress and tc_redirected 2017-01-08 20:58:52 -05:00
Kconfig ipvtap: IP-VLAN based tap driver 2017-02-11 20:59:41 -05:00
LICENSE.SRC
loopback.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
macsec.c net: make ndo_get_stats64 a void function 2017-01-08 17:51:44 -05:00
macvlan.c tap: Abstract type of virtual interface from tap implementation 2017-02-11 20:59:41 -05:00
macvtap.c tap: tap as an independent module 2017-02-11 20:59:41 -05:00
Makefile ipvtap: IP-VLAN based tap driver 2017-02-11 20:59:41 -05:00
mdio.c net: mdio: add mdio45_ethtool_ksettings_get 2017-01-02 16:59:10 -05:00
mii.c
netconsole.c
nlmon.c net: make ndo_get_stats64 a void function 2017-01-08 17:51:44 -05:00
ntb_netdev.c
rionet.c
sb1000.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
Space.c
sungem_phy.c
tap.c tap: tap as an independent module 2017-02-11 20:59:41 -05:00
tun.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-07 16:29:30 -05:00
veth.c net: make ndo_get_stats64 a void function 2017-01-08 17:51:44 -05:00
virtio_net.c virtio_net: XDP support for adjust_head 2017-02-07 10:05:12 -05:00
vrf.c net: rename dst_neigh_output back to neigh_output 2017-02-11 21:25:18 -05:00
vxlan.c vxlan: remove vni zero check and drop for COLLECT_METADATA 2017-02-11 21:25:50 -05:00
xen-netfront.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00