Commit graph

2964 commits

Author SHA1 Message Date
Or Gerlitz
abac7b011d mlxsw: spectrum: Add tos to the ipv4 acl block
Add ecn and dscp fields to the ipv4 acl block.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:18:23 -07:00
Or Gerlitz
80d0fe4710 mlxsw: acl: Add ip tos acl element
Define new element for ip tos (ecn, dscp) and place it into scratch area.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:18:23 -07:00
Or Gerlitz
fcbca8217d mlxsw: spectrum_flower: Add support for ip ttl
Support offloading rules that match on ip ttl.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:18:23 -07:00
Or Gerlitz
046759a6cf mlxsw: spectrum: Add ttl to the ipv4 acl block
Add ttl field to the ipv4 acl block.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:18:23 -07:00
Or Gerlitz
5f57e09091 mlxsw: acl: Add ip ttl acl element
Define new element for ip ttl and place it into scratch area.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:18:23 -07:00
Zhu Yanjun
e36fef66f4 mlx4_en: remove unnecessary returned value check
The function __mlx4_zone_remove_one_entry always returns zero. So
it is not necessary to check it.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-15 14:29:49 -07:00
Ido Schimmel
6f497930af mlxsw: spectrum_switchdev: Check status of memory allocation
We can't rely on kzalloc() always succeeding, so check its return value.

Suppresses the following smatch error:

mlxsw_sp_switchdev_event() error: potential null dereference
'switchdev_work->fdb_info.addr'.  (kzalloc returns
 null)

Fixes: af06137892 ("mlxsw: spectrum_switchdev: Add support for learning FDB through notification")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 08:15:52 -07:00
Ido Schimmel
a9265b804d mlxsw: spectrum_switchdev: Remove unused variable
Commit 10e23eb299 ("mlxsw: spectrum: Remove support for bypass bridge
port attributes/vlan set") removed statements that used 'bridge_vlan',
but didn't remove the variable itself resulting in the following warning
with W=1:

warning: variable ‘bridge_vlan’ set but not used
[-Wunused-but-set-variable]

Remove the variable and suppress the warning.

Fixes: 10e23eb299 ("mlxsw: spectrum: Remove support for bypass bridge port attributes/vlan set")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 08:15:52 -07:00
Ido Schimmel
7387dbbcdb mlxsw: spectrum_router: Fix use-after-free in route replace
While working on IPv6 route replace I realized we can have a
use-after-free in IPv4 in case the replaced route is offloaded and the
only one using its FIB info.

The problem is that fib_table_insert() drops the reference on the FIB
info of the replaced routes which is eventually freed via call_rcu().
Since the driver doesn't hold a reference on this FIB info it can cause
a use-after-free when it tries to clear the RTNH_F_OFFLOAD flag stored
in fi->fib_flags.

After running the following commands in a loop for enough time with a
KASAN enabled kernel I finally got the below trace.

$ ip route add 192.168.50.0/24 via 192.168.200.1 dev enp3s0np3
$ ip route replace 192.168.50.0/24 dev enp3s0np5
$ ip route del 192.168.50.0/24 dev enp3s0np5

BUG: KASAN: use-after-free in mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
Read of size 4 at addr ffff8803717d9820 by task kworker/u4:2/55
[...]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
? mlxsw_sp_router_neighs_update_work+0x1cd0/0x1ce0 [mlxsw_spectrum]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
__asan_load4+0x61/0x80
mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
mlxsw_sp_fib_entry_offload_refresh+0xb6/0x370 [mlxsw_spectrum]
mlxsw_sp_router_fib_event_work+0xd1c/0x2780 [mlxsw_spectrum]
[...]
Freed by task 5131:
 save_stack_trace+0x16/0x20
 save_stack+0x46/0xd0
 kasan_slab_free+0x70/0xc0
 kfree+0x144/0x570
 free_fib_info_rcu+0x2e7/0x410
 rcu_process_callbacks+0x4f8/0xe30
 __do_softirq+0x1d3/0x9e2

Fix this by taking a reference on the FIB info when creating the nexthop
group it represents and drop it when the group is destroyed.

Fixes: 599cf8f95f ("mlxsw: spectrum_router: Add support for route replace")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 08:15:52 -07:00
Ido Schimmel
a4e75b76b2 mlxsw: spectrum_router: Add missing rollback
With this patch the error path of mlxsw_sp_nexthop_init() is symmetric
with mlxsw_sp_nexthop_fini(). Noticed during code review.

Fixes: a8c9701427 ("mlxsw: spectrum_router: Refactor nexthop init routine")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 08:15:51 -07:00
Arnd Bergmann
de92cd6cf4 net/mlx5: IPSec, fix 64-bit division correctly
The new IPSec offload code introduced a build error:

drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.o: In function `mlx5e_ipsec_build_inverse_table':
ipsec_rxtx.c:(.text+0x556): undefined reference

Another patch was added on top to fix the build error, but
that introduced a new bug, as we now use the remainder of
the division rather than the result.

This makes it use the correct helper function instead.

Fixes: 5dfd87b67c ("net/mlx5: IPSec, Fix 64-bit division on 32-bit builds")
Fixes: 2ac9cfe782 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-10 19:34:00 +01:00
Huy Nguyen
d968f0f2e4 net/mlx5e: Initialize CEE's getpermhwaddr address buffer to 0xff
Latest change in open-lldp code uses bytes 6-11 of perm_addr buffer
as the Ethernet source address for the host TLV packet.
Since our driver does not fill these bytes, they stay at zero and
the open-lldp code ends up sending the TLV packet with zero source
address and the switch drops this packet.

The fix is to initialize these bytes to 0xff. The open-lldp code
considers 0xff:ff:ff:ff:ff:ff as the invalid address and falls back to
use the host's mac address as the Ethernet source address.

Fixes: 3a6a931dfb ("net/mlx5e: Support DCBX CEE API")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-06 15:13:20 +03:00
Ilan Tayari
fb000f7817 net/mlx5: Add Makefiles for subdirectories
Currently it is not possible to build just one .o file inside
a subdirectory, because the subdirectories lack a Makefile.

Add a Makefile to the mlx5 subdirectories.

Fixes: e29341fb3a ("net/mlx5: FPGA, Add basic support for Innova")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-06 15:13:20 +03:00
Ilan Tayari
111a676367 net/mlx5: Build wq.o even if MLX5_CORE_EN is not selected
Both the ethernet and FPGA portions of MLX5 now require the wq functions,
and we get a link error when CONFIG_MLX5_CORE_EN is disabled:

drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.o: In function `mlx5_fpga_conn_create_cq':
conn.c:(.text+0x10b3): undefined reference to `mlx5_cqwq_create'
conn.c:(.text+0x10c6): undefined reference to `mlx5_cqwq_get_size'
conn.c:(.text+0x12bc): undefined reference to `mlx5_cqwq_destroy'

Build wq.o even if MLX5_CORE_EN is not selected.

Fixes: 537a505741 ("net/mlx5: FPGA, Add high-speed connection routines")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-06 15:13:20 +03:00
Ilan Tayari
2a41d15b79 net/mlx5: FPGA, Fix datatype mismatch
Fix warnings when building with -Wall:
drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c:313:36: warning: cast to restricted __be32
drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c:314:37: warning: cast to restricted __be32

Fixes: bebb23e6cb ("net/mlx5: Accel, Add IPSec acceleration interface")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-06 15:13:20 +03:00
Ilan Tayari
c8af01692e net/mlx5: FPGA, make mlx5_fpga_device_brb static
Fix warning when building with -Wall:
drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c:105:5: warning: symbol 'mlx5_fpga_device_brb' was not declared. Should it be static?

Fixes: c43051d72a ("net/mlx5: FPGA, Add SBU bypass and reset flows")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-06 15:13:20 +03:00
Ilan Tayari
5dfd87b67c net/mlx5: IPSec, Fix 64-bit division on 32-bit builds
Fix warnings when building 386 kernel:
>> ERROR: "__udivdi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!

Fixes: 2ac9cfe782 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-06 15:13:19 +03:00
Ilan Tayari
aa07b63384 net/mlx5: Add missing include in lib/gid.c
Fix warnings when building with -Wall:
drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c:38:6: warning: symbol 'mlx5_init_reserved_gids' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c:47:6: warning: symbol 'mlx5_cleanup_reserved_gids' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c:55:5: warning: symbol 'mlx5_core_reserve_gids' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c:79:6: warning: symbol 'mlx5_core_unreserve_gids' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c:92:5: warning: symbol 'mlx5_core_reserved_gid_alloc' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c:109:6: warning: symbol 'mlx5_core_reserved_gid_free' was not declared. Should it be static?

Fixes: 52ec462eca ("net/mlx5: Add reserved-gids support")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-06 15:13:19 +03:00
David S. Miller
3a3f7d130e Merge https://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Some overlapping changes in the mlx5 driver.

A merge conflict resolution posted by Stephen Rothwell was used as a
guide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 03:42:10 -07:00
Zhu Yanjun
3b68067bd2 mlx4_en: make mlx4_log_num_mgm_entry_size static
The variable mlx4_log_num_mgm_entry_size is only called in main.c.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:41:26 -07:00
Or Gerlitz
c1c1d86bde net/mlxfw: Properly handle dependancy with non-loadable mlx5
If mlx5 is set to be built-in and mlxfw as a module, we
get a link error:

drivers/built-in.o: In function `mlx5_firmware_flash':
(.text+0x5aed72): undefined reference to `mlxfw_firmware_flash'

Since we don't want to mandate selecting mlxfw for mlx5 users, we
use the IS_REACHABLE macro to make sure that a stub is exposed
to the caller.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:32:25 -07:00
Stephen Rothwell
6992c6c5dd net/mlx5: fix memcpy limit?
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:57:27 -07:00
Colin Ian King
4120dab095 net/mlx5: fix spelling mistake: "Allodating" -> "Allocating"
Trivial fix to spelling mistake in mlx5_core_dbg debug message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 14:36:43 -07:00
David S. Miller
ea23b42739 mlx5-fixes-2017-06-28
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZU3mtAAoJEEg/ir3gV/o+4k0IAKj5XCn3cviZlXMJRMHBvamt
 yWrMI90XgjoPhGPx3K9mf+bMhHOGiZR0Q2DFDJZa5U64DDBVPNvag7fy74GYgj1D
 Cet1zohkQ2xdb/R3jfML8tG2IVfvETWo3cgJGFtGUBlOULvpwinSK4A+8oUUGszc
 K1vAY0j3+Ncfjk+CZJ8hWqaIk1dyYtjtyn0ACOUOftqBa6+UZY7LbLTTOI7hOZoX
 3M35W7ntgGoBScONlxpDUXNUewia4ADTiQPWwHdT9+xNlwz1fzmCHlYi5pY+z9TC
 PKbbe1O4l1nsMftwqJVQNHrFnq+x/X69J5vlgobWkk0dQCRQWE9qanG8BfXPykY=
 =DUG8
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2017-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-06-28

This series contains some fixes for the mlx5 core and netdev driver.

Please pull and let me know if there's any problem.

For -stable:
("net/mlx5e: Fix TX carrier errors report in get stats ndo") Kernels >= v4.7

("net/mlx5: Cancel delayed recovery work when unloading the driver") Kernels >= v4.10
* When applied to net-next this will introduce a contextual conflict, it
should be easy to resolve, (a spin_lock was changed to spin_lock_irqsave in net-next),
if you need any help with this please let me know.

("net/mlx5: Fix driver load error flow when firmware is stuck") Kernels >= v4.4*
* This patch fixes: 6c780a0267 ("net/mlx5: Wait for FW readiness before initializing command interface")
which was submitted two weeks ago and queued up for v4.4.

Sorry about the mess, but other than the above, this series doesn't introduce
any conflict with the current mlx5 IPSec offload series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 14:11:48 -07:00
David S. Miller
b079115937 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30 12:43:08 -04:00
Inbar Karmy
ec327f7a43 net/mlx4_en: Do not allocate redundant TX queues when TC is disabled
Currently the number of TX queues that are allocated doesn't depend
on the number of TCs, the module always loads with max num of UP
per channel.
In order to prevent the allocation of unnecessary memory, the
module will load with minimum number of UPs per channel, and the
user will be able to control the number of TX queues per channel
by changing the number of TC to 8 using the tc command.
The variable num_up will hold the information about the current
number of UPs.
Due to the change, needed to remove the lines that set the value of
UP to be different than zero in the func "mlx4_en_select_queue",
since now the num of TX queues that are allocated is only one per channel
in default.
In order not to force the UP to be zero in case of only one TC, added
a condition before forcing it in the func "mlx4_en_fill_qp_context".

Tested:
After the module is loaded with minimum number of UP per channel, to
increase num of TCs to 8, use:
tc qdisc add dev ens8 root mqprio num_tc 8
In order to decrease the number of TCs to minimum number of UP per channel,
use:
tc qdisc del dev ens8 root

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Tarick Bedeir <tarick@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29 15:56:15 -04:00
Inbar Karmy
f21ad61424 net/mlx4_en: Add dynamic variable to hold the number of user priorities (UP)
Until this patch, the number of UPs was hard coded for eight.
Replace this with a variable in struct "mlx4_en_port_profile".
Currently, the variable will hold the maximum number of UP,
as before.
The patch creates an infrastructure to add an option for dynamic
change of the actual number of TCs.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Tarick Bedeir <tarick@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29 15:56:15 -04:00
Ido Schimmel
6b27c8adf2 mlxsw: spectrum_router: Fix NULL pointer dereference
In case a VLAN device is enslaved to a bridge we shouldn't create a
router interface (RIF) for it when it's configured with an IP address.
This is already handled by the driver for other types of netdevs, such
as physical ports and LAG devices.

If this IP address is then removed and the interface is subsequently
unlinked from the bridge, a NULL pointer dereference can happen, as the
original 802.1d FID was replaced with an rFID which was then deleted.

To reproduce:
$ ip link set dev enp3s0np9 up
$ ip link add name enp3s0np9.111 link enp3s0np9 type vlan id 111
$ ip link set dev enp3s0np9.111 up
$ ip link add name br0 type bridge
$ ip link set dev br0 up
$ ip link set enp3s0np9.111 master br0
$ ip address add dev enp3s0np9.111 192.168.0.1/24
$ ip address del dev enp3s0np9.111 192.168.0.1/24
$ ip link set dev enp3s0np9.111 nomaster

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Petr Machata <petrm@mellanox.com>
Tested-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29 12:59:48 -04:00
David S. Miller
5185ad616b mlx5-updates-2017-06-27 (Innova IPsec offload support)
This patchset adds support for Innova IPSec network interface card.
 
 About Innova device:
 --------------------
 Innova is a network card with a ConnectX chip and an FPGA chip as a
  bump-on-the-wire.
 
                Internal
 +----------+   Link       +-----------------+
 |          +--------------+      FPGA       |  +------+
 | ConnectX |              |  Shell          +--+ QSFP |
 |          +--------------+    +-------+    |  | Port |
 +----------+      I2C     |    |  SBU  |    |  +------+
                           |    +-------+    |
                           +--+----------+---+
                              |          |
                           +--+--+   +---+---+
                           | DDR |   | Flash |
                           +-----+   +-------+
 
 The FPGA synthesized logic is loaded from dedicated flash storage and has
  access to its own dedicated DDR RAM.
 The ConnectX chip firmware programs the FPGA by accessing its configuration
 space over either the slow internal I2C link or the high-speed internal link.
 
 The FPGA logic is divided into a "Shell" and a "Sandbox Unit" (SBU).
 mlx5_core driver (with CONFIG_MLX5_FPGA) handles all shell functionality,
 while other components may handle the various SBU functionalities.
 
 The driver opens high-speed reliable communication channels with the shell and
 the SBU over the internal link.
 These channels may be used for high-bandwidth configuration or for SBU-specific
 out-of-band data paths.
 
 About Innova IPSec device:
 --------------------------
 Innova IPSec is a network card that allows offloading IPSec cryptography operations
 from the host CPU to the NIC. It is an Innova card with an IPSec SBU.
 The hardware keeps the database of IPSec Security Associations (SADB) in the FPGA's
 DDR memory.
 
                Internal
 +----------+   Link       +-----------------+
 |          +--------------+      FPGA       |  +------+
 | ConnectX |              |  Shell          +--+ QSFP |
 |          +--------------+    +-------+    |  | Port |
 +----------+ Internal I2C |    | IPSec |    |  +------+
                           |    |  SBU  |    |
                           |    +-------+    |
                           +--+----------+---+
                              |          |
                           +--+--+   +---+---+
                           | DDR |   |       |
                           |     |   | Flash |
                           |SADB |   |       |
                           +-----+   +-------+
 
 Modes and ciphers:
 Currently the following modes and ciphers are supported:
 IPv4 and IPv6
 ESP tunnel and transport modes
 AES 128 and 256 bit encryption, with GCM authentication (RFC4106)
 
 IV is generated using seqiv, in sync with Linux's geniv.
 
 More modes and ciphers may be added later.
 
 Notes:
 In the future similar functionality will be included in a single-chip NIC.
 
 About the driver:
 -----------------
 Patches 1-4 prepare some existing driver code for the new feature:
   * Add support for reserved GIDs in the hardware GID table
   * Allow multiple modules to enable hardware RoCE support independently
 Patches 5-6 define structs and helper functions for QP work-queues.
 Patches 7-11 add various FPGA-related features required for Innova.
 IPSec.
 Patch 12 adds abstraction layer for Mellanox IPSec-offload capable devices.
 atches 13-16 add IPSec offload support to the mlx5 netdevice.
 
 This driver services the new IPSec offload API introduced in commit
 d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
 
 Configuration Path:
 If Innova IPSec device is detected, the mlx5e netdevice gets the new
 NETIF_F_HW_ESP feature and the xdo callbacks, indicating ESP offload
 capabilities, and also the matching TX checksum and GSO features.
 
 The driver configures offloaded Security Associations (SAs) by sending
 an ADD_SA or DEL_SA message to the IPSec SBU, which updates the SADB in DDR.
 These messages and their responses are sent over a high-speed channel.
 Counters for ethtool are retrieved by the driver from the SBU.
 
 Data path:
 On receive path, the SBU decrypts ESP packets which match the offloaded SADB,
 but keeps them encapsulated.
 The SBU injects metadata (Mellanox owned ethertype) indicating that crypto-offload
 has taken place, the SA with which it was done, and the authentication result.
 
 The ConnectX chip performs RX checksum offload on the packet, and RSS using the
 ESP SPI value.  The driver detects the special ethertype, and attaches a struct
 secpath to the RX SKB, including flags to indicate that crypto offload took place,
 the authentication result, and which xfrm_state was used for decryption, in the
 olen and ovec members. The RX SKB may have useful CHECKSUM_COMPLETE. A separate
 patchset will add support for that in the xfrm stack.
 
 On transmit path, the stack encapsulates the packet but does not encrypt it, and
 indicates in the SKB's secpath that crypto offload is to be performed and the SA
 to use to do so.
 The driver avoids performing crypto-offload for ESP fragments, and packets with
 IP options, as the SBU cannot currently do that.  For eligible packets, the driver
 prepends a special ethertype with metadata instructing the hardware to perform crypto offload.
 The stack builds regular (non-GSO) SKBs so that they contain a placeholder for the ESP trailer.
 The driver trims it off, because the SBU automatically appends the trailer for offloaded packets.
 The ConnectX chip performs TX checksum offload on inner UDP or TCP packets,
 and GSO for TCP packets (duplicating the prepended metadata).
 The segmented packets then undergo encryption in the SBU before going on the wire.
 
 Performance:
 We measure single stream of TCP on Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
 Using AES-NI with ESP GSO we get constant 4.1 Gbps.
 Using crypto offload we get constant 18 Gbps.
 
 Note that these numbers require CHECKSUM_COMPLETE support in XFRM, which we submit separately.
 
 -  Ilan Tayari
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZUmf1AAoJEEg/ir3gV/o+ukIIALp/5+E1W0cC9xvY1X9dTETW
 cKsHvDJ7G1CxUy18W8Mf9z+WOqC6hGCqS+yicOb+umfIqkTcLHDb2irlqprYLC+F
 oYl1HqgHTaiAYByqL90qiyPcFbfsaNIqA9KOsED2qdZ1yxjoYBiJnSDZDAdO/0lN
 Lt1czNswFc5ovnEUGn8bkjLZZH2pJoJWEI4g4hN9cq33BLLq8A795F/ZjwCJTQ1X
 qXdKcEmktBrgZiSiTVFxxpQVhO/uB0HmzaZzrY1k1P5e6yhHEr422mcOcF9KcSL4
 aeyRYHjoIh51vPMbScPjvfbO/PwooU3LWLlxLVNLG0MmkSaGyJeUXg/wHsGI910=
 =JN0A
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-06-27 (Innova IPsec offload support)

This patchset adds support for Innova IPSec network interface card.

About Innova device:
--------------------
Innova is a network card with a ConnectX chip and an FPGA chip as a
 bump-on-the-wire.

               Internal
+----------+   Link       +-----------------+
|          +--------------+      FPGA       |  +------+
| ConnectX |              |  Shell          +--+ QSFP |
|          +--------------+    +-------+    |  | Port |
+----------+      I2C     |    |  SBU  |    |  +------+
                          |    +-------+    |
                          +--+----------+---+
                             |          |
                          +--+--+   +---+---+
                          | DDR |   | Flash |
                          +-----+   +-------+

The FPGA synthesized logic is loaded from dedicated flash storage and has
 access to its own dedicated DDR RAM.
The ConnectX chip firmware programs the FPGA by accessing its configuration
space over either the slow internal I2C link or the high-speed internal link.

The FPGA logic is divided into a "Shell" and a "Sandbox Unit" (SBU).
mlx5_core driver (with CONFIG_MLX5_FPGA) handles all shell functionality,
while other components may handle the various SBU functionalities.

The driver opens high-speed reliable communication channels with the shell and
the SBU over the internal link.
These channels may be used for high-bandwidth configuration or for SBU-specific
out-of-band data paths.

About Innova IPSec device:
--------------------------
Innova IPSec is a network card that allows offloading IPSec cryptography operations
from the host CPU to the NIC. It is an Innova card with an IPSec SBU.
The hardware keeps the database of IPSec Security Associations (SADB) in the FPGA's
DDR memory.

               Internal
+----------+   Link       +-----------------+
|          +--------------+      FPGA       |  +------+
| ConnectX |              |  Shell          +--+ QSFP |
|          +--------------+    +-------+    |  | Port |
+----------+ Internal I2C |    | IPSec |    |  +------+
                          |    |  SBU  |    |
                          |    +-------+    |
                          +--+----------+---+
                             |          |
                          +--+--+   +---+---+
                          | DDR |   |       |
                          |     |   | Flash |
                          |SADB |   |       |
                          +-----+   +-------+

Modes and ciphers:
Currently the following modes and ciphers are supported:
IPv4 and IPv6
ESP tunnel and transport modes
AES 128 and 256 bit encryption, with GCM authentication (RFC4106)

IV is generated using seqiv, in sync with Linux's geniv.

More modes and ciphers may be added later.

Notes:
In the future similar functionality will be included in a single-chip NIC.

About the driver:
-----------------
Patches 1-4 prepare some existing driver code for the new feature:
  * Add support for reserved GIDs in the hardware GID table
  * Allow multiple modules to enable hardware RoCE support independently
Patches 5-6 define structs and helper functions for QP work-queues.
Patches 7-11 add various FPGA-related features required for Innova.
IPSec.
Patch 12 adds abstraction layer for Mellanox IPSec-offload capable devices.
atches 13-16 add IPSec offload support to the mlx5 netdevice.

This driver services the new IPSec offload API introduced in commit
d77e38e612 ("xfrm: Add an IPsec hardware offloading API")

Configuration Path:
If Innova IPSec device is detected, the mlx5e netdevice gets the new
NETIF_F_HW_ESP feature and the xdo callbacks, indicating ESP offload
capabilities, and also the matching TX checksum and GSO features.

The driver configures offloaded Security Associations (SAs) by sending
an ADD_SA or DEL_SA message to the IPSec SBU, which updates the SADB in DDR.
These messages and their responses are sent over a high-speed channel.
Counters for ethtool are retrieved by the driver from the SBU.

Data path:
On receive path, the SBU decrypts ESP packets which match the offloaded SADB,
but keeps them encapsulated.
The SBU injects metadata (Mellanox owned ethertype) indicating that crypto-offload
has taken place, the SA with which it was done, and the authentication result.

The ConnectX chip performs RX checksum offload on the packet, and RSS using the
ESP SPI value.  The driver detects the special ethertype, and attaches a struct
secpath to the RX SKB, including flags to indicate that crypto offload took place,
the authentication result, and which xfrm_state was used for decryption, in the
olen and ovec members. The RX SKB may have useful CHECKSUM_COMPLETE. A separate
patchset will add support for that in the xfrm stack.

On transmit path, the stack encapsulates the packet but does not encrypt it, and
indicates in the SKB's secpath that crypto offload is to be performed and the SA
to use to do so.
The driver avoids performing crypto-offload for ESP fragments, and packets with
IP options, as the SBU cannot currently do that.  For eligible packets, the driver
prepends a special ethertype with metadata instructing the hardware to perform crypto offload.
The stack builds regular (non-GSO) SKBs so that they contain a placeholder for the ESP trailer.
The driver trims it off, because the SBU automatically appends the trailer for offloaded packets.
The ConnectX chip performs TX checksum offload on inner UDP or TCP packets,
and GSO for TCP packets (duplicating the prepended metadata).
The segmented packets then undergo encryption in the SBU before going on the wire.

Performance:
We measure single stream of TCP on Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
Using AES-NI with ESP GSO we get constant 4.1 Gbps.
Using crypto offload we get constant 18 Gbps.

Note that these numbers require CHECKSUM_COMPLETE support in XFRM, which we submit separately.

-  Ilan Tayari
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29 12:30:16 -04:00
Colin Ian King
46ccf725bf net/mlx4: fix spelling mistake: "enforcment" -> "enforcement"
Trivial fix to spelling mistake in mlx4_dbg debug message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29 12:25:01 -04:00
Ilan Tayari
164f16f702 net/mlx5e: IPSec, Add IPSec ethtool stats
Add Innova IPSec SBU counters to the ethtool -S stats.
Add IPSec offload error counters to the ethtool -S stats.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:48 +03:00
Ilan Tayari
2ac9cfe782 net/mlx5e: IPSec, Add Innova IPSec offload TX data path
In the TX data path, prepend a special metadata ethertype which
instructs the hardware to perform cryptography.

In addition, fill Software-Parser segment in TX descriptor so
that the hardware may parse the ESP protocol, and perform TX
checksum offload on the inner payload.

Support GSO, by providing the inverse of gso_size in the metadata.
This allows the FPGA to update the ESP header (seqno and seqiv) on the
resulting packets, by calculating the packet number within the GSO
back from the TCP sequence number.

Note that for GSO SKBs, the stack does not include an ESP trailer,
unlike the non-GSO case.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:48 +03:00
Ilan Tayari
899a59d301 net/mlx5e: IPSec, Add Innova IPSec offload RX data path
In RX data path, the hardware prepends a special metadata ethertype
which indicates that the packet underwent decryption, and the result of
the authentication check.

Communicate this to the stack in skb->sp.

Make wqe_size large enough to account for the injected metadata.

Support only Linked-list RQ type.

IPSec offload RX packets may have useful CHECKSUM_COMPLETE information,
which the stack may not be able to use yet.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
547eede070 net/mlx5e: IPSec, Innova IPSec offload infrastructure
Add Innova IPSec ESP crypto offload configuration paths.
Detect Innova IPSec device and set the NETIF_F_HW_ESP flag.
Configure Security Associations using the API introduced in a previous
patch.

Add Software-parser hardware descriptor layout
Software-Parser (swp) is a hardware feature in ConnectX which allows the
host software to specify protocol header offsets in the TX path, thus
overriding the hardware parser.
This is useful for protocols that the ASIC may not be able to parse on
its own.

Note that due to inline metadata, XDP is not supported in Innova IPSec.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
bebb23e6cb net/mlx5: Accel, Add IPSec acceleration interface
Add routines for manipulating the hardware IPSec SA database (SADB).

In Innova IPSec, a Security Association (SA) is added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.

Add implementation for Innova IPSec (FPGA-based) hardware.

These routines will be used by the IPSec offload support in a later patch
However they may also be used by others such as RDMA and RoCE IPSec.

mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
to work directly with mlx5_core rather than Innova FPGA or other mlx5
acceleration providers.

In this patchset we add Innova IPSec support and mlx5/accel delegates
IPSec offloads to Innova routines.

In the future, when IPSec/TLS or any other acceleration gets integrated
into ConnectX chip, mlx5/accel layer will provide the integrated
acceleration, rather than the Innova one.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
a9956d35d1 net/mlx5: FPGA, Add SBU infrastructure
Add interface to initialize and interact with Innova FPGA SBU
connections.
A client driver may use these functions to set up a high-speed DMA
connection with its SBU hardware logic, and send/receive messages
over this connection.

A later patch in this patchset will make use of these functions for
Innova IPSec offload in mlx5 Ethernet driver.

Add commands to retrieve Innova FPGA SBU capabilities, and to
read/write Innova FPGA configuration space registers and memory,
over internal I2C.

At high level, the FPGA configuration space is divided such:
 0x00000000 - 0x007fffff is reserved for the SBU
 0x00800000 - 0xffffffff is reserved for the Shell
0x400000000 - ...        is DDR memory

A later patchset will add support for accessing FPGA CrSpace and memory
over a high-speed connection. This is the reason for the ACCESS_TYPE
enumeration, which currently only supports I2C.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
c43051d72a net/mlx5: FPGA, Add SBU bypass and reset flows
The Innova FPGA includes shell hardware and Sandbox-Unit (SBU) hardware.
The shell hardware is handled by mlx5_core itself, while the SBU is
handled by a client driver.

Reset the SBU to a well-known initial state when initializing a new
device, and set the FPGA to bypass mode when uninitializing a device.
This allows the client driver to assume that its device has been
reset when a new device is detected.

During SBU reset, the FPGA is put into SBU-bypass mode. In this mode
packets do not pass through the SBU, so it cannot affect the network
data stream at all.

A factory-image does not have an SBU, so skip these flows.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
537a505741 net/mlx5: FPGA, Add high-speed connection routines
An FPGA high-speed connection has two endpoints, an FPGA QP and a
ConnectX QP.
Add library routines to create and connect the endpoints of an
FPGA high-speed connection.

These routines allow creating and interacting with both types of
connections: Shell and Sandbox Unit (SBU).

Shell connection provides an interface to the FPGA's address space,
which includes the configuration space and the DDR.
Use of the shell connection will be introduced in a later patchset.

SBU connection provides a command and/or data interface to the
application-specific logic within the FPGA.
Use of the SBU connection will be introduced in a later patch in
this patchset.

Some struct definitions are added to a new header file sdk.h, which
will be extended in later patches in the patchset.
This header file will contain the in-kernel FPGA client driver API.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
6062118d5c net/mlx5: FPGA, Add FW commands for FPGA QPs
The FPGA QP is a high-bandwidth communication channel between the host
CPU and the FPGA device. It allows performing DMA operations between
host memory and the FPGA logic via the ConnectX chip.

Add ConnectX FW commands which create and manipulate FPGA QPs.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
9410733c44 net/mlx5: FPGA, Move FPGA init/cleanup to init_once
The FPGA init and cleanup routines should be called just once per
device.
Move them to the init_once and cleanup_once routines.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
3f2b7edd7c net/mlx5: Add QP WQ support
A QP in ConnectX is a concatenation of RQ and SQ which share a QP-number
and work together.
Add support for allocating and managing the work-queue buffer for a QP, in
a similar way to how SQs and RQs are already supported.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
4b67379376 net/mlx5: Make get_cqe routine not ethernet-specific
Move mlx5e_get_cqe routine to wq.h and rename it to
mlx5_cqwq_get_cqe.

This allows it to be used by other CQ users outside of the
ethernet driver code.

A later patch in this patchset will make use of it from
FPGA code for the FPGA high-speed connection.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
a6f7d2aff6 net/mlx5: Add support for multiple RoCE enable
Previously, only mlx5_ib enabled RoCE on the port, but FPGA needs it as
well.
Add support for counting number of enables, so that FPGA and IB can work
in parallel and independently.
Program the HW to enable RoCE on the first enable call, and program to
disable RoCE on the last disable call.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
52ec462eca net/mlx5: Add reserved-gids support
Reserved GIDs are entries in the GID table in use by the mlx5_core
and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev).
The entries are reserved at the high indexes of the GID table.

A mlx5 submodule may reserve a certain amount of GIDs for its own use
during the load sequence by calling mlx5_core_reserve_gids, and must
also take care to un-reserve these GIDs when it closes.
Reservation is only allowed during the load sequence and before any
interfaces (e.g. mlx5_ib or mlx5_en) are up.

After reservation, a submodule may call mlx5_core_reserved_gid_alloc/
free to allocate entries from the reserved GIDs pool.

Reserve a GID table entry for every supported FPGA QP.

A later patch in the patchset will remove them from being reported to
IB core.
Another such patch will make use of these for FPGA QPs in Innova NIC.

Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to
expose only public mlx5 API, more mlx5 library files will be added in
future submissions.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Ilan Tayari
9ade8c7c3c net/mlx5: Set interface flags before cleanup in unload_one
In load_one, the interface flags are changed from down to up,
only after initializing the interfaces.
In unload_one, the flags are changed from up to down before the
interface cleanup.

Change the cleanup order to be opposite to initialization order.

This fixes flag consistency between init and cleanup.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
Gal Pressman
8ff93de766 net/mlx5e: Fix TX carrier errors report in get stats ndo
Symbol error during carrier counter from PPCNT was mistakenly reported as
TX carrier errors in get_stats ndo, although it's an RX counter.

Fixes: 269e6b3af3 ("net/mlx5e: Report additional error statistics in get stats ndo")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 14:49:57 +03:00
Mohamad Haj Yahia
2a0165a034 net/mlx5: Cancel delayed recovery work when unloading the driver
Draining the health workqueue will ignore future health works including
the one that report hardware failure and thus we can't enter error state
Instead cancel the recovery flow and make sure only recovery flow won't
be scheduled.

Fixes: 5e44fca504 ('net/mlx5: Only cancel recovery work when cleaning up device')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 14:49:57 +03:00
Gal Pressman
8ce59b16b4 net/mlx5: Fix driver load error flow when firmware is stuck
When wait for firmware init fails, previous code would mistakenly
return success and cause inconsistency in the driver state.

Fixes: 6c780a0267 ("net/mlx5: Wait for FW readiness before initializing command interface")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 14:49:57 +03:00
Colin Ian King
593814d1be net/mlx4: fix spelling mistake: "coalesing" -> "coalescing"
Trivial fix to spelling mistake in en_dbg debug message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:18:29 -04:00
David S. Miller
d4d0249ae2 mlx5-updates-2017-06-23
This series provides some updates to the mlx5 core and netdevice drivers.
 
 Three patches from Tariq, Introduces page reuse mechanism in non-Striding
 RQ RX datapath, we allow the the RX descriptor to reuse its allocated page
 as much as it could, until the page is fully consumed. RX page reuse
 reduces the stress on page allocator and improves RX performance especially
 with high speeds (100Gb/s).
 
 Next four patches of the series from Or allows to offload tc flower matching
 on ttl/hoplimit and header re-write of hoplimit.
 
 The rest of  the series from Yotam and Or enhances mlx5 to support FW flashing
 through the mlxfw module, in a similar manner done by the mlxsw driver.
 Currently, only ethtool based flashing is implemented, where both Eth and IB ports
 are supported.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZTR2/AAoJEEg/ir3gV/o+HHkH/jA8uoIQgsIKP8+D6ppwChCm
 pcNpNWi5eWjBwRbeDKqTVRtmKEapFf2bjhWaegsybr7oGhzyh+t6nvCEiJVoKLpd
 gmcSKy7nY2PSepuMEg7bqlfj5caS3b0Nlz5sqPdclOXKYDVLytvcelOvr9OnqdAT
 JqxWkJMhEpLSgFniuvIyc1uJq9j6ARW7SYWx/dKp+gTDI2KCwQ3DinsrZ+RPh42F
 rxLFhGkcFuQ5cIr3QIdCM+pbr6LlAf2Fvz1Y791BARZg0XZp149C0c4smieEjU9M
 ya9CWlbSvnsYZcSMXtUO8ETnrtvrEfcEM0MpZo8l4y28eWih/Ib5BiHevQG3wHA=
 =L44M
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-06-23

This series provides some updates to the mlx5 core and netdevice drivers.

Three patches from Tariq, Introduces page reuse mechanism in non-Striding
RQ RX datapath, we allow the the RX descriptor to reuse its allocated page
as much as it could, until the page is fully consumed. RX page reuse
reduces the stress on page allocator and improves RX performance especially
with high speeds (100Gb/s).

Next four patches of the series from Or allows to offload tc flower matching
on ttl/hoplimit and header re-write of hoplimit.

The rest of  the series from Yotam and Or enhances mlx5 to support FW flashing
through the mlxfw module, in a similar manner done by the mlxsw driver.
Currently, only ethtool based flashing is implemented, where both Eth and IB ports
are supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23 14:24:28 -04:00
Myron Stowe
bbad7c2138 net/mlx5e: Use device ID defines
Use Mellanox device ID definitions in the driver's mlx5 ID table so tools
such as 'grep' and 'cscope' can be used to help find correlated material
(such as INTx Masking quirks: d76d2fe05f PCI: Convert Mellanox broken
INTx quirks to be for listed devices only).

No functional change intended.

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22 11:04:00 -04:00
Or Gerlitz
e2e086c196 net/mlx5e: IPoIB, Support the flash device ethtool callback
This callback further invokes the mlxfw module to flash the new
firmware file to the device.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:14 +03:00
Or Gerlitz
3ffaabecd1 net/mlx5e: Support the flash device ethtool callback
This callback further invokes the mlxfw module to flash the new
firmware file to the device.

As the firmware flash process takes about 20 seconds and ethtool
takes the rtnl lock during the flash_device callback, we release
the rtnl lock at the beginning of the flash process and take it
again before leaving the callback.

This way, rtnl is not held during the process. To make sure the
device does not get deleted while being flashed, we take a
reference to it before releasing rtnl lock.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Or Gerlitz
62bd22cf32 net/mlx5: Add mlxfw callbacks
Add mlx5 implementation for the ones defined by the mlxfw
shared module to be used while flashing the device firmware.

The callbacks do their job through the MCQI, MCC and MCDA registers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Or Gerlitz
d2ad488b00 net/mlx5: Add helper functions to set/query MCC/MCDA/MCQI registers
To be used by the mlx5 callbacks exposed to the mlxfw module.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Or Gerlitz
c2df61376b mlxfw: Make the module selectable
There are upcoming NIC (mlx5) use-cases where people want to avoid
building the mlxfw module, allow for that. The mlxsw module is
untouched and keeps selecting mlxfw.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Or Gerlitz
0c0316f516 net/mlx5e: Add header re-write offloading of IPv6 hop-limit
For environments where flow-based ipv6 router is offloaded.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Or Gerlitz
a8e4f0c4ce net/mlx5e: Use macro for TC header re-write offload field mapping
Use a macro for the static mapping between the enumeration of field
supported by the firmware for header re-write to the corresponding
network header field. This improves the readability of the code and
doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Or Gerlitz
a8ade55ffd net/mlx5e: Offload TC matching on ip ttl
Enable offloading of TC matching on ip ttl / hop-limit

As matching on ttl is supported only by newer HW brands (ConnectX-5),
we should do capability check before attempting to offload that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Or Gerlitz
1f97a5265f net/mlx5e: Relocate the TC match on ip tos offload code section
The code section for offloading matches on ip tos (L3) should come
before and not after the one that deals with tcp/udp (L4) matches.

Otherwise, we might come up with wrong min-inline requirement, when
one attempts to match on both L3 and L4.

Fixes: fd7da28b28 ('net/mlx5e: Offload TC matching on ip tos / traffic-class')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Tariq Toukan
accd588332 net/mlx5e: Introduce RX Page-Reuse
Introduce a Page-Reuse mechanism in non-Striding RQ RX datapath.

A WQE (RX descriptor) buffer is a page, that in most cases was fully
wasted on a packet that is much smaller, requiring a new page for
the next round.

In this patch, we implement a page-reuse mechanism, that resembles a
`SW Striding RQ`.
We allow the WQE to reuse its allocated page as much as it could,
until the page is fully consumed.  In each round, the WQE is capable
of receiving packet of maximal size (MTU). Yet, upon the reception of
a packet, the WQE knows the actual packet size, and consumes the exact
amount of memory needed to build a linear SKB. Then, it updates the
buffer pointer within the page accordingly, for the next round.

Feature is mutually exclusive with XDP (packet-per-page)
and LRO (session size is a power of two, needs unused page).

Performance tests:
iperf tcp tests show huge gain:

--------------------------------------------
num streams | BW before | BW after | ratio |
          1 |      22.2 |     30.9 | 1.39x |
          8 |      64.2 |     93.6 | 1.46x |
         64 |      56.7 |     91.4 | 1.61x |
--------------------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Tariq Toukan
bce2b2bf66 net/mlx5e: Enhance RX SKB headroom logic
In the RX memory scheme of non Striding RQ, we use linear SKBs.
Keeping NET_IP_ALIGN in headroom can improve performance on some archs.
In addition, take this headroom into account when calculating the
LRO WQE size.

These are not needed in Striding RQ as they're done implicitly
within the non-linear SKB allocation.

Fixes: 1bfecfca56 ("net/mlx5e: Build RX SKB on demand")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
Tariq Toukan
78aedd3279 net/mlx5e: Build SKB with exact frag_size
Build the SKB over the receive packet instead of the
whole page. Getting the SKB's linear data and shared_info
closer improves locality.
In addition, this opens up the possibility to make use of
other parts of the page in the downstream page-reuse patch.

Fixes: 1bfecfca56 ("net/mlx5e: Build RX SKB on demand")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-22 14:30:13 +03:00
David S. Miller
3d09198243 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 17:35:22 -04:00
Feras Daoud
1170fbd8ff net/mlx5e: IPoIB, Add ioctl support to IPoIB device driver
Add ioctl support to IPoIB device driver. For now, this
ioctl will support timestamp get and set.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Feras Daoud
3844b07ee4 net/mlx5e: IPoIB, Add PTP support to IPoIB device driver
Enable PTP for IPoIB rdma_netdev and add the ability
to get the time stamping parameters using ethtool.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Erez Shitrit
4ec5cf781b net/mlx5e: IPoIB, Get more TX statistics
Add misses counters (bytes, packet, gso, xmit_more) in TX flow for ipoib
traffic.

Fixes: 58545449b7b ("net/mlx5e: IPoIB, Xmit flow")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Erez Shitrit
807c441597 net/mlx5e: IPoIB, Handle change_mtu
Add the ndo that supports change mtu for IPoIB.
The callback called from the ipoib ULP driver, that gives the ability to
change the SW and HW resources accordingly in the lower driver.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Erez Shitrit
c139dbfddd net/mlx5e: Use hard_mtu as part of the mlx5e_priv struct
The mtu extra space that kept for the HW is specific for each link type,
and it is different in mlx5e and mlx5i modules.
Now it is kept in the priv structures, set by the mlx5e/mlx5i driver
accordingly.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Erez Shitrit
b6dc510fac net/mlx5e: IPoIB, Change parameters default values
Add function that sets the default values for ipoib, setting/clearing
abilities that IPoIB doesn't support, like RQ size in this case.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Erez Shitrit
7ca42c8094 net/mlx5e: Add new profile function update_carrier
Updating the carrier involves specific HW setting, each profile should
use its own function for that.

Both IPoIB and VF representor don't need carrier update function, since
VF representor has only a logical link to VF and IPoIB manages its own
link via ib_core upper layer.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Erez Shitrit
076b0936e5 net/mlx5e: IPoIB, Add ethtool support
Add support for the following:
	"ethtool -S" (statistics).
	"ethtool -i" (driver info).
	"ethtool -g/G" (rings parameters).
	"ethtool -l/L" (channels parameters).
	"ethtool -c/C" (coalesce options).

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Feras Daoud
c66f2091c9 net/mlx5e: Prevent PFC call for non ethernet ports
Port flow control supported only for ethernet ports,
therefore, prevent any call if the port type differs from
MLX5_CAP_PORT_TYPE_ETH.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
Saeed Mahameed
4301ba7b3e net/mlx5e: IPoIB, Move to a separate directory
IPoIB netdevice driver was only introduced in previous kernel release
and it is growing in terms of features and LOC, move it to a separate
directory.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-19 18:40:20 +03:00
David S. Miller
273889e306 mlx5-updates-2017-06-16
This series provide some updates and cleanups for mlx5 core and netdevice
 driver.
 
 From Eli Cohen, add a missing event string.
 From Or Gerlitz, some checkpatch cleanups.
 From Moni, Disalbe HW level LAG when SRIOV is enabled.
 From Tariq, A code reuse cleanup in aRFS flow.
 From Itay Aveksis, Typo fix.
 From Gal Pressman, ethtool statistics updates and "update stats" deferred work optimizations.
 From Majd Dibbiny, Fast unload support on kernel shutdown.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZQv0+AAoJEEg/ir3gV/o+1iQH/15I5Pr9KoCWSTN9aUglRupU
 8HmJhkf7Novaro6WtIybgMGkdoNTrmHgyTEngAkRq5a5Ws/LrC/1wLH+lVMDh+Fx
 /2a5cfPsK483gHWBtAbasBD8SHnsyTIeVnEhuDsevHQNkz3HGuKOgx5ZHF1sdkHU
 bj/QU06LNPKAlMDI/wKod13MB4+AdTFemaJRCCgXFvu/p/EfVvB+TStdOsrxj1kx
 lDIwkCykJSJsg38HoLXt7Z12nWwgHGf2De04RukKeJ6C6KTdKcUu5EYbaL9BSZZT
 jiIayYjRgeXzNhY4R5yLPc0FkecNIgC90YJShUN3nR3PWa+ytaHpfJQPOS4/AW8=
 =Tjmk
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox mlx5 updates and cleanups 2017-06-16

mlx5-updates-2017-06-16

This series provide some updates and cleanups for mlx5 core and netdevice
driver.

From Eli Cohen, add a missing event string.
From Or Gerlitz, some checkpatch cleanups.
From Moni, Disalbe HW level LAG when SRIOV is enabled.
From Tariq, A code reuse cleanup in aRFS flow.
From Itay Aveksis, Typo fix.
From Gal Pressman, ethtool statistics updates and "update stats" deferred work optimizations.
From Majd Dibbiny, Fast unload support on kernel shutdown.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 15:22:42 -04:00
Martin KaFai Lau
821b2e2944 bpf: mlx5e: Report bpf_prog ID during XDP_QUERY_PROG
Add support to mlx5e to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:58:36 -04:00
Martin KaFai Lau
2e37e9b0f5 bpf: mlx4: Report bpf_prog ID during XDP_QUERY_PROG
Add support to mlx4 to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:58:36 -04:00
Johannes Berg
d58ff35122 networking: make skb_push & __skb_push return void pointers
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:40 -04:00
Johannes Berg
4df864c1d9 networking: make skb_put & friends return void pointers
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:39 -04:00
Johannes Berg
b080db5853 networking: convert many more places to skb_put_zero()
There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.

The following spatch found many more and also removes the
now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:35 -04:00
Tariq Toukan
4c07c13240 net/mlx4_en: Refactor mlx4_en_free_tx_desc
Some code re-ordering, functionally equivalent.

- The !tx_info->inl check is evaluated anyway in both flows
  (common case/end case). Run it first, this might finish
  the flows earlier.
- dma_unmap calls are identical in both flows, get it out
  of the if block into the common area.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:23 -04:00
Tariq Toukan
9573e0d39f net/mlx4_en: Replace TXBB_SIZE multiplications with shift operations
Define LOG_TXBB_SIZE, log of TXBB_SIZE, and use it with a shift
operation instead of a multiplication with TXBB_SIZE.
Operations are equivalent as TXBB_SIZE is a power of two.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:23 -04:00
Tariq Toukan
77788b5bf6 net/mlx4_en: Increase default TX ring size
Increase the default TX ring size (from 512 to 1024) to match
the RX ring size.
This gives the XDP TX ring a better chance to keep up with the
rate of its RX ring in case of a high load of XDP_TX actions.

Tested:
Ethtool counter rx_xdp_tx_full used to increase, after applying this
patch it stopped.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:23 -04:00
Tariq Toukan
6c78511b05 net/mlx4_en: Poll XDP TX completion queue in RX NAPI
Instead of having their own NAPIs, XDP TX completion queues get
polled within the corresponding RX NAPI.
This prevents any possible race on TX ring prod/cons indices,
between the context that issues the transmits (RX NAPI) and the
context that handles the completions (was previously done in
a separate NAPI).

This also improves performance, as it decreases the number
of NAPIs running on a CPU, saving the overhead of syncing
and switching between the contexts.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 12.0 Mpps | 13.8 Mpps |  15% |
IPv6 | 12.0 Mpps | 13.8 Mpps |  15% |
-------------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:23 -04:00
Tariq Toukan
36ea796498 net/mlx4_en: Improve XDP xmit function
Several performance improvements in XDP TX datapath,
including:
- Ring a single doorbell for XDP TX ring per NAPI budget,
  instead of doing it per a lower threshold (was 8).
  This includes removing the flow of immediate doorbell ringing
  in case of a full TX ring.
- Compiler branch predictor hints.
- Calculate values in compile time rather than in runtime.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 10.3 Mpps | 12.0 Mpps |  17% |
IPv6 | 10.3 Mpps | 12.0 Mpps |  17% |
-------------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:23 -04:00
Tariq Toukan
f28186d6b5 net/mlx4_en: Improve stack xmit function
Several small code and performance improvements in stack TX datapath,
including:
- Compiler branch predictor hints.
- Minimize variables scope.
- Move tx_info non-inline flow handling to a separate function.
- Calculate data_offset in compile time rather than in runtime
  (for !lso_header_size branch).
- Avoid trinary-operator ("?") when value can be preset in a matching
  branch.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:23 -04:00
Tariq Toukan
cc26a49086 net/mlx4_en: Improve transmit CQ polling
Several small performance improvements in TX CQ polling,
including:
- Compiler branch predictor hints.
- Minimize variables scope.
- More proper check of cq type.
- Use boolean instead of int for a binary indication.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Packet-rate tests for both regular stack and XDP use cases:
No noticeable gain, no degradation.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:23 -04:00
Tariq Toukan
9bcee89ac4 net/mlx4_en: Improve receive data-path
Several small performance improvements in RX datapath,
including:
- Compiler branch predictor hints.
- Replace a multiplication with a shift operation.
- Minimize variables scope.
- Write-prefetch for packet header.
- Avoid trinary-operator ("?") when value can be preset in a matching
  branch.
- Save a branch by updating RX ring doorbell within
  mlx4_en_refill_rx_buffers(), which now returns void.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON
(enable by ethtool -L <interface> rx 1).

XDP_DROP packet rate:
Same (28.1 Mpps), lower CPU utilization (from ~100% to ~92%).

Drop packets in TC:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 4.14 Mpps | 4.18 Mpps |   1% |
-------------------------------------

XDP_TX packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 10.1 Mpps | 10.3 Mpps |   2% |
IPv6 | 10.1 Mpps | 10.3 Mpps |   2% |
-------------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:23 -04:00
Saeed Mahameed
4931c6ef04 net/mlx4_en: Optimized single ring steering
Avoid touching RX QP RSS context when loading with only
one RX ring, to allow optimized A0 RX steering.

Enable by:
- loading mlx4_core with module param: log_num_mgm_entry_size = -6.
- then: ethtool -L <interface> rx 1

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

XDP_DROP packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 20.5 Mpps | 28.1 Mpps |  37% |
IPv6 | 18.4 Mpps | 28.1 Mpps |  53% |
-------------------------------------

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:22 -04:00
Tariq Toukan
cf97050d54 net/mlx4_en: Remove unused argument in TX datapath function
Remove owner argument, as it is obsolete and unused.
This also saves the overhead of calculating its value in data-path.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 22:53:22 -04:00
Majd Dibbiny
8812c24d28 net/mlx5: Add fast unload support in shutdown flow
Adding a support to flush all HW resources with one FW command and
skip all the heavy unload flows of the driver on kernel shutdown.
There's no need to free all the SW context since a new fresh kernel
will be loaded afterwards.

Regarding the FW resources, they should be closed, otherwise we will
have leakage in the FW. To accelerate this flow, we execute one command
in the beginning that tells the FW that the driver isn't going to close
any of the FW resources and asks the FW to clean up everything.
Once the commands complete, it's safe to close the PCI resources and
finish the routine.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:19:44 +03:00
Majd Dibbiny
4525abeaae net/mlx5: Expose command polling interface
Add a new interface for commands execution that allows the
caller to wait for the command's completion in a busy-wait
loop (polling mode).

This is useful if we want to execute a command in a polling mode
while the driver is working in events mode for the rest of
the commands.
This interface will be used in the downstream patches.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:19:43 +03:00
Gal Pressman
3834a5e626 net/mlx5e: Optimize update stats work
Unlike ethtool stats, get_stats ndo provides information cached by
update stats work that is running in the background without updating
them explicitly.
We cannot update all counters inside the ndo because some
updates require firmware commands that cannot be performed under a
spinlock.

update_stats work does not need to update ALL counters, since only
some of them are needed by ndo_get_stats.
This patch will allow for a minimal run of update_stats using an extra
parameter which will update necessary counters only and cut 13
firmware commands in each iteration of the work.

Work duration previous to this patch: ~4200us.
Work duration after this patch: ~700us (17% of the original time).

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
2017-06-16 00:19:32 +03:00
Gal Pressman
432609a4cd net/mlx5e: Move and optimize query out of buffer function
Move "query queue counter out of buffer" helper function out of
qp.c to en_main.c, since mlx5e netdev driver is the only one to use it.

Also allocate the output buffer on the stack instead of the heap, to reduce
number of heap allocs on update_stats work.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
2017-06-16 00:19:02 +03:00
Gal Pressman
0883b4f456 net/mlx5e: Reduce number of heap allocated buffers for update stats
Allocating buffers on the heap every 200ms is something we should avoid,
let's use buffers located on the stack instead.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
2017-06-16 00:18:40 +03:00
Gal Pressman
ebc88870da net/mlx5e: Rename physical symbol errors counter
Rename rx_symbol_errors_phy to rx_pcs_symbol_err_phy, in order to
prevent confusion with rx_symbol_err_phy counter.

rx_pcs_symbol_err_phy counter counts the number of symbol errors that
were detected on the PCS (regardless of traffic) and weren't
corrected by FEC correction algorithm or that FEC algorithm was
not active on this interface.

rx_symbol_err_phy refers to errors on packet level (physical error
during a packet receive).

Fixes: 5db0a4f64c ("net/mlx5e: Expose physical layer statistical...")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
2017-06-16 00:18:15 +03:00
Itay Aveksis
3e432ab6d9 net/mlx5e: Fix typo in warning if CQ moderation is not supported
Signed-off-by: Itay Aveksis <itayav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:41 +03:00
Tariq Toukan
22303f790d net/mlx5e: Use function to map aRFS into traffic type
For a better code reuse and readability, use the existing
function arfs_get_tt() to map arfs_type into mlx5e_traffic_types,
instead of duplicating the switch-case logic.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:41 +03:00
Moni Shoua
552db7bca5 net/mlx5: Undo LAG upon request to create virtual functions
LAG cannot work if virtual functions are present. Therefore, if LAG is
configured, the attempt to create virtual functions will fail. This gives
precedence to LAG over SRIOV which is not the desired behavior as users
might want to use the bonding/teaming driver also want to work with SRIOV.
In that case we don't want to force an order of actions, first create
virtual functions and only than configure a bonding/teaming net device.
To fix, if LAG is configured during a request to create virtual
functions, remove it and continue.

We ignore ENODEV when trying to forbid lag. This makes sense
because "No such device" means that lag is forbidden anyway.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:41 +03:00
Or Gerlitz
2fe30e23cd net/mlx5: Avoid space after casting
Fix checkpatch complaints on that:

 CHECK: No space is necessary after a cast

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:41 +03:00
Or Gerlitz
e53eef6350 net/mlx5: Align to match opening parenthesis
Fixed checkpatch complaints of the form:

 CHECK: Alignment should match open parenthesis

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:40 +03:00
Or Gerlitz
8963ca45e7 net/mlx5: Avoid blank lines before/after closing/opening braces
Fixed checkpatch complaints on that:

 CHECK: Blank lines aren't necessary before a close brace '}'
 CHECK: Blank lines aren't necessary after an open brace '{'

and one on missing blank line..

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:40 +03:00
Or Gerlitz
1b9f533a90 net/mlx5: Avoid using multiple blank lines
Fixed bunch of this checkpatch complaint:

 CHECK: Please don't use multiple blank lines

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:40 +03:00
Or Gerlitz
bd10838af2 net/mlx5: Fix some spelling mistakes
Fixed few places where endianness was misspelled and
one spot whwere output was:

CHECK: 'endianess' may be misspelled - perhaps 'endianness'?
CHECK: 'ouput' may be misspelled - perhaps 'output'?

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:40 +03:00
Eli Cohen
14160ea227 net/mlx5: Update eqe_type_str() event names
Add missing NIC_VPORT_CHANGE event.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:40 +03:00
Or Gerlitz
31ac93386d net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it
The error flow of mlx5e_create_netdev calls the cleanup call
of the given profile without checking if it exists, fix that.

Currently the VF reps don't register that callback and we crash
if getting into error -- can be reproduced by the user doing ctrl^C
while attempting to change the sriov mode from legacy to switchdev.

Fixes: 26e59d8077 '(net/mlx5e: Implement mlx5e interface attach/detach callbacks')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Sabrina Dubroca <sdubroca@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-15 23:27:46 +03:00
Or Gerlitz
9cfb4f7192 net/mlx5e: Remove TC header re-write offloading of ip tos
Currently the firmware API is partial and allows to offload only
the dscp part of the tos, also, ipv6 support isn't there yet.

As such, remove the offloading option of ipv4 dscp till the FW
APIs are more comprehensive.

Fixes: d79b6df6b1 ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-15 23:27:46 +03:00
Or Gerlitz
9d1cef196b net/mlx5: Properly check applicability of devlink eswitch commands
Currently we don't check that the link type is Eth and hence crash
on IB ports when attempting to deref esw->xxx, fix that.

To avoid repeating this check over and over, put the existing
checks and the one on link type in a single helper.

Fixes: 7768d1971d ('net/mlx5: E-Switch, Add control for encapsulation')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Mohamad Badarnah <mohamadb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-15 23:27:46 +03:00
Chris Mi
5f195c2c5c net/mlx5e: Fix min inline value for VF rep SQs
The offending commit only changed the code path for PF/VF, but it
didn't take care of VF representors. As a result, since
params->tx_min_inline_mode for VF representors is kzalloced to 0
(MLX5_INLINE_MODE_NONE), all VF reps SQs were set to that mode.

This actually works on CX5 by default but broke CX4. Fix that by
adding a call to query the min inline mode from the VF rep build up code.

Fixes: a6f402e499 ("net/mlx5e: Tx, no inline copy on ConnectX-5")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-15 23:27:46 +03:00
Maor Dickman
f0b381178b net/mlx5e: Fix timestamping capabilities reporting
Misuse of (BIT) macro caused to report wrong flags for
"Hardware Transmit Timestamp Modes" and "Hardware Receive
Filter Modes"

Fixes: ef9814deaf ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-15 23:27:46 +03:00
Eli Cohen
6c780a0267 net/mlx5: Wait for FW readiness before initializing command interface
Before attempting to initialize the command interface we must wait till
the fw_initializing bit is clear.

If we fail to meet this condition the hardware will drop our
configuration, specifically the descriptors page address.  This scenario
can happen when the firmware is still executing an FLR flow and did not
finish yet so the driver needs to wait for that to finish.

Fixes: e3297246c2 ('net/mlx5_core: Wait for FW readiness on startup')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-15 23:27:46 +03:00
David S. Miller
0ddead90b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 11:59:32 -04:00
Dan Carpenter
f5165a5492 net/mlxfw: fix a NULL dereference
If we hit this error path we end up returning ERR_PTR(0) which is NULL.
The caller is not expecting that so it results in a NULL dereference.

Fixes: 410ed13cae ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 15:32:18 -04:00
Arkadi Sharshevsky
2ea109039c mlxsw: spectrum: Add support for access cable info via ethtool
Add support for access cable info via ethtool.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 15:16:30 -04:00
Arkadi Sharshevsky
7ca36994a3 mlxsw: reg: Add MCIA register for cable info access
The MCIA register is used to access the SFP+ and QSFP connector's
EPROM. It will be used to query the cable info.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 15:16:30 -04:00
David S. Miller
66e037ca57 mlx5-updates-2017-06-11
This series provides updates to mlx5 header rewrite feature, from Or Gerlitz.
 and three more small updates From maor and eran.
 
 -------
 Or says:
 
 Packets belonging to flows which are different by matching may still need
 to go through the same header re-writes (e.g set the current routing hop
 MACs and issue TTL decrement).  To minimize the number of modify header
 IDs, we add a cache for header re-write IDs which is keyed by the binary
 chain of modify header actions.
 
 The caching is supported for both eswitch and NIC use-cases, where the
 actual conversion of the code to use caching comes in separate patches,
 one per use-case.
 
 Using a per field mask field, the TC pedit action supports modifying
 partial fields. The last patch enables offloading that.
 -------
 
 From Maor, update flow table commands layout to the latest HW spec.
 From Eran, ethtool connector type reporting updates.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZPUrBAAoJEEg/ir3gV/o+ptQH+gIXfHH1mrJp0ZwM/hhLidYE
 Bj/ie9Y1Ir6q3RU2+g/NLqejtvTIzAyhMfiq4ag4eCVVRuGGjPXRZJWivWXUCbjm
 /XLaXTK62qNxJNAWyzgxEJSUI1URMtQWIf9SF8LMLGiNfZfx8b7o/Q08P18tNxbb
 AeNzmgYetva9lBialWF0dPDaAvd8THngBPF7LYDWghEXbDPvYTfvABN0qUrHs1s2
 /LbmZ7L9U+RDSusz/klYW+/WorNiOm44nwk+KgdnsrVITZVVfblI6VEvgEjprhZF
 c3SFUPYz079WjndNHVWezqsl4gIZIoM9FAtmoiBg+hxTmWJVI9uwllUcarW9s+k=
 =0Cyo
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-06-11

This series provides updates to mlx5 header rewrite feature, from Or Gerlitz.
and three more small updates From maor and eran.

-------
Or says:

Packets belonging to flows which are different by matching may still need
to go through the same header re-writes (e.g set the current routing hop
MACs and issue TTL decrement).  To minimize the number of modify header
IDs, we add a cache for header re-write IDs which is keyed by the binary
chain of modify header actions.

The caching is supported for both eswitch and NIC use-cases, where the
actual conversion of the code to use caching comes in separate patches,
one per use-case.

Using a per field mask field, the TC pedit action supports modifying
partial fields. The last patch enables offloading that.
-------

From Maor, update flow table commands layout to the latest HW spec.
From Eran, ethtool connector type reporting updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-11 18:10:42 -04:00
Majd Dibbiny
91828bd899 net/mlx5: Enable 4K UAR only when page size is bigger than 4K
When the page size isn't bigger than 4K, there is no added value of enabling 4K
UAR feature in the Firmware.

Modified the condition of enabling the 4K UAR accordingly.

Fixes: f502d83495 ("net/mlx5: Activate support for 4K UARs")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-11 13:10:36 +03:00
Tal Gilboa
53acd76ce5 net/mlx5e: Fix wrong indications in DIM due to counter wraparound
DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for
changing the channel interrupt moderation values in order to reduce CPU
overhead for all traffic types.
Each iteration of the algorithm, DIM calculates the difference in
throughput, packet rate and interrupt rate from last iteration in order
to make a decision. DIM relies on counters for each metric. When these
counters get to their type's max value they wraparound. In this case
the delta between 'end' and 'start' samples is negative and when
translated to unsigned integers - very high. This results in a false
indication to the algorithm and might result in a wrong decision.

The fix calculates the 'distance' between 'end' and 'start' samples in a
cyclic way around the relevant type's max value. It can also be viewed as
an absolute value around the type's max value instead of around 0.

Testing show higher stability in DIM profile selection and no wraparound
issues.

Fixes: cb3c7fd4f8 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-11 13:10:36 +03:00
Tal Gilboa
c3164d2fc4 net/mlx5e: Added BW check for DIM decision mechanism
DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for
changing the channel interrupt moderation values in order to reduce CPU
overhead for all traffic types.
Until now only interrupt and packet rate were sampled.
We found a scenario on which we get a false indication since a change in
DIM caused more aggregation and reduced packet rate while increasing BW.

We now regard a change as succesfull iff:
current_BW > (prev_BW + threshold) or
current_BW ~= prev_BW and current_PR > (prev_PR + threshold) or
current_BW ~= prev_BW and current_PR ~= prev_PR and
    current_IR < (prev_IR - threshold)
Where BW = Bandwidth, PR = Packet rate and IR = Interrupt rate

Improvements (ConnectX-4Lx 25GbE, single RX queue, LRO off)
    --------------------------------------------------
    packet size | before[Mb/s] | after[Mb/s] | gain  |
    2B          | 343.4        | 359.4       |  4.5% |
    16B         | 2739.7       | 2814.8      |  2.7% |
    64B         | 9739         | 10185.3     |  4.5% |

Fixes: cb3c7fd4f8 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-11 13:10:36 +03:00
Huy Nguyen
f729860a17 net/mlx5: Remove several module events out of ethtool stats
Remove the following module event counters out of ethtool stats. The
reason for removing these event counters is that these events do not
occur without techinician's intervention.
  module_pwr_budget_exd
  module_long_range
  module_no_eeprom
  module_enforce_part
  module_unknown_id
  module_unknown_status
  module_plug

Fixes: bedb7c909c ("net/mlx5e: Add port module event counters to ethtool stats")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed by: Gal Pressman <galp@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-11 13:10:36 +03:00
Mohamad Haj Yahia
3fece5d676 net/mlx5: Continue health polling until it is explicitly stopped
The issue is that when we get an assert we will stop polling the health
and thus we cant enter error state when we have a real health issue.

Fixes: fd76ee4da5 ('net/mlx5_core: Fix internal error detection conditions')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-11 13:10:36 +03:00
Mohamad Haj Yahia
57f35c93a2 net/mlx5: Fix create vport flow table flow
Send vport number to the create flow table inner method instead of
ignoring the vport argument and sending always 0.

Fixes: b3ba51498b ('net/mlx5: Refactor create flow table method to accept underlay QP')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-11 13:10:36 +03:00
Ido Schimmel
2e915e0b68 mlxsw: spectrum: Pass port argument to module mapping functions
Previous patch made it unnecessary to map ports to modules before we
allocate their struct. We can now therefore pass the port struct to
these functions, thereby making them consistent with other functions
that operate on ports.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:33:41 -04:00
Ido Schimmel
5b15385964 mlxsw: spectrum: Simplify port split flow
In commit be94535f95 ("mlxsw: spectrum: Make split flow match firmware
requirements") we had to modify the port split flow to overcome quirks
in the device's firmware. This resulted in asymmetrical code with
regards to port creation and removal.

The problem in the firmware is long gone and since we can now enforce a
minimal firmware version, we can simplify the code and make it symmetric
again.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:33:41 -04:00
Ido Schimmel
d7a60306c6 mlxsw: spectrum_router: Mark only first LPM tree as reserved
In new firmware versions (that we can now enforce via
request_firmware()), only the first LPM tree is reserved and not the
first two as in older versions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:33:40 -04:00
Arkadi Sharshevsky
be7432b952 mlxsw: spectrum: Remove support for bridge bypass FDB add/del
The FDB add/del are now done through the notification chain. The FDBs
are synced with the bridge and there is no need for extra dumping.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:28 -04:00
Arkadi Sharshevsky
af06137892 mlxsw: spectrum_switchdev: Add support for learning FDB through notification
Add support for learning FDB through notification. The driver defers
the hardware update via ordered work queue. Support for stacked devices
is also provided. In case of a successful FDB add a notification is
sent back to bridge.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:27 -04:00
Arkadi Sharshevsky
1b40dc3d86 mlxsw: spectrum_switchdev: Change switchdev notifier API
The current API for sending switchdev notifications implies only FDB
add/del. In order to support notification about successful FDB offload
the API is changed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:27 -04:00
Arkadi Sharshevsky
10e23eb299 mlxsw: spectrum: Remove support for bypass bridge port attributes/vlan set
The bridge port attributes/vlan for mlxsw devices should be set only
from bridge code. The vlans are synced totally with the bridge so
there is no need to special dump support.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:27 -04:00
Arkadi Sharshevsky
c7b566cd2e mlxsw: spectrum_switchdev: Add support for querying supported bridge flags
Add support for querying supported bridge flags.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:26 -04:00
Arkadi Sharshevsky
a989cdb473 mlxsw: spectrum: Remove support for bridge FDB learning sync
Currently the mlxsw driver supports an option for disabling syncing
the hardware learned FDBs with the software bridge. This behavior
breaks the bridge offload model and thus it is removed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:26 -04:00
Arkadi Sharshevsky
6b26b51b1d net: bridge: Add support for notifying devices about FDB add/del
Currently the bridge doesn't notify the underlying devices about new
FDBs learned. The FDB sync is placed on the switchdev notifier chain
because devices may potentially learn FDB that are not directly related
to their ports, for example:

1. Mixed SW/HW bridge - FDBs that point to the ASICs external devices
                        should be offloaded as CPU traps in order to
			perform forwarding in slow path.
2. EVPN - Externally learned FDBs for the vtep device.

Notification is sent only about static FDB add/del. This is done due
to fact that currently this is the only scenario supported by switch
drivers.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:25 -04:00
Jiri Pirko
a5fcf8a6c9 net: propagate tc filter chain index down the ndo_setup_tc call
We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 09:55:53 -04:00
Eran Ben Elisha
46e9d0b61e net/mlx5e: Fill advertised and supported port data from Hardware info
Translate hardware port connector type data into link mode supported and
advertised info instead of caching it in driver.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:12:00 +03:00
Eran Ben Elisha
5b4793f817 net/mlx5e: Add support for reading connector type from PTYS
Read port connector type from the firmware instead of caching it in the
driver metadata.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:12:00 +03:00
Maor Gottlieb
0c90e9c6b5 net/mlx5: Update flow table commands layout
Update struct mlx5_ifc_create(modify)_flow_table_bits according to
the last device specification.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:12:00 +03:00
Or Gerlitz
2b64beba02 net/mlx5e: Support header re-write of partial fields in TC pedit offload
Using a per field mask field, the TC pedit action supports modifying
partial fields. E.g if using the TC tool, the following example would
make the kernel to only re-write two bytes of the src ip address:

tc filter add dev enp1s0 protocol ip parent ffff: prio 30
	flower skip_sw ip_proto udp dst_port 8001
	action pedit ex munge ip src set 10.1.0.0 retain 0xffff0000

We add driver support for offload these partial re-writes, by setting
the per FW action offset-in-field and length-from-offset attributes.

The 1st bit set in the mask specifies both the offset and the right
shift to apply on the value such that the 1st bit which needs to be
set will reside in bit 0 of the FW data field.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:12:00 +03:00
Or Gerlitz
3099eb5a8e net/mlx5e: Use modify header ID cache for offloaded TC NIC flows
Use the modify header ID cache for the header re-write part of offloading
TC NIC flows.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:12:00 +03:00
Or Gerlitz
1a9527bb17 net/mlx5e: Use modify header ID cache for offloaded TC E-Switch flows
Use the modify header ID cache for the header re-write part of offloading
TC eswitch flows.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:12:00 +03:00
Or Gerlitz
11c9c548ce net/mlx5e: Add cache for HW modify header IDs
Packets belonging to flows which are different by matching may still need
to go through the same header re-write. Add a cache for header re-write IDs
keyed by the binary chain of modify header actions.

The caching is supported for both eswitch and NIC use-cases, where the
actual conversion of the code to use caching comes in next patches, one
per use-case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:12:00 +03:00
Or Gerlitz
513f8f7fc0 net/mlx5e: Use short attribute form when adding/deleting offloaded TC flows
Instead of going through flow->nic/esw_attr for each usage, assign
an attr pointer per the context (nic or esw) and use that.

This patch doesn't add any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:11:59 +03:00
Or Gerlitz
de6ea92382 net/mlx5e: Remove limitation of single NIC offloaded TC action per rule
Remove the limitation that offloaded NIC filters can have only one
action. This allows us for example to provide flow tag as a note
to upper layers / apps that that HW header re-write was applied.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08 14:11:59 +03:00
Tariq Toukan
808df6a209 net/mlx4_en: Bump driver version
Remove date and bump version for mlx4_en driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:33:01 -04:00
Tariq Toukan
cea2a6d81d net/mlx4_core: Bump driver version
Remove date and bump version for mlx4_core driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:33:00 -04:00
David S. Miller
216fe8f021 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 22:20:08 -04:00
Jiri Pirko
bd5ddba52d spectrum_flower: Implement gact trap TC action offload
Just use the previously prepared infrastructure and offload the gact
trap action to ACL.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:24 -04:00
Jiri Pirko
df7eea963e acl: Introduce ACL trap action
Use trap/discard flex action to implement trap.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:24 -04:00
Jiri Pirko
0db7b386f5 mlxsw: spectrum: Introduce ACL trap
Introduce an ACL trap and put it into ip2me trap group.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:24 -04:00
Jiri Pirko
be8408e144 mlxsw: pci: Fix size of trap_id field in CQE
The "trap_id" is 9bits long. So far, this was not a problem since we
used only traps with ids that fit into 8bits. But the ACL traps that are
going to be introduced use the 9th bit.

Fixes: eda6500a98 ("mlxsw: Add PCI bus implementation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:23 -04:00
Colin Ian King
928a759593 net/mlxfw: remove redundant goto on error check
The check to see of err is set and the subsequent goto is extraneous
as the next statement is where the goto is jumping to. Remove this
redundant check and goto.

Detected by CoverityScan, CID#1437734 ("Identical code for
different branches")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:21:29 -04:00
Ido Shamay
269f9883fe net/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport
The Granular QoS per VF feature must be enabled in FW before it can be
used.

Thus, the driver cannot modify a QP's qos_vport value (via the UPDATE_QP FW
command) if the feature has not been enabled -- the FW returns an error if
this is attempted.

Fixes: 08068cd568 ("net/mlx4: Added qos_vport QP configuration in VST mode")
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 11:29:54 -04:00
Ido Schimmel
de5ed99e97 mlxsw: spectrum_router: Align RIF index allocation with existing code
The way we usually allocate an index is by letting the allocation
function return an error instead of an invalid index.

Do the same for RIF index.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:49:49 -04:00
Ido Schimmel
da0abcf93f mlxsw: Fix typo inside enumeration
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:49:48 -04:00
Ido Schimmel
cb4cc0e0b1 mlxsw: spectrum: Tidy up header file
Make it clear where functions are defined and move misplaced declaration
to their correct place.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:49:48 -04:00
Yotam Gigi
a4e1ce24f7 mlxsw: spectrum: Rename the firmware file
Change the firmware file name to be in "mellanox" directory.

This commit is a followup to the linux-firmware commit a4c72696f5f4
("Mellanox: Add firmware for mlxsw_spectrum")

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:48:03 -04:00
Talat Batheesh
6dc06c08be net/mlx4: Fix the check in attaching steering rules
Our previous patch (cited below) introduced a regression
for RAW Eth QPs.

Fix it by checking if the QP number provided by user-space
exists, hence allowing steering rules to be added for valid
QPs only.

Fixes: 89c557687a ("net/mlx4_en: Avoid adding steering rules with invalid ring")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:10:05 -04:00
Or Gerlitz
fd7da28b28 net/mlx5e: Offload TC matching on ip tos / traffic-class
Enable offloading of TC matching on ipv4 tos or ipv6 traffic-class.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 18:12:24 -04:00
Or Gerlitz
e77834ec0a net/mlx5e: Offload TC matching on tcp flags
Enable offloading of TC matching on tcp flags.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 18:12:24 -04:00
Ido Schimmel
0266f79778 mlxsw: spectrum: Add bridge dependency for spectrum
When BRIDGE is a loadable module, MLXSW_SPECTRUM mustn't be built-in:

drivers/built-in.o: In function `mlxsw_sp_bridge_device_create':
drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:145: undefined reference to `br_vlan_enabled'
drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:158: undefined reference to `br_multicast_enabled'
drivers/built-in.o: In function `mlxsw_sp_dev_rif_type':
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:2972: undefined reference to `br_vlan_enabled'
drivers/built-in.o: In function `mlxsw_sp_inetaddr_vlan_event':
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3310: undefined reference to `br_vlan_enabled'

Add Kconfig dependency to enforce usable configurations.

Fixes: c57529e1d5 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 15:06:45 -04:00
Yotam Gigi
ce6ef68f43 mlxsw: spectrum: Implement the ethtool flash_device callback
Add callback to the ethtool flash_device op. This callback uses the mlxfw
module to flash the new firmware file to the device.

As the firmware flash process takes about 20 seconds and ethtool takes the
rtnl lock during the flash_device callback, release the rtnl lock at the
beginning of the flash process and take it again before leaving the
callback. This way, the rtnl is not held during the process. To make sure
the device does not get deleted during the flash process, take a reference
to it before releasing the rtnl lock.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 12:25:41 -04:00
Jakub Kicinski
d897a638e9 sched: add helper for updating statistics on all actions
Forgetting to disable preemption around tcf_action_stats_update()
seems to be a common mistake.  Add a helper function for updating
stats on all actions of a filter.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31 17:58:13 -04:00
Arnd Bergmann
9831724a08 net/mlxfw: select CONFIG_XZ_DEC
The new mlxfw code fails to build without the xz library:

drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.o: In function `mlxfw_mfa2_xz_dec_run':
:(.text.mlxfw_mfa2_xz_dec_run+0x8): undefined reference to `xz_dec_run'
drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.o: In function `mlxfw_mfa2_file_component_get':
:(.text.mlxfw_mfa2_file_component_get+0x218): undefined reference to `xz_dec_init'
:(.text.mlxfw_mfa2_file_component_get+0x2c0): undefined reference to `xz_dec_end'

This adds a Kconfig 'select' statement for it, which is also what
the other user of that library has.

Fixes: 410ed13cae ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31 12:53:12 -04:00
Arnd Bergmann
f0d7ae95ff net/mlx5: avoid build warning for uniprocessor
Building the driver with CONFIG_SMP disabled results in a harmless
warning:

ethernet/mellanox/mlx5/core/main.c: In function 'mlx5_irq_set_affinity_hint':
ethernet/mellanox/mlx5/core/main.c:615:6: error: unused variable 'irq' [-Werror=unused-variable]

It's better to express the conditional compilation using IS_ENABLED()
here, as that lets the compiler see what the intented use for the variable
is, and that it can be silently discarded.

Fixes: b665d98edc ("net/mlx5: Tolerate irq_set_affinity_hint() failures")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30 14:09:28 -04:00
David S. Miller
34aa83c2fc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Overlapping changes in drivers/net/phy/marvell.c, bug fix in 'net'
restricting a HW workaround alongside cleanups in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 20:46:35 -04:00
Ido Schimmel
e4f3c1c17b mlxsw: spectrum_router: Implement common RIF core
The mlxsw driver currently implements three types of RIFs. VLAN and FID
RIFs for L3 interfaces on top of VLAN-aware and VLAN-unaware bridges
(respectively) and Subport RIFs for all other L3 interfaces.

All the RIF types follow a common configuration procedure, which only
differs in the type-specific bits. The patch exploits this fact and
consolidates the common code paths, thereby simplifying the code and
making it more extensible.

This work also prepares the driver for use with future ASICs, where the
range of the Subport RIFs will be extended and their configuration
modified accordingly. By merely implementing a new RIF operations and
selecting it during initialization, the same driver could be re-used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:49 -04:00
Ido Schimmel
a110748725 mlxsw: spectrum: Implement common FID core
The device supports three types of FIDs. 802.1Q and 802.1D FIDs for
VLAN-aware and VLAN-unaware bridges (respectively) and rFIDs to
transport packets to the router block.

The different users (e.g., bridge, router, ACLs) of the FIDs
infrastructure need not know about the internal FIDs implementation and
can therefore interact with it using a restricted set of exported
functions.

By encapsulating the entire FID logic and hiding it from the rest of the
driver we get a code base that it much simpler and easier to work with
and extend.

For example, in the current Spectrum ASIC only 802.1D FIDs can be
assigned a VNI, but future ASICs will also support 802.1Q FIDs. With
this patch in place, support for future ASICs can be easily added by
implementing a new FID operations according to their capabilities.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:49 -04:00
Ido Schimmel
c9ec53f034 mlxsw: spectrum_router: Determine VR first when creating RIF
All RIF types are associated with a virtual router (VR), so determine VR
first when creating a RIF.

That way, we can more easily integrate the common RIF core in the
following patches.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:49 -04:00
Ido Schimmel
8e3482d6ad mlxsw: spectrum_router: Flood packets to router after RIF creation
If a packet ingress the router but can't be assigned an ingress RIF,
it's dropped.

Therefore, in the case of RIF configured on top of a bridge, it makes
sense to start flooding broadcast packets to the router only after the
RIF was created.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:48 -04:00
Ido Schimmel
1b8f09a05f mlxsw: spectrum_router: Destroy RIF only based on its struct
Now that all the information to create a RIF is contained within the RIF
struct itself, we can also simplify the destruction logic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:48 -04:00
Ido Schimmel
ab01ae9169 mlxsw: spectrum_router: Configure RIFs based on RIF struct
All the information necessary for the configuration of RIFs can now be
found in the RIF struct itself, so reduce the arguments list.

This gets us one step closer to the common RIF core.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:48 -04:00
Ido Schimmel
4d93ceebf0 mlxsw: spectrum_router: Extend the RIF struct
Currently, when a Subport RIF is configured, the LAG status and VLAN of
the underlying port are read from the port itself. This is problematic,
as we would like to have common code to configure all types of RIFs,
which aren't necessarily bound to a port.

Instead, embed the RIF in a struct specific to the Subport type, which
contains all the necessary information.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:48 -04:00
Ido Schimmel
a13a594da0 mlxsw: spectrum_router: Allocate RIF prior to its configuration
In the following patches the RIF's configuration function is going to
expect a RIF struct with all the necessary information.

Therefore, allocate the RIF just before it's configured to the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:47 -04:00
Ido Schimmel
caa3ddf8e3 mlxsw: spectrum_router: Allocate FID prior to RIF configuration
The following patches are going to re-arrange the FID and RIF code, so
that when the RIF is configured to the device based on the information
present in the RIF struct (which points to a FID).

For this reason, move the FID allocation to just before the RIF
configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:47 -04:00
Ido Schimmel
c57529e1d5 mlxsw: spectrum: Replace vPorts with Port-VLAN
As explained in the cover letter, since the introduction of the bridge
offload in the mlxsw driver, information related to the offloaded bridge
and bridge ports was stored in the individual port struct,
mlxsw_sp_port.

This lead to a bloated struct storing both physical properties of the
port (e.g., autoneg status) as well as logical properties of an upper
bridge port (e.g., learning, mrouter indication). While this might work
well for simple devices, it proved to be hard to extend when stacked
devices were taken into account and more advanced use-cases (e.g., IGMP
snooping) considered.

This patch removes the excess information from the above struct and
instead stores it in more appropriate structs that represent the bridge
port, the bridge itself and a VLAN configured on the bridge port.

The membership of a port in a bridge is denoted using the Port-VLAN
struct, which points to the bridge port and also member in the bridge
VLAN group of the VLAN it represents. This allows us to completely
remove the vPort abstraction and consolidate many of the code paths
relating to VLAN-aware and unaware bridges.

Note that the FID / vFID code is currently duplicated, but this will
soon go away when the common FID core will be introduced.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:47 -04:00
Ido Schimmel
ed9ddd3aad mlxsw: spectrum: Don't create FIDs upon creation of VLAN uppers
Up until now we used to create FIDs upon the creation of VLAN uppers on
top of the VLAN-aware bridge. This was done so that in case a router
interface (RIF) was configured on top of the bridge, the FID would
already be there.

Instead, simplify the code and only create the FID upon RIF creation.

This is an intermediary step towards the introduction of the common FID
core, in which this code would be completely removed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:46 -04:00
Ido Schimmel
f0cebd81c9 mlxsw: spectrum: Don't lose bridge port device during enslavement
Currently, when port netdevs (or their uppers) are enslaved to a bridge,
we simply propagate the CHANGEUPPER event all the way down and lose the
context of the actual netdevice used as the bridge port.

This leads to a lot of information hanging off the ports (and vPorts),
which doesn't logically belong there, such as mrouter indication and
unknown unicast flood state.

Following patches are going to put the mlxsw_sp_port struct on diet and
instead introduce a bridge port struct, where the above mentioned
information belongs. But in order to do that, we need to be able to
determine the bridge port netdevice, so propagate it down.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:46 -04:00
Ido Schimmel
7cbecf245a mlxsw: spectrum_router: Replace vPorts with Port-VLAN
We're going to get rid of vPorts completely later in the patchset, but
the router code is self-contained, so it's a good candidate to start the
transition with.

Convert all the functions that expects to operate on a vPort to operate
on a Port-VLAN instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:45 -04:00
Ido Schimmel
ce95e15456 mlxsw: spectrum: Change signature of FID leave function
When a vPort is destroyed, it leaves the FID it's currently mapped to
(if any) and drops the reference. The FID's leave function expects to
get the vPort as its argument, but this will have to change when the
vPort model is retired.

Change the function signature to expect a Port-VLAN struct instead and
patch the call sites accordingly.

The code introduced in this patch will be removed later in the patchset,
but this intermediary step is required in order to ease the code review.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:45 -04:00
Ido Schimmel
31a08a523a mlxsw: spectrum: Introduce Port-VLAN structure
This is the first step in the transition from the vPort model to a
unified Port-VLAN structure. The new structure is defined and created /
destroyed upon invocation of the 8021q ndos, but it's not actually used
throughout the code.

Subsequent patches will initialize it correctly and also create /
destroy it upon switchdev's VLAN object.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:45 -04:00
Ido Schimmel
4aafc368da mlxsw: spectrum: Set port's mode according to FID mappings
We currently transition the port to "Virtual mode" upon the creation of
its first VLAN upper, as we need to classify incoming packets to a FID
using {Port, VID} and not only the VID.

However, it's more appropriate to transition the port to this mode when
the {Port, VID} are actually mapped to a FID. Either during the
enslavement of the VLAN upper to a VLAN-unaware bridge or the
configuration of a router port.

Do this change now in preparation for the introduction of the FID core,
where this operation will be encapsulated.

To prevent regressions, this patch also explicitly configures an OVS
slave to "Virtual mode". Otherwise, a packet that didn't hit an ACL rule
could be classified to an existing FID based on a global VID-to-FID
mapping, thus not incurring a FID mis-classification, which would
otherwise trap the packet to the CPU to be processed by the OVS daemon.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 15:18:45 -04:00
Ido Schimmel
03ea01e9db mlxsw: spectrum_router: Adjust RIF configuration for new firmware versions
In new firmware versions, when configuring a {Port, VID} as a router
interface, the driver is responsible for enabling the STP filter and
disabling learning.  Otherwise, packets are discarded.

This change doesn't break existing firmware versions, but is required
for newer firmware versions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 17:46:17 -04:00
Yotam Gigi
6b7421992b mlxsw: spectrum: Validate firmware revision on init
Make the spectrum module check the current device firmware version, and if
it is below the supported version, use the libfirmware API to request a
firmware file with the supported firmware version and flash it to the
device using the mlxfw module.

The firmware file names are expected to be of Mellanox Firmware Archive
version 2 (MFA2) format and their name are expected to be in the following
pattern: "mlxsw_spectrum-<major>.<minor>.<sub-minor>.mfa2".

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 17:46:17 -04:00
Yotam Gigi
c41d007588 mlxsw: core: Create the mlxsw_fw_rev struct
This struct was previously an anonymous struct defined inside the
mlxsw_bus_info struct. Extract it to a struct named mlxsw_fw_rev, as it
will be needed later by the spectrum driver.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 17:46:17 -04:00
Yotam Gigi
e5e5c88a1f mlxsw: spectrum: Add the needed callbacks for mlxfw integration
The mlxfw module defines several needed callbacks in order to flash the
device's firmware. As the mlxfw module is shared between several different
drivers, those callbacks are the glue functionality that is responsible
for hardware interaction. Add those callbacks using the MCQI, MCC, MCDA
registers.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 17:46:17 -04:00
Yotam Gigi
4625d59d6d mlxsw: reg: Add Management Component Data Access register
The MCDA register allows reading and writing a firmware component.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 17:46:17 -04:00
Yotam Gigi
191839de90 mlxsw: reg: Add Management Component Control register
The MCC register allows controlling and querying the firmware flash state
machine (FSM).

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 17:46:17 -04:00
Yotam Gigi
4f2402d46b mlxsw: reg: Add Management Component Query Information register
The MCQI register queries information about firmware components. It will
be needed by the mlxfw module to query various options about the
components, such as their max size, alignment and max write size.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 17:46:17 -04:00
Yotam Gigi
410ed13cae Add the mlxfw module for Mellanox firmware flash process
The mlxfw module is in charge of common logic needed to flash Mellanox
devices firmware, which consists of:
 - Parse the Mellanox Firmware Archive version 2 (MFA2) format, which is
   the format used to store the Mellanox firmware. The MFA2 format file can
   hold firmware for many different silicon variants, differentiated by a
   unique ID called PSID. In addition, the MFA2 file data section is
   compressed using xz compression to save both file-system space and
   memory at extraction time.
 - Implement the firmware flash state machine logic, which is a common
   logic for Mellanox products needed to flash the firmware to the device.

As the module is shared between different Mellanox products, it defines a
set of callbacks to be implemented by the specific driver for hardware
interaction.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 17:46:17 -04:00
David S. Miller
abc7a4ef84 mlx5-update-2017-05-23
First patch from Leon, came to remove the redundant usage of mlx5_vzalloc,
 and directly use kvzalloc across all mlx5 drivers.
 
 2nd patch from Noa, adds new device IDs into the supported devices list.
 
 3rd and 4th patches from Ilan are adding the basic infrastructure and
 support for Mellanox's mlx5 FPGA.
 
 Last two patches from Tariq came to modify the outdated driver version
 reported in ethtool and in mlx5_ib to more reflect the current driver state
 and remove the redundant date string reported in the version.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZJBu5AAoJEEg/ir3gV/o+bCAH/Av6h8TcoR6p+NzfJBqfY+6G
 dXfXbxxpAGGZQeC7PDU+3BG/sw8bvQxJ4P23zs4BPw40X78s4+zUknzdnw6ZxvFh
 Kk9C8MbsEG60k1hw9mzSx3cYyYjol/6UdLPxVVJC4dJbYAXJl5ZZfiUf+2J85mic
 3YZpvE4HymyOo2pGRDyi0i3ailr5njhQQ4bBxYLmyY0mbAm/hf/O2oKcQPuwvzTI
 Gwkaj4S207rA1xDO6PCp884P5mhstmL3Cs9JnsQirISn5n/5L51Ezt7/63C1ysab
 2t/JM+MiEMUXk+plnGwtkPUo5/am41e044GmJfMT6Kerq+ZCwdO5/rQ7wc4qJXU=
 =fYgO
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-update-2017-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-update-2017-05-23

First patch from Leon, came to remove the redundant usage of mlx5_vzalloc,
and directly use kvzalloc across all mlx5 drivers.

2nd patch from Noa, adds new device IDs into the supported devices list.

3rd and 4th patches from Ilan are adding the basic infrastructure and
support for Mellanox's mlx5 FPGA.

Last two patches from Tariq came to modify the outdated driver version
reported in ethtool and in mlx5_ib to more reflect the current driver state
and remove the redundant date string reported in the version.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 12:01:22 -04:00
Jiri Pirko
8a41d845c4 mlxsw: spectrum_flower: Add support for tcp flags
Allow to offload rules that contain tcp flags within the mask.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24 16:22:11 -04:00
Jiri Pirko
dea2d6457f mlxsw: spectrum: Add acl block containing tcp flags for ipv4
Add acl block called "ipv4" which contains tcp flags.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24 16:22:11 -04:00
Jiri Pirko
b4d39b4b54 mlxsw: acl: Add tcp flags acl element
Define new element for tcp flags and place it into scratch area.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24 16:22:11 -04:00
Tariq Toukan
b665d98edc net/mlx5: Tolerate irq_set_affinity_hint() failures
Add tolerance to failures of irq_set_affinity_hint().
Its role is to give hints that optimizes performance,
and should not block the driver load.

In non-SMP systems, functionality is not available as
there is a single core, and all these calls definitely
fail.  Hence, do not call the function and avoid the
warning prints.

Fixes: db058a186f ("net/mlx5_core: Set irq affinity hints")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:32 +03:00
Mohamad Haj Yahia
73dd3a4839 net/mlx5: Avoid using pending command interface slots
Currently when firmware command gets stuck or it takes long time to
complete, the driver command will get timeout and the command slot is
freed and can be used for new commands, and if the firmware receive new
command on the old busy slot its behavior is unexpected and this could
be harmful.
To fix this when the driver command gets timeout we return failure,
but we don't free the command slot and we wait for the firmware to
explicitly respond to that command.
Once all the entries are busy we will stop processing new firmware
commands.

Fixes: 9cba4ebcf3 ('net/mlx5: Fix potential deadlock in command mode change')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:31 +03:00
Erez Shitrit
b57fe69196 net/mlx5e: IPoIB, handle RX packet correctly
IPoIB packet contains the pseudo header area, we need to pull it prior
to reset_mac_header in order to let the GRO work well.

In more details:
GRO checks the mac address of the new coming packet, it does that by
comparing the hard_header_len size of the current packet to the previous
one in that session, the comparison is over hard_header_len size.
Now, the driver prepares that area in the skb by allocating area from the
reserved part and resetting the correct mac header to it.

Fixes: 9d6bd752c6 ("net/mlx5e: IPoIB, RX handler")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:31 +03:00
Or Gerlitz
e3ca4e0583 net/mlx5e: Fix warnings around parsing of TC pedit actions
The sparse tool emits these correct complaints:

drivers/net/ethernet/mellanox/mlx5/core//en_tc.c:1005:25: warning: cast to restricted __be32
drivers/net/ethernet/mellanox/mlx5/core//en_tc.c:1007:25: warning: cast to restricted __be16

The value is provided from user-space in network order, but there's
no way for them to realize that, avoid the warnings by casting to the
appropriate type.

Fixes: d79b6df6b1 ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:31 +03:00
Or Gerlitz
d824bf3fe2 net/mlx5e: Properly enforce disallowing of partial field re-write offload
Currently we don't support partial header re-writes through TC pedit
action offloading. However, the code that enforces that wasn't err-ing
on cases where the first and last bits of the mask are set but there is
some zero bit between them, such as in the below example, fix that!

tc filter add dev enp1s0 protocol ip parent ffff: prio 10 flower
	ip_proto udp dst_port 2001 skip_sw
	action pedit munge ip src set 1.0.0.1 retain 0xff0000ff

Fixes: d79b6df6b1 ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:31 +03:00
Or Gerlitz
26c0274993 net/mlx5e: Allow TC csum offload if applied together with pedit action
When offloading header re-writes, the HW re-calculates the relevant L3/L4
checksums. Hence, when upper layers (as done by OVS) ask for TC checksum action
offload together with pedit offload, don't err. This command now works:

tc filter add dev ens1f0 protocol ip parent ffff: prio 20 flower skip_sw
	ip_proto tcp dst_port 9001
	action pedit ex munge tcp dport set 0x1234 pipe
        action csum tcp

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:31 +03:00
Or Gerlitz
cdc5a7f363 net/mlx5e: Use the correct delete call on offloaded TC encap entry detach
We wrongly direcly invoke hlist_del_rcu() and not hash_del_rcu() which
does a slightly different call now and may change later, fix that.

Fixes: a54e20b4fc ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:31 +03:00
Miroslav Lichvar
e341257548 net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL
Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid
filter and update drivers which can timestamp all packets, or which
explicitly list unsupported filters instead of using a default case, to
handle the filter.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 13:37:32 -04:00
David S. Miller
c6cd850d65 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-05-18 16:11:32 -04:00
Wei Yongjun
27902f0806 net/mlx5e: Fix possible memory leak
'encap_header' is malloced and should be freed before leaving from
the error handling cases, otherwise it will cause memory leak.

Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18 15:18:31 -04:00
Ido Schimmel
c0e01eac7a mlxsw: spectrum: Avoid possible NULL pointer dereference
In case we got an FDB notification for a port that doesn't exist we
execute an FDB entry delete to prevent it from re-appearing the next
time we poll for notifications.

If the operation failed we would trigger a NULL pointer dereference as
'mlxsw_sp_port' is NULL.

Fix it by reporting the error using the underlying bus device instead.

Fixes: 12f1501e75 ("mlxsw: spectrum: remove FDB entry in case we get unknown object notification")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18 11:27:21 -04:00
Arkadi Sharshevsky
d5d6add01e mlxsw: spectrum_dpipe: Fix sparse warnings
drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:52: warning:
Using plain integer as NULL pointer
drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:74: warning:
Using plain integer as NULL pointer

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18 11:05:31 -04:00
Arkadi Sharshevsky
6b1206bbbc mlxsw: spectrum_router: Fix rif counter freeing routine
During rif counter freeing the counter index can be invalid. Add check
of validity before freeing the counter.

Fixes: e0c0afd8aa ("mlxsw: spectrum: Support for counters on router interfaces")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18 11:04:00 -04:00
Arkadi Sharshevsky
6dd4aba36f mlxsw: spectrum_dpipe: Fix incorrect entry index
In case of disabled counters the entry index will be incorrect. Fix this
by moving the entry index set before the counter status check.

Fixes: 2ba5999f00 ("mlxsw: spectrum: Add Support for erif table entries access")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18 11:03:59 -04:00
Ido Schimmel
45a4a16cdb mlxsw: spectrum: Default ports to non-virtual mode
In virtual mode, packets are classified to FIDs based on their ingress
port and VLAN whereas in non-virtual mode only the VLAN is taken into
account.

Currently ports are initialized to use virtual mode due to the presence
of the PVID vPort. However, we're going to transition ports between both
modes based on the FIDs they use and not merely based on the presence on
a VLAN upper. Therefore, during initialization, no mode will be
explicitly set.

Since the Programmer's Reference Manual (PRM) doesn't specify a default,
explicitly set the port to non-virtual mode and later transition the
port between both modes based on the FIDs it uses.

In a follow-up patchset, this step will be moved to the common FID core
where it logically belongs.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
b02eae9b91 mlxsw: spectrum: Move PVID code to appropriate place
PVID is a port attribute and should therefore reside in the main driver
file and not the switchdev specific one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
7cbc4277c7 mlxsw: spectrum_switchdev: Don't batch learning operations
We no longer batch VLAN operations, so there's no need to set the
learning state for a range of VLANs.

Use a common function to set the learning state for a Port-VLAN, thereby
making the code saner more receptive for upcoming changes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
45bfe6b433 mlxsw: spectrum_switchdev: Don't batch STP operations
Simplify the code by using the common function that sets an STP state
for a Port-VLAN and remove the existing one that tries to batch it for
several VLANs.

This will help us in a follow-up patchset to introduce a unified
infrastructure for bridge ports, regardless if the bridge is VLAN-aware
or not.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
fe9ccc785d mlxsw: spectrum_switchdev: Don't batch VLAN operations
switchdev's VLAN object has the ability to describe a range of VLAN IDs,
but this is only used when VLAN operations are done using the SELF flag,
which is something we would like to remove as it allows one to bypass
the bridge driver.

Do VLAN operations on a per-VLAN basis, thereby simplifying the code and
preparing it for refactoring in a follow-up patchset.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
d341e2ce6b mlxsw: spectrum_switchdev: Remove redundant check
Since commit 97c242902c ("switchdev: Execute bridge ndos only for
bridge ports") switchdev code checks that port is bridged, so no need to
perform the same check in the driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
348b8fc3cf mlxsw: spectrum_router: Initialize RIFs in a separate function
The router interfaces (RIFs) array is currently initialized together
with the general router configuration. However, in a follow-up patchset
we're going to introduce a common RIF core that will require us to
initialize more RIF constructs, so move the RIF initialization to its
own function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
7e39d1153d mlxsw: spectrum_router: Move FIB notification block to router struct
The FIB notification block logically belongs inside the router specific
struct, so move it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
5f9efffbdb mlxsw: spectrum_router: Move RIFs array to its rightful place
The router interfaces (RIFs) array is of no interest to code outside the
routing realm, so declare it inside the router specific struct instead
of the chip-wide one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
5f6935c6a4 mlxsw: spectrum_switchdev: Reduce scope of bridge struct
Some attributes in the global chip struct are only relevant for bridge
operation, so encapsulate them in their own struct that isn't exposed to
non-bridge code.

This will also help us later, when we add more bridge-specific
attributes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
9011b677e7 mlxsw: spectrum_router: Reduce scope of router struct
In a similar fashion to previous patch, the router structure
('mlxsw_sp_router') doesn't need to be accessible to anyone, but the
router code located at spectrum_router.c

Make this apparent and reduce its scope by defining it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
33cbd87cc0 mlxsw: spectrum_buffer: Reduce scope of shared buffer struct
The shared buffer structure ('mlxsw_sp_sb') doesn't need to be
accessible to anyone, but the shared buffer code located at
spectrum_buffers.c

Make this apparent and reduce its scope by defining it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Arnd Bergmann
2432a3fb5c mlx5e: add CONFIG_INET dependency
We now reference the arp_tbl, which requires IPv4 support to be
enabled in the kernel, otherwise we get a link error:

drivers/net/built-in.o: In function `mlx5e_tc_update_neigh_used_value':
(.text+0x16afec): undefined reference to `arp_tbl'
drivers/net/built-in.o: In function `mlx5e_rep_neigh_init':
en_rep.c:(.text+0x16c16d): undefined reference to `arp_tbl'
drivers/net/built-in.o: In function `mlx5e_rep_netevent_event':
en_rep.c:(.text+0x16cbb5): undefined reference to `arp_tbl'

This adds a Kconfig dependency for it.

Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 15:48:01 -04:00
David S. Miller
42a928ced3 mlx5-fixes-2017-05-12
Misc fixes for mlx5 driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZGDM8AAoJEEg/ir3gV/o+svMH/1lAl+FIGCWgZ82/UbFHCZCW
 SIXEnP2id+Nic7JxQSa/RVQ43I75wkM8pN929hFoyz8p/pPLhZkpo7vX6yLWG2SL
 fTx/4Qn5jR/eow/D+fdrlyKMPg0A7fijY+ZnvDPsQkjtakIedCgc/A1xDufpKi7+
 I7nOJa4yACuZK0gzy32VGgpJw02q32eRTJjKHRiEYdmNQSIJpmRbG2m4e0z/me2s
 hrMt358/llPOZNwkAPD2SHZxH68oSq5EbSRmz5jDwXfTVFkjWNQVowwm3pCFjsQn
 3xDtkjakVPVBUagR3hngtLPcsDpUR4GU3tHr4l/UPp0wFBXVKgdupC7B+mCkSp4=
 =ubJG
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2017-05-12-V2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-05-12

This series contains some mlx5 fixes for net.
Please pull and let me know if there's any problem.

For -stable:
("net/mlx5e: Fix ethtool pause support and advertise reporting") kernels >= 4.8
("net/mlx5e: Use the correct pause values for ethtool advertising") kernels >= 4.8

v1->v2:
 Dropped statistics spinlock patch, it needs some extra work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:38:04 -04:00
yuval.shaia@oracle.com
4762010f09 net/mlx4_core: Use min3 to select number of MSI-X vectors
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:20:26 -04:00
Tariq Toukan
7913d20596 net/mlx5: Bump driver version
Remove date and bump version for mlx5_core driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 14:24:18 +03:00
Ilan Tayari
e29341fb3a net/mlx5: FPGA, Add basic support for Innova
Mellanox Innova is a NIC with ConnectX and an FPGA on the same
board. The FPGA is a bump-on-the-wire and thus affects operation of
the mlx5_core driver on the ConnectX ASIC.

Add basic support for Innova in mlx5_core.

This allows using the Innova card as a regular NIC, by detecting
the FPGA capability bit, and verifying its load state before
initializing ConnectX interfaces.

Also detect FPGA fatal runtime failures and enter error state if
they ever happen.

All new FPGA-related logic is placed in its own subdirectory 'fpga',
which may be built by selecting CONFIG_MLX5_FPGA.
This prepares for further support of various Innova features in later
patchsets.
Additional details about hardware architecture will be provided as
more features get submitted.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 14:24:17 +03:00
Ilan Tayari
0179720d6b net/mlx5: Introduce trigger_health_work function
Introduce new function for entering bad-health state.

This function will be called from FPGA-related logic in a later patch from
asynchronous event (IRQ) context, for that we change the spin lock to an
IRQ-safe one.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 14:22:10 +03:00
Noa Osherovich
2e9d3e83ab net/mlx5: Update the list of the PCI supported devices
Add the BlueField device and VF IDs to the supported devices list.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:53:26 +03:00
Leon Romanovsky
1b9a07ee25 {net, IB}/mlx5: Replace mlx5_vzalloc with kvzalloc
Commit a7c3e901a4 ("mm: introduce kv[mz]alloc helpers") added
proper implementation of mlx5_vzalloc function to the MM core.

This made the mlx5_vzalloc function useless, so let's remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:53:18 +03:00
Yishai Hadas
508541146a net/mlx5: Use underlay QPN from the root name space
Root flow table is dynamically changed by the underlying flow steering
layer, and IPoIB/ULPs have no idea what will be the root flow table in
the future, hence we need a dynamic infrastructure to move Underlay QPs
with the root flow table.

Fixes: b3ba51498b ("net/mlx5: Refactor create flow table method to accept underlay QP")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Saeed Mahameed
5360fd473c net/mlx5e: IPoIB, Only support regular RQ for now
IPoIB doesn't support striding RQ at the moment, for this
we need to explicitly choose non striding RQ in IPoIB init,
even if the HW supports it.

Fixes: 8f493ffd88 ("net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Saeed Mahameed
20b6a1c782 net/mlx5e: Fix setup TC ndo
Fail-safe support patches introduced a trivial bug,
setup tc callback is doing a wrong check of the netdevice state,
the fix is simply to invert the condition.

Fixes: 6f9485af40 ("net/mlx5e: Fail safe tc setup")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Gal Pressman
e3c1950371 net/mlx5e: Fix ethtool pause support and advertise reporting
Pause bit should set when RX pause is on, not TX pause.
Also, setting Asym_Pause is incorrect, and should be turned off.

Fixes: 665bc53969 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Gal Pressman
b383b544f2 net/mlx5e: Use the correct pause values for ethtool advertising
Query the operational pause from firmware (PFCC register) instead of
always passing zeros.

Fixes: 665bc53969 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
50fb55d88c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix multiqueue in stmmac driver on PCI, from Andy Shevchenko.

 2) cdc_ncm doesn't actually fully zero out the padding area is
    allocates on TX, from Jim Baxter.

 3) Don't leak map addresses in BPF verifier, from Daniel Borkmann.

 4) If we randomize TCP timestamps, we have to do it everywhere
    including SYN cookies. From Eric Dumazet.

 5) Fix "ethtool -S" crash in aquantia driver, from Pavel Belous.

 6) Fix allocation size for ntp filter bitmap in bnxt_en driver, from
    Dan Carpenter.

 7) Add missing memory allocation return value check to DSA loop driver,
    from Christophe Jaillet.

 8) Fix XDP leak on driver unload in qed driver, from Suddarsana Reddy
    Kalluru.

 9) Don't inherit MC list from parent inet connection sockets, another
    syzkaller spotted gem. Fix from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  dccp/tcp: do not inherit mc_list from parent
  qede: Split PF/VF ndos.
  qed: Correct doorbell configuration for !4Kb pages
  qed: Tell QM the number of tasks
  qed: Fix VF removal sequence
  qede: Fix XDP memory leak on unload
  net/mlx4_core: Reduce harmless SRIOV error message to debug level
  net/mlx4_en: Avoid adding steering rules with invalid ring
  net/mlx4_en: Change the error print to debug print
  drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison
  DECnet: Use container_of() for embedded struct
  Revert "ipv4: restore rt->fi for reference counting"
  net: mdio-mux: bcm-iproc: call mdiobus_free() in error path
  net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control
  ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
  net: cdc_ncm: Fix TX zero padding
  stmmac: pci: split out common_default_data() helper
  stmmac: pci: RX queue routing configuration
  stmmac: pci: TX and RX queue priority configuration
  stmmac: pci: set default number of rx and tx queues
  ...
2017-05-09 15:42:31 -07:00
Jack Morgenstein
83bd5118a1 net/mlx4_core: Reduce harmless SRIOV error message to debug level
Under SRIOV resource management, extra counters are allocated to VFs
from a free pool. If that pool is empty, the ALLOC_RES command for
a counter resource fails -- and this generates a misleading error
message in the message log.

Under SRIOV, each VF is allocated (i.e., guaranteed) 2 counters --
one counter per port. For ETH ports, the RoCE driver requests an
additional counter (above the guaranteed counters). If that request
fails, the VF RoCE driver simply uses the default (i.e., guaranteed)
counter for that port.

Thus, failing to allocate an additional counter does not constitute
a  problem, and the error message on the PF when this occurs should
be reduced to debug level.

Finally, to identify the situation that the reason for the failure is
that no resources are available to grant to the VF, we modified the
error returned by mlx4_grant_resource to -EDQUOT (Quota exceeded),
which more accurately describes the error.

Fixes: c3abb51bdb ("IB/mlx4: Add RoCE/IB dedicated counters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09 11:22:46 -04:00
Talat Batheesh
89c557687a net/mlx4_en: Avoid adding steering rules with invalid ring
Inserting steering rules with illegal ring is an invalid operation,
block it.

Fixes: 820672812f ('net/mlx4_en: Manage flow steering rules with ethtool')
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09 11:22:46 -04:00
Kamal Heib
505a9249c2 net/mlx4_en: Change the error print to debug print
The error print within mlx4_en_calc_rx_buf() should be a debug print.

Fixes: 51151a16a6 ('mlx4: allow order-0 memory allocations in RX path')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09 11:22:46 -04:00
3341713c67 Updates #2 for 4.12 kernel merge window
- mlx5/IPoIB fixup patch
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC7idAAoJELgmozMOVy/d8bEP/31hq3Vr1unNJYjU8yEaWI0l
 D/k3ZSJcnLaeoUE5Ts69pYVkncAM/g55bmS80bKyNONRJp0BMU0MxmmzQC/iMqMM
 LAvDf3Qyv8rTrEgjjmjWVQrP/v7K0+teD6YMmTkfB8fWqhi/vR+84Sv3c50GUXpL
 OZMrn6p/IXptWa24kvi5/mKNSmeOBQkp/1lRxyi6bhAcFhjwQZoO4tDkOL6Aj1oA
 FFeUcBDwMsOjbqhH59sM9ryOnGXPEwJUnLOPfq36pIrFzyUtbbcmFfMmefjJWMLK
 XW4VBXywYLzkCvr2tGeXMukJzroUPQhD6G/4uwnFlwad0tovlBCwPmOmXHbnhRbh
 JmriZaQEGj43o3yH/HHSeRvbwJ7e71xbH0tW21K9cB/CuYTixwlvYAMd218qXQ+k
 Q6qBRueWlgsaDatlkuRV0ezrILvGQhn6QgEbCUIluDjKp/d6aRtI4IM60Y5G02U/
 WLvnMQJ/MaGQXPVR1gWqWpy1ojvj/mAeTQf6aKw1Icjd3oSPgAI06B/uThXoWxh/
 FyvDl/+HNW2StxPudQPKWkMjamaD/cToSa4HjLjdPER9HiCHmUru6nRZOufhwtwh
 q2NEPx5kraRQNEMqjpopzNIWPpGBNBm+htkqXmTRpDdfvKgQ29vL49GCcV1VHtj9
 TpYkqu/6ZVPItAAfyklh
 =Ihe9
 -----END PGP SIGNATURE-----
mergetag object 67cf3623e0
 type commit
 tag for-next
 tagger Doug Ledford <dledford@redhat.com> 1493940800 -0400
 
 Updates #3 for 4.12 kernel merge window
 
 - The hfi1 15 patch set that landed late
 - IPoIB get_link_ksettings which landed late because I asked for a
   respin
 - One late rxe change
 - One -rc worthy fix that's in early
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC7q5AAoJELgmozMOVy/dungQAKpWG7sq+23PhsXDTOGtKSRg
 6jrgt5lG+V4twAJogXazxYsanPLQOE6ItKUFaBqEh2fULGyzXyPv1PsLNAVi4MYR
 6vbsMh2qCW8+dy+uEBveirQzdwdJr8U7g1mvJYPDr1Rmaw852Sq+PbId3nEy/cFK
 boMn/zhSSJuF8VsTfFCHx1gVdRN7n7gclqBP7Phq4bzdZ8E0u5GIPjmPDePUTpwH
 r3yFSC3VV5JGGgOSUc66oPPoIgraAG9EQezn01Llk5n+HPJ3eM91spV1Y9tZ7EgA
 toNW+bxEmanFYtuR+qzd0ctozUt2Ke6AUdTgcZcSCFkIan2ZOssJ0y2UN3CsNxlQ
 rYcOBe5KBNcFVb5E3NoeG+iHo9jT86fBqzYheHKmQJy38xalAlOoX6mLOvn4pPzo
 DFYDd6PpEU9SQSmhpPj1UbhvCOzecBaV3cRF/TihnqlOrRiKtd/YCccqT5GguPCB
 RVW6h1qQ94rmr/k6gobT7lNDWn2bZAgUupUD19Nw+7YwauxjGhSFsXW357NVcs4B
 3a/uLNVvnoyIjm62Sm8YFz9Bz3pRutfP9b6pSJ5V4P7dBgXYFZU35buViwcAj5fL
 DNGx4a8Poi3VQR8lQjpK6MRCWAibUY27zW1nbWQrQ9TdcHhK4sJC9Mqe41W4FXpU
 qQLkYBChz+mkCzE44bN/
 =H9f5
 -----END PGP SIGNATURE-----

Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "As mentioned in my first pull request, this is the subsequent pull
  requests I had. This is all I have, and in fact this cleans out the
  RDMA subsystem's entire patchworks queue of kernel changes that are
  ready to go (well, it did for the weekend anyway, a few new patches
  are in, but they'll be coming during the -rc cycle).

  The first tag contains a single patch that would have conflicted if
  taken from my tree or DaveM's tree as it needed our trees merged to
  come cleanly.

  The second tag contains the patch series from Intel plus three other
  stragllers that came in late last week. I took them because it allowed
  me to legitimately claim that the RDMA patchworks queue was, for a
  short time, 100% cleared of all waiting kernel patches, woohoo! :-).

  I have it under my for-next tag, so it did get 0day and linux- next
  over the end of last week, and linux-next did show one minor conflict.

  Summary:

  'for-linus' tag:
   - mlx5/IPoIB fixup patch

  'for-next' tag:
   - the hfi1 15 patch set that landed late
   - IPoIB get_link_ksettings which landed late because I asked for a
     respin
   - one late rxe change
   - one -rc worthy fix that's in early"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Enable IPoIB acceleration

* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rxe: expose num_possible_cpus() cnum_comp_vectors
  IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
  IB/hfi1: Clean up on context initialization failure
  IB/hfi1: Fix an assign/ordering issue with shared context IDs
  IB/hfi1: Clean up context initialization
  IB/hfi1: Correctly clear the pkey
  IB/hfi1: Search shared contexts on the opened device, not all devices
  IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
  IB/hfi1: Use filedata rather than filepointer
  IB/hfi1: Name function prototype parameters
  IB/hfi1: Fix a subcontext memory leak
  IB/hfi1: Return an error on memory allocation failure
  IB/hfi1: Adjust default eager_buffer_size to 8MB
  IB/hfi1: Get rid of divide when setting the tx request header
  IB/hfi1: Fix yield logic in send engine
  IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
  IB/hfi1: Fix checks for Offline transient state
  IB/ipoib: add get_link_ksettings in ethtool
2017-05-08 20:07:29 -07:00
Michal Hocko
752ade68cb treewide: use kv[mz]alloc* rather than opencoded variants
There are many code paths opencoding kvmalloc.  Let's use the helper
instead.  The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator.  E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation.  This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously.  There is no guarantee something like that happens
though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Erez Shitrit
693dfd5a3f IB/mlx5: Enable IPoIB acceleration
Enable mlx5 IPoIB acceleration by declaring
mlx5_ib_{alloc,free}_rdma_netdev and assigning the mlx5
IPoIB rdma_netdev callbacks.

In addition, this patch brings in sync mlx5's IPoIB parts for net and IB
trees. As a precaution, we disabled IPoIB acceleration by default (in
the mlx5_core Kconfig file).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 16:22:08 -04:00
Ido Schimmel
b1e455260c mlxsw: spectrum_router: Simplify VRF enslavement
When a netdev is enslaved to a VRF master, its router interface (RIF)
needs to be destroyed (if exists) and a new one created using the
corresponding virtual router (VR).

>From the driver's perspective, the above is equivalent to an inetaddr
event sent for this netdev. Therefore, when a port netdev (or its
uppers) are enslaved to a VRF master, call the same function that
would've been called had a NETDEV_UP was sent for this netdev in the
inetaddr notification chain.

This patch also fixes a bug when a LAG netdev with an existing RIF is
enslaved to a VRF. Before this patch, each LAG port would drop the
reference on the RIF, but would re-join the same one (in the wrong VR)
soon after. With this patch, the corresponding RIF is first destroyed
and a new one is created using the correct VR.

Fixes: 7179eb5acd ("mlxsw: spectrum_router: Add support for VRFs")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:47:58 -04:00
Eli Cohen
0a0ab1d2cc net/mlx5: E-Switch, Avoid redundant memory allocation
struct esw_mc_addr is a small struct that can be part of struct
mlx5_eswitch. Define it as a field and not as a pointer and save the
kzalloc call and then error flow handling.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:21 +03:00
Eran Ben Elisha
0f6e4cf674 net/mlx5e: Disable HW LRO when PCI is slower than link on striding RQ
We will activate the HW LRO only on servers with PCI BW > MAX LINK BW,
or when PCI BW > 16Gbps. On other cases we do not want LRO by default as
LRO sessions might get timeout and add redundant software overhead.

Tested:
	ethtool -k <ifs-name> | grep large-receive-offload
	On systems with and without the limitations.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:20 +03:00
Tariq Toukan
b1b03bded1 net/mlx5e: Use u8 as ownership type in mlx5e_get_cqe()
CQE ownership indication is as small as a single bit.
Use u8 to speedup the comparison.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:19 +03:00
Tariq Toukan
ad78af9b88 net/mlx5e: Use prefetchw when a write is to follow
"prefetchw()" prefetches the cacheline for write. Use it for
skb->data, as soon we'll be copying the packet header there.

Performance:
Single-stream packet-rate tested with pktgen.
Packets are dropped in tc level to zoom into driver data-path.
Larger gain is expected for smaller packets, as less time
is spent on handling SKB fragments, making the path shorter
and the improvement more significant.

---------------------------------------------
packet size | before    | after     | gain  |
64B         | 4,113,306 | 4,778,720 |  16%  |
1024B       | 3,633,819 | 3,950,593 | 8.7%  |

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:17 +03:00
Tariq Toukan
1f5b1e47ee net/mlx5e: Optimize poll ICOSQ completion queue
UMR operations are more frequent and important.
Check them first, and add a compiler branch predictor hint.

According to current design, ICOSQ CQ can contain at most one
pending CQE per napi. Poll function is optimized accordingly.

Performance:
Single-stream packet-rate tested with pktgen.
Packets are dropped in tc level to zoom into driver data-path.
Larger gain is expected for larger packet sizes, as BW is higher
and UMR posts are more frequent.

---------------------------------------------
packet size | before    | after     | gain  |
64B         | 4,092,370 | 4,113,306 |  0.5% |
1024B       | 3,421,435 | 3,633,819 |  6.2% |

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:16 +03:00
Hadar Hen Zion
a2fa1fe5ad net/mlx5e: Act on delay probe time updates
The user can change delay_first_probe_time parameter through sysctl.
Listen to NETEVENT_DELAY_PROBE_TIME_UPDATE notifications and update the
intervals for updating the neighbours 'used' value periodic task and
for flow HW counters query periodic task.
Both of the intervals will be update only in case the new delay prob
time value is lower the current interval.

Since the driver saves only one min interval value and not per device,
the users will be able to set lower interval value for updating
neighbour 'used' value periodic task but they won't be able to schedule
a higher interval for this periodic task.
The used interval for scheduling neighbour 'used' value periodic task is
the minimal delay prob time parameter ever seen by the driver.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:15 +03:00
Hadar Hen Zion
f6dfb4c3f2 net/mlx5e: Update neighbour 'used' state using HW flow rules counters
When IP tunnel encapsulation rules are offloaded, the kernel can't see
the traffic of the offloaded flow. The neighbour for the IP tunnel
destination of the offloaded flow can mistakenly become STALE and
deleted by the kernel since its 'used' value wasn't changed.

To make sure that a neighbour which is used by the HW won't become
STALE, we proactively update the neighbour 'used' value every
DELAY_PROBE_TIME period, when packets were matched and counted by the HW
for one of the tunnel encap flows related to this neighbour.

The periodic task that updates the used neighbours is scheduled when a
tunnel encap rule is successfully offloaded into HW and keeps re-scheduling
itself as long as the representor's neighbours list isn't empty.

Add, remove, lookup and status change operations done over the
representor's neighbours list or the neighbour hash entry encaps list
are all serialized by RTNL lock.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:14 +03:00
Hadar Hen Zion
232c001398 net/mlx5e: Add support to neighbour update flow
In order to offload TC encap rules, the driver does a lookup for the IP
tunnel neighbour according to the output device and the destination IP
given by the user.

To keep tracking after the validity state of such neighbours, we keep
the neighbours information (pair of device pointer and destination IP)
in a hash table maintained at the relevant egress representor and
register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update
netevent, we search for a match among the cached neighbours entries used for
encapsulation.

In case the neighbour isn't valid, we can't offload the flow into the
HW. We cache the flow (requested matching and actions) in the driver and
offload the rule later, when the neighbour is resolved and becomes
valid.

When a flow is only cached in the driver and not offloaded into HW
yet, we use EAGAIN return value to mark it internally, the TC ndo still
returns success.

Listen to kernel neighbour update netevents to trace relevant neighbours
validity state:

1. If a neighbour becomes valid, offload the related rules to HW.

2. If the neighbour becomes invalid, remove the related rules from HW.

3. If the neighbour mac address was changed, update the encap header.
   Remove all the offloaded rules using the old encap header from the HW
   and insert new rules to HW with updated encap header.

Access to the neighbors hash table is protected by RTNL lock of its
caller or by the table's spinlock.

Details of the locking/synchronization among the different actions
applied on the neighbour table:

Add/remove operations - protected by RTNL lock of its caller (all TC
commands are protected by RTNL lock). Add and remove operations are
initiated only when the user inserts/removes a TC rule into/from the driver.

Lookup/remove operations - since the lookup operation is done from
netevent notifier block, RTNL lock can't be used (atomic context).
Use the table's spin lock to protect lookups from TC user removal operation.
bh is used since netevent can be called from a softirq context.

Lookup/add operations - The hash table access functions are taking
care of the protection between lookup and add operations.

When adding/removing encap headers and rules to/from the HW, RTNL lock
is used. It can happen when:

1. The user inserts/removes a TC rule into/from the driver (TC commands
are protected by RTNL lock of it's caller).

2. The driver gets neighbour notification event, which reports about
neighbour validity status change. Before adding/removing encap headers
and rules to/from the HW, RTNL lock is taken.

A neighbour hash table entry should be freed when its encap list is empty.
Since The neighbour update netevent notification schedules a neighbour
update work that uses the neighbour hash entry, it can't be freed
unconditionally when the encap list becomes empty during TC delete rule flow.
Use reference count to protect from freeing neighbour hash table entry
while it's still in use.

When the user asks to unregister a netdvice used by one of the neigbours,
neighbour removal notification is received. Then we take a reference on the
neighbour and don't free it until the relevant encap entries (and flows) are
marked as invalid (not offloaded) and removed from HW.
As long as the encap entry is still valid (checked under RTNL lock) we
can safely access the neighbour device saved on mlx5e_neigh struct.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:13 +03:00
Hadar Hen Zion
37b498ff23 net/mlx5e: Add neighbour hash table to the representors
Add hash table to the representors which is to be used by the next patch
to save neighbours information in the driver.

In order to offload IP tunnel encapsulation rules, the driver must find
the tunnel dst neighbour according to the output device and the
destination address given by the user. The next patch will cache the
neighbors information in the driver to allow support in neigh update
flow for tunnel encap rules.

The neighbour entries are also saved in a list so we easily iterate over
them when querying statistics in order to provide 'used' feedback to the
kernel neighbour NUD core.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:12 +03:00
Hadar Hen Zion
033354d501 net/mlx5e: Read neigh parameters with proper locking
The nud_state and hardware address fields are protected by the neighbour
lock, we should acquire it before accessing those parameters.

Use this lock to avoid inconsistency between the neighbour validity state
and it's hardware address.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:10 +03:00
Hadar Hen Zion
0b67a38fe6 net/mlx5e: Use flag to properly monitor a flow rule offloading state
Instead of relaying on the 'flow->rule' pointer value which can be
valid or invalid (in case the FW returns an error while trying to offload
the rule), monitor the rule state using a flag.

In downstream patch which adds support to IP tunneling neigh update
flow, a TC rule could be cached in the driver and not offloaded into the
HW. In this case, the flow handle pointer stays NULL.

Check the offloaded flag to properly deal with rules which are currently
not offloaded when querying rule statistics.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:09 +03:00