This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.
net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...
When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup. This will signal to userspace that the route will not be
selected. The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down. This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected. The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.
With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15
cache
While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down. Now interface p8p1 is linked-up:
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache <local>
80.0.0.2 dev p8p1 src 80.0.0.1
cache
and the output changes to what one would expect.
If the sysctl is not set, the following output would be expected when
p8p1 is down:
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.
v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set. Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.
v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.
v4: Drop binary sysctl
v5: Whitespace fixups from Dave
v6: Style changes from Dave and checkpatch suggestions
v7: One more checkpatch fixup
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off. No action is taken,
but additional flags are passed to userspace to indicate carrier status.
This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.
v2: Split out kernel functionality into 2 patches, this patch simply
sets and clears new nexthop flag RTNH_F_LINKDOWN.
v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.
v5: Whitespace and variable declaration fixups suggested by Dave.
v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
all.
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS
attributes, but a driver doesn't need to implement this in order to get
bridge link information.
So error out only on errors different than -EOPNOTSUPP.
(This is a follow-up patch for 7d4f8d8.)
Fixes: 8793d0a664 ("switchdev: add new switchdev_port_bridge_getlink")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug pops up with NetworkManager on Fedora 21. NetworkManager tends to
stop the interface (nicvf_stop() is called) before changing settings. In
stopped state MAC cannot be sent to a PF. However, when the interface is
restarted (nicvf_open() is called), we ping the PF using NIC_MBOX_MSG_READY
message, and the PF replies back with old MAC address, overriding what we
had after MAC setting from userspace. As a result, we cannot set MAC
address using NetworkManager.
This patch introduces special tracking of MAC change in stopped state so
that the correct new MAC address is sent to a PF when interface is reopen.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz says:
====================
Mellanox NIC drivers update, June 23 2015
This series has two fixes from Eran to his recent SRIOV counters work in
mlx4 and few more updates from Saeed and Achiad to the mlx5 Ethernet
code. All fixes here relate to net-next code, so no need for -stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Prefetch the 1st cache line used by the buffer pointed by
the skb linear data.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate between mlx5e_get_cqe() and mlx5_cqwq_pop(), this helps for
better code readability and better CQ buffer management.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use container_of() instead.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coding Style fix, remove extra spaces.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to save PCI BW consumed by TX CQEs and to reduce the amount of
CPU cache misses caused by TX CQE reading, we request TX CQE generation
only when skb->xmit_more=0.
As a consequence of the above, a single TX CQE may now indicate the
transmission completion of multiple TX SKBs.
This also handles a problem introduced in commit b1b8105ebf41 "net/mlx5e:
Support NETIF_F_SG" where we didn't ask for NOP completions while the
driver didn't have the proper code to handle this case.
Fixes: b1b8105ebf41 ('net/mlx5e: Support NETIF_F_SG')
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NOP completion SKBs are always NULL.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is already assigned at mlx5e_build_rq_param()
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of counting number of gso fragments, we can use
skb_shinfo(skb)->gso_segs.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To save per-packet calculations, we use the following static mappings:
1) priv {channel, tc} to netdev txq (used @mlx5e_selec_queue())
2) netdev txq to priv sq (used @mlx5e_xmit())
Thanks to these static mappings, no more need for a separate implementation
of ndo_start_xmit when multiple TCs are configured.
We believe the performance improvement of such separation would be negligible, if any.
The previous way of dynamically calculating the above mappings required
allocating more TX queues than actually used (@alloc_etherdev_mqs()),
which is now no longer needed.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under SRIOV, the port rx/tx bytes/packets statistics should by read
from the HW instead of using the PF netdevice SW accounting. This is
needed in order to get the full port statistics and not just the PF
own ones
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NUM_ALL_STATS was not updated with the new four entries, instead
NUM_FLOW_STATS was updated, fix it. that caused off-by-four for all
counters below pf_*_*.
Fixes: b42de4d012 ('net/mlx4_en: Show PF own statistics via ethtool')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suman Tripathi says:
====================
drivers: net: xgene: Fix the ACPI support for RGMII/SGMII0/XFI ethernet interfaces of APM X-Gene SoC.
====================
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patches fixes the code to check for IS_ERR rather
than NULL for clock interface.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ACPI support for SGMII0 and XFI1 interface of
2nd H/W version of APM X-Gene SoC ethernet controller.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the backward compatibility with the old firmware where
the Tx completion IRQ interrupt was absent whereas incase of new firmware
the Tx completion IRQ interrupt is present.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements couple of fixes to support ACPI for RGMII/SGMII0/XFI
interface of APM X-Gene SoC ethernet controller driver. This patch uses
the _SUN acpi object to fetch the port-id information whereas the FDT uses
port-id binding for port-id information.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simple LAN device for debug or management purposes.
Device supports interrupts for RX and TX(completion).
Device does not have DMA ability.
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Tal Zilcer <talz@ezchip.com>
Acked-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Below case causes mii bus probe failed:
ifconfig eth0 down -> suspend/resume with Mega/fax mix off -> ifconfig eth0 up
In i.MX6SX/i.MX7D chip, Mega/fast mix off feature is supported that means most of
SOC power will be off including ENET MAC for power saving. Once ENET MAC power
off, all initialized MAC registers reset to default, so in the case, it must
init MAC prior to mii bus probe.
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While IEEE and CEE use the same structure to store apps, the selector
and priority fields for both are different. Only the priority field is
explained, add documentation explaining how the selector field differs
for both.
cgdcbxd code shows an example of how selector fields differ.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This particular BUG_ON condition was checking for attr set err in the
COMMIT phase, which isn't expected (it's a driver bug if PREPARE phase is
OK but COMMIT fails). But BUG_ON() is too strong for this case, so change
to WARN(). BUG_ON() would be warranted if the system was corrupted beyond
repair, but this is not the case here.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman says:
====================
switchdev; add VLAN support for port's bridge_getlink
One more missing piece of the puzzle. Add vlan dump support to switchdev
port's bridge_getlink. iproute2 "bridge vlan show" cmd already knows how
to show the vlans installed on the bridge and the device , but (until now)
no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
msg. Before this patch, "bridge vlan show":
$ bridge -c vlan show
port vlan ids
sw1p1 30-34 << bridge side vlans
57
sw1p1 << device side vlans (missing)
sw1p2 57
sw1p2
sw1p3
sw1p4
br0 None
(When the port is bridged, the output repeats the vlan list for the vlans
on the bridge side of the port and the vlans on the device side of the
port. The listing above show no vlans for the device side even though they
are installed).
After this patch:
$ bridge -c vlan show
port vlan ids
sw1p1 30-34 << bridge side vlan
57
sw1p1 30-34 << device side vlans
57
3840 PVID
sw1p2 57
sw1p2 57
3840 PVID
sw1p3 3842 PVID
sw1p4 3843 PVID
br0 None
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
One more missing piece of the puzzle. Add vlan dump support to switchdev
port's bridge_getlink. iproute2 "bridge vlan show" cmd already knows how
to show the vlans installed on the bridge and the device , but (until now)
no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
msg. Before this patch, "bridge vlan show":
$ bridge -c vlan show
port vlan ids
sw1p1 30-34 << bridge side vlans
57
sw1p1 << device side vlans (missing)
sw1p2 57
sw1p2
sw1p3
sw1p4
br0 None
(When the port is bridged, the output repeats the vlan list for the vlans
on the bridge side of the port and the vlans on the device side of the
port. The listing above show no vlans for the device side even though they
are installed).
After this patch:
$ bridge -c vlan show
port vlan ids
sw1p1 30-34 << bridge side vlan
57
sw1p1 30-34 << device side vlans
57
3840 PVID
sw1p2 57
sw1p2 57
3840 PVID
sw1p3 3842 PVID
sw1p4 3843 PVID
br0 None
I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func.
switchdev support adds an obj dump for VLAN objects, using the same
call-back scheme as FDB dump. Support included for both compressed and
un-compressed vlan dumps.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use vid_begin/end to be consistent with BRIDGE_VLAN_INFO_RANGE_BEGIN/END.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove handling of tx_ring in prb_setup_retire_blk_timer
for TPACKET_V3 because init_prb_bdqc is called only for zero tx_ring
and thus prb_setup_retire_blk_timer for zero tx_ring only.
And also in functon init_prb_bdqc there is no usage of tx_ring.
Thus removing tx_ring from init_prb_bdqc.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Suggested-by: Frans Klaver <fransklaver@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This howto made sense in the 1990s when users had to manually configure
ISA cards with jumpers or vendor utilities, but with the implementation
of PCI it became increasingly less and less relevant, to the point where
it has been well over a decade since I last updated it. And there is
no value in anyone else taking over updating it either.
However the references to it continue to spread as boiler plate text
from one Kconfig file into the next. We are not doing end users any
favours by pointing them at this old document, so lets kill it with
fire, once and for all, to hopefully stop any further spread.
No code is changed in this commit, just Kconfig help text.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiko Stuebner says:
====================
net: stmmac: dwmac-rk: add support for rk3368
Apart from small cleanups, this series provides support for the dwmac
on the new rk3368 ARM64 soc.
Tested on a R88 board using a RMII phy.
Changes since v1:
- Adapt to changes resulting from patch d42202dce0 ("net: stmmac:
dwmac-rk: Don't add function name in info or err messages")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add constants and callback functions for the dwmac on rk3368 socs.
As can be seen, the base structure is the same, only registers and
the bits in them moved slightly.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mac settings like RGMII/RMII, speeds etc are done in the so called
"General Register Files", contain numerous other settings as well and
always seem to change between Rockchip SoCs. Therefore abstract the
register accesses into a per-soc ops struct to make this reusable on
other Rockchip SoCs.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first iteration of the dwmac-rk support did access an intermediate
clock directly below the pll selector. This was removed in a subsequent
revision, but the clock and one invocation remained. This results in
the driver trying to set the rate of a non-existent clock when the soc
and not some external source provides the phy clock for RMII phys.
So set the rate of the correct clock and remove the remaining now
completely unused definition.
Fixes: 436f5ae08f9d ("GMAC: add driver for Rockchip RK3288 SoCs integrated GMAC")
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a first version the driver did want to do some gpio wiggling, which
of course never made it into the kernel, but somehow these register
defines where forgotten. Remove them, as they shouldn't be here.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zero the statistics counters when setting up the global
registers. Otherwise the counters will remain from the last boot if
the power has not been removed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the contents of the scratch registers to be shown in debugfs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device map is used to route packets between cascaded switches.
Add dumping a switches device map via debugfs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the contents of the statistics counters to be shown in debugfs.
This is particularly useful for the cpu and dsa ports, which cannot be
seen using ethtools -S.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the code to retrieve a statistics counter into a function of its
own, so it can later be reused.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dump the Address Translation Unit via a file in debugfs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the contents of the registers to be shown in debugfs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the driver understand adapter version 2.
Cc: Rachel Lunnon <rachel_lunnon@stormagic.com>
Signed-off-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If rcd length was zero, the page used for frag was not being released. It
was being replaced with a newly allocated page. This change takes care
of that memory leak.
Signed-off-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement a handler for pci shutdown so that the driver has an
opportunity to make sure that device is quiesced before the PCI
switches to legacy IRQs. This way the possibility of
"screaming interrupt" is avoided.
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currenlty nf_tables chains added in one network namespace are being
run in all network namespace. The issues are myriad with the simplest
being an unprivileged user can cause any network packets to be dropped.
Address this by simply not running nf_tables chains in the wrong
network namespace.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Macvtap should be compatible with tuntap for
maximum number of queues.
commit 'baf71c5c1f80d82e92924050a60b5baaf97e3094 (tuntap:
Increase the number of queues in tun.)' removes
the limitations and increases number of queues in tuntap.
Now, Its safe to increase number of queues in Macvtap as well.
This patch also modifies 'macvtap_del_queues' function
to avoid extra memory allocation in stack.
Changes from v1->v2 :
Michael S. Tsirkin, Jason Wang :
Better way to use linked list to
avoid use of extra memory in stack.
Sergei Shtylyov : Specify dependent commit's summary.
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>