When the bnx2x driver is rmmoded, if VFs of a given PF will be assigned
to a VM then that PF will be unable to call `pci_disable_sriov()'.
If for that same PF there would also exist unassigned VFs in the hypervisor,
the result will be that after the removal there will still be virtual PCI
functions on the hypervisor.
If the bnx2x module were to be re-inserted, the result will be that the VFs
on the hypervisor will be re-probed directly following the PF's probe, even
though that in regular loading flow sriov is only enabled once PF is loaded.
The probed VF will then try to access its bar, causing a PCI error as the HW
is not in a state enabling such a request.
This patch adds a missing disablement procedure to the PF's removal, one that
sets registers viewable to the VF to indicate that the VFs have no permission
to access the bar, thus resulting in probe errors instead of PCI errors.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Buffers for FW statistics were allocated at an inappropriate time; In a machine
where the driver encounters problems allocating all of its queues, the driver
would still create FW requests for the statistics of the non-existing queues.
The wrong order of memory allocation could lead to zeroed statistics messages
being sent, leading to fw assert in case function 0 was down.
This changes the order of allocations, guaranteeing that statistic requests will
only be generated for actual queues.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 6ff50cd555 ("tcp: gso: do not generate out of order packets")
had an heuristic that can trigger a warning in skb_try_coalesce(),
because skb->truesize of the gso segments were exactly set to mss.
This breaks the requirement that
skb->truesize >= skb->len + truesizeof(struct sk_buff);
It can trivially be reproduced by :
ifconfig lo mtu 1500
ethtool -K lo tso off
netperf
As the skbs are looped into the TCP networking stack, skb_try_coalesce()
warns us of these skb under-estimating their truesize.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The virtio_net driver's mergeable receive buffer allocator
uses 4KB packet buffers. For MTU-sized traffic, SKB truesize
is > 4KB but only ~1500 bytes of the buffer is used to store
packet data, reducing the effective TCP window size
substantially. This patch addresses the performance concerns
with mergeable receive buffers by allocating MTU-sized packet
buffers using page frag allocators. If more than MAX_SKB_FRAGS
buffers are needed, the SKB frag_list is used.
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code for privacy extentions is very mature, and making it
configurable only gives marginal memory/code savings in exchange
for obfuscation and hard to read code via CPP ifdef'ery.
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSXuCxAAoJEI9vqH3mFV2sdWwP/11xVQFtzoT01k9P4fZeFCMT
+dGztMBY6bODMEyeC9raX6sJhuOYNivS2IBFJ4qv9p+kcCplYE0EsB3MfOQ60NdV
eV2DdnHrPKgIPUutGf7mcZ5bXB8q8HEuEFn0ONsCdNTAxzrcLxaPZU3MeGhSfL0H
d/RPPmSW8+vLEMWUe58EB2J8Z1DJye4O0pbxskjGA64KrLssKBG3LWBQBdTsihkE
r2olYYT0j7osJ7loLbyll/CcacQTfJbQT8Xd0Y7MbLYR/G6EVMwb6PkzkFnUPMIQ
zLwMAOgL8/E250p4ab0Wy8vf9rW+938qR93oZEyO8TRDdF/mYNQb+GK+ZKUrMrxg
eZ/pOhKQ+xUKtt23xrvbXR3nTnI9AfiJ1nhuO4UmX5WIhyJXDcYYk+rujlEduxGm
IJ7p5cRWBdqC+iIiSzQJjyW4drIXU6QJ+ureOGo0vBcBJsLRbUFYLt+08Rb60/d1
zgS8L7DpKRUZlIrbD6/HxyUimJCTGERsvTSDZYCYs1+ri/YKV4a8Ukfsl8bWLbhd
Le5PyMeusWaM6MHCIOJPys2SpsKGl9UCaVr7Nz1Lr6HIqZJUN61rNVSIcUpufiqt
hPSYtljmz6746Z6QvNW0OD8+qE5foplCnv94Oqj9tb24gHfINPXkjiaEKhPDICi0
6cCQj4pCsjjlzXXmn5Fp
=sopZ
-----END PGP SIGNATURE-----
Merge tag 'xtensa-next-20131015' of git://github.com/czankel/xtensa-linux
Pull Xtensa patchset from Chris Zankel:
"The main patch fixes a bug that can cause a kernel panic, and was
introduced in rc1. The other two have been discovered by a uclibc
test and 'coccinelle'"
* tag 'xtensa-next-20131015' of git://github.com/czankel/xtensa-linux:
xtensa: Cocci spatch "noderef"
xtensa: don't use alternate signal stack on threads
xtensa: fix fast_syscall_spill_registers_fixup
This is a set of four patches that revert functionality introduced in the
merge window to sg. The locking changes turned out to introduce this bug:
[ 205.372901] [ BUG: lock held when returning to user space! ]
[...]
[ 205.373285] #0: (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0
The fix is large, so at this late stage we'd like to revert the functionality
and start again in the next merge window.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJSbneXAAoJEDeqqVYsXL0Mr+AIAJRkPeIzyX59Y16nAlxY8tZj
MCYFZ9v9TiVEXm2+AM/Z/Y6jVYG3wgkRzZARyOxeNeRsyWFMsm74WQG5AIopwL5B
liN9tMlcxl3Ofdh9Ww1Os7AaIhm3xmiRTNsNirNxgh5Y3+XH8MMTv5QR97XIt5Ys
arDJUAJKtYJIcgMbYpDdQ7vb8deNsrwC0mQ2vbdox6tMKn8wIigsyZKI/+55imJP
uPZjruhFQxhzMzi5+kR8q/Y2hj+2Lo0pSIEWjO2K/xqLzMI/8IkCwx5gYnNdxONE
Rrj/6VevFd0w3G9EWb8d4HhptyTMFkq4D8XoSxepSFdA1Ti4kVixwXw+PRdEU/I=
=pSy2
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of four patches that revert functionality introduced in
the merge window to sg. The locking changes turned out to introduce
this bug:
[ 205.372901] [ BUG: lock held when returning to user space! ]
[...]
[ 205.373285] #0: (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0
The fix is large, so at this late stage we'd like to revert the
functionality and start again in the next merge window"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] Revert "sg: use rwsem to solve race during exclusive open"
[SCSI] Revert "sg: no need sg_open_exclusive_lock"
[SCSI] Revert "sg: checking sdp->detached isn't protected when open"
[SCSI] Revert "sg: push file descriptor list locking down to per-device locking"
Alexander Aring says:
====================
6lowpan: trivial changes
This patch series includes some trivial changes to prepare the 6lowpan stack
for upcomming patch-series which mainly fix fragmentation according to rfc4944
and udp handling(which is currently broken).
Changes since v3:
- really fix intendation in patch 3/5
Changes since v2:
- change intendation in patch 3/5
- fix typo in 5/5 unecessary -> unnecessary
- add missing 6lowpan tag in cover-letter
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the assignment of skb->dev. We don't need it here because
we use the netdev_alloc_skb_ip_align function which already sets the
skb->dev.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses the netdev_alloc_skb instead dev_alloc_skb function and
drops the seperate assignment to skb->dev.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The err variable can only be zero in this case.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tail position of the event buffer should only be modified after
actually use that event.
If not the event buffer could be invalid before use, and segment fault
occurs when invoking perf top -G.
Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Cc: David Ahern <dsahern@gmail.com>
Cc: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Link: http://lkml.kernel.org/r/1382600613-32177-1-git-send-email-zhouzhouyi@gmail.com
[ Simplified the logic using exit gotos and renamed write_tail method to mmap_consume ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Splitting -G and --call-graph for record command, so we could use '-G'
with no option.
The '-G' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.
It will be possible to configure unwind method via config file in
upcoming patches.
All current '-G' arguments is overtaken by --call-graph option.
NOTE: The documentation for top --call-graph option
was wrongly copied from report command.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382797536-32303-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Splitting -g and --call-graph for record command, so we could use '-g'
with no option.
The '-g' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.
It will be possible to configure unwind method via config file in
upcoming patches.
All current '-g' arguments is overtaken by --call-graph option.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382797536-32303-2-git-send-email-jolsa@redhat.com
[ reordered -g/--call-graph on --help and expanded the man page
according to comments by David Ahern and Namhyung Kim ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Following commit tightened up the buffer size for output to strict width
of used format columns:
99cf666 perf hists: Fix formatting of long symbol names
This works fine until you hit color overhead output which places extra
bytes into output buffer. We need to account for color overhead in the
output buffer. Adding maximum color byte size to the output buffer size.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382700293-1803-1-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Intel D410PT(LW) and D425KT Mini-ITX desktop boards both show up as
having LVDS but the hardware is not populated. This patch adds them to
the list of such systems. Patch is against 3.11.4
v2: Patch revised to match the D425KT exactly as the D425KTW does have
LVDS. According to Intel's documentation, the D410PTL and D410PLTW
don't.
Signed-off-by: Rob Pearce <rob@flitspace.org.uk>
Cc: stable@vger.kernel.org
[danvet: Pimp commit message to my liking and add cc: stable.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This isn't a real fix to the problem, but rather a stopgap measure while
trying to find a proper solution.
There are several laptops out there that fail to light up the eDP panel
in UEFI boot mode. They seem to be mostly IVB machines, including but
apparently not limited to Dell XPS 13, Asus TX300, Asus UX31A, Asus
UX32VD, Acer Aspire S7. They seem to work in CSM or legacy boot.
The difference between UEFI and CSM is that the BIOS provides a
different VBT to the kernel. The UEFI VBT typically specifies 18 bpp and
1.62 GHz link for eDP, while CSM VBT has 24 bpp and 2.7 GHz link. We end
up clamping to 18 bpp in UEFI mode, which we can fit in the 1.62 Ghz
link, and for reasons yet unknown fail to light up the panel.
Dithering from 24 to 18 bpp itself seems to work; if we use 18 bpp with
2.7 GHz link, the eDP panel lights up. So essentially this is a link
speed issue, and *not* a bpp clamping issue.
The bug raised its head since
commit 657445fe86
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Sat May 4 10:09:18 2013 +0200
Revert "drm/i915: revert eDP bpp clamping code changes"
which started clamping bpp *before* computing the link requirements, and
thus affecting the required bandwidth. Clamping after the computations
kept the link at 2.7 GHz.
Even though the BIOS tells us to use 18 bpp through the VBT, it happily
boots up at 24 bpp and 2.7 GHz itself! Use this information to
selectively ignore the VBT provided value.
We can't ignore the VBT eDP bpp altogether, as there are other laptops
that do require the clamping to be used due to EDID reporting higher bpp
than the panel can support.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=59841
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67950
Tested-by: Ulf Winkelvos <ulf@winkelvos.de>
Tested-by: jkp <jkp@iki.fi>
CC: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Call intel_ddi_get_config() to get the pipe_bpp settings from
DDI.
The sync polarity settings from DDI are irrelevant for CRT
output, so override them with data from the ADPA register.
Note: This is already merged in drm-intel-next-queued as
commit 6801c18c0a
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Tue Sep 24 14:24:05 2013 +0300
drm/i915: Add HSW CRT output readout support
but is required for the following edp bpp bugfix.
v2: Extract intel_crt_get_flags()
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69691
Tested-by: Qingshuai Tian <qingshuai.tian@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... instead of NULL dereferences.
Spotted by coverity CID 402004.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
... due to a copy & paste error.
Spotted by coverity CID 710923.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Cc: stable@vger.kernel.org
. Fix up /proc/PID/maps parsing, where perfectly fine mmap entries
were being trown away when synthesizing PERF_RECORD_MMAP for
preexisting threads, prevenging symbol resolution to work
for those threads, broken in the MMAP2 removal. Reported and
pinpointed by Markus Trippelsdorf,
. Fix mem leak in the python 'perf script' backend, due to missing Py_DECREFs
on dict entries, fix from Joseph Schuchart.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSbnfHAAoJENZQFvNTUqpA66kQAJUcIBvn72kmZXmw5wf0eroL
xBsgInu9rUoP1BVlaG2HVY9Sx8z9qi0bJM01UxLPypAfJD0yho4iKsnuW0u8GtMi
Ha+F7DQsWA0aspuRUVA/39ylwV+mcuu0isNdMm85xWxYB7XK6n7fyClbhqkLP+Y1
YPRsEN6H+/ZVveBeZkmh2+c/KQgIBYOQcZ2EtpCOv8Je3sgqNrXoPKmrgORvOVN5
hQg6M+VdcXtcnJ/MxlmKxUxHGTwAWRkl0GNcmYhWfULsogCX948zAAfju7+jHMbj
RbciiaZHLa1AtArEa7OU3F0HAFqPm16T3IH848vNPoaj7Y/t349f3mTt2aloKGxb
qt8uKSDgyZ9DeHLHMqcATh5GqV7UdmntXf8nSYFhwHfcqseAlw3DNsnU+aLx1Rrx
iFyq7K9ADcBLYUdFn0LPZiwqpvDNutC1OZZQgtqxQfc/WYIlmBSlAyXr/jCa8kD7
HrIeVGfI23bVKnoQjTgMMlR3lkNseiiCpPWtYFXk7YFPPVFTIGHb0ex20U83Ipbf
G4tL5CEG19FPzpC/C6NDcq9sthaDnzV2DS6hU+IUxdSgUUXaHOu4lVRUjC0ZPQ/M
apnWtrf38dLG72oFPWrY9fe+tGs7/BMYub2FPfqwdWHBa04J2LcFtKGxSZFTDaWN
eziDrrvFGOguKOh+KcHV
=/kKq
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
* Fix up /proc/PID/maps parsing, where perfectly fine mmap entries
were being trown away when synthesizing PERF_RECORD_MMAP for
preexisting threads, prevenging symbol resolution to work
for those threads, broken in the MMAP2 removal. Reported and
pinpointed by Markus Trippelsdorf,
* Fix mem leak in the python 'perf script' backend, due to missing Py_DECREFs
on dict entries, fix from Joseph Schuchart.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When introducing support for MMAP2 we considered more parts of each map
representation in /proc/PID/maps, and when disabling it we forgot to
reduce the number of expected parsed/assigned entries in the sscanf
call, fix it to expect the right number of desired fields, 5.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Based-on-a-patch-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-vrbo1wik997ahjzl1chm3bdm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On CTG+ read out the pipe bpp setting from hardware and fill it into
pipe config. Also check it appropriately.
v2: Don't do the pipe_bpp extraction inside the PCH only code block on
ILK+.
Avoid the PIPECONF read as we already have read it for the
PIPECONF_EANBLE check.
Note: This is already in drm-intel-next-queued as
commit 42571aefaf
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Fri Sep 6 23:29:00 2013 +0300
drm/i915: Add support for pipe_bpp readout
but is needed for the following bugfix.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the removal of the routing cache, we lost the
option to tweak the garbage collector threshold
along with the maximum routing cache size. So git
commit 703fb94ec ("xfrm: Fix the gc threshold value
for ipv4") moved back to a static threshold.
It turned out that the current threshold before we
start garbage collecting is much to small for some
workloads, so increase it from 1024 to 32768. This
means that we start the garbage collector if we have
more than 32768 dst entries in the system and refuse
new allocations if we are above 65536.
Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Two if statements do the same work, we can merge them to
one. And fix some typos. There is just code simplification,
no functional changes.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kmem_cache_zalloc had set the allocated memory to zero. I think no need
to initialize with 0. And move the comments to the function begin.
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix some typos
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While investigating on a recent vxlan regression, I found veth
was using a zero features set for vxlan tunnels.
We have to segment GSO frames, copy the payload, and do the checksum.
This patch brings a ~200% performance increase
We probably have to add hw_enc_features support
on other virtual devices.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei reported a performance regression on vxlan, caused
by commit 3347c96029 "ipv4: gso: make inet_gso_segment() stackable"
GSO vxlan packets were not properly segmented, adding IP fragments
while they were not expected.
Rename 'bool tunnel' to 'bool encap', and add a new boolean
to express the fact that UDP should be fragmented.
This fragmentation is triggered by skb->encapsulation being set.
Remove a "skb->encapsulation = 1" added in above commit,
as its not needed, as frags inherit skb->frag from original
GSO skb.
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a socket is freed/reallocated, we need to clear time_next_packet
or else we can inherit a prior value and delay first packets of the
new flow.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 4d961a101e, reversing
changes made to a00f6fcc7d.
Revert bond locking changes, they cause regressions and Veaceslav Falico
doesn't like how the commit messages were done at all.
Signed-off-by: David S. Miller <davem@davemloft.net>
Includes:
- ndo_busy_poll implementation
- Locking between napi and busy_poll
- Fix rx_post_starvation (replenish rx-queues in out-of-mememory scenario)
logic to accomodate busy_poll.
v2 changes:
[Eric D.'s comment] call alloc_pages() with GFP_ATOMIC even in ndo_busy_poll
context as it is not allowed to sleep.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various typo fixes to netdev-FAQ.txt:
- capitalize Linux
- hyphenate dual-word adjectives
- minor punctuation fixes
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch ed08495c3 "tcp: use RTT from SACK for RTO" always re-arms RTO upon
obtaining a RTT sample from newly sacked data.
But technically RTO should only be re-armed when the data sent before
the last (re)transmission of write queue head are (s)acked. Otherwise
the RTO may continue to extend during loss recovery on data sent
in the future.
Note that RTTs from ACK or timestamps do not have this problem, as the RTT
source must be from data sent before.
The new RTO re-arm policy is
1) Always re-arm RTO if SND.UNA is advanced
2) Re-arm RTO if sack RTT is available, provided the sacked data was
sent before the last time write_queue_head was sent.
Signed-off-by: Larry Brakmo <brakmo@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch ed08495c3 "tcp: use RTT from SACK for RTO" has a bug that
it does not check if the ACK acknowledge new data before taking
the RTT sample from TCP timestamps. This patch adds the check
back as required by the RFC.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tp->lsndtime may not always be the SYNACK timestamp if a passive
Fast Open socket sends data before handshake completes. And if the
remote acknowledges both the data and the SYNACK, the RTT sample
is already taken in tcp_ack(), so no need to call
tcp_update_ack_rtt() in tcp_synack_rtt_meas() aagain.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On very old FW versions < 4.0, the mailbox command to set interrupts
on the card succeeds even though it is not supported and should have
failed, leading to a scenario where interrupts do not work.
Hence warn users to upgrade to a suitable FW version to avoid seeing
broken functionality.
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ding Tianhong says:
====================
bonding: patchset for rcu use in bonding
The slave list will add and del by bond_master_upper_dev_link() and
bond_upper_dev_unlink(), which will call call_netdevice_notifiers(),
even it is safe to call it in write bond lock now, but we can't sure
that whether it is safe later, because other drivers may deal
NETDEV_CHANGEUPPER in sleep way, so I didn't admit move the
bond_upper_dev_unlink() in write bond lock.
now the bond_for_each_slave only protect by rtnl_lock(), maybe use
bond_for_each_slave_rcu is a good way to protect slave list for bond,
but as a system slow path, it is no need to transform
bond_for_each_slave() to bond_for_each_slave_rcu() in slow path, so in
the patchset, I will remove the unused read bond lock for monitor
function, maybe it is a better way, I will wait to accept any relay
for it.
Thanks for the Veaceslav Falico opinion.
v2: add and modify commit for patchset and patch, it will be the first
step for the whole patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and add the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull parisc fix from Helge Deller:
"This is a 2-line patch to save the CPU register which holds our task
thread info pointer before calling a firmware function and then to
restore it again afterwards.
This is necessary because on some 64bit machines the high-order 32bits
are being clobbered by the firmware call, and thus we failed to bring
up secondary CPUs (and instead crashed the kernel) in some situations
eg if we had more than 4GB RAM. This patch fixes a bug which has been
since ever in the parisc linux kernel and which prevented some people
to use a 64bit kernel"
* 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM
Pull timer fix from Ingo Molnar:
"This tree contains a clockevents regression fix for certain ARM
subarchitectures"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents: Sanitize ticks to nsec conversion
Pull perf fixes from Ingo Molnar:
"The tree contains three fixes:
- Two tooling fixes
- Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
this merge window. (Patches were proposed to fix it but it was all
a bit late and we felt it's safer to just delay the ABI one more
kernel release and do it right)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Disable PERF_RECORD_MMAP2 support
perf scripting perl: Fix build error on Fedora 12
perf probe: Fix to initialize fname always before use it
Pull locking fix from Ingo Molnar:
"This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
kernels built with GCC 3.x (there are still such distros)"
Side note: it's not just a fix for old gcc versions, it's also removing
an incredibly broken/subtle check that LLVM had issues with, and that
made no sense.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mutex: Avoid gcc version dependent __builtin_constant_p() usage