Commit graph

77774 commits

Author SHA1 Message Date
Ard Biesheuvel
44511fb9e5 efi: Use correct type for struct efi_memory_map::phys_map
We have been getting away with using a void* for the physical
address of the UEFI memory map, since, even on 32-bit platforms
with 64-bit physical addresses, no truncation takes place if the
memory map has been allocated by the firmware (which only uses
1:1 virtually addressable memory), which is usually the case.

However, commit:

  0f96a99dab ("efi: Add "efi_fake_mem" boot option")

adds code that clones and modifies the UEFI memory map, and the
clone may live above 4 GB on 32-bit platforms.

This means our use of void* for struct efi_memory_map::phys_map has
graduated from 'incorrect but working' to 'incorrect and
broken', and we need to fix it.

So redefine struct efi_memory_map::phys_map as phys_addr_t, and
get rid of a bunch of casts that are now unneeded.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: izumi.taku@jp.fujitsu.com
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: linux-efi@vger.kernel.org
Cc: matt.fleming@intel.com
Link: http://lkml.kernel.org/r/1445593697-1342-1-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-28 12:28:06 +01:00
Ingo Molnar
790a2ee242 * Make the EFI System Resource Table (ESRT) driver explicitly
non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway - Paul Gortmaker
 
  * Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64 - Leif Lindholm
 
  * Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses - Matt Fleming
 
  * Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel
 
  * Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module - Ben Hutchings
 
  * Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support - Taku Izumi
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm
 +OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm
 H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN
 2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R
 ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM
 0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce
 zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07
 pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF
 v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo
 dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF
 v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G
 8xb/rET4JrhCG4gFMUZ7
 =1Oee
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi

Pull v4.4 EFI updates from Matt Fleming:

  - Make the EFI System Resource Table (ESRT) driver explicitly
    non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway. (Paul Gortmaker)

  - Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64. (Leif Lindholm)

  - Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses. (Matt Fleming)

  - Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel

  - Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module. (Ben Hutchings)

  - Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support. (Taku Izumi)

Note: there is a semantic conflict between the following two commits:

  8a53554e12 ("x86/efi: Fix multiple GOP device support")
  ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")

I fixed up the interaction in the merge commit, changing the type of
current_fb_base from u32 to u64.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:51:34 +02:00
Ingo Molnar
c7d77a7980 Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:05:18 +02:00
Taku Izumi
0f96a99dab efi: Add "efi_fake_mem" boot option
This patch introduces new boot option named "efi_fake_mem".
By specifying this parameter, you can add arbitrary attribute
to specific memory range.
This is useful for debugging of Address Range Mirroring feature.

For example, if "efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000"
is specified, the original (firmware provided) EFI memmap will be
updated so that the specified memory regions have
EFI_MEMORY_MORE_RELIABLE attribute (0x10000):

 <original>
   efi: mem36: [Conventional Memory|  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000100000000-0x00000020a0000000) (129536MB)

 <updated>
   efi: mem36: [Conventional Memory|  |MR|  |  |  |   |WB|WT|WC|UC] range=[0x0000000100000000-0x0000000180000000) (2048MB)
   efi: mem37: [Conventional Memory|  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000180000000-0x00000010a0000000) (61952MB)
   efi: mem38: [Conventional Memory|  |MR|  |  |  |   |WB|WT|WC|UC] range=[0x00000010a0000000-0x0000001120000000) (2048MB)
   efi: mem39: [Conventional Memory|  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000001120000000-0x00000020a0000000) (63488MB)

And you will find that the following message is output:

   efi: Memory: 4096M/131455M mirrored memory

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:09 +01:00
Ard Biesheuvel
a104171334 efi: Introduce EFI_NX_PE_DATA bit and set it from properties table
UEFI v2.5 introduces a runtime memory protection feature that splits
PE/COFF runtime images into separate code and data regions. Since this
may require special handling by the OS, allocate a EFI_xxx bit to
keep track of whether this feature is currently active or not.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:07 +01:00
Ard Biesheuvel
bf924863c9 efi: Add support for UEFIv2.5 Properties table
Version 2.5 of the UEFI spec introduces a new configuration table
called the 'EFI Properties table'. Currently, it is only used to
convey whether the Memory Protection feature is enabled, which splits
PE/COFF images into separate code and data memory regions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:07 +01:00
Matt Fleming
ae2ee627dc efifb: Add support for 64-bit frame buffer addresses
The EFI Graphics Output Protocol uses 64-bit frame buffer addresses
but these get truncated to 32-bit by the EFI boot stub when storing
the address in the 'lfb_base' field of 'struct screen_info'.

Add a 'ext_lfb_base' field for the upper 32-bits of the frame buffer
address and set VIDEO_TYPE_CAPABILITY_64BIT_BASE when the field is
useable.

It turns out that the reason no one has required this support so far
is that there's actually code in tianocore to "downgrade" PCI
resources that have option ROMs and 64-bit BARS from 64-bit to 32-bit
to cope with legacy option ROMs that can't handle 64-bit addresses.
The upshot is that basically all GOP devices in the wild use a 32-bit
frame buffer address.

Still, it is possible to build firmware that uses a full 64-bit GOP
frame buffer address. Chad did, which led to him reporting this issue.

Add support in anticipation of GOP devices using 64-bit addresses more
widely, and so that efifb works out of the box when that happens.

Reported-by: Chad Page <chad.page@znyx.com>
Cc: Pete Hawkins <pete.hawkins@znyx.com>
Acked-by: Peter Jones <pjones@redhat.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:06 +01:00
Leif Lindholm
7968c0e338 efi/arm64: Clean up efi_get_fdt_params() interface
As we now have a common debug infrastructure between core and arm64 efi,
drop the bit of the interface passing verbose output flags around.

Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:06 +01:00
e3d6e0e701 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "Three trivial commits:

   - Fix a kerneldoc regression

   - Export handle_bad_irq to unbreak a driver in next

   - Add an accessor for the of_node field so refactoring in next does
     not depend on merge ordering"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqdomain: Add an accessor for the of_node field
  genirq: Fix handle_bad_irq kerneldoc comment
  genirq: Export handle_bad_irq
2015-10-11 10:16:59 -07:00
4a06c8ac2f USB fixes for 4.3-rc5
Here are some small USB and PHY fixes and quirk updates for 4.3-rc5.
 Nothing major here, full details in the shortlog, and all of these have
 been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlYZOFUACgkQMUfUDdst+ymLSACeLNl7IWSxq2acJ5rhUl5+LRxp
 KtsAn3lMXJryk4xw2WpfJg30TXpWXnNM
 =n9ei
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB and PHY fixes and quirk updates for 4.3-rc5.

  Nothing major here, full details in the shortlog, and all of these
  have been in linux-next for a while"

* tag 'usb-4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: Add device quirk for Logitech PTZ cameras
  USB: chaoskey read offset bug
  USB: Add reset-resume quirk for two Plantronics usb headphones.
  usb: renesas_usbhs: Add support for R-Car H3
  usb: renesas_usbhs: fix build warning if 64-bit architecture
  usb: gadget: bdc: fix memory leak
  phy: berlin-sata: Fix module autoload for OF platform driver
  phy: rockchip-usb: power down phy when rockchip phy probe
  phy: qcom-ufs: fix build error when the component is built as a module
2015-10-10 11:17:45 -07:00
Marc Zyngier
10abc7df92 irqdomain: Add an accessor for the of_node field
As we're about to remove the of_node field from the irqdomain
structure, introduce an accessor for it. Subsequent patches
will take care of the actual repainting.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1444402211-1141-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09 17:17:30 +02:00
f6702681a0 xen: bug fixes for 4.3-rc4
- Fix VM save performance regression with x86 PV guests.
 - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI).
 - Other minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWE8wQAAoJEFxbo/MsZsTRVTMH/0eqSg2M78wv4sBl234Y3FE9
 AN8KFUdlkK7VN9v0uuLMDSKIWNUuFJIvo/2rElWGRiX2Q+/pfnQg3ZSFhub9S8uL
 T4LCvmG9viRFb2oUz792ewqncSw3X98Jpto4smA820gJRjndBSWm5HUKUtPAkv1M
 l5DFMEgOeHbu+wCbKD/ZPEt5K9GsIaNviSNoWtYHirZwrd00oLmNbWp+g8lIGQiT
 3vLW0SaZzjL6akKxihb/p3WZ9eNmyz8yk0V7dItUEVUB9qoaDDLJ5qIRSHHWTWQD
 Jza/GE32VallZLuEXGG5/D86MsnyVYHC+lZtwo2IptOGm8v7WuZRv094wI1ev5c=
 =aiDw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - Fix VM save performance regression with x86 PV guests

 - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI)

 - Other minor fixes.

* tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen/p2m: hint at the last populated P2M entry
  x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map
  x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
  xen/x86: Don't try to write syscall-related MSRs for PV guests
  xen: use correct type for HYPERVISOR_memory_op()
2015-10-06 15:05:02 +01:00
30c44659f4 Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.

Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.

The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.

strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result.  To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.

strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string.  Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated.  It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.

strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG.  It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.

So why did I waffle about this for so long?

Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.

And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.

So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches.  Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.

* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: use global strscpy() rather than private copy
  string: provide strscpy()
  Make asm/word-at-a-time.h available on all architectures
2015-10-04 16:31:13 +01:00
14f97d9713 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Bunch of fixes all over the place, all pretty small: amdgpu, i915,
  exynos, one qxl and one vmwgfx.

  There is also a bunch of mst fixes, I left some cleanups in the series
  as I didn't think it was worth splitting up the tested series"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
  drm/dp/mst: add some defines for logical/physical ports
  drm/dp/mst: drop cancel work sync in the mstb destroy path (v2)
  drm/dp/mst: split connector registration into two parts (v2)
  drm/dp/mst: update the link_address_sent before sending the link address (v3)
  drm/dp/mst: fixup handling hotplug on port removal.
  drm/dp/mst: don't pass port into the path builder function
  drm/radeon: drop radeon_fb_helper_set_par
  drm: handle cursor_set2 in restore_fbdev_mode
  drm/exynos: Staticize local function in exynos_drm_gem.c
  drm/exynos: fimd: actually disable dp clock
  drm/exynos: dp: remove suspend/resume functions
  drm/qxl: recreate the primary surface when the bo is not primary
  drm/amdgpu: only print meaningful VM faults
  drm/amdgpu/cgs: remove import_gpu_mem
  drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2
  drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
  drm/vmwgfx: Fix a command submission hang regression
  drm/exynos: remove unused mode_fixup() code
  drm/exynos: remove decon_mode_fixup()
  drm/exynos: remove fimd_mode_fixup()
  ...
2015-10-03 10:39:31 -04:00
27728bf04b Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Another week, another round of fixes.

  These have been brewing for a bit and in various iterations, but I
  feel pretty comfortable about the quality of them.  They fix real
  issues.  The pull request is mostly blk-mq related, and the only one
  not fixing a real bug, is the tag iterator abstraction from Christoph.
  But it's pretty trivial, and we'll need it for another fix soon.

  Apart from the blk-mq fixes, there's an NVMe affinity fix from Keith,
  and a single fix for xen-blkback from Roger fixing failure to free
  requests on disconnect"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: factor out a helper to iterate all tags for a request_queue
  blk-mq: fix racy updates of rq->errors
  blk-mq: fix deadlock when reading cpu_list
  blk-mq: avoid inserting requests before establishing new mapping
  blk-mq: fix q->mq_usage_counter access race
  blk-mq: Fix use after of free q->mq_map
  blk-mq: fix sysfs registration/unregistration race
  blk-mq: avoid setting hctx->tags->cpumask before allocation
  NVMe: Set affinity after allocating request queues
  xen/blkback: free requests on disconnection
2015-10-02 14:40:57 -04:00
8c25ab8b5a Merge git://git.infradead.org/intel-iommu
Pull IOVA fixes from David Woodhouse:
 "The main fix here is the first one, fixing the over-allocation of
   size-aligned requests.  The other patches simply make the existing
  IOVA code available to users other than the Intel VT-d driver, with no
  functional change.

  I concede the latter really *should* have been submitted during the
  merge window, but since it's basically risk-free and people are
  waiting to build on top of it and it's my fault I didn't get it in, I
  (and they) would be grateful if you'd take it"

* git://git.infradead.org/intel-iommu:
  iommu: Make the iova library a module
  iommu: iova: Export symbols
  iommu: iova: Move iova cache management to the iova library
  iommu/iova: Avoid over-allocating when size-aligned
2015-10-02 07:59:29 -04:00
Dave Airlie
ccf03d6995 drm/dp/mst: add some defines for logical/physical ports
This just removes the magic number.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-02 15:34:42 +10:00
Dave Airlie
d9515c5ec1 drm/dp/mst: split connector registration into two parts (v2)
In order to cache the EDID properly for tiled displays, we
need to retrieve it before we register the connector with
userspace, otherwise userspace can call get resources
and try and get the edid before we've even cached it.

This fixes some problems when hotplugging mst monitors,
with X/mutter running. As mutter seems to get 0 modes
for one of the monitors in the tile.

v2: fix warning in radeon
handle tile setting in cached path rather than
get edid path.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-02 15:34:41 +10:00
bde17b90dd Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "12 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dmapool: fix overflow condition in pool_find_page()
  thermal: avoid division by zero in power allocator
  memcg: remove pcp_counter_lock
  kprobes: use _do_fork() in samples to make them work again
  drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE
  memcg: make mem_cgroup_read_stat() unsigned
  memcg: fix dirty page migration
  dax: fix NULL pointer in __dax_pmd_fault()
  mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
  mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1)
  userfaultfd: remove kernel header include from uapi header
  arch/x86/include/asm/efi.h: fix build failure
2015-10-01 22:20:11 -04:00
1bca1000fa Power management and ACPI material for v4.3-rc4
- intel_idle driver fixup for the recently added Skylake chips
    support (Len Brown).
 
  - Operating Performance Points (OPP) library fix related to the
    recently added support for new DT bindings and a fix for a typo
    in a comment (Viresh Kumar, Stephen Boyd).
 
  - ACPI EC driver fix for a recently introduced memory leak in an
    error code path (Lv Zheng).
 
  - ACPI PCI IRQ management fix for the issue where an ISA IRQ is
    shared with a PCI device which requires it to be configured in a
    different way and may cause an interrupt storm to happen as a
    result with an extra ACPI SCI IRQ handling simplification on top
    of it (Jiang Liu).
 
  - Update of the PCI power management documentation that became
    outdated and started to actively confuse the readers to make
    it actually reflect the code (Rafael J Wysocki).
 
  - turbostat fixes including an IVB Xeon regression fix (related to
    the --debug command line option), Skylake adjustment for the TSC
    running at a frequency that doesn't match the base one exactly,
    and a Knights Landing quirk to account for the fact that it only
    updates APERF and MPERF every 1024 clock cycles plus bumping up
    the turbostat version number (Len Brown, Hubert Chrzaniuk).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJWDag4AAoJEILEb/54YlRxy/UQAJa39EC2IQd+PrMlgMx3cp2N
 ssotwuQiQ0jL2V/qc36wfzgu3A5k0ldHHQGbgX0f/z9LjD+zLsZiPtHj27LrNtG5
 J9DgViLh9vut4XEsLlzj8W2z1OcTyAmZyTIiVeFlj/zM517oeXKVYMX2RuhHQk0r
 lwDI/hc1rtpUkdN7gkT9DqyO32r1LgNkDt6+ubRr/qrYVhYPXSrp4k9wxnr9j1Bx
 0G9bvCz8ETTclRPcfToGU9P86snk5FS3veSm231ioABdry7BxhTZHjQKSZyjuvx4
 l8YedxBc0ks7yyeN9lvWPbNSpHLjhYen+d9q1koQsHJYb+gWJ/KbSGu3kfg0bPDj
 Rzh1u76ak7MOYpkn+95MRhzIiFxG3IhUoqYhIGGyCNFGAJgPfFos2IJTISAxSmTE
 ebCyFEX07AdhjHac4RyRCnMVavZthgLyXHwXiNqG9gdW9aOEzN65svH2LLMBiKcH
 IGRCsjom1uCUT0y1gy3R7q1nTCi112IcXwvAziX7QKCNOxLIH8HJNiraVcyl2vY5
 BbDyTOQ7VboviWWSQ09+bQFq4CAhe4b9+nR4XhvHO9F0ffxBujBoCwjjFQY+yJIH
 9nYaYyUynpi1m0Y1AwlrI8wgVLDfNEE6UU63clHQ2PoOFfDDE+/5I/l3yuWubo0I
 cUtW1RVEgDaa61ehyFuS
 =ELup
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are fixes mostly, for a few changes made in this cycle (the
  intel_idle driver, the OPP library, the ACPI EC driver, turbostat) and
  for some issues that have just been discovered (ACPI PCI IRQ
  management, PCI power management documentation, turbostat), with a
  couple of cleanups on top of them.

  Specifics:

   - intel_idle driver fixup for the recently added Skylake chips
     support (Len Brown).

   - Operating Performance Points (OPP) library fix related to the
     recently added support for new DT bindings and a fix for a typo in
     a comment (Viresh Kumar, Stephen Boyd).

   - ACPI EC driver fix for a recently introduced memory leak in an
     error code path (Lv Zheng).

   - ACPI PCI IRQ management fix for the issue where an ISA IRQ is
     shared with a PCI device which requires it to be configured in a
     different way and may cause an interrupt storm to happen as a
     result with an extra ACPI SCI IRQ handling simplification on top of
     it (Jiang Liu).

   - Update of the PCI power management documentation that became
     outdated and started to actively confuse the readers to make it
     actually reflect the code (Rafael J Wysocki).

   - turbostat fixes including an IVB Xeon regression fix (related to
     the --debug command line option), Skylake adjustment for the TSC
     running at a frequency that doesn't match the base one exactly, and
     a Knights Landing quirk to account for the fact that it only
     updates APERF and MPERF every 1024 clock cycles plus bumping up the
     turbostat version number (Len Brown, Hubert Chrzaniuk)"

* tag 'pm+acpi-4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  tools/power turbosat: update version number
  tools/power turbostat: SKL: Adjust for TSC difference from base frequency
  tools/power turbostat: KNL workaround for %Busy and Avg_MHz
  tools/power turbostat: IVB Xeon: fix --debug regression
  ACPI / PCI: Remove duplicated penalty on SCI IRQ
  ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ
  ACPI / EC: Fix a memory leak issue in acpi_ec_query()
  PM / OPP: Fix typo modifcation -> modification
  PCI / PM: Update runtime PM documentation for PCI devices
  PM / OPP: of_property_count_u32_elems() can return errors
  intel_idle: Skylake Client Support - updated
2015-10-01 22:06:40 -04:00
3deaa4f531 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

1) Fix regression in SKB partial checksum handling, from Pravin B
   Shalar.

2) Fix VLAN inside of VXLAN handling in i40e driver, from Jesse
   Brandeburg.

3) Cure softlockups during accept() in SCTP, from Karl Heiss.

4) MSG_PEEK should return multiple SKBs worth of data in AF_UNIX, from
   Aaron Conole.

5) IPV6 erroneously ignores output interface specifier in lookup key for
   route lookups, fix from David Ahern.

6) In Marvell DSA driver, forward unknown frames to CPU port, from
   Andrew Lunn.

7) Mission flow flag initializations in some code paths, from David
   Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: Initialize flow flags in input path
  net: dsa: fix preparation of a port STP update
  testptp: Silence compiler warnings on ppc64
  net/mlx4: Handle return codes in mlx4_qp_attach_common
  dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port
  skbuff: Fix skb checksum partial check.
  net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set
  net sysfs: Print link speed as signed integer
  bna: fix error handling
  af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
  af_unix: Convert the unix_sk macro to an inline function for type safety
  net: sctp: Don't use 64 kilobyte lookup table for four elements
  l2tp: protect tunnel->del_work by ref_count
  net/ibm/emac: bump version numbers for correct work with ethtool
  sctp: Prevent soft lockup when sctp_accept() is called during a timeout event
  sctp: Whitespace fix
  i40e/i40evf: check for stopped admin queue
  i40e: fix VLAN inside VXLAN
  r8169: fix handling rtl_readphy result
  net: hisilicon: fix handling platform_get_irq result
2015-10-01 21:55:35 -04:00
Greg Thelen
ef510194ce memcg: remove pcp_counter_lock
Commit 733a572e66 ("memcg: make mem_cgroup_read_{stat|event}() iterate
possible cpus instead of online") removed the last use of the per memcg
pcp_counter_lock but forgot to remove the variable.

Kill the vestigial variable.

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01 21:42:35 -04:00
Greg Thelen
0610c25daa memcg: fix dirty page migration
The problem starts with a file backed dirty page which is charged to a
memcg.  Then page migration is used to move oldpage to newpage.

Migration:
 - copies the oldpage's data to newpage
 - clears oldpage.PG_dirty
 - sets newpage.PG_dirty
 - uncharges oldpage from memcg
 - charges newpage to memcg

Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
count.

However, because newpage is not yet charged, setting newpage.PG_dirty
does not increment the memcg's dirty page count.  After migration
completes newpage.PG_dirty is eventually cleared, often in
account_page_cleaned().  At this time newpage is charged to a memcg so
the memcg's dirty page count is decremented which causes underflow
because the count was not previously incremented by migration.  This
underflow causes balance_dirty_pages() to see a very large unsigned
number of dirty memcg pages which leads to aggressive throttling of
buffered writes by processes in non root memcg.

This issue:
 - can harm performance of non root memcg buffered writes.
 - can report too small (even negative) values in
   memory.stat[(total_)dirty] counters of all memcg, including the root.

To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
page_memcg() and set_page_memcg() helpers.

Test:
    0) setup and enter limited memcg
    mkdir /sys/fs/cgroup/test
    echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
    echo $$ > /sys/fs/cgroup/test/cgroup.procs

    1) buffered writes baseline
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    2) buffered writes with compaction antagonist to induce migration
    yes 1 > /proc/sys/vm/compact_memory &
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    kill %
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    3) buffered writes without antagonist, should match baseline
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

                       (speed, dirty residue)
             unpatched                       patched
    1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
    2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
    3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages

    Notice that unpatched baseline performance (1) fell after
    migration (3): 841 -> 114 MB/s.  In the patched kernel, post
    migration performance matches baseline.

Fixes: c4843a7593 ("memcg: add per cgroup dirty page accounting")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01 21:42:35 -04:00
Andre Przywara
9ff42d10c3 userfaultfd: remove kernel header include from uapi header
As include/uapi/linux/userfaultfd.h is a user visible header file, it
should not include kernel-exclusive header files.

So trying to build the userfaultfd test program from the selftests
directory fails, since it contains a reference to linux/compiler.h.  As
it turns out, that header is not really needed there, so we can simply
remove it to fix that issue.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01 21:42:35 -04:00
46c8217c4a Changes for 4.3-rc4
- Fixes for mlx5 related issues
 - Fixes for ipoib multicast handling
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWCfALAAoJELgmozMOVy/dc+MQAKoD6echYpTkWE0otMuHQcYf
 zMaVVots+JdRKpA6OqHYQHgKGA80z21BpnjGYwcwB5zB1zPrJwz4vxwGlOBHt01T
 xLBReFgSKyJlgOWLXKfPx4bXUdivOBKm203wY0dh+/dC/VROGYoiXYTmSDsfsuKa
 8OXT1kWgzRVLtqwqj5GSkgWvtFZ28CjKh6d9egjqcj9tpbh2UupQDZzMyOtZ52X6
 Nz/Vo3u4T7qjzlhHOlCwHCDw+97x0yvmvLY1mWweGPfKOnxtXjkzQmTQEpyzU5Mo
 EwcqJucrBnmjbLAIBMrbR1mzTUQeD4dHz1jx+EzWE0lVnRL3twe1UaY40176sNlm
 aCBA4bIOQ242r3IJ++ss15ol1k5hu7PYKRn9Q8d2sSbQGcSnCHe/YOutQQ+FTEFG
 yE9xiLL+pgT8koauROnxg66E3HDM78NGTpjP3EuG4r2Qwa1iFANPfDB6kikuv8bO
 rG3qUJcloEPvfatZY+h5QC4UCoB0/W1DAhlfzE3tPBYPmhSEgQDfEOzXTKDakeF0
 VB903bYrOL3CVOun4I7fLrDc1leVeiAUKqO2orZs3qIpRWvAKyV/VjolAusMv2+F
 /4xPyh95AEMTFfmZogOCofQFk3eOnkWpLdrVTYCKy3i6NVBoy2wHldrl+LuCAN/m
 r/DNRBmazShashbeU6wg
 =8+cX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 - Fixes for mlx5 related issues
 - Fixes for ipoib multicast handling

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/ipoib: increase the max mcast backlog queue
  IB/ipoib: Make sendonly multicast joins create the mcast group
  IB/ipoib: Expire sendonly multicast joins
  IB/mlx5: Remove pa_lkey usages
  IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY
  IB/iser: Add module parameter for always register memory
  xprtrdma: Replace global lkey with lkey local to PD
2015-10-01 16:38:52 -04:00
Rafael J. Wysocki
dd953d318d Merge branches 'pm-pci' and 'acpi-pci'
* pm-pci:
  PCI / PM: Update runtime PM documentation for PCI devices

* acpi-pci:
  ACPI / PCI: Remove duplicated penalty on SCI IRQ
  ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ
2015-10-01 22:30:12 +02:00
Christoph Hellwig
0bf6cd5b95 blk-mq: factor out a helper to iterate all tags for a request_queue
And replace the blk_mq_tag_busy_iter with it - the driver use has been
replaced with a new helper a while ago, and internal to the block we
only need the new version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-01 10:10:57 +02:00
Christoph Hellwig
f4829a9b7a blk-mq: fix racy updates of rq->errors
blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.

Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-01 10:10:55 +02:00
Yoshihiro Shimoda
9ae7ce00cc usb: renesas_usbhs: fix build warning if 64-bit architecture
This patch fixes the following warning if 64-bit architecture environment:

./drivers/usb/renesas_usbhs/common.c:496:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  dparam->type = of_id ? (u32)of_id->data : 0;

Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2015-09-30 11:21:03 -05:00
Egbert Eich
4ad640e99e drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
drm_kms_helper_poll_enable() was converted to lock the mode_config
mutex in commit 8c4ccc4ab6
("drm/probe-helper: Grab mode_config.mutex in poll_init/enable").

This disregarded the cases where this function is called from a context
where this mutex is already locked.

Add a non-locking version as well.

Changes since v1:
- use function name suffix '_locked' for the function that
  is to be called from a locked context.

Signed-off-by: Egbert Eich <eich@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-09-30 16:04:08 +03:00
Pravin B Shelar
31b33dfb0a skbuff: Fix skb checksum partial check.
Earlier patch 6ae459bda tried to detect void ckecksum partial
skb by comparing pull length to checksum offset. But it does
not work for all cases since checksum-offset depends on
updates to skb->data.

Following patch fixes it by validating checksum start offset
after skb-data pointer is updated. Negative value of checksum
offset start means there is no need to checksum.

Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
Reported-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29 16:48:46 -07:00
Aaron Conole
4613012db1 af_unix: Convert the unix_sk macro to an inline function for type safety
As suggested by Eric Dumazet this change replaces the
#define with a static inline function to enjoy
complaints by the compiler when misusing the API.

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29 13:47:07 -07:00
Akinobu Mita
4593fdbe7a blk-mq: fix sysfs registration/unregistration race
There is a race between cpu hotplug handling and adding/deleting
gendisk for blk-mq, where both are trying to register and unregister
the same sysfs entries.

null_add_dev
    --> blk_mq_init_queue
        --> blk_mq_init_allocated_queue
            --> add to 'all_q_list' (*)
    --> add_disk
        --> blk_register_queue
            --> blk_mq_register_disk (++)

null_del_dev
    --> del_gendisk
        --> blk_unregister_queue
            --> blk_mq_unregister_disk (--)
    --> blk_cleanup_queue
        --> blk_mq_free_queue
            --> del from 'all_q_list' (*)

blk_mq_queue_reinit
    --> blk_mq_sysfs_unregister (-)
    --> blk_mq_sysfs_register (+)

While the request queue is added to 'all_q_list' (*),
blk_mq_queue_reinit() can be called for the queue anytime by CPU
hotplug callback.  But blk_mq_sysfs_unregister (-) and
blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
is finished.  Because '/sys/block/*/mq/' is not exists.

There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can
be used to track these sysfs stuff, but it is only fixing this issue
partially.

In order to fix it completely, we just need per-queue flag instead of
per-hctx flag with appropriate locking.  So this introduces
q->mq_sysfs_init_done which is properly protected with all_q_mutex.

Also, we need to ensure that blk_mq_map_swqueue() is called with
all_q_mutex is held.  Since hctx->nr_ctx is reset temporarily and
updated in blk_mq_map_swqueue(), so we should avoid
blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value
in CPU hotplug handling or adding/deleting gendisk .

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-09-29 11:32:45 -06:00
Vitaly Kuznetsov
0b34a166f2 x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
Currently there is a number of issues preventing PVHVM Xen guests from
doing successful kexec/kdump:

  - Bound event channels.
  - Registered vcpu_info.
  - PIRQ/emuirq mappings.
  - shared_info frame after XENMAPSPACE_shared_info operation.
  - Active grant mappings.

Basically, newly booted kernel stumbles upon already set up Xen
interfaces and there is no way to reestablish them. In Xen-4.7 a new
feature called 'soft reset' is coming. A guest performing kexec/kdump
operation is supposed to call SCHEDOP_shutdown hypercall with
SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
(with some help from toolstack) will do full domain cleanup (but
keeping its memory and vCPU contexts intact) returning the guest to
the state it had when it was first booted and thus allowing it to
start over.

Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
probably OK as by default all unknown shutdown reasons cause domain
destroy with a message in toolstack log: 'Unknown shutdown reason code
5. Destroying domain.'  which gives a clue to what the problem is and
eliminates false expectations.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-28 14:48:52 +01:00
Ingo Molnar
7c4f1c694b Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull RCU fixes from Paul E. McKenney, for two regressions
introduced in this merge window:

  - Fix bug with recent GCCs.
  - Fix false positive lockdep splat.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-28 08:03:52 +02:00
c91d707295 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This includes a iser-target series from Jenny + Sagi @ Mellanox that
  addresses the few remaining active I/O shutdown bugs, along with a
  patch to support zero-copy for immediate data payloads that gives a
  nice performance improvement for small block WRITEs.

  Also included are some recent >= v4.2 regression bug-fixes.  The most
  notable is a RCU conversion regression for SPC-3 PR registrations, and
  recent removal of obsolete RFC-3720 markers that introduced a login
  regression bug with MSFT iSCSI initiators.

  Thanks to everyone who has been testing + reporting bugs for v4.x"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: Avoid OFMarker + IFMarker negotiation
  target: Make TCM_WRITE_PROTECT failure honor D_SENSE bit
  target: Fix target_sense_desc_format NULL pointer dereference
  target: Propigate backend read-only to core_tpg_add_lun
  target: Fix PR registration + APTPL RCU conversion regression
  iser-target: Skip data copy if all the command data comes as immediate
  iser-target: Change the recv buffers posting logic
  iser-target: Fix pending connections handling in target stack shutdown sequnce
  iser-target: Remove np_ prefix from isert_np members
  iser-target: Remove unused variables
  iser-target: Put the reference on commands waiting for unsol data
  iser-target: remove command with state ISTATE_REMOVE
2015-09-26 21:02:42 -04:00
518a7cb698 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
    instead of cloning them.  From Daniel Borkmann.

 2) When converting classical BPF into eBPF, fix the setting of the
    source reg to BPF_REG_X.  From Tycho Andersen.

 3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
    Linus Lussing.

 4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.

 5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
    Roopa Prabhu.

 6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.

 7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
    Florian Westphal.

 8) Check DMA mapping errors in bna driver, from Ivan Vecera.

 9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
    misclearing of interrupt status in TX timeout handler, etc.) from
    David Woodhouse.

10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
    Hugne.

11) Fix autobind races et al. in netlink code, from Herbert Xu with
    help from Tejun Heo and others.

12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.

13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
    Eric Dumazet.

14) Fix array overruns in mac80211, from Johannes Berg and Dan
    Carpenter.

15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.

16) Fix race between poll_one_napi and napi_disable, from Neil Horman.

17) Fix byte order in geneve tunnel port config, from John W Linville.

18) Fix handling of ARP replies over lightweight tunnels, from Jiri
    Benc.

19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
    Kok and Roopa Prabhu.

20) Several reference count handling bug fixes in the PHY/MDIO layer
    from Russel King.

21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.

22) Fix crash in icmp_route_lookup(), from David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net: Fix panic in icmp_route_lookup
  net: update docbook comment for __mdiobus_register()
  ppp: fix lockdep splat in ppp_dev_uninit()
  net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
  phy: marvell: add link partner advertised modes
  net: fix net_device refcounting
  phy: add phy_device_remove()
  phy: fixed-phy: properly validate phy in fixed_phy_update_state()
  net: fix phy refcounting in a bunch of drivers
  of_mdio: fix MDIO phy device refcounting
  phy: add proper phy struct device refcounting
  phy: fix mdiobus module safety
  net: dsa: fix of_mdio_find_bus() device refcount leak
  phy: fix of_mdio_find_bus() device refcount leak
  ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
  ip6_gre: Reduce log level in ip6gre_err() to debug
  fib_rules: fix fib rule dumps across multiple skbs
  bnx2x: byte swap rss_key to comply to Toeplitz specs
  net: revert "net_sched: move tp->root allocation into fw_init()"
  lwtunnel: remove source and destination UDP port config option
  ...
2015-09-26 06:01:33 -04:00
Jiang Liu
5ebc760353 ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ
Avoid IRQs occupied by ISA IRQs when allocating IRQs for PCI link devices,
otherwise it may cause interrupt storm due to incompatible pin attributes.

This issue was triggered on a KVM virtual machine, which
 1) uses IRQ9 for SCI in high level mode.
 2) defines an PCI interrupt link device (LNKS) with IRQ9 as the only
    possible irq.
 3) has an PCI device referring to link device LNKS.
So it causes interrupt storm when enabling the PCI device because PCI IRQ
works in low level mode.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-26 01:53:07 +02:00
d4a748a10e Merge branch 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull another cgroup fix from Tejun Heo:
 "The cgroup writeback support got inadvertently enabled for traditional
  hierarchies revealing two regressions which are currently being worked
  on.  It shouldn't have been enabled on traditional hierarchies, so
  disable it on them.  This is enough to make the regressions go away
  for people who aren't experimenting with cgroup"

* 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup, writeback: don't enable cgroup writeback on traditional hierarchies
2015-09-25 16:20:55 -07:00
101688f534 NFS client bugfixes for Linux 4.3
Highlights include:
 
 Stable patches:
 - fix v4.2 SEEK on files over 2 gigs
 - Fix a layout segment reference leak when pNFS I/O falls back to inband I/O.
 - Fix recovery of recalled read delegations
 
 Bugfixes:
 - Fix a case where NFSv4 fails to send CLOSE after a server reboot
 - Fix sunrpc to wait for connections to complete before retrying
 - Fix sunrpc races between transport connect/disconnect and shutdown
 - Fix an infinite loop when layoutget fail with BAD_STATEID
 - nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
 - Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set
 - Fix layoutreturn/close ordering issues.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWBWKdAAoJEGcL54qWCgDy2rUP/iIWUQSpUPfKKw7xquQUQe4j
 ci4nFxpJ/zhKj1u7x3wrkxZAXcEooYo+ZJ7ayzROKcfQL/sUSWGbSLdr3mqrQynv
 b0SDmnJK9V+CdBQrA+Jp5UGQxcumpMxsAfqVznT0qkf/wDp44DCVgDz5Aj8cRbWU
 6xPfMgVLEnXiId9IgKqg3sJ2NmvMZXuI9sHM6hp6OzRmQDjTcx+LgRz7tnQHgaEk
 zGz8R6eDm3OA0wfApqZwJ6JY793HsDdy30W9L0Yi2PVGXfzwoEB8AqgLVwSDIY1B
 5hG5zn3tg9PSz9vhJ7M2h4AgFHdB3w3XGdJUafwqZEeqEIagw1iFCWlMyo/lE2dG
 G7oob9Jiiwxjc3RDWn2wGaafymrrWZwl2nYzC4O3UvJ3hVJ0mEl1iJagK1m8LzfN
 fmnP7tTyPuoOXkzDogZ0YI3FrngO6430PoR2hUPkS1yce/a+IV0HQEmXbSDSwN80
 1d9zyC9TnPj6rFjZeaGxGK17BpkC0oIQCPq4OSJB4396wzAwMqoJjJVVWWeAK4UC
 PxzoXqAAaBFguSsDbuBMcXgiuUw/7DIZ/pdzsWSiCFgocgF5ZdJdieCNtGk0nbLM
 37R7HCauF93JDrkpUMKPnLXScb2IbEh31pFtKzptJYKwMxEiScXXiP3NE9hfX65i
 2zLkl2aBvd154RvVKNbp
 =GdeV
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable patches:
   - fix v4.2 SEEK on files over 2 gigs
   - Fix a layout segment reference leak when pNFS I/O falls back to inband I/O.
   - Fix recovery of recalled read delegations

  Bugfixes:
   - Fix a case where NFSv4 fails to send CLOSE after a server reboot
   - Fix sunrpc to wait for connections to complete before retrying
   - Fix sunrpc races between transport connect/disconnect and shutdown
   - Fix an infinite loop when layoutget fail with BAD_STATEID
   - nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
   - Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set
   - Fix layoutreturn/close ordering issues"

* tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS41: make close wait for layoutreturn
  NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set
  NFSv4.x/pnfs: Don't try to recover stateids twice in layoutget
  NFSv4: Recovery of recalled read delegations is broken
  NFS: Fix an infinite loop when layoutget fail with BAD_STATEID
  NFS: Do cleanup before resetting pageio read/write to mds
  SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose
  SUNRPC: Lock the transport layer on shutdown
  nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
  SUNRPC: Ensure that we wait for connections to complete before retrying
  SUNRPC: drop null test before destroy functions
  nfs: fix v4.2 SEEK on files over 2 gigs
  SUNRPC: Fix races between socket connection and destroy code
  nfs: fix pg_test page count calculation
  Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
2015-09-25 11:33:52 -07:00
Sagi Grimberg
c6790aa9f4 IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY
Commit 96249d70dd ("IB/core: Guarantee that a local_dma_lkey
is available") allows ULPs that make use of the local dma key to keep
working as before by allocating a DMA MR with local permissions and
converted these consumers to use the MR associated with the PD
rather then device->local_dma_lkey.

ConnectIB has some known issues with memory registration
using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).

Thus don't expose support for it (remove device->local_dma_lkey
setting), and take advantage of the above commit such that no regression
is introduced to working systems.

The local_dma_lkey support will be restored in CX4 depending on FW
capability query.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 10:46:51 -04:00
Nicholas Bellinger
eeeb952223 target: Propigate backend read-only to core_tpg_add_lun
This patch adds a DF_READ_ONLY flag that is used by IBLOCK to
signal when a backend has been set to read-only mode, in order
to propigate read-only status up to core_tpg_add_lun() for all
future LUN fabric exports.

With this is place, existing emulation for reporting read-only
in spc_emulate_modesense() and normal transport_lookup_cmd_lun()
TCM_WRITE_PROTECTED status checking just works as expected.

Reported-by: Joeue Deng <joeue404@gmail.com>
Reported-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-24 23:17:21 -07:00
Russell King
38737e490d phy: add phy_device_remove()
Add a phy_device_remove() function to complement phy_device_register(),
which undoes the effects of phy_device_register() by removing the phy
device from visibility, but not freeing it.

This allows these details to be moved out of the mdio bus code into
the phy code where this action belongs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24 23:04:53 -07:00
Russell King
3e3aaf6494 phy: fix mdiobus module safety
Re-implement the mdiobus module refcounting to ensure that we actually
ensure that the mdiobus module code does not go away while we might call
into it.

The old scheme using bus->dev.driver was buggy, because bus->dev is a
class device which never has a struct device_driver associated with it,
and hence the associated code trying to obtain a refcount did nothing
useful.

Instead, take the approach that other subsystems do: pass the module
when calling mdiobus_register(), and record that in the mii_bus struct.
When we need to increment the module use count in the phy code, use
this stored pointer.  When the phy is deteched, drop the module
refcount, remembering that the phy device might go away at that point.

This doesn't stop the mii_bus going away while there are in-use phys -
it merely stops the underlying code vanishing.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24 23:04:52 -07:00
ced255c0c5 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:

 - Power allocator governor changes to allow binding on thermal zones
   with missing power estimates information.  From Javi Merino.

 - Add compile test flags on thermal drivers that allow it without
   producing compilation errors.  From Eduardo Valentin.

 - Fixes around memory allocation on cpu_cooling.  From Javi Merino.

 - Fix on db8500 cpufreq code to allow autoload.  From Luis de
   Bethencourt.

 - Maintainer entries for cpu cooling device

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  thermal: power_allocator: exit early if there are no cooling devices
  thermal: power_allocator: don't require tzp to be present for the thermal zone
  thermal: power_allocator: relax the requirement of two passive trip points
  thermal: power_allocator: relax the requirement of a sustainable_power in tzp
  thermal: Add a function to get the minimum power
  thermal: cpu_cooling: free power table on error or when unregistering
  thermal: cpu_cooling: don't call kcalloc() under rcu_read_lock
  thermal: db8500_cpufreq_cooling: Fix module autoload for OF platform driver
  thermal: cpu_cooling: Add MAINTAINERS entry
  thermal: ti-soc: Kconfig fix to avoid menu showing wrongly
  thermal: ti-soc: allow compile test
  thermal: qcom_spmi: allow compile test
  thermal: exynos: allow compile test
  thermal: armada: allow compile test
  thermal: dove: allow compile test
  thermal: kirkwood: allow compile test
  thermal: rockchip: allow compile test
  thermal: spear: allow compile test
  thermal: hisi: allow compile test
  thermal: Fix thermal_zone_of_sensor_register to match documentation
2015-09-24 20:14:26 -07:00
bfbaa60d18 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  ocfs2/dlm: fix deadlock when dispatch assert master
  membarrier: clean up selftest
  vmscan: fix sane_reclaim helper for legacy memcg
  lib/iommu-common.c: do not try to deref a null iommu->lazy_flush() pointer when n < pool->hint
  x86, efi, kasan: #undef memset/memcpy/memmove per arch
  mm: migrate: hugetlb: putback destination hugepage to active list
  mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified
  userfaultfd: register uapi generic syscall (aarch64)
  userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical
  userfaultfd: selftest: return an error if BOUNCE_VERIFY fails
  userfaultfd: selftest: avoid my_bcmp false positives with powerpc
  userfaultfd: selftest: only warn if __NR_userfaultfd is undefined
  userfaultfd: selftest: headers fixup
  userfaultfd: selftests: vm: pick up sanitized kernel headers
  userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"
2015-09-24 14:31:40 -07:00
Jiri Benc
b194f30c61 lwtunnel: remove source and destination UDP port config option
The UDP tunnel config is asymmetric wrt. to the ports used. The source and
destination ports from one direction of the tunnel are not related to the
ports of the other direction. We need to be able to respond to ARP requests
using the correct ports without involving routing.

As the consequence, UDP ports need to be fixed property of the tunnel
interface and cannot be set per route. Remove the ability to set ports per
route. This is still okay to do, as no kernel has been released with these
attributes yet.

Note that the ability to specify source and destination ports is preserved
for other users of the lwtunnel API which don't use routes for tunnel key
specification (like openvswitch).

If in the future we rework ARP handling to allow port specification, the
attributes can be added back.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24 14:31:37 -07:00
Jiri Benc
63d008a4e9 ipv4: send arp replies to the correct tunnel
When using ip lwtunnels, the additional data for xmit (basically, the actual
tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
metadata dst. When replying to ARP requests, we need to send the reply to
the same tunnel the request came from. This means we need to construct
proper metadata dst for ARP replies.

We could perform another route lookup to get a dst entry with the correct
lwtstate. However, this won't always ensure that the outgoing tunnel is the
same as the incoming one, and it won't work anyway for IPv4 duplicate
address detection.

The only thing to do is to "reverse" the ip_tunnel_info.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24 14:31:36 -07:00
Pravin B Shelar
6ae459bdaa skbuff: Fix skb checksum flag on skb pull
VXLAN device can receive skb with checksum partial. But the checksum
offset could be in outer header which is pulled on receive. This results
in negative checksum offset for the skb. Such skb can cause the assert
failure in skb_checksum_help(). Following patch fixes the bug by setting
checksum-none while pulling outer header.

Following is the kernel panic msg from old kernel hitting the bug.

------------[ cut here ]------------
kernel BUG at net/core/dev.c:1906!
RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
Call Trace:
<IRQ>
[<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
[<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
[<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
[<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
[<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
[<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
[<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
[<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
[<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
[<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
[<ffffffff81550128>] ip_local_deliver+0x88/0x90
[<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
[<ffffffff81550365>] ip_rcv+0x235/0x300
[<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
[<ffffffff8151c360>] netif_receive_skb+0x80/0x90
[<ffffffff81459935>] virtnet_poll+0x555/0x6f0
[<ffffffff8151cd04>] net_rx_action+0x134/0x290
[<ffffffff810683d8>] __do_softirq+0xa8/0x210
[<ffffffff8162fe6c>] call_softirq+0x1c/0x30
[<ffffffff810161a5>] do_softirq+0x65/0xa0
[<ffffffff810687be>] irq_exit+0x8e/0xb0
[<ffffffff81630733>] do_IRQ+0x63/0xe0
[<ffffffff81625f2e>] common_interrupt+0x6e/0x6e

Reported-by: Anupam Chanda <achanda@vmware.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24 14:09:13 -07:00
Tejun Heo
9badce000e cgroup, writeback: don't enable cgroup writeback on traditional hierarchies
inode_cgwb_enabled() gates cgroup writeback support.  If it returns
true, each inode is attached to the corresponding memory domain which
gets mapped to io domain.  It currently only tests whether the
filesystem and bdi support cgroup writeback; however, cgroup writeback
support doesn't work on traditional hierarchies and thus it should
also test whether memcg and iocg are on the default hierarchy.

This caused traditional hierarchy setups to hit the cgroup writeback
path inadvertently and ended up creating separate writeback domains
for each memcg and mapping them all to the root iocg uncovering a
couple issues in the cgroup writeback path.

cgroup writeback was never meant to be enabled on traditional
hierarchies.  Make inode_cgwb_enabled() test whether both memcg and
iocg are on the default hierarchy.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Artem Bityutskiy <dedekind1@gmail.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net
2015-09-24 16:48:52 -04:00