Commit graph

60221 commits

Author SHA1 Message Date
Dominik Brodowski
1bec510a9e kexec: move sys_kexec_load() prototype to syscalls.h
As the syscall function should only be called from the system call table
but not from elsewhere in the kernel, move the prototype for
sys_kexec_load() to include/syscall.h.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: kexec@lists.infradead.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:15 +02:00
Dominik Brodowski
c7b95d5156 mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:12 +02:00
Dominik Brodowski
a90f590a1b mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
Using this helper allows us to avoid the in-kernel calls to the
sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_mmap_pgoff().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:11 +02:00
Dominik Brodowski
9d5b7c956b mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel
calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that
this function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as ksys_fadvise64_64().

Some compat stubs called sys_fadvise64(), which then just passed through
the arguments to sys_fadvise64_64(). Get rid of this indirection, and call
ksys_fadvise64_64() directly.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:10 +02:00
Dominik Brodowski
edf292c76b fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
Using the ksys_fallocate() wrapper allows us to get rid of in-kernel
calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_fallocate().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:09 +02:00
Dominik Brodowski
36028d5dd7 fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
Using the ksys_p{read,write}64() wrappers allows us to get rid of
in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls.
The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_p{read,write}64().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:09 +02:00
Dominik Brodowski
df260e21e6 fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
Using the ksys_truncate() wrapper allows us to get rid of in-kernel
calls to the sys_truncate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_truncate().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:08 +02:00
Dominik Brodowski
806cbae122 fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
Using this helper allows us to avoid the in-kernel calls to the
sys_sync_file_range() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it uses
the same calling convention as sys_sync_file_range().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:07 +02:00
Dominik Brodowski
e2aaa9f423 kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
Using this helper allows us to avoid the in-kernel call to the
sys_setsid() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_setsid().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:06 +02:00
Dominik Brodowski
9b32105ec6 kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
Using this helper allows us to avoid the in-kernel calls to the
sys_unshare() syscall. The ksys_ prefix denotes that this function is meant
as a drop-in replacement for the syscall. In particular, it uses the same
calling convention as sys_unshare().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:06 +02:00
Dominik Brodowski
70f68ee81e fs: add ksys_sync() helper; remove in-kernel calls to sys_sync()
Using this helper allows us to avoid the in-kernel calls to the
sys_sync() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_sync().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:05 +02:00
Dominik Brodowski
3ce4a7bf66 fs: add ksys_read() helper; remove in-kernel calls to sys_read()
Using this helper allows us to avoid the in-kernel calls to the
sys_read() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_read().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:04 +02:00
Dominik Brodowski
76847e4344 fs: add ksys_lseek() helper; remove in-kernel calls to sys_lseek()
Using this helper allows us to avoid the in-kernel calls to the
sys_lseek() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_lseek().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:03 +02:00
Dominik Brodowski
cbb60b924b fs: add ksys_ioctl() helper; remove in-kernel calls to sys_ioctl()
Using this helper allows us to avoid the in-kernel calls to the
sys_ioctl() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_ioctl().

After careful review, at least some of these calls could be converted
to do_vfs_ioctl() in future.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:03 +02:00
Dominik Brodowski
454dab3f96 fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64()
Using this helper allows us to avoid the in-kernel calls to the
sys_getdents64() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_getdents64().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:02 +02:00
Dominik Brodowski
bae217ea8c fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
Using this wrapper allows us to avoid the in-kernel calls to the
sys_open() syscall. The ksys_ prefix denotes that this function is meant
as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_open().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:01 +02:00
Dominik Brodowski
2ca2a09d62 fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
Using the ksys_close() wrapper allows us to get rid of in-kernel calls
to the sys_close() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_close(), with one subtle
difference:

The few places which checked the return value did not care about the return
value re-writing in sys_close(), so simply use a wrapper around
__close_fd().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:00 +02:00
Dominik Brodowski
411d9475cf fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate()
Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel
calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_ftruncate().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:00 +02:00
Dominik Brodowski
55731b3cda fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers
Using the fs-interal do_fchownat() wrapper allows us to get rid of
fs-internal calls to the sys_fchownat() syscall.

Introducing the ksys_fchown() helper and the ksys_{,}chown() wrappers
allows us to avoid the in-kernel calls to the sys_{,l,f}chown() syscalls.
The ksys_ prefix denotes that these functions are meant as a drop-in
replacement for the syscalls. In particular, they use the same calling
convention as sys_{,l,f}chown().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:59 +02:00
Dominik Brodowski
cbfe20f565 fs: add do_faccessat() helper and ksys_access() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_faccessat() helper allows us to get rid of
fs-internal calls to the sys_faccessat() syscall.

Introducing the ksys_access() wrapper allows us to avoid the in-kernel
calls to the sys_access() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_access().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:58 +02:00
Dominik Brodowski
03450e271a fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_fchmodat() helper allows us to get rid of
fs-internal calls to the sys_fchmodat() syscall.

Introducing the ksys_fchmod() helper and the ksys_chmod() wrapper allows
us to avoid the in-kernel calls to the sys_fchmod() and sys_chmod()
syscalls. The ksys_ prefix denotes that these functions are meant as a
drop-in replacement for the syscalls. In particular, they use the same
calling convention as sys_fchmod() and sys_chmod().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:57 +02:00
Dominik Brodowski
46ea89eb65 fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_linkat() helper allows us to get rid of
fs-internal calls to the sys_linkat() syscall.

Introducing the ksys_link() wrapper allows us to avoid the in-kernel
calls to sys_link() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it uses
the same calling convention as sys_link().

In the near future, the only fs-external user of ksys_link() should be
converted to use vfs_link() instead.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:57 +02:00
Dominik Brodowski
87c4e19262 fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_mknodat() helper allows us to get rid of
fs-internal calls to the sys_mknodat() syscall.

Introducing the ksys_mknod() wrapper allows us to avoid the in-kernel
calls to sys_mknod() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it uses
the same calling convention as sys_mknod().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:56 +02:00
Dominik Brodowski
b724e846b4 fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_symlinkat() helper allows us to get rid of
fs-internal calls to the sys_symlinkat() syscall.

Introducing the ksys_symlink() wrapper allows us to avoid the in-kernel
calls to the sys_symlink() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In particular,
it uses the same calling convention as sys_symlink().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:55 +02:00
Dominik Brodowski
0101db7a30 fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_mkdirat() helper allows us to get rid of
fs-internal calls to the sys_mkdirat() syscall.

Introducing the ksys_mkdir() wrapper allows us to avoid the in-kernel calls
to the sys_mkdir() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_mkdir().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:54 +02:00
Dominik Brodowski
f459dffae1 fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
Using this wrapper allows us to avoid the in-kernel calls to the
sys_rmdir() syscall. The ksys_ prefix denotes that this function is meant
as a drop-in replacement for the syscall. In particular, it uses the same
calling convention as sys_rmdir().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:54 +02:00
Dominik Brodowski
0f32ab8cfa fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
Using this wrapper allows us to avoid the in-kernel calls to the
sys_unlink() syscall. The ksys_ prefix denotes that this function is meant
s a drop-in replacement for the syscall. In particular, it uses the same
calling convention as sys_unlink().

In the near future, all callers of ksys_unlink() should be converted to
call do_unlinkat() directly or, at least, to operate on regular kernel
pointers.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:52 +02:00
Dominik Brodowski
447016e968 fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
Using this helper allows us to avoid the in-kernel calls to the sys_chdir()
syscall. The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_chdir().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:51 +02:00
Dominik Brodowski
e7a3e8b2ed fs: add ksys_write() helper; remove in-kernel calls to sys_write()
Using this helper allows us to avoid the in-kernel calls to the sys_write()
syscall. The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_write().

In the near future, the do_mounts / initramfs callers of ksys_write()
should be converted to use filp_open() and vfs_write() instead.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:51 +02:00
Dominik Brodowski
a16fe33ab5 fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
Using this helper allows us to avoid the in-kernel calls to the
sys_chroot() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_chroot().

In the near future, the fs-external callers of ksys_chroot() should be
converted to use kern_path()/set_fs_root() directly. Then ksys_chroot()
can be moved within sys_chroot() again.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:50 +02:00
Dominik Brodowski
c7248321a3 fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
Using ksys_dup() and ksys_dup3() as helper functions allows us to
avoid the in-kernel calls to the sys_dup() and sys_dup3() syscalls.
The ksys_ prefix denotes that these functions are meant as a drop-in
replacement for the syscalls. In particular, they use the same
calling convention as sys_dup{,3}().

In the near future, the fs-external callers of ksys_dup{,3}() should be
converted to call do_dup2() directly. Then, ksys_dup{,3}() can be moved
within sys_dup{,3}() again.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:49 +02:00
Dominik Brodowski
3a18ef5c1b fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
Using this helper allows us to avoid the in-kernel call to the sys_umount()
syscall. The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as ksys_umount().

In the near future, the only fs-external caller of ksys_umount() should be
converted to call do_umount() directly. Then, ksys_umount() can be moved
within sys_umount() again.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:48 +02:00
Dominik Brodowski
312db1aa1d fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
Using this helper allows us to avoid the in-kernel calls to the sys_mount()
syscall. The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_mount().

In the near future, all callers of ksys_mount() should be converted to call
do_mount() directly.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:48 +02:00
Dominik Brodowski
ab0d1e85bf fs/quota: use COMPAT_SYSCALL_DEFINE for sys32_quotactl()
While sys32_quotactl() is only needed on x86, it can use the recommended
COMPAT_SYSCALL_DEFINEx() machinery for its setup.

Acked-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:47 +02:00
Dominik Brodowski
cb0b476ab1 fs/quota: add kernel_quotactl() helper; remove in-kernel call to syscall
Using the fs-internal kernel_quotactl() helper allows us to get rid of
the fs-internal call to the sys_quotactl() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:46 +02:00
Dominik Brodowski
e1834a329d net: socket: move check for forbid_cmsg_compat to __sys_...msg()
The non-compat codepaths for sys_...msg() verify that MSG_CMSG_COMPAT
is not set. By moving this check to the __sys_...msg() functions
(and making it dependent on a static flag passed to this function), we
can call the __sys...msg() functions instead of the syscall functions
in all cases. __sys_recvmmsg() does not need this trickery, as the
check is handled within the do_sys_recvmmsg() function internal to
net/socket.c.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:15 +02:00
Dominik Brodowski
cc36dca0df net: socket: add __sys_setsockopt() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_setsockopt() allows us to avoid the
internal calls to the sys_setsockopt() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:13 +02:00
Dominik Brodowski
005a1aeac4 net: socket: add __sys_shutdown() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_shutdown() allows us to avoid the
internal calls to the sys_shutdown() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:12 +02:00
Dominik Brodowski
6debc8d834 net: socket: add __sys_socketpair() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_socketpair() allows us to avoid the
internal calls to the sys_socketpair() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:11 +02:00
Dominik Brodowski
b21c8f838a net: socket: add __sys_getpeername() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_getpeername() allows us to avoid the
internal calls to the sys_getpeername() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:10 +02:00
Dominik Brodowski
8882a107b3 net: socket: add __sys_getsockname() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_getsockname() allows us to avoid the
internal calls to the sys_getsockname() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:09 +02:00
Dominik Brodowski
25e290eed9 net: socket: add __sys_listen() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_listen() allows us to avoid the
internal calls to the sys_listen() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:09 +02:00
Dominik Brodowski
1387c2c2f9 net: socket: add __sys_connect() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_connect() allows us to avoid the
internal calls to the sys_connect() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:08 +02:00
Dominik Brodowski
a87d35d87a net: socket: add __sys_bind() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_bind() allows us to avoid the
internal calls to the sys_bind() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:07 +02:00
Dominik Brodowski
9d6a15c3f2 net: socket: add __sys_socket() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_socket() allows us to avoid the
internal calls to the sys_socket() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:06 +02:00
Dominik Brodowski
4541e80560 net: socket: add __sys_accept4() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_accept4() allows us to avoid the
internal calls to the sys_accept4() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:06 +02:00
Dominik Brodowski
211b634b7f net: socket: add __sys_sendto() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_sendto() allows us to avoid the
internal calls to the sys_sendto() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:05 +02:00
Dominik Brodowski
7a09e1eb9c net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscall
Using the net-internal helper __sys_recvfrom() allows us to avoid the
internal calls to the sys_recvfrom() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:04 +02:00
Dominik Brodowski
2de0db992d mm: use do_futex() instead of sys_futex() in mm_release()
sys_futex() is a wrapper to do_futex() which does not modify any
values here:

- uaddr, val and val3 are kept the same

- op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE.
  Therefore, val2 is always 0.

- as utime is set to NULL, *timeout is NULL

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:02 +02:00
Dominik Brodowski
d53238cd51 kernel: open-code sys_rt_sigpending() in sys_sigpending()
A similar but not fully equivalent code path is already open-coded
three times (in sys_rt_sigpending and in the two compat stubs), so
do it a fourth time here.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:00 +02:00
486adcea4a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main kernel side changes were:

   - Modernize the kprobe and uprobe creation/destruction tooling ABIs:

     The existing text based APIs (kprobe_events and uprobe_events in
     tracefs), are naive, limited ABIs in that they require user-space
     to clean up after themselves, which is both difficult and fragile
     if the tool is buggy or exits unexpectedly. In other words they are
     not really suited for modern, robust tooling.

     So introduce a modern, file descriptor based ABI that does not have
     these limitations: introduce the 'perf_kprobe' and 'perf_uprobe'
     PMUs and extend the perf_event_open() syscall to create events with
     a kprobe/uprobe attached to them. These [k,u]probe are associated
     with this file descriptor, so they are not available in tracefs.

     (Song Liu)

   - Intel Cannon Lake CPU support (Harry Pan)

   - Intel PT cleanups (Alexander Shishkin)

   - Improve the performance of pinned/flexible event groups by using RB
     trees (Alexey Budankov)

   - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification
     of hardware breakpoints, which new ABI variant massively speeds up
     existing tooling that uses hardware breakpoints to instrument (and
     debug) memory usage.

     (Milind Chabbi, Jiri Olsa)

   - Various Intel PEBS handling fixes and improvements, and other Intel
     PMU improvements (Kan Liang)

   - Various perf core improvements and optimizations (Peter Zijlstra)

   - ... misc cleanups, fixes and updates.

  There's over 200 tooling commits, here's an (imperfect) list of
  highlights:

   - 'perf annotate' improvements:

      * Recognize and handle jumps to other functions as calls, which
        improves the navigation along jumps and back. (Arnaldo Carvalho
        de Melo)

      * Add the 'P' hotkey in TUI annotation to dump annotation output
        into a file, to ease e-mail reporting of annotation details.
        (Arnaldo Carvalho de Melo)

      * Add an IPC/cycles column to the TUI (Jin Yao)

      * Improve s390 assembly annotation (Thomas Richter)

      * Refactor the output formatting logic to better separate it into
        interactive and non-interactive features and add the --stdio2
        output variant to demonstrate this. (Arnaldo Carvalho de Melo)

   - 'perf script' improvements:

      * Add Python 3 support (Jaroslav Škarvada)

      * Add --show-round-event (Jiri Olsa)

   - 'perf c2c' improvements:

      * Add NUMA analysis support (Jiri Olsa)

   - 'perf trace' improvements:

      * Improve PowerPC support (Ravi Bangoria)

   - 'perf inject' improvements:

      * Integrate ARM CoreSight traces (Robert Walker)

   - 'perf stat' improvements:

      * Add the --interval-count option (yuzhoujian)

      * Add the --timeout option (yuzhoujian)

   - 'perf sched' improvements (Changbin Du)

   - Vendor events improvements :

      * Add IBM s390 vendor events (Thomas Richter)

      * Add and improve arm64 vendor events (John Garry, Ganapatrao
        Kulkarni)

      * Update POWER9 vendor events (Sukadev Bhattiprolu)

   - Intel PT tooling improvements (Adrian Hunter)

   - PMU handling improvements (Agustin Vega-Frias)

   - Record machine topology in perf.data (Jiri Olsa)

   - Various overwrite related cleanups (Kan Liang)

   - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet)

   - ... and lots of other changes, cleanups and fixes, see the shortlog
     and Git history for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
  perf/x86/intel: Enable C-state residency events for Cannon Lake
  perf/x86/intel: Add Cannon Lake support for RAPL profiling
  perf/x86/pt, coresight: Clean up address filter structure
  perf vendor events s390: Add JSON files for IBM z14
  perf vendor events s390: Add JSON files for IBM z13
  perf vendor events s390: Add JSON files for IBM zEC12 zBC12
  perf vendor events s390: Add JSON files for IBM z196
  perf vendor events s390: Add JSON files for IBM z10EC z10BC
  perf mmap: Be consistent when checking for an unmaped ring buffer
  perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done()
  perf build: Fix check-headers.sh opts assignment
  perf/x86: Update rdpmc_always_available static key to the modern API
  perf annotate: Use absolute addresses to calculate jump target offsets
  perf annotate: Defer searching for comma in raw line till it is needed
  perf annotate: Support jumping from one function to another
  perf annotate: Add "_local" to jump/offset validation routines
  perf python: Reference Py_None before returning it
  perf annotate: Mark jumps to outher functions with the call arrow
  perf annotate: Pass function descriptor to its instruction parsing routines
  perf annotate: No need to calculate notes->start twice
  ...
2018-04-02 11:06:34 -07:00
701f3b3149 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in the locking subsystem in this cycle were:

   - Add the Linux Kernel Memory Consistency Model (LKMM) subsystem,
     which is an an array of tools in tools/memory-model/ that formally
     describe the Linux memory coherency model (a.k.a.
     Documentation/memory-barriers.txt), and also produce 'litmus tests'
     in form of kernel code which can be directly executed and tested.

     Here's a high level background article about an earlier version of
     this work on LWN.net:

        https://lwn.net/Articles/718628/

     The design principles:

      "There is reason to believe that Documentation/memory-barriers.txt
       could use some help, and a major purpose of this patch is to
       provide that help in the form of a design-time tool that can
       produce all valid executions of a small fragment of concurrent
       Linux-kernel code, which is called a "litmus test". This tool's
       functionality is roughly similar to a full state-space search.
       Please note that this is a design-time tool, not useful for
       regression testing. However, we hope that the underlying
       Linux-kernel memory model will be incorporated into other tools
       capable of analyzing large bodies of code for regression-testing
       purposes."

     [...]

      "A second tool is klitmus7, which converts litmus tests to
       loadable kernel modules for direct testing. As with herd7, the
       klitmus7 code is freely available from

         http://diy.inria.fr/sources/index.html

       (and via "git" at https://github.com/herd/herdtools7)"

     [...]

     Credits go to:

      "This patch was the result of a most excellent collaboration
       founded by Jade Alglave and also including Alan Stern, Andrea
       Parri, and Luc Maranget."

     ... and to the gents listed in the MAINTAINERS entry:

        LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
        M:      Alan Stern <stern@rowland.harvard.edu>
        M:      Andrea Parri <parri.andrea@gmail.com>
        M:      Will Deacon <will.deacon@arm.com>
        M:      Peter Zijlstra <peterz@infradead.org>
        M:      Boqun Feng <boqun.feng@gmail.com>
        M:      Nicholas Piggin <npiggin@gmail.com>
        M:      David Howells <dhowells@redhat.com>
        M:      Jade Alglave <j.alglave@ucl.ac.uk>
        M:      Luc Maranget <luc.maranget@inria.fr>
        M:      "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

     The LKMM project already found several bugs in Linux locking
     primitives and improved the understanding and the documentation of
     the Linux memory model all around.

   - Add KASAN instrumentation to atomic APIs (Dmitry Vyukov)

   - Add RWSEM API debugging and reorganize the lock debugging Kconfig
     (Waiman Long)

   - ... misc cleanups and other smaller changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  locking/Kconfig: Restructure the lock debugging menu
  locking/Kconfig: Add LOCK_DEBUGGING_SUPPORT to make it more readable
  locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches
  lockdep: Make the lock debug output more useful
  locking/rtmutex: Handle non enqueued waiters gracefully in remove_waiter()
  locking/atomic, asm-generic, x86: Add comments for atomic instrumentation
  locking/atomic, asm-generic: Add KASAN instrumentation to atomic operations
  locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h
  locking/atomic, asm-generic: Add asm-generic/atomic-instrumented.h
  locking/xchg/alpha: Remove superfluous memory barriers from the _local() variants
  tools/memory-model: Finish the removal of rb-dep, smp_read_barrier_depends(), and lockless_dereference()
  tools/memory-model: Add documentation of new litmus test
  tools/memory-model: Remove mention of docker/gentoo image
  locking/memory-barriers: De-emphasize smp_read_barrier_depends() some more
  locking/lockdep: Show unadorned pointers
  mutex: Drop linkage.h from mutex.h
  tools/memory-model: Remove rb-dep, smp_read_barrier_depends, and lockless_dereference
  tools/memory-model: Convert underscores to hyphens
  tools/memory-model: Add a S lock-based external-view litmus test
  tools/memory-model: Add required herd7 version to README file
  ...
2018-04-02 10:27:16 -07:00
8747a29173 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU subsystem changes in this cycle were:

  - Miscellaneous fixes, perhaps most notably removing obsolete code
    whose only purpose in life was to gather information for the
    now-removed RCU debugfs facility. Other notable changes include
    removing NO_HZ_FULL_ALL in favor of the nohz_full kernel boot
    parameter, minor optimizations for expedited grace periods, some
    added tracing, creating an RCU-specific workqueue using Tejun's new
    WQ_MEM_RECLAIM flag, and several cleanups to code and comments.

  - SRCU cleanups and optimizations.

  - Torture-test updates, perhaps most notably the adding of ARMv8
    support, but also including numerous cleanups and usability fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  rcu: Create RCU-specific workqueues with rescuers
  torture: Provide more sensible nreader/nwriter defaults for rcuperf
  torture: Grace periods do not piggyback off of themselves
  torture: Adjust rcuperf trace processing to allow for workqueues
  torture: Default jitter off when running rcuperf
  torture: Specify qemu memory size with --memory argument
  rcutorture: Add basic ARM64 support to run scripts
  rcutorture: Update kvm.sh header comment
  rcutorture: Record which grace-period primitives are tested
  rcutorture: Re-enable testing of dynamic expediting
  rcutorture: Avoid fake-writer use of undefined primitives
  rcutorture: Abstract function and module names
  rcutorture: Replace multi-instance kzalloc() with kcalloc()
  rcu: Remove SRCU throttling
  srcu: Remove dead code in srcu_gp_end()
  srcu: Reduce scans of srcu_data in counter wrap check
  srcu: Prevent sdp->srcu_gp_seq_needed_exp counter wrap
  srcu: Abstract function name
  rcu: Make expedited RCU CPU selection avoid unnecessary stores
  rcu: Trace expedited GP delays due to transitioning CPUs
  ...
2018-04-02 09:59:09 -07:00
cc67ccecd3 Merge branch 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull header file cleanup from Ingo Molnar:
 "Reduce <linux/interrupt.h> dependencies: a single change that drops
  two #includes from this frequently used kernel header"

* 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  headers: Drop two #included headers from <linux/interrupt.h>
2018-04-02 09:27:29 -07:00
320b164abb main drm pull request for v4.17
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJavDxYAAoJEAx081l5xIa+pCoP/iwjuxkSTdJpZUx5g0daGkCK
 O18moGqGPChb7qJovfHqCKZ1f9PGulQt7SxwFzzJXNbv0PbfMA/Og0EhMLBImb+Q
 VfYgq2vJLpmkikgcI5fBrzs9DRMQKKobGIzw24VS7IkPYA7d8KgAyywBwG0+LUFR
 G3sobClgapsfaUcleb3ZOeDwymGkGCuuYRpYE4giHtuMDIxCWLePKJKOaOIq8o6P
 A1557EvSbKuLQGI9X50jzJOoBE3TKRQYkzuM1GthdOF8RHaMNcFy44lDNO030HwZ
 hzwAIg5Izhu16PqZGyEdIQ6SJTv3isRJWEciPnOsijvjl1li3ehMdQfhGISa/jZO
 ivEGd32kaactiT0jJ5OyexergEViCPVKCIORksSIk46L84luDva9L22A3yu0mf3F
 ixB63bAiLH7Py77kH3DmeJdqhMxlVZXCbdBVFDvzZvY4O3Mx0Dv9mmN/nw1FVCFH
 scSYnXea9/o4IY5yGASU6FAUJEEGu20HAN12oHJw7/taqV/gbbEos3F7AGmjJE0f
 qe6Rt/8fwi7Lhm2va6EoOo6yltH/gL4/AgnsN76VzppNGbaIv7W8Qa4Y/ES1lAE1
 SATAEUJfU8kiLrVOolIElPbgfdJwv8TzoxiKB5wK/eoH20wf4BTmOuBMviaL2qXK
 Sz6wihq+IlMXW7Y7pIl/
 =DrA+
 -----END PGP SIGNATURE-----

Merge tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
 "Cannonlake and Vega12 support are probably the two major things. This
  pull lacks nouveau, Ben had some unforseen leave and a few other
  blockers so we'll see how things look or maybe leave it for this merge
  window.

  core:
   - Device links to handle sound/gpu pm dependency
   - Color encoding/range properties
   - Plane clipping into plane check helper
   - Backlight helpers
   - DP TP4 + HBR3 helper support

  amdgpu:
   - Vega12 support
   - Enable DC by default on all supported GPUs
   - Powerplay restructuring and cleanup
   - DC bandwidth calc updates
   - DC backlight on pre-DCE11
   - TTM backing store dropping support
   - SR-IOV fixes
   - Adding "wattman" like functionality
   - DC crc support
   - Improved DC dual-link handling

  amdkfd:
   - GPUVM support for dGPU
   - KFD events for dGPU
   - Enable PCIe atomics for dGPUs
   - HSA process eviction support
   - Live-lock fixes for process eviction
   - VM page table allocation fix for large-bar systems

  panel:
   - Raydium RM68200
   - AUO G104SN02 V2
   - KEO TX31D200VM0BAA
   - ARM Versatile panels

  i915:
   - Cannonlake support enabled
   - AUX-F port support added
   - Icelake base enabling until internal milestone of forcewake support
   - Query uAPI interface (used for GPU topology information currently)
   - Compressed framebuffer support for sprites
   - kmem cache shrinking when GPU is idle
   - Avoid boosting GPU when waited item is being processed already
   - Avoid retraining LSPCON link unnecessarily
   - Decrease request signaling latency
   - Deprecation of I915_SET_COLORKEY_NONE
   - Kerneldoc and compiler warning cleanup for upcoming CI enforcements
   - Full range ycbcr toggling
   - HDCP support

  i915/gvt:
   - Big refactor for shadow ppgtt
   - KBL context save/restore via LRI cmd (Weinan)
   - Properly unmap dma for guest page (Changbin)

  vmwgfx:
   - Lots of various improvements

  etnaviv:
   - Use the drm gpu scheduler
   - prep work for GC7000L support

  vc4:
   - fix alpha blending
   - Expose perf counters to userspace

  pl111:
   - Bandwidth checking/limiting
   - Versatile panel support

  sun4i:
   - A83T HDMI support
   - A80 support
   - YUV plane support
   - H3/H5 HDMI support

  omapdrm:
   - HPD support for DVI connector
   - remove lots of static variables

  msm:
   - DSI updates from 10nm / SDM845
   - fix for race condition with a3xx/a4xx fence completion irq
   - some refactoring/prep work for eventual a6xx support (ie. when we
     have a userspace)
   - a5xx debugfs enhancements
   - some mdp5 fixes/cleanups to prepare for eventually merging
     writeback
   - support (ie. when we have a userspace)

  tegra:
   - mmap() fixes for fbdev devices
   - Overlay plane for hw cursor fix
   - dma-buf cache maintenance support

  mali-dp:
   - YUV->RGB conversion support

  rockchip:
   - rk3399/chromebook fixes and improvements

  rcar-du:
   - LVDS support move to drm bridge
   - DT bindings for R8A77995
   - Driver/DT support for R8A77970

  tilcdc:
   - DRM panel support"

* tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux: (1646 commits)
  drm/i915: Fix hibernation with ACPI S0 target state
  drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt
  drm/i915: Specify which engines to reset following semaphore/event lockups
  drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
  drm/amdkfd: Use ordered workqueue to restore processes
  drm/amdgpu: Fix acquiring VM on large-BAR systems
  drm/amd/pp: clean header file hwmgr.h
  drm/amd/pp: use mlck_table.count for array loop index limit
  drm: Fix uabi regression by allowing garbage mode->type from userspace
  drm/amdgpu: Add an ATPX quirk for hybrid laptop
  drm/amdgpu: fix spelling mistake: "asssert" -> "assert"
  drm/amd/pp: Add new asic support in pp_psm.c
  drm/amd/pp: Clean up powerplay code on Vega12
  drm/amd/pp: Add smu irq handlers for legacy asics
  drm/amd/pp: Fix set wrong temperature range on smu7
  drm/amdgpu: Don't change preferred domian when fallback GTT v5
  drm/vmwgfx: Bump version patchlevel and date
  drm/vmwgfx: use monotonic event timestamps
  drm/vmwgfx: Unpin the screen object backup buffer when not used
  drm/vmwgfx: Stricter count of legacy surface device resources
  ...
2018-04-02 07:59:23 -07:00
Mark Brown
70a3550b8c
Merge remote-tracking branches 'spi/topic/atmel', 'spi/topic/bcm-qspi', 'spi/topic/bcm2835aux', 'spi/topic/dw' and 'spi/topic/gpio' into spi-next 2018-04-02 15:56:34 +01:00
Rafael J. Wysocki
103cf0e579 Merge branches 'pm-cpuidle' and 'pm-tools'
* pm-cpuidle:
  cpuidle: poll_state: Avoid invoking local_clock() too often
  PM: cpuidle/suspend: Add s2idle usage and time state attributes
  cpuidle: Enable coupled cpuidle support on Exynos3250 platform
  cpuidle: poll_state: Add time limit to poll_idle()
  ARM: cpuidle: Drop memory allocation error message from arm_idle_init_cpu()

* pm-tools:
  pm-graph: AnalyzeSuspend v5.0
  pm-graph: AnalyzeBoot v2.2
  pm-graph: config files and installer
2018-04-02 11:01:02 +02:00
Rafael J. Wysocki
a9f36db7ec Merge branch 'pm-cpufreq'
* pm-cpufreq: (38 commits)
  cpufreq: CPPC: Use transition_delay_us depending transition_latency
  cpufreq: tegra186: Don't validate the frequency table twice
  cpufreq: speedstep: Don't validate the frequency table twice
  cpufreq: sparc: Don't validate the frequency table twice
  cpufreq: sh: Don't validate the frequency table twice
  cpufreq: sfi: Don't validate the frequency table twice
  cpufreq: scpi: Don't validate the frequency table twice
  cpufreq: sc520: Don't validate the frequency table twice
  cpufreq: s3c24xx: Don't validate the frequency table twice
  cpufreq: qoirq: Don't validate the frequency table twice
  cpufreq: pxa: Don't validate the frequency table twice
  cpufreq: ppc_cbe: Don't validate the frequency table twice
  cpufreq: powernow: Don't validate the frequency table twice
  cpufreq: p4-clockmod: Don't validate the frequency table twice
  cpufreq: mediatek: Don't validate the frequency table twice
  cpufreq: longhaul: Don't validate the frequency table twice
  cpufreq: ia64-acpi: Don't validate the frequency table twice
  cpufreq: elanfreq: Don't validate the frequency table twice
  cpufreq: e_powersaver: Don't validate the frequency table twice
  cpufreq: cpufreq-dt: Don't validate the frequency table twice
  ...
2018-04-02 11:00:46 +02:00
Rafael J. Wysocki
e3a495c4ee Merge branches 'pm-core', 'pm-sleep' and 'acpi-pm'
* pm-core:
  driver core: Introduce device links reference counting
  PM / wakeirq: Add wakeup name to dedicated wake irqs

* pm-sleep:
  PM / hibernate: Change message when writing to /sys/power/resume
  PM / hibernate: Make passing hibernate offsets more friendly
  PCMCIA / PM: Avoid noirq suspend aborts during suspend-to-idle

* acpi-pm:
  ACPI / PM: Fix keyboard wakeup from suspend-to-idle on ASUS UX331UA
  ACPI / PM: Allow deeper wakeup power states with no _SxD nor _SxW
  ACPI / PM: Reduce LPI constraints logging noise
  ACPI / PM: Do not reconfigure GPEs for suspend-to-idle
2018-04-02 11:00:28 +02:00
Rafael J. Wysocki
0c9ed61bdd Merge branches 'acpi-battery', 'acpi-doc' and 'acpi-pmic'
* acpi-battery:
  Revert "ACPI: battery: Add the ThinkPad "Not Charging" quirk"
  ACPI: battery: do not export degraded capacity values over 100
  ACPI: battery: make function __battery_hook_unregister() static
  ACPI: battery: Add the ThinkPad "Not Charging" quirk
  thinkpad_acpi: Add support for battery thresholds
  power: add to_power_supply macro to the API
  battery: Add the battery hooking API

* acpi-doc:
  ACPI: sysfs: Update device object sysfs documentation

* acpi-pmic:
  ACPI / PMIC: Replace license boilerplate with SPDX license identifier
2018-04-02 10:58:13 +02:00
David S. Miller
c0b458a946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:

1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE -->
   MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE

2) In 'net-next' params->log_rq_size is renamed to be
   params->log_rq_mtu_frames.

3) In 'net-next' params->hard_mtu is added.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01 19:49:34 -04:00
Atul Gupta
e0be6bea25 ethtool: enable Inline TLS in HW
Ethtool option enables TLS record offload on HW, user
configures the feature for netdev capable of Inline TLS.
This allows user to define custom sk_prot for Inline TLS sock

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31 23:37:32 -04:00
David S. Miller
d4069fe6fc Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-03-31

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add raw BPF tracepoint API in order to have a BPF program type that
   can access kernel internal arguments of the tracepoints in their
   raw form similar to kprobes based BPF programs. This infrastructure
   also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which
   returns an anon-inode backed fd for the tracepoint object that allows
   for automatic detach of the BPF program resp. unregistering of the
   tracepoint probe on fd release, from Alexei.

2) Add new BPF cgroup hooks at bind() and connect() entry in order to
   allow BPF programs to reject, inspect or modify user space passed
   struct sockaddr, and as well a hook at post bind time once the port
   has been allocated. They are used in FB's container management engine
   for implementing policy, replacing fragile LD_PRELOAD wrapper
   intercepting bind() and connect() calls that only works in limited
   scenarios like glibc based apps but not for other runtimes in
   containerized applications, from Andrey.

3) BPF_F_INGRESS flag support has been added to sockmap programs for
   their redirect helper call bringing it in line with cls_bpf based
   programs. Support is added for both variants of sockmap programs,
   meaning for tx ULP hooks as well as recv skb hooks, from John.

4) Various improvements on BPF side for the nfp driver, besides others
   this work adds BPF map update and delete helper call support from
   the datapath, JITing of 32 and 64 bit XADD instructions as well as
   offload support of bpf_get_prandom_u32() call. Initial implementation
   of nfp packet cache has been tackled that optimizes memory access
   (see merge commit for further details), from Jakub and Jiong.

5) Removal of struct bpf_verifier_env argument from the print_bpf_insn()
   API has been done in order to prepare to use print_bpf_insn() soon
   out of perf tool directly. This makes the print_bpf_insn() API more
   generic and pushes the env into private data. bpftool is adjusted
   as well with the print_bpf_insn() argument removal, from Jiri.

6) Couple of cleanups and prep work for the upcoming BTF (BPF Type
   Format). The latter will reuse the current BPF verifier log as
   well, thus bpf_verifier_log() is further generalized, from Martin.

7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read
   and write support has been added in similar fashion to existing
   IPv6 IPV6_TCLASS socket option we already have, from Nikita.

8) Fixes in recent sockmap scatterlist API usage, which did not use
   sg_init_table() for initialization thus triggering a BUG_ON() in
   scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and
   uses a small helper sg_init_marker() to properly handle the affected
   cases, from Prashant.

9) Let the BPF core follow IDR code convention and therefore use the
   idr_preload() and idr_preload_end() helpers, which would also help
   idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory
   pressure, from Shaohua.

10) Last but not least, a spelling fix in an error message for the
    BPF cookie UID helper under BPF sample code, from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31 23:33:04 -04:00
Eric Dumazet
bf66337140 inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately
this integer is currently in a different cache line than skb->next,
meaning that we use two cache lines per skb when finding the insertion point.

By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields
in a single cache line and save precious memory bandwidth.

Note that after the fast path added by Changli Gao in commit
d6bebca92c ("fragment: add fast path for in-order fragments")
this change wont help the fast path, since we still need
to access prev->len (2nd cache line), but will show great
benefits when slow path is entered, since we perform
a linear scan of a potentially long list.

Also, note that this potential long list is an attack vector,
we might consider also using an rb-tree there eventually.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31 23:25:40 -04:00
Eric Dumazet
e5d672a078 rhashtable: reorganize struct rhashtable layout
While under frags DDOS I noticed unfortunate false sharing between
@nelems and @params.automatic_shrinking

Move @nelems at the end of struct rhashtable so that first cache line
is shared between all cpus, because almost never dirtied.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31 23:25:39 -04:00
Ingo Molnar
169310f71f Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-31 07:30:17 +02:00
a44406ec3d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix RCU locking in xfrm_local_error(), from Taehee Yoo.

 2) Fix return value assignments and thus error checking in
    iwl_mvm_start_ap_ibss(), from Johannes Berg.

 3) Don't count header length twice in vti4, from Stefano Brivio.

 4) Fix deadlock in rt6_age_examine_exception, from Eric Dumazet.

 5) Fix out-of-bounds access in nf_sk_lookup_slow{v4,v6}() from Subash
    Abhinov.

 6) Check nladdr size in netlink_connect(), from Alexander Potapenko.

 7) VF representor SQ numbers are 32 not 16 bits, in mlx5 driver, from
    Or Gerlitz.

 8) Out of bounds read in skb_network_protocol(), from Eric Dumazet.

 9) r8169 driver sets driver data pointer after register_netdev() which
    is too late. Fix from Heiner Kallweit.

10) Fix memory leak in mlx4 driver, from Moshe Shemesh.

11) The multi-VLAN decap fix added a regression when dealing with device
    that lack a MAC header, such as tun. Fix from Toshiaki Makita.

12) Fix integer overflow in dynamic interrupt coalescing code. From Tal
    Gilboa.

13) Use after free in vrf code, from David Ahern.

14) IPV6 route leak between VRFs fix, also from David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  net: mvneta: fix enable of all initialized RXQs
  net/ipv6: Fix route leaking between VRFs
  vrf: Fix use after free and double free in vrf_finish_output
  ipv6: sr: fix seg6 encap performances with TSO enabled
  net/dim: Fix int overflow
  vlan: Fix vlan insertion for packets without ethernet header
  net: Fix untag for vlan packets without ethernet header
  atm: iphase: fix spelling mistake: "Receiverd" -> "Received"
  vhost: validate log when IOTLB is enabled
  qede: Do not drop rx-checksum invalidated packets.
  hv_netvsc: enable multicast if necessary
  ip_tunnel: Resolve ipsec merge conflict properly.
  lan78xx: Crash in lan78xx_writ_reg (Workqueue: events lan78xx_deferred_multicast_write)
  qede: Fix barrier usage after tx doorbell write.
  vhost: correctly remove wait queue during poll failure
  net/mlx4_core: Fix memory leak while delete slave's resources
  net/mlx4_en: Fix mixed PFC and Global pause user control requests
  net/smc: use announced length in sock_recvmsg()
  llc: properly handle dev_queue_xmit() return value
  strparser: Fix sign of err codes
  ...
2018-03-30 18:47:28 -10:00
Masahiro Yamada
a95b37e20d kbuild: get <linux/compiler_types.h> out of <linux/kconfig.h>
Since commit 28128c61e0 ("kconfig.h: Include compiler types to avoid
missed struct attributes"), <linux/kconfig.h> pulls in kernel-space
headers to unrelated places.

Commit 0f9da844d8 ("MIPS: boot: Define __ASSEMBLY__ for its.S build")
suppress the build error by defining __ASSEMBLY__, but ITS (i.e. DTS)
is not assembly, and should not include <linux/compiler_types.h> in the
first place.

Looking at arch/s390/tools/Makefile, host programs gen_facilities and
gen_opcode_table now pull in <linux/compiler_types.h> as well.

The motivation for that commit was to define necessary attributes
before any struct is defined.  Obviously, this happens only in C.

It is enough to include <linux/compiler_types.h> only when compiling
C files, and only when compiling kernel space.  Move the include to
c_flags.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-03-31 12:22:38 +09:00
Andrey Ignatov
aac3fc320d bpf: Post-hooks for sys_bind
"Post-hooks" are hooks that are called right before returning from
sys_bind. At this time IP and port are already allocated and no further
changes to `struct sock` can happen before returning from sys_bind but
BPF program has a chance to inspect the socket and change sys_bind
result.

Specifically it can e.g. inspect what port was allocated and if it
doesn't satisfy some policy, BPF program can force sys_bind to fail and
return EPERM to user.

Another example of usage is recording the IP:port pair to some map to
use it in later calls to sys_connect. E.g. if some TCP server inside
cgroup was bound to some IP:port_n, it can be recorded to a map. And
later when some TCP client inside same cgroup is trying to connect to
127.0.0.1:port_n, BPF hook for sys_connect can override the destination
and connect application to IP:port_n instead of 127.0.0.1:port_n. That
helps forcing all applications inside a cgroup to use desired IP and not
break those applications if they e.g. use localhost to communicate
between each other.

== Implementation details ==

Post-hooks are implemented as two new attach types
`BPF_CGROUP_INET4_POST_BIND` and `BPF_CGROUP_INET6_POST_BIND` for
existing prog type `BPF_PROG_TYPE_CGROUP_SOCK`.

Separate attach types for IPv4 and IPv6 are introduced to avoid access
to IPv6 field in `struct sock` from `inet_bind()` and to IPv4 field from
`inet6_bind()` since those fields might not make sense in such cases.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:16:26 +02:00
Andrey Ignatov
d74bad4e74 bpf: Hooks for sys_connect
== The problem ==

See description of the problem in the initial patch of this patch set.

== The solution ==

The patch provides much more reliable in-kernel solution for the 2nd
part of the problem: making outgoing connecttion from desired IP.

It adds new attach types `BPF_CGROUP_INET4_CONNECT` and
`BPF_CGROUP_INET6_CONNECT` for program type
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both
source and destination of a connection at connect(2) time.

Local end of connection can be bound to desired IP using newly
introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though,
and doesn't support binding to port, i.e. leverages
`IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this:
* looking for a free port is expensive and can affect performance
  significantly;
* there is no use-case for port.

As for remote end (`struct sockaddr *` passed by user), both parts of it
can be overridden, remote IP and remote port. It's useful if an
application inside cgroup wants to connect to another application inside
same cgroup or to itself, but knows nothing about IP assigned to the
cgroup.

Support is added for IPv4 and IPv6, for TCP and UDP.

IPv4 and IPv6 have separate attach types for same reason as sys_bind
hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields
when user passes sockaddr_in since it'd be out-of-bound.

== Implementation notes ==

The patch introduces new field in `struct proto`: `pre_connect` that is
a pointer to a function with same signature as `connect` but is called
before it. The reason is in some cases BPF hooks should be called way
before control is passed to `sk->sk_prot->connect`. Specifically
`inet_dgram_connect` autobinds socket before calling
`sk->sk_prot->connect` and there is no way to call `bpf_bind()` from
hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since
it'd cause double-bind. On the other hand `proto.pre_connect` provides a
flexible way to add BPF hooks for connect only for necessary `proto` and
call them at desired time before `connect`. Since `bpf_bind()` is
allowed to bind only to IP and autobind in `inet_dgram_connect` binds
only port there is no chance of double-bind.

bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite
of value of `bind_address_no_port` socket field.

bpf_bind() sets `with_lock` to `false` when calling to __inet_bind()
and __inet6_bind() since all call-sites, where bpf_bind() is called,
already hold socket lock.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:15:54 +02:00
Andrey Ignatov
4fbac77d2d bpf: Hooks for sys_bind
== The problem ==

There is a use-case when all processes inside a cgroup should use one
single IP address on a host that has multiple IP configured.  Those
processes should use the IP for both ingress and egress, for TCP and UDP
traffic. So TCP/UDP servers should be bound to that IP to accept
incoming connections on it, and TCP/UDP clients should make outgoing
connections from that IP. It should not require changing application
code since it's often not possible.

Currently it's solved by intercepting glibc wrappers around syscalls
such as `bind(2)` and `connect(2)`. It's done by a shared library that
is preloaded for every process in a cgroup so that whenever TCP/UDP
server calls `bind(2)`, the library replaces IP in sockaddr before
passing arguments to syscall. When application calls `connect(2)` the
library transparently binds the local end of connection to that IP
(`bind(2)` with `IP_BIND_ADDRESS_NO_PORT` to avoid performance penalty).

Shared library approach is fragile though, e.g.:
* some applications clear env vars (incl. `LD_PRELOAD`);
* `/etc/ld.so.preload` doesn't help since some applications are linked
  with option `-z nodefaultlib`;
* other applications don't use glibc and there is nothing to intercept.

== The solution ==

The patch provides much more reliable in-kernel solution for the 1st
part of the problem: binding TCP/UDP servers on desired IP. It does not
depend on application environment and implementation details (whether
glibc is used or not).

It adds new eBPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` and
attach types `BPF_CGROUP_INET4_BIND` and `BPF_CGROUP_INET6_BIND`
(similar to already existing `BPF_CGROUP_INET_SOCK_CREATE`).

The new program type is intended to be used with sockets (`struct sock`)
in a cgroup and provided by user `struct sockaddr`. Pointers to both of
them are parts of the context passed to programs of newly added types.

The new attach types provides hooks in `bind(2)` system call for both
IPv4 and IPv6 so that one can write a program to override IP addresses
and ports user program tries to bind to and apply such a program for
whole cgroup.

== Implementation notes ==

[1]
Separate attach types for `AF_INET` and `AF_INET6` are added
intentionally to prevent reading/writing to offsets that don't make
sense for corresponding socket family. E.g. if user passes `sockaddr_in`
it doesn't make sense to read from / write to `user_ip6[]` context
fields.

[2]
The write access to `struct bpf_sock_addr_kern` is implemented using
special field as an additional "register".

There are just two registers in `sock_addr_convert_ctx_access`: `src`
with value to write and `dst` with pointer to context that can't be
changed not to break later instructions. But the fields, allowed to
write to, are not available directly and to access them address of
corresponding pointer has to be loaded first. To get additional register
the 1st not used by `src` and `dst` one is taken, its content is saved
to `bpf_sock_addr_kern.tmp_reg`, then the register is used to load
address of pointer field, and finally the register's content is restored
from the temporary field after writing `src` value.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:15:18 +02:00
Andrey Ignatov
5e43f899b0 bpf: Check attach type at prog load time
== The problem ==

There are use-cases when a program of some type can be attached to
multiple attach points and those attach points must have different
permissions to access context or to call helpers.

E.g. context structure may have fields for both IPv4 and IPv6 but it
doesn't make sense to read from / write to IPv6 field when attach point
is somewhere in IPv4 stack.

Same applies to BPF-helpers: it may make sense to call some helper from
some attach point, but not from other for same prog type.

== The solution ==

Introduce `expected_attach_type` field in in `struct bpf_attr` for
`BPF_PROG_LOAD` command. If scenario described in "The problem" section
is the case for some prog type, the field will be checked twice:

1) At load time prog type is checked to see if attach type for it must
   be known to validate program permissions correctly. Prog will be
   rejected with EINVAL if it's the case and `expected_attach_type` is
   not specified or has invalid value.

2) At attach time `attach_type` is compared with `expected_attach_type`,
   if prog type requires to have one, and, if they differ, attach will
   be rejected with EINVAL.

The `expected_attach_type` is now available as part of `struct bpf_prog`
in both `bpf_verifier_ops->is_valid_access()` and
`bpf_verifier_ops->get_func_proto()` () and can be used to check context
accesses and calls to helpers correspondingly.

Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and
Daniel Borkmann <daniel@iogearbox.net> here:
https://marc.info/?l=linux-netdev&m=152107378717201&w=2

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:14:44 +02:00
Tariq Toukan
619a8f2a42 net/mlx5e: Use linear SKB in Striding RQ
Current Striding RQ HW feature utilizes the RX buffers so that
there is no wasted room between the strides. This maximises
the memory utilization.
This prevents the use of build_skb() (which requires headroom
and tailroom), and demands to memcpy the packets headers into
the skb linear part.

In this patch, whenever a set of conditions holds, we apply
an RQ configuration that allows combining the use of linear SKB
on top of a Striding RQ.

To use build_skb() with Striding RQ, the following must hold:
1. packet does not cross a page boundary.
2. there is enough headroom and tailroom surrounding the packet.

We can satisfy 1 and 2 by configuring:
	stride size = MTU + headroom + tailoom.

This is possible only when:
a. (MTU - headroom - tailoom) does not exceed PAGE_SIZE.
b. HW LRO is turned off.

Using linear SKB has many advantages:
- Saves a memcpy of the headers.
- No page-boundary checks in datapath.
- No filler CQEs.
- Significantly smaller CQ.
- SKB data continuously resides in linear part, and not split to
  small amount (linear part) and large amount (fragment).
  This saves datapath cycles in driver and improves utilization
  of SKB fragments in GRO.
- The fragments of a resulting GRO SKB follow the IP forwarding
  assumption of equal-size fragments.

Some implementation details:
HW writes the packets to the beginning of a stride,
i.e. does not keep headroom. To overcome this we make sure we can
extend backwards and use the last bytes of stride i-1.
Extra care is needed for stride 0 as it has no preceding stride.
We make sure headroom bytes are available by shifting the buffer
pointer passed to HW by headroom bytes.

This configuration now becomes default, whenever capable.
Of course, this implies turning LRO off.

Performance testing:
ConnectX-5, single core, single RX ring, default MTU.

UDP packet rate, early drop in TC layer:

--------------------------------------------
| pkt size | before    | after     | ratio |
--------------------------------------------
| 1500byte | 4.65 Mpps | 5.96 Mpps | 1.28x |
|  500byte | 5.23 Mpps | 5.97 Mpps | 1.14x |
|   64byte | 5.94 Mpps | 5.96 Mpps | 1.00x |
--------------------------------------------

TCP streams: ~20% gain

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30 16:54:49 -07:00
Saeed Mahameed
b2d3907c23 net/mlx5: Eliminate query xsrq dead code
1. This function is not used anywhere in mlx5 driver
2. It has a memcpy statement that makes no sense and produces build
warning with gcc8

drivers/net/ethernet/mellanox/mlx5/core/transobj.c: In function 'mlx5_core_query_xsrq':
drivers/net/ethernet/mellanox/mlx5/core/transobj.c:347:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict]

Fixes: 01949d0109 ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30 16:16:17 -07:00
Prashant Bhole
f385178679 lib/scatterlist: add sg_init_marker() helper
sg_init_marker initializes sg_magic in the sg table and calls
sg_mark_end() on the last entry of the table. This can be useful to
avoid memset in sg_init_table() when scatterlist is already zeroed out

For example: when scatterlist is embedded inside other struct and that
container struct is zeroed out

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30 22:50:15 +02:00
Andy Shevchenko
9def051018 crypto: Deduplicate le32_to_cpu_array() and cpu_to_le32_array()
Deduplicate le32_to_cpu_array() and cpu_to_le32_array() by moving them
to the generic header.

No functional change implied.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-31 01:33:09 +08:00
Tal Gilboa
f97c3dc3c0 net/dim: Fix int overflow
When calculating difference between samples, the values
are multiplied by 100. Large values may cause int overflow
when multiplied (usually on first iteration).
Fixed by forcing 100 to be of type unsigned long.

Fixes: 4c4dbb4a73 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 12:56:22 -04:00
Toshiaki Makita
c769accdf3 vlan: Fix vlan insertion for packets without ethernet header
In some situation vlan packets do not have ethernet headers. One example
is packets from tun devices. Users can specify vlan protocol in tun_pi
field instead of IP protocol. When we have a vlan device with reorder_hdr
disabled on top of the tun device, such packets from tun devices are
untagged in skb_vlan_untag() and vlan headers will be inserted back in
vlan_insert_inner_tag().

vlan_insert_inner_tag() however did not expect packets without ethernet
headers, so in such a case size argument for memmove() underflowed.

We don't need to copy headers for packets which do not have preceding
headers of vlan headers, so skip memmove() in that case.
Also don't write vlan protocol in skb->data when it does not have enough
room for it.

Fixes: cbe7128c4b ("vlan: Fix out of order vlan headers with reorder header off")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 12:36:27 -04:00
David S. Miller
d162190bde Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree. This batch comes with more input sanitization for xtables to
address bug reports from fuzzers, preparation works to the flowtable
infrastructure and assorted updates. In no particular order, they are:

1) Make sure userspace provides a valid standard target verdict, from
   Florian Westphal.

2) Sanitize error target size, also from Florian.

3) Validate that last rule in basechain matches underflow/policy since
   userspace assumes this when decoding the ruleset blob that comes
   from the kernel, from Florian.

4) Consolidate hook entry checks through xt_check_table_hooks(),
   patch from Florian.

5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
   very large compat offset arrays, so we have a reasonable upper limit
   and fuzzers don't exercise the oom-killer. Patches from Florian.

6) Several WARN_ON checks on xtables mutex helper, from Florian.

7) xt_rateest now has a hashtable per net, from Cong Wang.

8) Consolidate counter allocation in xt_counters_alloc(), from Florian.

9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
   from Xin Long.

10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
    Felix Fietkau.

11) Consolidate code through flow_offload_fill_dir(), also from Felix.

12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
    to remove a dependency with flowtable and ipv6.ko, from Felix.

13) Cache mtu size in flow_offload_tuple object, this is safe for
    forwarding as f87c10a8aa describes, from Felix.

14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
    modular infrastructure, from Felix.

15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
    Ahmed Abdelsalam.

16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.

17) Support for counting only to nf_conncount infrastructure, patch
    from Yi-Hung Wei.

18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
    to nft_ct.

19) Use boolean as return value from ipt_ah and from IPVS too, patch
    from Gustavo A. R. Silva.

20) Remove useless parameters in nfnl_acct_overquota() and
    nf_conntrack_broadcast_help(), from Taehee Yoo.

21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.

22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.

23) Fix typo in xt_limit, from Geert Uytterhoeven.

24) Do no use VLAs in Netfilter code, again from Gustavo.

25) Use ADD_COUNTER from ebtables, from Taehee Yoo.

26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.

27) Use pr_*() and add pr_fmt(), from Arushi Singhal.

28) Add synproxy support to ctnetlink.

29) ICMP type and IGMP matching support for ebtables, patches from
    Matthias Schiffer.

30) Support for the revision infrastructure to ebtables, from
    Bernie Harris.

31) String match support for ebtables, also from Bernie.

32) Documentation for the new flowtable infrastructure.

33) Use generic comparison functions in ebt_stp, from Joe Perches.

34) Demodularize filter chains in nftables.

35) Register conntrack hooks in case nftables NAT chain is added.

36) Merge assignments with return in a couple of spots in the
    Netfilter codebase, also from Arushi.

37) Document that xtables percpu counters are stored in the same
    memory area, from Ben Hutchings.

38) Revert mark_source_chains() sanity checks that break existing
    rulesets, from Florian Westphal.

39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 11:41:18 -04:00
Mike Looijmans
9217e566bd of_net: Implement of_get_nvmem_mac_address helper
It's common practice to store MAC addresses for network interfaces into
nvmem devices. However the code to actually do this in the kernel lacks,
so this patch adds of_get_nvmem_mac_address() for drivers to obtain the
address from an nvmem cell provider.

This is particulary useful on devices where the ethernet interface cannot
be configured by the bootloader, for example because it's in an FPGA.

Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:40:18 -04:00
Russell King
e679c9c1db sfp/phylink: move module EEPROM ethtool access into netdev core ethtool
Provide a pointer to the SFP bus in struct net_device, so that the
ethtool module EEPROM methods can access the SFP directly, rather
than needing every user to provide a hook for it.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:11:06 -04:00
Florian Fainelli
c6ab3008b6 net: phy: phylink: Provide PHY interface to mac_link_{up, down}
In preparation for having DSA transition entirely to PHYLINK, we need to pass a
PHY interface type to the mac_link_{up,down} callbacks because we may have to
make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not pass
an entire phylink_link_state because not all parameters (pause, duplex etc.) are
defined when the link is down, only link and interface are.

Update mvneta accordingly since it currently implements phylink_mac_ops.

Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:11:06 -04:00
Gal Pressman
9daae9bd47 net: Call add/kill vid ndo on vlan filter feature toggling
NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit
flip in dev->features in order to keep the driver in a consistent state.
These features notify the driver of each added/removed vlan, but toggling
of vlan-filter does not notify the driver accordingly for each of the
existing vlans.

This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT
behavior (which notifies the driver about UDP ports in the same manner
that vids are reported).

Each toggling of the features propagates to the 8021q module, which
iterates over the vlans and call add/kill ndo accordingly.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 09:58:59 -04:00
John Fastabend
fa246693a1 bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT:
Add support for the BPF_F_INGRESS flag in skb redirect helper. To
do this convert skb into a scatterlist and push into ingress queue.
This is the same logic that is used in the sk_msg redirect helper
so it should feel familiar.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30 00:09:43 +02:00
John Fastabend
8934ce2fd0 bpf: sockmap redirect ingress support
Add support for the BPF_F_INGRESS flag in sk_msg redirect helper.
To do this add a scatterlist ring for receiving socks to check
before calling into regular recvmsg call path. Additionally, because
the poll wakeup logic only checked the skb recv queue we need to
add a hook in TCP stack (similar to write side) so that we have
a way to wake up polling socks when a scatterlist is redirected
to that sock.

After this all that is needed is for the redirect helper to
push the scatterlist into the psock receive queue.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30 00:09:43 +02:00
Al Viro
903ddaf493 take out orphan externs (empty_string/slash_string)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:46 -04:00
Sudarsana Reddy Kalluru
3a69cae80c qed: Adapter flash update support.
This patch adds the required driver support for updating the flash or
non volatile memory of the adapter. At highlevel, flash upgrade comprises
of reading the flash images from the input file, validating the images and
writing them to the respective paritions.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Michal Kalderon
50bc60cb15 qed*: Utilize FW 8.33.11.0
This FW contains several fixes and features

RDMA Features
- SRQ support
- XRC support
- Memory window support
- RDMA low latency queue support
- RDMA bonding support

RDMA bug fixes
- RDMA remote invalidate during retransmit fix
- iWARP MPA connect interop issue with RTR fix
- iWARP Legacy DPM support
- Fix MPA reject flow
- iWARP error handling
- RQ WQE validation checks

MISC
- Fix some HSI types endianity
- New Restriction: vlan insertion in core_tx_bd_data can't be set
  for LB packets

ETH
- HW QoS offload support
- Fix vlan, dcb and sriov flow of VF sending a packet with
  inband VLAN tag instead of default VLAN
- Allow GRE version 1 offloads in RX flow
- Allow VXLAN steering

iSCSI / FcoE
- Fix bd availability checking flow
- Support 256th sge proerly in iscsi/fcoe retransmit
- Performance improvement
- Fix handle iSCSI command arrival with AHS and with immediate
- Fix ipv6 traffic class configuration

DEBUG
- Update debug utilities

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:18:02 -04:00
David S. Miller
6e2135ce54 mlx5-updates-2018-03-27 (Misc updates & SQ recovery)
This series contains Misc updates and cleanups for mlx5e rx path
 and SQ recovery feature for tx path.
 
 From Tariq: (RX updates)
     - Disable Striding RQ when PCI devices, striding RQ limits the use
       of CQE compression feature, which is very critical for slow PCI
       devices performance, in this change we will prefer CQE compression
       over Striding RQ only on specific "slow"  PCIe links.
     - RX path cleanups
     - Private flag to enable/disable striding RQ
 
 From Eran: (TX fast recovery)
     - TX timeout logic improvements, fast SQ recovery and TX error reporting
       if a HW error occurs while transmitting on a specific SQ, the driver will
       ignore such error and will wait for TX timeout to occur and reset all
       the rings. Instead, the current series improves the resiliency for such
       HW errors by detecting TX completions with errors, which will report them
       and perform a fast recover for the specific faulty SQ even before a TX
       timeout is detected.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJauuK/AAoJEEg/ir3gV/o+Ro4IAIBcQl4S/6jPttGDOPwElU4b
 Q8MuQbnB9k36Kf6x3GrhBqwtoS3OYBLvfyOeZsghD8uyzklhnqR/psrCt1YhLXq7
 aI5iFbuQucCUwIYBscuZegXBs+PTJYALoil05hOwSkwhEBZR53ZvdRQ9Jg0G0QiY
 sYEYAl41A1DWQOLET5lwleUpVQnVnNvIcXqNHKBKbS4f8rmLpfsZ6YAxJiiIjeDJ
 gLbsaxUyY3p77594okBk6DuFBWZsEAKwumspdSc92OEMrZSYE5ugNL+UgVEE7I8F
 JgK/4OUarw3K7xwhAg//ZhTM1iRRdJv0Sz2GPhhWRzYB0jWACc3ZG2/M/WmxFIg=
 =Uslu
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-03-27 (Misc updates & SQ recovery)

This series contains Misc updates and cleanups for mlx5e rx path
and SQ recovery feature for tx path.

From Tariq: (RX updates)
    - Disable Striding RQ when PCI devices, striding RQ limits the use
      of CQE compression feature, which is very critical for slow PCI
      devices performance, in this change we will prefer CQE compression
      over Striding RQ only on specific "slow"  PCIe links.
    - RX path cleanups
    - Private flag to enable/disable striding RQ

From Eran: (TX fast recovery)
    - TX timeout logic improvements, fast SQ recovery and TX error reporting
      if a HW error occurs while transmitting on a specific SQ, the driver will
      ignore such error and will wait for TX timeout to occur and reset all
      the rings. Instead, the current series improves the resiliency for such
      HW errors by detecting TX completions with errors, which will report them
      and perform a fast recover for the specific faulty SQ even before a TX
      timeout is detected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:01:40 -04:00
Kirill Tkhai
f0b07bb151 net: Introduce net_rwsem to protect net_namespace_list
rtnl_lock() is used everywhere, and contention is very high.
When someone wants to iterate over alive net namespaces,
he/she has no a possibility to do that without exclusive lock.
But the exclusive rtnl_lock() in such places is overkill,
and it just increases the contention. Yes, there is already
for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
and this can't be sleepable. Also, sometimes it may be need
really prevent net_namespace_list growth, so for_each_net_rcu()
is not fit there.

This patch introduces new rw_semaphore, which will be used
instead of rtnl_mutex to protect net_namespace_list. It is
sleepable and allows not-exclusive iterations over net
namespaces list. It allows to stop using rtnl_lock()
in several places (what is made in next patches) and makes
less the time, we keep rtnl_mutex. Here we just add new lock,
while the explanation of we can remove rtnl_lock() there are
in next patches.

Fine grained locks generally are better, then one big lock,
so let's do that with net_namespace_list, while the situation
allows that.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:53 -04:00
Alexander Shishkin
6ed70cf342 perf/x86/pt, coresight: Clean up address filter structure
This is a cosmetic patch that deals with the address filter structure's
ambiguous fields 'filter' and 'range'. The former stands to mean that the
filter's *action* should be to filter the traces to its address range if
it's set or stop tracing if it's unset. This is confusing and hard on the
eyes, so this patch replaces it with 'action' enum. The 'range' field is
completely redundant (meaning that the filter is an address range as
opposed to a single address trigger), as we can use zero size to mean the
same thing.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180329120648.11902-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-29 16:07:22 +02:00
Ingo Molnar
2d074918fb Merge branch 'perf/urgent' into perf/core
Conflicts:
	kernel/events/hw_breakpoint.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-29 16:03:48 +02:00
Greg Kroah-Hartman
86f690e8bf stm class/intel_th: Updates for 4.17
These are:
   * Mass conversion to GPL-2 SPDX header
   * Moved "hwtracing" to now its own submenu, to uncrowd the parent menu a bit
   * Added MAINTAINERS entry for drivers/hwtracing
   * Somewhat small Trace Hub fixes
   * Added ACPI glue layer for the Trace Hub
   * Added more module parameters to dummy_stm for better test coverage
 -----BEGIN PGP SIGNATURE-----
 
 iJkEABEIAEEWIQQSviFCoXpKPDNATbnrxfYkYwVX/wUCWrzG7yMcYWxleGFuZGVy
 LnNoaXNoa2luQGxpbnV4LmludGVsLmNvbQAKCRDrxfYkYwVX/0jcAQCE/maw6L5P
 NdT3Ck3EuDWzij9uimq3Y0UyI9F4Cd+vYgD+N8Skyiua+NqfBGB0oXNsJt2acirg
 O3G990RK9dX4H2c=
 =CuGw
 -----END PGP SIGNATURE-----

Merge tag 'stm-intel_th-for-greg-20180329' of git://git.kernel.org/pub/scm/linux/kernel/git/ash/stm into char-misc-next

Alexander writes:

stm class/intel_th: Updates for 4.17

These are:
  * Mass conversion to GPL-2 SPDX header
  * Moved "hwtracing" to now its own submenu, to uncrowd the parent menu a bit
  * Added MAINTAINERS entry for drivers/hwtracing
  * Somewhat small Trace Hub fixes
  * Added ACPI glue layer for the Trace Hub
  * Added more module parameters to dummy_stm for better test coverage
2018-03-29 14:15:13 +02:00
Thomas Gleixner
71e6882b0a irqchip updates for 4.17
- New Qualcomm PDC irqchip
 - New Microsemi Ocelot irqchip
 - Suspend/resume support for some oddball GICv3 irqchip
 - Better GIC/GICv3 support for kexec
 - Various cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlq8xbMVHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDHlYQAKBxb1i3pZ9rmiuQPdrP8b2Yqalg
 M98ojKVGkX3gjsPQVEbeDW8r+/JShNJWzf7J885N0FDJ61+MAOJaA9Zgs9QkZiDp
 l2iNiiIYa9yFe1u/5vg5PEyIfw7HzzXSqW2EKpFqq2SoeGiX+cQGIdR+dmWADQgV
 VACM72aki0ERiZOdBrUy0sADYkL7qEUmeADjSYvSnqluIr10TWmKY11UGtTQI9gx
 aE2AZm+vcYsAeRndFybDAeZ02DT47nFgOETqss1OhO8Z99rrcRI98nyowYCWQGRH
 unRSPJxwaRF99/IFD+/zNGf9+C8eNTznqL1f/Qw7/I+3vAx0E4IcwN0H/y/9ylpg
 9OsKjJTO9GXy8PqPQnLWdeoS2YAVVWn8PppvhgNmIt9K7cAbie1ump5tBWRId9ZE
 G/YjLz0wnPXxbP6YoLm+YrOxvBH37zPihqZSm+hsNSV1IpZkn1e3QsJqxuTOdJyp
 OrEJKbDQxc9i5dZJQh5urhjEhFR6PQp9KqWcbzg1vrFgmGvbniE4r/ReH0aQHljx
 wDNGMuSS80xJNmKAs7rKW5HNoWb1963LIa2z27Sk94lQeFHsLxpOWMf/2mCSVZkv
 BgkZ0CfROcIdxXNcJaz6LlGVqdO/oIjdDGC5TWL+2yS3puatonjcduYYZEjB6gbv
 IETl4aD8E/tvpsxf
 =Zqil
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates for 4.17 from Marc Zyngier:

 - New Qualcomm PDC irqchip
 - New Microsemi Ocelot irqchip
 - Suspend/resume support for some oddball GICv3 irqchip
 - Better GIC/GICv3 support for kexec
 - Various cleanups and fixes
2018-03-29 13:21:29 +02:00
Rafael J. Wysocki
64bdff6980 PM: cpuidle/suspend: Add s2idle usage and time state attributes
Add a new attribute group called "s2idle" under the sysfs directory
of each cpuidle state that supports the ->enter_s2idle callback
and put two new attributes, "usage" and "time", into that group to
represent the number of times the given state was requested for
suspend-to-idle and the total time spent in suspend-to-idle after
requesting that state, respectively.

That will allow diagnostic information related to suspend-to-idle
to be collected without enabling advanced debug features and
analyzing dmesg output.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-03-29 13:06:08 +02:00
Jakub Kicinski
e59ac63490 bpf: add parenthesis around argument of BPF_LDST_BYTES()
BPF_LDST_BYTES() does not put it's argument in parenthesis
when referencing it.  This makes it impossible to pass pointers
obtained by address-of operator (e.g. BPF_LDST_BYTES(&insn)).
Add the parenthesis.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-03-28 19:36:13 -07:00
Alexei Starovoitov
c4f6699dfc bpf: introduce BPF_RAW_TRACEPOINT
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
       __u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
  // program can read args[N] where N depends on tracepoint
  // and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
   0xffffffff81132080 <+0>:     mov    %ecx,%ecx
   0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
        TP_PROTO(const struct net_device *dev,
                 const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.

This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.

The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

__bpf_raw_tp_map section logic was contributed by Steven Rostedt

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28 22:55:19 +02:00
Alexei Starovoitov
cf14f27f82 macro: introduce COUNT_ARGS() macro
move COUNT_ARGS() macro from apparmor to generic header and extend it
to count till twelve.

COUNT() was an alternative name for this logic, but it's used for
different purpose in many other places.

Similarly for CONCATENATE() macro.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28 22:55:19 +02:00
Stephen Rothwell
8ecd2953d0 ipc/shm: fix up for struct file no longer being available in shm.h
Stephen Rothewell <sfr@canb.auug.org.au> wrote:

> After merging the userns tree, today's linux-next build (powerpc
> ppc64_defconfig) produced this warning:
>
> In file included from include/linux/sched.h:16:0,
>                  from arch/powerpc/lib/xor_vmx_glue.c:14:
> include/linux/shm.h:17:35: error: 'struct file' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
>  bool is_file_shm_hugepages(struct file *file);
>                                    ^~~~
>
> and many, many more (most warnings, but some errors - arch/powerpc is
> mostly built with -Werror)

I dug through this and I discovered that the error was caused by the
removal of struct shmid_kernel from shm.h when building on powerpc.
Except for observing the existence of "struct file *shm_file" in
struct shmid_kernel I have no clue why the structure move would cause
such a failure.  I suspect shm.h always needed the forward declaration
and someting had been confusing gcc into not issuing the warning. --EWB

Fixes: a2e102cd3c ("shm: Move struct shmid_kernel into ipc/shm.c")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-03-28 13:29:57 -05:00
Alexander Shishkin
9ea393d8d8 stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
This adds SPDX GPL-2.0 header to to stm core files and removes the
GPLv2 boilerplate text.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
2018-03-28 18:47:17 +03:00