linux-hardened/kernel
Davidlohr Bueso e38513905e locking/rwsem: Rework zeroing reader waiter->task
Readers that are awoken will expect a nil ->task indicating
that a wakeup has occurred. Because of the way readers are
implemented, there's a small chance that the waiter will never
block in the slowpath (rwsem_down_read_failed), and therefore
requires some form of reference counting to avoid the following
scenario:

rwsem_down_read_failed()		rwsem_wake()
  get_task_struct();
  spin_lock_irq(&wait_lock);
  list_add_tail(&waiter.list)
  spin_unlock_irq(&wait_lock);
					  raw_spin_lock_irqsave(&wait_lock)
					  __rwsem_do_wake()
  while (1) {
    set_task_state(TASK_UNINTERRUPTIBLE);
					    waiter->task = NULL
    if (!waiter.task) // true
      break;
    schedule() // never reached

   __set_task_state(TASK_RUNNING);
 do_exit();
					    wake_up_process(tsk); // boom

... and therefore race with do_exit() when the caller returns.

There is also a mismatch between the smp_mb() and its documentation,
in that the serialization is done between reading the task and the
nil store. Furthermore, in addition to having the overlapping of
loads and stores to waiter->task guaranteed to be ordered within
that CPU, both wake_up_process() originally and now wake_q_add()
already imply barriers upon successful calls, which serves the
comment.

Now, as an alternative to perhaps inverting the checks in the blocker
side (which has its own penalty in that schedule is unavoidable),
with lockless wakeups this situation is naturally addressed and we
can just use the refcount held by wake_q_add(), instead doing so
explicitly. Of course, we must guarantee that the nil store is done
as the _last_ operation in that the task must already be marked for
deletion to not fall into the race above. Spurious wakeups are also
handled transparently in that the task's reference is only removed
when wake_up_q() is actually called _after_ the nil store.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman.Long@hpe.com
Cc: dave@stgolabs.net
Cc: jason.low2@hp.com
Cc: peter@hurleysoftware.com
Link: http://lkml.kernel.org/r/1463165787-25937-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-03 09:47:12 +02:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-31 22:28:28 -07:00
configs
debug mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings 2016-02-22 08:51:37 +01:00
events Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-05-25 17:05:40 -07:00
gcov gcov: disable for COMPILE_TEST 2016-05-10 17:12:49 +02:00
irq radix-tree: introduce radix_tree_empty 2016-05-20 17:58:30 -07:00
livepatch Merge branches 'for-4.7/core', 'for-4.7/livepatching-doc' and 'for-4.7/livepatching-ppc64' into for-linus 2016-05-17 12:06:35 +02:00
locking locking/rwsem: Rework zeroing reader waiter->task 2016-06-03 09:47:12 +02:00
power PM / Hibernate: Call flush_icache_range() on pages restored in-place 2016-04-28 13:35:48 +01:00
printk printk/nmi: flush NMI messages on the system panic 2016-05-20 17:58:30 -07:00
rcu debugobjects: insulate non-fixup logic related to static obj from fixup callbacks 2016-05-19 19:12:14 -07:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-05-25 17:11:43 -07:00
time timer: Export destroy_hrtimer_on_stack() 2016-05-31 11:44:08 -07:00
trace Three more changes. 2016-05-22 19:40:39 -07:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c
async.c async: export current_is_async() 2015-11-19 17:51:48 +01:00
audit.c Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit 2016-05-18 18:46:55 -07:00
audit.h security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
audit_fsnotify.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit_tree.c audit: cleanup prune_tree_thread 2016-04-04 09:46:47 -04:00
audit_watch.c don't bother with ->d_inode->i_sb - it's always equal to ->d_sb 2016-04-10 17:11:51 -04:00
auditfilter.c audit: Fix typo in comment 2016-02-08 11:25:39 -05:00
auditsc.c Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit 2016-05-18 18:46:55 -07:00
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup: fix compile warning 2016-05-12 11:05:27 -04:00
cgroup_freezer.c cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends 2015-12-03 10:24:08 -05:00
cgroup_pids.c cgroup_pids: fix a typo. 2015-12-14 14:54:37 -05:00
compat.c
configs.c
context_tracking.c context_tracking: Switch to new static_branch API 2015-11-24 09:56:43 +01:00
cpu.c sched/hotplug: Make activate() the last hotplug step 2016-05-06 14:58:25 +02:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpuset.c cpuset: use static key better and convert to new API 2016-05-19 19:12:14 -07:00
crash_dump.c
cred.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
delayacct.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
dma.c
elfcore.c
exec_domain.c
exit.c wait: allow sys_waitid() to accept __WNOTHREAD/__WCLONE/__WALL 2016-05-23 17:04:14 -07:00
extable.c kernel/extable.c: remove duplicated include 2015-09-10 13:29:01 -07:00
fork.c mm: oom_reaper: remove some bloat 2016-05-26 15:35:44 -07:00
freezer.c
futex.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
groups.c
hung_task.c kernel/hung_task.c: use timeout diff when timeout is updated 2016-03-22 15:36:02 -07:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
kallsyms.c kallsyms: add support for relative offsets in kallsyms address table 2016-03-15 16:55:16 -07:00
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: don't profile branches in kcov 2016-04-28 19:34:04 -07:00
kexec.c s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() 2016-05-23 17:04:14 -07:00
kexec_core.c s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() 2016-05-23 17:04:14 -07:00
kexec_file.c kexec: introduce a protection mechanism for the crashkernel reserved memory 2016-05-23 17:04:14 -07:00
kexec_internal.h kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c
ksysfs.c rcu: Remove TINY_RCU bloat from pointless boot parameters 2015-12-07 16:59:37 -08:00
kthread.c kernel/kthread.c:kthread_create_on_node(): clarify documentation 2015-09-04 16:54:41 -07:00
latencytop.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
Makefile ELF/MIPS build fix 2016-05-23 17:04:14 -07:00
membarrier.c sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
memremap.c memremap: add arch specific hook for MEMREMAP_WB mappings 2016-04-04 10:26:41 +02:00
module-internal.h
module.c module: preserve Elf information for livepatch modules 2016-04-01 15:00:10 +02:00
module_signing.c KEYS: Move the point of trust determination to __key_link() 2016-04-11 22:43:43 +01:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c cgroup: introduce cgroup namespaces 2016-02-16 13:04:58 -05:00
padata.c kernel/padata.c: hide unused functions 2016-05-19 19:12:14 -07:00
panic.c printk/nmi: flush NMI messages on the system panic 2016-05-20 17:58:30 -07:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
pid_namespace.c
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2016-03-22 15:36:02 -07:00
ptrace.c ptrace: change __ptrace_unlink() to clear ->ptrace under ->siglock 2016-03-22 15:36:02 -07:00
range.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2016-04-14 12:56:09 -07:00
seccomp.c Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-05-19 10:02:26 -07:00
signal.c kernel/signal.c: convert printk(KERN_<LEVEL> ...) to pr_<level>(...) 2016-05-23 17:04:14 -07:00
smp.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 13:50:29 -07:00
smpboot.c cpu/hotplug: Unpark smpboot threads from the state machine 2016-03-01 20:36:56 +01:00
smpboot.h cpu/hotplug: Create hotplug threads 2016-03-01 20:36:56 +01:00
softirq.c arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
stacktrace.c
stop_machine.c kernel/stop_machine.c: remove CONFIG_SMP dependencies 2016-01-16 11:17:24 -08:00
sys.c prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable 2016-05-23 17:04:14 -07:00
sys_ni.c vfs: add copy_file_range syscall and vfs helper 2015-12-01 14:00:53 -05:00
sysctl.c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-05-25 17:05:40 -07:00
sysctl_binary.c kernel/sysctl_binary.c: use generic UUID library 2016-05-20 17:58:30 -07:00
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c taskstats: use the libnl API to align nlattr on 64-bit 2016-04-23 20:13:25 -04:00
test_kprobes.c
torture.c rcutorture: Dump trace buffer upon shutdown 2016-04-21 13:47:04 -07:00
tracepoint.c kernel/...: convert pr_warning to pr_warn 2016-03-22 15:36:02 -07:00
tsacct.c time, acct: Drop irq save & restore from __acct_update_integrals() 2016-02-29 09:53:09 +01:00
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c kernel/*: switch to memdup_user_nul() 2016-01-04 10:27:55 -05:00
utsname.c
utsname_sysctl.c
watchdog.c watchdog: don't run proc_watchdog_update if new value is same as old 2016-03-17 15:09:34 -07:00
workqueue.c debugobjects: insulate non-fixup logic related to static obj from fixup callbacks 2016-05-19 19:12:14 -07:00
workqueue_internal.h sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() 2016-03-02 10:28:47 -05:00