Commit graph

330 commits

Author SHA1 Message Date
Paul E. McKenney
4b455dc3e1 rcu: Catch up rcu_report_qs_rdp() comment with reality
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:56 -08:00
Boqun Feng
67c583a7de RCU: Privatize rcu_node::lock
In patch:

"rcu: Add transitivity to remaining rcu_node ->lock acquisitions"

All locking operations on rcu_node::lock are replaced with the wrappers
because of the need of transitivity, which indicates we should never
write code using LOCK primitives alone(i.e. without a proper barrier
following) on rcu_node::lock outside those wrappers. We could detect
this kind of misuses on rcu_node::lock in the future by adding __private
modifier on rcu_node::lock.

To privatize rcu_node::lock, unlock wrappers are also needed. Replacing
spinlock unlocks with these wrappers not only privatizes rcu_node::lock
but also makes it easier to figure out critical sections of rcu_node.

This patch adds __private modifier to rcu_node::lock and makes every
access to it wrapped by ACCESS_PRIVATE(). Besides, unlock wrappers are
added and raw_spin_unlock(&rnp->lock) and its friends are replaced with
those wrappers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:54 -08:00
Chen Gang
1914aab543 rcu: Remove useless rcu_data_p when !PREEMPT_RCU
The related warning from gcc 6.0:

  In file included from kernel/rcu/tree.c:4630:0:
  kernel/rcu/tree_plugin.h:810:40: warning: ‘rcu_data_p’ defined but not used [-Wunused-const-variable]
   static struct rcu_data __percpu *const rcu_data_p = &rcu_sched_data;
                                          ^~~~~~~~~~

Also remove always redundant rcu_data_p in tree.c.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:53 -08:00
Paul E. McKenney
23a9bacd35 rcu: Set rdp->gpwrap when CPU is idle
Commit #e3663b1024d1 ("rcu: Handle gpnum/completed wrap while dyntick
idle") sets rdp->gpwrap on the wrong side of the "if" statement in
dyntick_save_progress_counter(), that is, it sets it when the CPU is
not idle instead of when it is idle.  Of course, if the CPU is not idle,
its rdp->gpnum won't be lagging beind the global rsp->gpnum, which means
that rdp->gpwrap will never be set.

This commit therefore moves this code to the proper leg of that "if"
statement.  This change means that the "else" cause is just "return 0"
and the "then" clause ends with "return 1", so also move the "return 0"
to follow the "if", dropping the "else" clause.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:52 -08:00
Paul E. McKenney
4914950aaa rcu: Stop treating in-kernel CPU-bound workloads as errors
Commit 4a81e8328d ("Reduce overhead of cond_resched() checks for RCU")
handles the error case where a nohz_full loops indefinitely in the kernel
with the scheduling-clock interrupt disabled.  However, this handling
includes IPIing the CPU running the offending loop, which is not what
we want for real-time workloads.  And there are starting to be real-time
CPU-bound in-kernel workloads, and these must be handled without IPIing
the CPU, at least not in the common case.  Therefore, this situation can
no longer be dismissed as an error case.

This commit therefore splits the handling out, so that the setting of
bits in the per-CPU rcu_sched_qs_mask variable is done relatively early,
but if the problem persists, resched_cpu() is eventually used to IPI the
CPU containing the offending loop.  Assuming that in-kernel CPU-bound
loops used by real-time tasks contain frequent calls cond_resched_rcu_qs()
(as in more than once per few tens of milliseconds), the real-time tasks
will never be IPIed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2016-02-23 19:59:51 -08:00
Paul E. McKenney
8994515cf0 rcu: Update rcu_report_qs_rsp() comment
The header comment for rcu_report_qs_rsp() was obsolete, dating well
before the advent of RCU grace-period kthreads.  This commit therefore
brings this comment back into alignment with current reality.

Reported-by: Lihao Liang <lihao.liang@cs.ox.ac.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:51 -08:00
Paul E. McKenney
bb53e416e0 rcu: Assign false instead of 0 for ->core_needs_qs
A zero seems to have escaped earlier true/false substitution efforts,
so this commit changes 0 to false for the ->core_needs_qs boolean field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:51 -08:00
Paul E. McKenney
648c630c64 Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', 'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD
doc.2015.12.05a:  Documentation updates
exp.2015.12.07a:  Expedited grace-period updates
fixes.2015.12.07a:  Miscellaneous fixes
list.2015.12.04b:  Linked-list updates
torture.2015.12.05a:  Torture-test updates
2015-12-07 17:02:54 -08:00
Paul E. McKenney
45fed3e7cf rcu: Make rcu_gp_init() be bool rather than int
The return value from rcu_gp_init() is always used as a bool, so
this commit makes it be a bool.

Reported-by: Iftekhar Ahmed <ahmedi@oregonstate.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:33 -08:00
Peter Zijlstra
e11f13355b rcu: Move wakeup out from under rnp->lock
This patch removes a potential deadlock hazard by moving the
wake_up_process() in rcu_spawn_gp_kthread() out from under rnp->lock.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:32 -08:00
Paul E. McKenney
7c9906ca5e rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}()
This commit replaces a local_irq_save()/local_irq_restore() pair with
a lockdep assertion that interrupts are already disabled.  This should
remove the corresponding overhead from the interrupt entry/exit fastpaths.

This change was inspired by the fact that Iftekhar Ahmed's mutation
testing showed that removing rcu_irq_enter()'s call to local_ird_restore()
had no effect, which might indicate that interrupts were always enabled
anyway.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:31 -08:00
Paul E. McKenney
d117c8aa1d rcu: Make cpu_needs_another_gp() be bool
The cpu_needs_another_gp() function is currently of type int, but only
returns zero or one.  Bow to reality and make it be of type bool.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:31 -08:00
Paul E. McKenney
a87f203e27 rcu: Eliminate unused rcu_init_one() argument
Now that the rcu_state structure's ->rda field is compile-time initialized,
there is no need to pass the per-CPU rcu_data structure into rcu_init_one().
This commit therefore eliminates this now-unused parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:19 -08:00
Paul E. McKenney
6b50e119c4 rcutorture: Print symbolic name for ->gp_state
Currently, ->gp_state is printed as an integer, which slows debugging.
This commit therefore prints a symbolic name in addition to the integer.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Updated to fix relational operator called out by Dan Carpenter. ]
[ paulmck: More "const", as suggested by Josh Triplett. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-05 17:58:26 -08:00
Paul E. McKenney
b1adb3e273 rcutorture: Dump stack when GP kthread stalls
This commit increases debug information in the case where the grace-period
kthread is being prevented from running by dumping that kthread's stack.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Split into prior commit and this commit, as suggested by
  Josh Triplett. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-05 17:58:05 -08:00
Paul E. McKenney
a0e3a3aa28 rcutorture: Flag nonexistent RCU GP kthread
Currently, if the RCU grace-period kthread has not yet been created,
in which case the starvation-check code will print zero for the state,
which maps to TASK_RUNNING.  This could clearly be quite confusing, so
this commit prints ~0, which does not map to any legal ->state value.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-05 17:58:00 -08:00
Paul E. McKenney
46a5d164db rcu: Stop disabling interrupts in scheduler fastpaths
We need the scheduler's fastpaths to be, well, fast, and unnecessarily
disabling and re-enabling interrupts is not necessarily consistent with
this goal.  Especially given that there are regions of the scheduler that
already have interrupts disabled.

This commit therefore moves the call to rcu_note_context_switch()
to one of the interrupts-disabled regions of the scheduler, and
removes the now-redundant disabling and re-enabling of interrupts from
rcu_note_context_switch() and the functions it calls.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested
  by Peter Zijlstra. ]
2015-12-04 12:27:31 -08:00
Paul E. McKenney
fecbf6f01f rcu: Simplify rcu_sched_qs() control flow
This commit applies an early-exit approach to rcu_sched_qs(), reducing
the nesting level and saving a line of code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:27:29 -08:00
Paul E. McKenney
3dc5dbe9a1 rcu: Move lock_class_key to local scope
Currently, the rcu_node_class[], rcu_fqs_class[], and rcu_exp_class[]
arrays needlessly pollute the global namespace within tree.c.  This
commit therefore converts them to static local variables within
rcu_init_one().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:27:29 -08:00
Paul E. McKenney
5a9be7c628 rcu: Add rcu_normal kernel parameter to suppress expediting
Although expedited grace periods can be quite useful, and although their
OS jitter has been greatly reduced, they can still pose problems for
extreme real-time workloads.  This commit therefore adds a rcu_normal
kernel boot parameter (which can also be manipulated via sysfs)
to suppress expedited grace periods, that is, to treat requests for
expedited grace periods as if they were requests for normal grace periods.
If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
This means that if you are relying on expedited grace periods to speed up
boot, you will want to specify rcu_expedited on the kernel command line,
and then specify rcu_normal via sysfs once boot completes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:53 -08:00
Paul E. McKenney
72611ab9f5 rcu: Add more diagnostics to expedited stall warning messages.
This commit adds print statements that check the rcu_node structure to
find which ->expmask bits and which ->exp_tasks structures are blocking
the current expedited grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:53 -08:00
Paul E. McKenney
73f36f9de8 rcu: Make expedited grace periods resolve stall-warning ties
Currently, if a grace period ends just as the stall-warning timeout
fires, an empty stall warning will be printed.  This is not helpful,
so this commit avoids these useless warnings by rechecking completion
after awakening in synchronize_sched_expedited_wait().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:52 -08:00
Paul E. McKenney
df5bd5144a rcu: Reduce expedited GP memory contention via per-CPU variables
Currently, the piggybacked-work checks carried out by sync_exp_work_done()
atomically increment a small set of variables (the ->expedited_workdone0,
->expedited_workdone1, ->expedited_workdone2, ->expedited_workdone3
fields in the rcu_state structure), which will form a memory-contention
bottleneck given a sufficiently large number of CPUs concurrently invoking
either synchronize_rcu_expedited() or synchronize_sched_expedited().

This commit therefore moves these for fields to the per-CPU rcu_data
structure, eliminating the memory contention.  The show_rcuexp() function
also changes to sum up each field in the rcu_data structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:52 -08:00
Paul E. McKenney
1307f21487 rcu: Invert sync_rcu_exp_select_cpus() "if" statement
This commit saves a couple lines of code and reduces indentation
by inverting the sense of an "if" statement in the function
sync_rcu_exp_select_cpus().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:51 -08:00
Paul E. McKenney
886ef5a18a rcu: Move smp_mb() from rcu_seq_snap() to rcu_exp_gp_seq_snap()
The memory barrier in rcu_seq_snap() is needed only for grace periods,
so this commit moves it to the grace-period-oriented wrapper
rcu_exp_gp_seq_snap().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:51 -08:00
Paul E. McKenney
06f60de19d rcu: Short-circuit synchronize_sched_expedited() if only one CPU
If there is only one CPU, then invoking synchronize_sched_expedited()
is by definition a grace period.  This commit checks for this condition
and does a short-circuit return in that case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:50 -08:00
Paul E. McKenney
6cf1008122 rcu: Add transitivity to remaining rcu_node ->lock acquisitions
The rule is that all acquisitions of the rcu_node structure's ->lock
must provide transitivity:  The lock is not acquired that frequently,
and sorting out exactly which required it and which did not would be
a maintenance nightmare.  This commit therefore supplies the needed
transitivity to the remaining ->lock acquisitions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-11-23 10:37:35 -08:00
Peter Zijlstra
2a67e741bb rcu: Create transitive rnp->lock acquisition functions
Providing RCU's memory-ordering guarantees requires that the rcu_node
tree's locking provide transitive memory ordering, which the Linux kernel's
spinlocks currently do not provide unless smp_mb__after_unlock_lock()
is used.  Having a separate smp_mb__after_unlock_lock() after each and
every lock acquisition is error-prone, hard to read, and a bit annoying,
so this commit provides wrapper functions that pull in the
smp_mb__after_unlock_lock() invocations.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-11-23 10:37:35 -08:00
Paul E. McKenney
d2856b046d Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEAD
exp.2015.10.07a:  Reduce OS jitter of RCU-sched expedited grace periods.
fixes.2015.10.06a:  Miscellaneous fixes.
2015-10-07 16:05:21 -07:00
Paul E. McKenney
338b0f760e rcu: Better hotplug handling for synchronize_sched_expedited()
Earlier versions of synchronize_sched_expedited() can prematurely end
grace periods due to the fact that a CPU marked as cpu_is_offline()
can still be using RCU read-side critical sections during the time that
CPU makes its last pass through the scheduler and into the idle loop
and during the time that a given CPU is in the process of coming online.
This commit therefore eliminates this window by adding additional
interaction with the CPU-hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
c58656382e rcu: Add tasks to expedited stall-warning messages
This commit adds task-print ability to the expedited RCU CPU stall
warning messages in preparation for adding stall warnings to
synchornize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
74611ecb0f rcu: Add online/offline info to expedited stall warning message
This commit makes the RCU CPU stall warning message print online/offline
indications immediately after the CPU number.  A "O" indicates global
offline, a "." global online, and a "o" indicates RCU believes that the
CPU is offline for the current grace period and "." otherwise, and an
"N" indicates that RCU believes that the CPU will be offline for the
next grace period, and "." otherwise, all right after the CPU number.
So for CPU 10, you would normally see "10-...:" indicating that everything
believes that the CPU is online.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
dcdb8807ba rcu: Consolidate expedited CPU selection
Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus()
are identical aside from the the argument to smp_call_function_single(),
this commit consolidates them with a functional argument.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
66fe6cbee4 rcu: Prepare for consolidating expedited CPU selection
This commit brings sync_sched_exp_select_cpus() into alignment with
sync_rcu_exp_select_cpus(), as a first step towards consolidating them
into one function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
807226e2fb rcu: Stop excluding CPU hotplug in synchronize_sched_expedited()
Now that synchronize_sched_expedited() uses IPIs, a hook in
rcu_sched_qs(), and the ->expmask field in the rcu_node combining
tree, it is no longer necessary to exclude CPU hotplug.  Any
races with CPU hotplug will be detected when attempting to send
the IPI.  This commit therefore removes the code excluding
CPU hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:49 -07:00
Paul E. McKenney
83c2c735e7 rcu: Stop silencing lockdep false positive for expedited grace periods
This reverts commit af859beaab (rcu: Silence lockdep false positive
for expedited grace periods).  Because synchronize_rcu_expedited()
no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
acquisition is no longer nested, so the false positive no longer happens.
This commit therefore removes the extra lockdep data structures, as they
are no longer needed.
2015-10-07 16:02:49 -07:00
Paul E. McKenney
6587a23b6b rcu: Switch synchronize_sched_expedited() to IPI
This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
to smp_call_function_single(), thus moving from an IPI and a pair of
context switches to an IPI and a single pass through the scheduler.
Of course, if the scheduler actually does decide to switch to a different
task, there will still be a pair of context switches, but there would
likely have been a pair of context switches anyway, just a bit later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:01:12 -07:00
Petr Mladek
77f81fe08e rcu: Finish folding ->fqs_state into ->gp_state
Commit commit 4cdfc175c2 ("rcu: Move quiescent-state forcing
into kthread") started the process of folding the old ->fqs_state into
->gp_state, but did not complete it.  This situation does not cause
any malfunction, but can result in extremely confusing trace output.
This commit completes this task of eliminating ->fqs_state in favor
of ->gp_state.

The old ->fqs_state was also used to decide when to collect dyntick-idle
snapshots.  For this purpose, we add a boolean variable into the kthread,
which is set on the first call to rcu_gp_fqs() for a given grace period
and clear otherwise.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:15:59 -07:00
Paul E. McKenney
ee968ac61d rcu: Eliminate panic when silly boot-time fanout specified
This commit loosens rcutree.rcu_fanout_leaf range checks
and replaces a panic() with a fallback to compile-time values.
This fallback is accompanied by a WARN_ON(), and both occur when the
rcutree.rcu_fanout_leaf value is too small to accommodate the number of
CPUs.  For example, given the current four-level limit for the rcu_node
tree, a system with more than 16 CPUs built with CONFIG_FANOUT=2 must
have rcutree.rcu_fanout_leaf larger than 2.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:09:41 -07:00
Boqun Feng
bb73c52bad rcu: Don't disable preemption for Tiny and Tree RCU readers
Because preempt_disable() maps to barrier() for non-debug builds,
it forces the compiler to spill and reload registers.  Because Tree
RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
barrier() instances generate needless extra code for each instance of
rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
RCU and bloats Tiny RCU.

This commit therefore removes the preempt_disable() and preempt_enable()
from the non-preemptible implementations of __rcu_read_lock() and
__rcu_read_unlock(), respectively.  However, for debug purposes,
preempt_disable() and preempt_enable() are still invoked if
CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
atomic sections in non-preemptible kernels.

However, Tiny and Tree RCU operates by coalescing all RCU read-side
critical sections on a given CPU that lie between successive quiescent
states.  It is therefore necessary to compensate for removing barriers
from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
couple of the RCU functions invoked during quiescent states, namely to
rcu_all_qs() and rcu_note_context_switch().  However, note that the latter
is more paranoia than necessity, at least until link-time optimizations
become more aggressive.

This is based on an earlier patch by Paul E. McKenney, fixing
a bug encountered in kernels built with CONFIG_PREEMPT=n and
CONFIG_PREEMPT_COUNT=y.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:08:23 -07:00
Boqun Feng
b6a4ae766e rcu: Use rcu_callback_t in call_rcu*() and friends
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
should use it in call_rcu*() and friends as the type of parameters. This
could save us a few lines of code and make it clear which function
requires an rcu callbacks rather than other callbacks as its argument.

Besides, this can also help cscope to generate a better database for
code reading.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:08:05 -07:00
Paul E. McKenney
5b74c45890 rcu: Make ->cpu_no_qs be a union for aggregate OR
This commit converts the rcu_data structure's ->cpu_no_qs field
to a union.  The bytewise side of this union allows individual access
to indications as to whether this CPU needs to find a quiescent state
for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
side of the union allows testing whether or not a quiescent state is
needed at all, for either type of grace period.

For now, only .norm is used.  A later commit will introduce the expedited
usage.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:21 -07:00
Paul E. McKenney
0d43eb34f9 rcu: Invert passed_quiesce and rename to cpu_no_qs
This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs.  This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:21 -07:00
Paul E. McKenney
97c668b8e9 rcu: Rename qs_pending to core_needs_qs
An upcoming commit needs to invert the sense of the ->passed_quiesce
rcu_data structure field, so this commit is taking this opportunity
to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

So if !rdp->core_needs_qs, then this CPU need not concern itself with
quiescent states, in particular, it need not acquire its leaf rcu_node
structure's ->lock to check.  Otherwise, it needs to report the next
quiescent state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:20 -07:00
Paul E. McKenney
bce5fa12aa rcu: Move synchronize_sched_expedited() to combining tree
Currently, synchronize_sched_expedited() uses a single global counter
to track the number of remaining context switches that the current
expedited grace period must wait on.  This is problematic on large
systems, where the resulting memory contention can be pathological.
This commit therefore makes synchronize_sched_expedited() instead use
the combining tree in the same manner as synchronize_rcu_expedited(),
keeping memory contention down to a dull roar.

This commit creates a temporary function sync_sched_exp_select_cpus()
that is very similar to sync_rcu_exp_select_cpus().  A later commit
will consolidate these two functions, which becomes possible when
synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
smp_call_function_single().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:20 -07:00
Paul E. McKenney
8203d6d0ee rcu: Use single-stage IPI algorithm for RCU expedited grace period
The current preemptible-RCU expedited grace-period algorithm invokes
synchronize_sched_expedited() to enqueue all tasks currently running
in a preemptible-RCU read-side critical section, then waits for all the
->blkd_tasks lists to drain.  This works, but results in both an IPI and
a double context switch even on CPUs that do not happen to be running
in a preemptible RCU read-side critical section.

This commit implements a new algorithm that causes less OS jitter.
This new algorithm IPIs all online CPUs that are not idle (from an
RCU perspective), but refrains from self-IPIs.  If a CPU receiving
this IPI is not in a preemptible RCU read-side critical section (or
is just now exiting one), it pushes quiescence up the rcu_node tree,
otherwise, it sets a flag that will be handled by the upcoming outermost
rcu_read_unlock(), which will then push quiescence up the tree.

The expedited grace period must of course wait on any pre-existing blocked
readers, and newly blocked readers must be queued carefully based on
the state of both the normal and the expedited grace periods.  This
new queueing approach also avoids the need to update boost state,
courtesy of the fact that blocked tasks are no longer ever migrated to
the root rcu_node structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:19 -07:00
Paul E. McKenney
b9585e940a rcu: Consolidate tree setup for synchronize_rcu_expedited()
This commit replaces sync_rcu_preempt_exp_init1(() and
sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
and sync_exp_reset_tree(), which will also be used by
synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
contains code specific to synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:18 -07:00
Paul E. McKenney
7922cd0e56 rcu: Move rcu_report_exp_rnp() to allow consolidation
This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(),
sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so
that later commits can make synchronize_sched_expedited() use them.
The non-code-movement portion of this commit tags rcu_report_exp_rnp()
as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:18 -07:00
Paul E. McKenney
f4ecea309d rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq
Now that there is an ->expedited_wq waitqueue in each rcu_state structure,
there is no need for the sync_rcu_preempt_exp_wq global variable.  This
commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq.
It also initializes ->expedited_wq only once at boot instead of at the
start of each expedited grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:17 -07:00
Paul E. McKenney
19a5ecde08 rcu: Suppress lockdep false positive for rcp->exp_funnel_mutex
In kernels built with CONFIG_PREEMPT=y, synchronize_rcu_expedited()
invokes synchronize_sched_expedited() while holding RCU-preempt's
root rcu_node structure's ->exp_funnel_mutex, which is acquired after
the rcu_data structure's ->exp_funnel_mutex.  The first thing that
synchronize_sched_expedited() will do is acquire RCU-sched's rcu_data
structure's ->exp_funnel_mutex.   There is no danger of an actual deadlock
because the locking order is always from RCU-preempt's expedited mutexes
to those of RCU-sched.  Unfortunately, lockdep considers both rcu_data
structures' ->exp_funnel_mutex to be in the same lock class and therefore
reports a deadlock cycle.

This commit silences this false positive by placing RCU-sched's rcu_data
structures' ->exp_funnel_mutex locks into their own lock class.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:01:22 -07:00
Paul E. McKenney
8ff4fbfd69 Merge branches 'fixes.2015.07.22a' and 'initexp.2015.08.04a' into HEAD
fixes.2015.07.22a: Miscellaneous fixes.
initexp.2015.08.04a: Initialization and expedited updates.
	(Single branch due to conflicts.)
2015-08-04 08:40:58 -07:00
Paul E. McKenney
af859beaab rcu: Silence lockdep false positive for expedited grace periods
In a CONFIG_PREEMPT=y kernel, synchronize_rcu_expedited()
acquires the ->exp_funnel_mutex in rcu_preempt_state, then invokes
synchronize_sched_expedited, which acquires the ->exp_funnel_mutex in
rcu_sched_state.  There can be no deadlock because rcu_preempt_state
->exp_funnel_mutex acquisition always precedes that of rcu_sched_state.
But lockdep does not know that, so it gives false-positive splats.

This commit therefore associates a separate lock_class_key structure
with the rcu_sched_state structure's ->exp_funnel_mutex, allowing
lockdep to see the lock ordering, avoiding the false positives.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-08-04 08:39:21 -07:00
Paul E. McKenney
f78f5b90c4 rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros.  This also requires
inverting the sense of the conditional, which this commit also does.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2015-07-22 15:27:32 -07:00
Alexei Starovoitov
46f00d18fc rcu: Make rcu_is_watching() really notrace
Although rcu_is_watching() is marked notrace, it invokes preempt_disable()
and preempt_enable(), both of which can be traced.  This defeats the
purpose of the notrace on rcu_is_watching(), so this commit substitutes
preempt_disable_notrace() and preempt_enable_notrace().

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-22 15:27:31 -07:00
Paul E. McKenney
24560056de rcu: Add RCU-sched flavors of get-state and cond-sync
The get_state_synchronize_rcu() and cond_synchronize_rcu() functions
allow polling for grace-period completion, with an actual wait for a
grace period occurring only when cond_synchronize_rcu() is called too
soon after the corresponding get_state_synchronize_rcu().  However,
these functions work only for vanilla RCU.  This commit adds the
get_state_synchronize_sched() and cond_synchronize_sched(), which provide
the same capability for RCU-sched.

Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:26:58 -07:00
Paul E. McKenney
cdacbe1f91 rcu: Add fastpath bypassing funnel locking
In the common case, there will be only one expedited grace period in
the system at a given time, in which case it is not helpful to use
funnel locking.  This commit therefore adds a fastpath that bypasses
funnel locking when the root ->exp_funnel_mutex is not held.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:06 -07:00
Paul E. McKenney
32bb1c7999 rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
The grace-period kthread sleeps waiting to do a force-quiescent-state
scan, and when awakened sets rsp->gp_state to RCU_GP_DONE_FQS.
However, this is confusing because the kthread has not done the
force-quiescent-state, but is instead just starting to do it.  This commit
therefore renames RCU_GP_DONE_FQS to RCU_GP_DOING_FQS in order to make
things a bit easier on reviewers.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:05 -07:00
Paul E. McKenney
b9a425cfcb rcu: Pull out wait_event*() condition into helper function
The condition for the wait_event_interruptible_timeout() that waits
to do the next force-quiescent-state scan is a bit ornate:

	((gf = READ_ONCE(rsp->gp_flags)) &
	 RCU_GP_FLAG_FQS) ||
	(!READ_ONCE(rnp->qsmask) &&
	 !rcu_preempt_blocked_readers_cgp(rnp))

This commit therefore pulls this condition out into a helper function
and comments its component conditions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:04 -07:00
Paul E. McKenney
cf3620a6c7 rcu: Add stall warnings to synchronize_sched_expedited()
Although synchronize_sched_expedited() historically has no RCU CPU stall
warnings, the availability of the rcupdate.rcu_expedited boot parameter
invalidates the old assumption that synchronize_sched()'s stall warnings
would suffice.  This commit therefore adds RCU CPU stall warnings to
synchronize_sched_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:01 -07:00
Paul E. McKenney
2cd6ffafec rcu: Extend expedited funnel locking to rcu_data structure
The strictly rcu_node based funnel-locking scheme works well in many
cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get
all that much concurrency.  This commit therefore extends the funnel
locking into the per-CPU rcu_data structure, providing concurrency equal
to the number of CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:00 -07:00
Paul E. McKenney
704dd435ac rcu: Consolidate last open-coded expedited memory barrier
One of the requirements on RCU grace periods is that if there is a
causal chain of operations that starts after one grace period and
ends before another grace period, then the two grace periods must
be serialized.  There has been (and might still be) code that relies
on this, for example, certain types of reference-counting code that
does a call_rcu() within an RCU callback function.

This requirement is why there is an smp_mb() at the end of both
synchronize_sched_expedited() and synchronize_rcu_expedited().
However, this is the only smp_mb() in these functions, so it would
be nicer to consolidate it into rcu_exp_gp_seq_end().  This commit
does just that.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:59 -07:00
Paul E. McKenney
4f525a528b rcu: Apply rcu_seq operations to _rcu_barrier()
The rcu_seq operations were open-coded in _rcu_barrier(), so this commit
replaces the open-coding with the shiny new rcu_seq operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:57 -07:00
Paul E. McKenney
29fd930940 rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop
This commit gets rid of synchronize_rcu_expedited()'s mutex_trylock()
polling loop in favor of the funnel-locking scheme that was abstracted
from synchronize_sched_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:56 -07:00
Paul E. McKenney
7fd0ddc5bf rcu: Fix synchronize_sched_expedited() type error for "s"
The type of "s" has been "long" rather than the correct "unsigned long"
for quite some time.  This commit fixes this type error.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:55 -07:00
Paul E. McKenney
b09e5f8601 rcu: Abstract funnel locking from synchronize_sched_expedited()
This commit abstracts funnel locking from synchronize_sched_expedited()
so that it may be used by synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:53 -07:00
Paul E. McKenney
28f00767e3 rcu: Abstract sequence counting from synchronize_sched_expedited()
This commit creates rcu_exp_gp_seq_start() and rcu_exp_gp_seq_end() to
bracket an expedited grace period, rcu_exp_gp_seq_snap() to snapshot the
sequence counter, and rcu_exp_gp_seq_done() to check to see if a full
expedited grace period has elapsed since the snapshot.  These will be
applied to synchronize_rcu_expedited().  These are defined in terms of
underlying rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), rcu_seq_done(),
which will be applied to _rcu_barrier().

One reason that this commit doesn't use the seqcount primitives themselves
is that the smp_wmb() in those primitive is insufficient due to the fact
that expedited grace periods do reads as well as writes.  In addition,
the read-side seqcount primitives detect a potentially partial change,
where the expedited primitives instead need a guaranteed full change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:51 -07:00
Peter Zijlstra
3a6d7c64d7 rcu: Make expedited GP CPU stoppage asynchronous
Sequentially stopping the CPUs slows down expedited grace periods by
at least a factor of two, based on rcutorture's grace-period-per-second
rate.  This is a conservative measure because rcutorture uses unusually
long RCU read-side critical sections and because rcutorture periodically
quiesces the system in order to test RCU's ability to ramp down to and
up from the idle state.  This commit therefore replaces the stop_one_cpu()
with stop_one_cpu_nowait(), using an atomic-counter scheme to determine
when all CPUs have passed through the stopped state.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:50 -07:00
Paul E. McKenney
385b73c06f rcu: Get rid of synchronize_sched_expedited()'s polling loop
This commit gets rid of synchronize_sched_expedited()'s mutex_trylock()
polling loop in favor of a funnel-locking scheme based on the rcu_node
tree.  The work-done check is done at each level of the tree, allowing
high-contention situations to be resolved quickly with reasonable levels
of mutex contention.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:48 -07:00
Paul E. McKenney
d6ada2cf2f rcu: Rework synchronize_sched_expedited() counter handling
Now that synchronize_sched_expedited() have a mutex, it can use simpler
work-already-done detection scheme.  This commit simplifies this scheme
by using something similar to the sequence-locking counter scheme.
A counter is incremented before and after each grace period, so that
the counter is odd in the midst of the grace period and even otherwise.
So if the counter has advanced to the second even number that is
greater than or equal to the snapshot, the required grace period has
already happened.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:47 -07:00
Peter Zijlstra
c190c3b16c rcu: Switch synchronize_sched_expedited() to stop_one_cpu()
The synchronize_sched_expedited() currently invokes try_stop_cpus(),
which schedules the stopper kthreads on each online non-idle CPU,
and waits until all those kthreads are running before letting any
of them stop.  This is disastrous for real-time workloads, which
get hit with a preemption that is as long as the longest scheduling
latency on any CPU, including any non-realtime housekeeping CPUs.
This commit therefore switches to using stop_one_cpu() on each CPU
in turn.  This avoids inflicting the worst-case scheduling latency
on the worst-case CPU onto all other CPUs, and also simplifies the
code a little bit.

Follow-up commits will simplify the counter-snapshotting algorithm
and convert a number of the counters that are now protected by the
new ->expedited_mutex to non-atomic.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
[ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:45 -07:00
Paul E. McKenney
13bd64947f rcu: Reset rcu_fanout_leaf if out of bounds
Currently if the rcu_fanout_leaf boot parameter is out of bounds (that
is, less than RCU_FANOUT_LEAF or greater than the number of bits in an
unsigned long), a warning is issued and execution continues with the
out-of-bounds value.  This can result in all manner of failures, so this
patch resets rcu_fanout_leaf to RCU_FANOUT_LEAF when an out-of-bounds
condition is detected.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:41 -07:00
Alexander Gordeev
cb00710239 rcu: Limit count of static data to the number of RCU levels
Although a number of RCU levels may be less than the current
maximum of four, some static data associated with each level
are allocated for all four levels. As result, the extra data
never get accessed and just wast memory. This update limits
count of allocated items to the number of used RCU levels.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:20 -07:00
Alexander Gordeev
199977bff9 rcu: Remove unnecessary fields from rcu_state structure
Members rcu_state::levelcnt[] and rcu_state::levelspread[]
are only used at init. There is no reason to keep them
afterwards.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:19 -07:00
Alexander Gordeev
05b84aec46 rcu: Limit rcu_capacity[] size to RCU_NUM_LVLS items
Number of items in rcu_capacity[] array is defined by macro
MAX_RCU_LVLS. However, that array is never accessed beyond
RCU_NUM_LVLS index. Therefore, we can limit the array to
RCU_NUM_LVLS items and eliminate MAX_RCU_LVLS. As result,
in most cases the memory is conserved.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:18 -07:00
Alexander Gordeev
9618138b09 rcu: Simplify rcu_init_geometry() capacity arithmetics
Current code suggests that introducing the extra level to
rcu_capacity[] array makes some of the arithmetic easier.
Well, in fact it appears rather confusing and unnecessary.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:15 -07:00
Alexander Gordeev
679f9858b1 rcu: Cleanup rcu_init_geometry() code and arithmetics
This update simplifies rcu_init_geometry() code flow
and makes calculation of the total number of rcu_node
structures more easy to read.

The update relies on the fact num_rcu_lvl[] is never
accessed beyond rcu_num_lvls index by the rest of the
code. Therefore, there is no need initialize the whole
num_rcu_lvl[].

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:14 -07:00
Alexander Gordeev
372b0ec24f rcu: Remove superfluous local variable in rcu_init_geometry()
Local variable 'n' mimics 'nr_cpu_ids' while the both are
used within one function. There is no reason for 'n' to
exist whatsoever.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:13 -07:00
Alexander Gordeev
75cf15a4c0 rcu: Panic if RCU tree can not accommodate all CPUs
Currently a condition when RCU tree is unable to accommodate
the configured number of CPUs is not permitted and causes
a fall back to compile-time values. However, the code has no
means to exceed the RCU tree capacity neither at compile-time
nor in run-time. Therefore, if the condition is met in run-
time then it indicates a serios problem elsewhere and should
be handled with a panic.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:12 -07:00
Paul E. McKenney
319362c90f rcu: Provide more diagnostics for stalled GP kthread
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:10 -07:00
Paul E. McKenney
d1ec4c34c7 rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL
The RCU_USER_QS Kconfig parameter is now just a synonym for NO_HZ_FULL,
so this commit eliminates RCU_USER_QS, replacing all uses with NO_HZ_FULL.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-06 13:52:18 -07:00
Linus Torvalds
e382608254 This patch series contains several clean ups and even a new trace clock
"monitonic raw". Also some enhancements to make the ring buffer even
 faster. But the biggest and most noticeable change is the renaming of
 the ftrace* files, structures and variables that have to deal with
 trace events.
 
 Over the years I've had several developers tell me about their confusion
 with what ftrace is compared to events. Technically, "ftrace" is the
 infrastructure to do the function hooks, which include tracing and also
 helps with live kernel patching. But the trace events are a separate
 entity altogether, and the files that affect the trace events should
 not be named "ftrace". These include:
 
   include/trace/ftrace.h	->	include/trace/trace_events.h
   include/linux/ftrace_event.h	->	include/linux/trace_events.h
 
 Also, functions that are specific for trace events have also been renamed:
 
   ftrace_print_*()		->	trace_print_*()
   (un)register_ftrace_event()	->	(un)register_trace_event()
   ftrace_event_name()		->	trace_event_name()
   ftrace_trigger_soft_disabled()->	trace_trigger_soft_disabled()
   ftrace_define_fields_##call() ->	trace_define_fields_##call()
   ftrace_get_offsets_##call()	->	trace_get_offsets_##call()
 
 Structures have been renamed:
 
   ftrace_event_file		->	trace_event_file
   ftrace_event_{call,class}	->	trace_event_{call,class}
   ftrace_event_buffer		->	trace_event_buffer
   ftrace_subsystem_dir		->	trace_subsystem_dir
   ftrace_event_raw_##call	->	trace_event_raw_##call
   ftrace_event_data_offset_##call->	trace_event_data_offset_##call
   ftrace_event_type_funcs_##call ->	trace_event_type_funcs_##call
 
 And a few various variables and flags have also been updated.
 
 This has been sitting in linux-next for some time, and I have not heard
 a single complaint about this rename breaking anything. Mostly because
 these functions, variables and structures are mostly internal to the
 tracing system and are seldom (if ever) used by anything external to that.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJViYhVAAoJEEjnJuOKh9ldcJ0IAI+mytwoMAN/CWDE8pXrTrgs
 aHlcr1zorSzZ0Lq6lKsWP+V0VGVhP8KWO16vl35HaM5ZB9U+cDzWiGobI8JTHi/3
 eeTAPTjQdgrr/L+ZO1ApzS1jYPhN3Xi5L7xublcYMJjKfzU+bcYXg/x8gRt0QbG3
 S9QN/kBt0JIIjT7McN64m5JVk2OiU36LxXxwHgCqJvVCPHUrriAdIX7Z5KRpEv13
 zxgCN4d7Jiec/FsMW8dkO0vRlVAvudZWLL7oDmdsvNhnLy8nE79UOeHos2c1qifQ
 LV4DeQ+2Hlu7w9wxixHuoOgNXDUEiQPJXzPc/CuCahiTL9N/urQSGQDoOVMltR4=
 =hkdz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This patch series contains several clean ups and even a new trace
  clock "monitonic raw".  Also some enhancements to make the ring buffer
  even faster.  But the biggest and most noticeable change is the
  renaming of the ftrace* files, structures and variables that have to
  deal with trace events.

  Over the years I've had several developers tell me about their
  confusion with what ftrace is compared to events.  Technically,
  "ftrace" is the infrastructure to do the function hooks, which include
  tracing and also helps with live kernel patching.  But the trace
  events are a separate entity altogether, and the files that affect the
  trace events should not be named "ftrace".  These include:

    include/trace/ftrace.h         ->    include/trace/trace_events.h
    include/linux/ftrace_event.h   ->    include/linux/trace_events.h

  Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*()               ->    trace_print_*()
    (un)register_ftrace_event()    ->    (un)register_trace_event()
    ftrace_event_name()            ->    trace_event_name()
    ftrace_trigger_soft_disabled() ->    trace_trigger_soft_disabled()
    ftrace_define_fields_##call()  ->    trace_define_fields_##call()
    ftrace_get_offsets_##call()    ->    trace_get_offsets_##call()

  Structures have been renamed:

    ftrace_event_file              ->    trace_event_file
    ftrace_event_{call,class}      ->    trace_event_{call,class}
    ftrace_event_buffer            ->    trace_event_buffer
    ftrace_subsystem_dir           ->    trace_subsystem_dir
    ftrace_event_raw_##call        ->    trace_event_raw_##call
    ftrace_event_data_offset_##call->    trace_event_data_offset_##call
    ftrace_event_type_funcs_##call ->    trace_event_type_funcs_##call

  And a few various variables and flags have also been updated.

  This has been sitting in linux-next for some time, and I have not
  heard a single complaint about this rename breaking anything.  Mostly
  because these functions, variables and structures are mostly internal
  to the tracing system and are seldom (if ever) used by anything
  external to that"

* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  ring_buffer: Allow to exit the ring buffer benchmark immediately
  ring-buffer-benchmark: Fix the wrong type
  ring-buffer-benchmark: Fix the wrong param in module_param
  ring-buffer: Add enum names for the context levels
  ring-buffer: Remove useless unused tracing_off_permanent()
  ring-buffer: Give NMIs a chance to lock the reader_lock
  ring-buffer: Add trace_recursive checks to ring_buffer_write()
  ring-buffer: Allways do the trace_recursive checks
  ring-buffer: Move recursive check to per_cpu descriptor
  ring-buffer: Add unlikelys to make fast path the default
  tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
  tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
  tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
  tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
  tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
  tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
  tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
  tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
  tracing: Rename ftrace_event_name() to trace_event_name()
  tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
  ...
2015-06-26 14:02:43 -07:00
Paul E. McKenney
0868aa2216 Merge branches 'array.2015.05.27a', 'doc.2015.05.27a', 'fixes.2015.05.27a', 'hotplug.2015.05.27a', 'init.2015.05.27a', 'tiny.2015.05.27a' and 'torture.2015.05.27a' into HEAD
array.2015.05.27a:  Remove all uses of RCU-protected array indexes.
doc.2015.05.27a:  Docuemntation updates.
fixes.2015.05.27a:  Miscellaneous fixes.
hotplug.2015.05.27a:  CPU-hotplug updates.
init.2015.05.27a:  Initialization/Kconfig updates.
tiny.2015.05.27a:  Updates to Tiny RCU.
torture.2015.05.27a:  Torture-testing updates.
2015-05-27 13:00:49 -07:00
Paul E. McKenney
1ce46ee597 rcu: Conditionally compile RCU's eqs warnings
This commit applies some warning-omission micro-optimizations to RCU's
various extended-quiescent-state functions, which are on the kernel/user
hotpath for CONFIG_NO_HZ_FULL=y.

Reported-by: Rik van Riel <riel@redhat.com>
Reported by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:59:07 -07:00
Paul E. McKenney
26730f55c2 rcu: Make RCU able to tolerate undefined CONFIG_RCU_KTHREAD_PRIO
This commit updates the initialization of the kthread_prio boot parameter
so that RCU will build even when CONFIG_RCU_KTHREAD_PRIO is undefined.
The kthread_prio boot parameter is set to CONFIG_RCU_KTHREAD_PRIO if
that is defined, otherwise to 1 if CONFIG_RCU_BOOST is defined and
to zero otherwise.  This commit then makes CONFIG_RCU_KTHREAD_PRIO
depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about
CONFIG_RCU_KTHREAD_PRIO unless they want to be.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27 12:59:06 -07:00
Paul E. McKenney
47d631af58 rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF
This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so
that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined.
The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF
when defined, otherwise it is set to 32 for 32-bit systems and 64 for
64-bit systems.  This commit then makes CONFIG_RCU_FANOUT_LEAF depend
on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about
CONFIG_RCU_FANOUT_LEAF unless they want to be.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27 12:59:05 -07:00
Paul E. McKenney
05c5df31af rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT
This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will
build even when CONFIG_RCU_FANOUT is undefined.  The RCU_FANOUT macro is
set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set
to 32 for 32-bit systems and 64 for 64-bit systems.  This commit then
makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig
users won't be asked about CONFIG_RCU_FANOUT unless they want to be.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27 12:59:05 -07:00
Paul E. McKenney
a3dc2948ce rcu: Enable diagnostic dump of rcu_node combining tree
The purpose of this commit is to make it easier to verify that RCU's
combining tree is set up correctly, which is useful to have when making
changes in how that tree is initialized.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
[ paulmck: Fold fix found by Fengguang's 0-day test robot. ]
2015-05-27 12:59:04 -07:00
Paul E. McKenney
7fa270010e rcu: Convert CONFIG_RCU_FANOUT_EXACT to boot parameter
The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and
perhaps only) by rcutorture to verify that RCU works correctly in specific
rcu_node combining-tree configurations.  It therefore does not make
much sense have this as a question to people attempting to configure
their kernels.  So this commit creates an rcutree.rcu_fanout_exact=
boot parameter that rcutorture can use, and eliminates the original
CONFIG_RCU_FANOUT_EXACT Kconfig parameter.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27 12:59:04 -07:00
Paul E. McKenney
0f41c0ddad rcu: Provide diagnostic option to slow down grace-period scans
Grace-period scans of the rcu_node combining tree normally
proceed quite quickly, so that it is very difficult to reproduce
races against them.  This commit therefore allows grace-period
pre-initialization and cleanup to be artificially slowed down,
increasing race-reproduction probability.  A pair of pairs of new
Kconfig parameters are provided, RCU_TORTURE_TEST_SLOW_PREINIT to
enable the slowing down of propagating CPU-hotplug changes up the
combining tree along with RCU_TORTURE_TEST_SLOW_PREINIT_DELAY to
specify the delay in jiffies, and RCU_TORTURE_TEST_SLOW_CLEANUP
to enable the slowing down of the end-of-grace-period cleanup scan
along with RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY to specify the delay
in jiffies.  Boot-time parameters named rcutree.gp_preinit_delay and
rcutree.gp_cleanup_delay allow these delays to be specified at boot time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:59:02 -07:00
Paul E. McKenney
3eaaaf6cd6 rcu: Shut up spurious gcc uninitialized-variable warning
Because gcc doesn't realize that rcu_num_lvls must be strictly greater
than zero, some versions give a spurious warning about levelcnt[0] being
uninitialized in rcu_init_one().  This commit updates the condition on
the pre-existing panic() in order to educate gcc on this point.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:59:02 -07:00
Paul E. McKenney
eab128e830 rcu: Modulate grace-period slow init to normalize delay
Currently, the larger the gp_init_delay boot parameter, the slower
rcutorture will sequence through grace periods.  This commit avoids this
issue by decreasing the probability of slowing initialization of a given
grace period as the degree of slowness increases.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:59:01 -07:00
Paul E. McKenney
a738eec6c6 rcu: Correctly initialize ->rcu_qs_ctr_snap at online time
The rcu_data structure's ->rcu_qs_ctr_snap field is initialized at
CPU-online time from the current CPU's element of the per-CPU rcu_qs_ctr
variable.  Unfortunately, this is at CPU_UP_PREPARE time, so has nothing
to do with the CPU being onlined.  This commit therefore initializes
this variable from the incoming CPU's element of rcu_qs_ctr.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:38 -07:00
Paul E. McKenney
cce7f1fc01 rcu: Remove redundant offline check
Because offline CPUs are propagated up the rcu_node tree's ->qsmaskinit
bits just before each grace period starts, the ->qsmaskinit bit cannot
be clear when the corresponding ->qsmask bit is set.  Furthermore, this
condition used to correspond to a CPU that was on its way offline, and
making RCU's notion of an offline CPU more precise has eliminated this
situation.  This commit therefore removes the now-redundant offline
check from force_qs_rnp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:38 -07:00
Paul E. McKenney
c5b5539506 rcu: Remove dead code from force_qs_rnp()
Because force_qs_rnp() is invoked only from the force-quiescent-state
code which runs only in the context of the grace-period kthread, a grace
period must always be in progress throughout force_qs_rnp()'s execution.
This commit therefore removes the rcu_gp_in_progress() check and the
associated dead code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:37 -07:00
Paul E. McKenney
ea46351cea rcu: Eliminate HOTPLUG_CPU #ifdef in favor of IS_ENABLED()
This commit removes a HOTPLUG_CPU #ifdef, replacing it with
IS_ENABLED()-protected return statements.  This relies on the
optimizer to remove any resulting dead code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:37 -07:00
Nicholas Mc Guire
82072c4fcf rcu: Change function declaration to bool
rcu_cpu_has_callbacks() is declared int. The current declaration was introduced
in commit c0f4dfd4f9 (rcu: Make RCU_FAST_NO_HZ take advantage of numbered
callbacks). But it is actually returning bool and as the function description
states " * Return true if the specified CPU has any callback....", this probably
should be a bool as all (3) call-sites currently treat it as bool.

Type-checking coccinelle spatches are being used to locate type mismatches
between function signatures and return values in this case this produced:
./kernel/rcu/tree.c:3538 WARNING: return of wrong type
                    int != bool,

Patch was compile tested with x86_64_defconfig (implies CONFIG_TREE_RCU=y)

Patch is against 4.1-rc3 (localversion-next is -next-20150511) and fixes

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:04 -07:00
Nicolas Iooss
c92fb05795 rcu: Make rcu_*_data variables static
rcu_bh_data, rcu_sched_data and rcu_preempt_data are never used outside
kernel/rcu/tree.c and thus can be made static.

Doing so fixes a section mismatch warning reported by clang when
building LLVMLinux with -Wsection, because these variables were declared
in .data..percpu and defined in .data..percpu..shared_aligned since
commit 11bbb235c2 ("rcu: Use DEFINE_PER_CPU_SHARED_ALIGNED for
rcu_data").

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:03 -07:00
Paul E. McKenney
30ff1533b8 rcu: Make synchronize_sched_expedited() call wait_rcu_gp()
Currently, synchronize_sched_expedited() will call synchronize_sched()
if there is danger of counter wrap.  But if configuration says to
always do expedited grace periods, synchronize_sched() will just
call synchronize_sched_expedited() right back again.  In theory,
the old expedited operations will complete, the counters will
get back in synch, and the recursion will end.  But we could
easily run out of stack long before that time.  This commit
therefore makes synchronize_sched_expedited() invoke the underlying
wait_rcu_gp(call_rcu_sched) instead of synchronize_sched(), the same as
all the other calls out from synchronize_sched_expedited().

This bug was introduced by commit 1924bcb025 (Avoid counter wrap in
synchronize_sched_expedited()).

Reported-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:03 -07:00
Paul E. McKenney
81e701e437 rcu: Add more debug info on "kthread starved" RCU CPU stall warnings
This commit adds grace number and command-flags information to the
"kthread starved" message that is sometimes printed out as part of
RCU CPU stall warnings.  This message is caused by the corresponding
RCU grace-period kthread not having run for at least two seconds, and
this added information can be helpful when debugging.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:02 -07:00
Paul E. McKenney
cd73ca21cd rcu: Force wakeup of rcu_gp_kthread at grace-period end
The rcu_gp_kthread_wake() refuses to do a wakeup unless at least
one of the ->gp_flags bits are set, which normally will not be the
case when the last quiescent state is reported.  This results in
up to a 3-jiffy delay given default Kconfig settings.  This commit
therefore has rcu_report_qs_rsp() set RCU_GP_FLAG_FQS before invoking
rcu_gp_kthread_wake() in order to force a more immediate wakeup at
grace-period end, thus reducing grace-period latencies.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:01 -07:00
Paul E. McKenney
2927a689e8 rcu: Create an immutable rcu_data_p pointer to default rcu_data structure
This commit creates an immutable rcu_data_p pointer that references
rcu_preempt_data for TREE_PREEMPT_RCU builds and that references
rcu_sched_data for TREE_RCU builds.  This rcu_data_p pointer will enable
more code to move from #ifdef to IS_ENABLED().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:00 -07:00
Paul E. McKenney
b28a7c0166 rcu: Tell the compiler that rcu_state_p is immutable
This commit adds a "const" tag to the declarations of rcu_state_p,
which should allow the compiler to generate better code and also to
catch erroneous assignments to this variable.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:57:59 -07:00
Paul E. McKenney
7d0ae8086b rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE()
and WRITE_ONCE() APIs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck:  Updated to include kernel/torture.c as suggested by Jason Low. ]
2015-05-27 12:56:15 -07:00
Steven Rostedt (Red Hat)
af658dca22 tracing: Rename ftrace_event.h to trace_events.h
The term "ftrace" is really the infrastructure of the function hooks,
and not the trace events. Rename ftrace_event.h to trace_events.h to
represent the trace_event infrastructure and decouple the term ftrace
from it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:05:12 -04:00
Paul E. McKenney
8d7dc9283f rcu: Control grace-period delays directly from value
In a misguided attempt to avoid an #ifdef, the use of the
gp_init_delay module parameter was conditioned on the corresponding
RCU_TORTURE_TEST_SLOW_INIT Kconfig variable, using IS_ENABLED() at
the point of use in the code.  This meant that the compiler always saw
the delay, which meant that RCU_TORTURE_TEST_SLOW_INIT_DELAY had to be
unconditionally defined.  This in turn caused "make oldconfig" to ask
pointless questions about the value of RCU_TORTURE_TEST_SLOW_INIT_DELAY
in cases where it was not even used.

This commit avoids these pointless questions by defining gp_init_delay
under #ifdef.  In one branch, gp_init_delay is initialized to
RCU_TORTURE_TEST_SLOW_INIT_DELAY and is also a module parameter (thus
allowing boot-time modification), and in the other branch gp_init_delay
is a const variable initialized by default to zero.

This approach also simplifies the code at the delay point by eliminating
the IS_DEFINED().  Because gp_init_delay is constant zero in the no-delay
case intended for production use, the "gp_init_delay > 0" check causes
the delay to become dead code, as desired in this case.  In addition,
this commit replaces magic constant "10" with the preprocessor variable
PER_RCU_NODE_PERIOD, which controls the number of grace periods that
are allowed to elapse at full speed before a delay is inserted.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-04-14 19:33:59 -07:00
Paul E. McKenney
42528795ac Merge branches 'doc.2015.02.26a', 'earlycb.2015.03.03a', 'fixes.2015.03.03a', 'gpexp.2015.02.26a', 'hotplug.2015.03.20a', 'sysidle.2015.02.26b' and 'tiny.2015.02.26a' into HEAD
doc.2015.02.26a:  Documentation changes
earlycb.2015.03.03a:  Permit early-boot RCU callbacks
fixes.2015.03.03a:  Miscellaneous fixes
gpexp.2015.02.26a:  In-kernel expediting of normal grace periods
hotplug.2015.03.20a:  CPU hotplug fixes
sysidle.2015.02.26b:  NO_HZ_FULL_SYSIDLE fixes
tiny.2015.02.26a:  TINY_RCU fixes
2015-03-20 08:31:01 -07:00
Paul E. McKenney
654e953340 rcu: Associate quiescent-state reports with grace period
As noted in earlier commit logs, CPU hotplug operations running
concurrently with grace-period initialization can result in a given
leaf rcu_node structure having all CPUs offline and no blocked readers,
but with this rcu_node structure nevertheless blocking the current
grace period.  Therefore, the quiescent-state forcing code now checks
for this situation and repairs it.

Unfortunately, this checking can result in false positives, for example,
when the last task has just removed itself from this leaf rcu_node
structure, but has not yet started clearing the ->qsmask bits further
up the structure.  This means that the grace-period kthread (which
forces quiescent states) and some other task might be attempting to
concurrently clear these ->qsmask bits.  This is usually not a problem:
One of these tasks will be the first to acquire the upper-level rcu_node
structure's lock and with therefore clear the bit, and the other task,
seeing the bit already cleared, will stop trying to clear bits.

Sadly, this means that the following unusual sequence of events -can-
result in a problem:

1.	The grace-period kthread wins, and clears the ->qsmask bits.

2.	This is the last thing blocking the current grace period, so
	that the grace-period kthread clears ->qsmask bits all the way
	to the root and finds that the root ->qsmask field is now zero.

3.	Another grace period is required, so that the grace period kthread
	initializes it, including setting all the needed qsmask bits.

4.	The leaf rcu_node structure (the one that started this whole
	mess) is blocking this new grace period, either because it
	has at least one online CPU or because there is at least one
	task that had blocked within an RCU read-side critical section
	while running on one of this leaf rcu_node structure's CPUs.
	(And yes, that CPU might well have gone offline before the
	grace period in step (3) above started, which can mean that
	there is a task on the leaf rcu_node structure's ->blkd_tasks
	list, but ->qsmask equal to zero.)

5.	The other kthread didn't get around to trying to clear the upper
	level ->qsmask bits until all the above had happened.  This means
	that it now sees bits set in the upper-level ->qsmask field, so it
	proceeds to clear them.  Too bad that it is doing so on behalf of
	a quiescent state that does not apply to the current grace period!

This sequence of events can result in the new grace period being too
short.  It can also result in the new grace period ending before the
leaf rcu_node structure's ->qsmask bits have been cleared, which will
result in splats during initialization of the next grace period.  In
addition, it can result in tasks blocking the new grace period still
being queued at the start of the next grace period, which will result
in other splats.  Sasha's testing turned up another of these splats,
as did rcutorture testing.  (And yes, rcutorture is being adjusted to
make these splats show up more quickly.  Which probably is having the
undesirable side effect of making other problems show up less quickly.
Can't have everything!)

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.0.x
Tested-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-20 08:28:25 -07:00
Paul E. McKenney
a77da14ce9 rcu: Yet another fix for preemption and CPU hotplug
As noted earlier, the following sequence of events can occur when
running PREEMPT_RCU and HOTPLUG_CPU on a system with a multi-level
rcu_node combining tree:

1.	A group of tasks block on CPUs corresponding to a given leaf
	rcu_node structure while within RCU read-side critical sections.
2.	All CPUs corrsponding to that rcu_node structure go offline.
3.	The next grace period starts, but because there are still tasks
	blocked, the upper-level bits corresponding to this leaf rcu_node
	structure remain set.
4.	All the tasks exit their RCU read-side critical sections and
	remove themselves from the leaf rcu_node structure's list,
	leaving it empty.
5.	But because there now is code to check for this condition at
	force-quiescent-state time, the upper bits are cleared and the
	grace period completes.

However, there is another complication that can occur following step 4 above:

4a.	The grace period starts, and the leaf rcu_node structure's
	gp_tasks pointer is set to NULL because there are no tasks
	blocked on this structure.
4b.	One of the CPUs corresponding to the leaf rcu_node structure
	comes back online.
4b.	An endless stream of tasks are preempted within RCU read-side
	critical sections on this CPU, such that the ->blkd_tasks
	list is always non-empty.

The grace period will never end.

This commit therefore makes the force-quiescent-state processing check only
for absence of tasks blocking the current grace period rather than absence
of tasks altogether.  This will cause a quiescent state to be reported if
the current leaf rcu_node structure is not blocking the current grace period
and its parent thinks that it is, regardless of how RCU managed to get
itself into this state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.0.x
Tested-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-20 08:27:33 -07:00
Paul E. McKenney
5c60d25fa1 rcu: Add diagnostics to grace-period cleanup
At grace-period initialization time, RCU checks that all quiescent
states were really reported for the previous grace period.  Now that
grace-period cleanup has been split out of grace-period initialization,
this commit also performs those checks at grace-period cleanup time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12 15:19:38 -07:00
Paul E. McKenney
88428cc5c2 rcu: Handle outgoing CPUs on exit from idle loop
This commit informs RCU of an outgoing CPU just before that CPU invokes
arch_cpu_idle_dead() during its last pass through the idle loop (via a
new CPU_DYING_IDLE notifier value).  This change means that RCU need not
deal with outgoing CPUs passing through the scheduler after informing
RCU that they are no longer online.  Note that removing the CPU from
the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
and orphaning callbacks is still done at CPU_DEAD time, the reason being
that at CPU_DEAD time we have another CPU that can adopt them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12 15:19:38 -07:00
Paul E. McKenney
c199068913 rcu: Eliminate ->onoff_mutex from rcu_node structure
Because that RCU grace-period initialization need no longer exclude
CPU-hotplug operations, this commit eliminates the ->onoff_mutex and
its uses.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12 15:19:37 -07:00
Paul E. McKenney
0aa04b055e rcu: Process offlining and onlining only at grace-period start
Races between CPU hotplug and grace periods can be difficult to resolve,
so the ->onoff_mutex is used to exclude the two events.  Unfortunately,
this means that it is impossible for an outgoing CPU to perform the
last bits of its offlining from its last pass through the idle loop,
because sleeplocks cannot be acquired in that context.

This commit avoids these problems by buffering online and offline events
in a new ->qsmaskinitnext field in the leaf rcu_node structures.  When a
grace period starts, the events accumulated in this mask are applied to
the ->qsmaskinit field, and, if needed, up the rcu_node tree.  The special
case of all CPUs corresponding to a given leaf rcu_node structure being
offline while there are still elements in that structure's ->blkd_tasks
list is handled using a new ->wait_blkd_tasks field.  In this case,
propagating the offline bits up the tree is deferred until the beginning
of the grace period after all of the tasks have exited their RCU read-side
critical sections and removed themselves from the list, at which point
the ->wait_blkd_tasks flag is cleared.  If one of that leaf rcu_node
structure's CPUs comes back online before the list empties, then the
->wait_blkd_tasks flag is simply cleared.

This of course means that RCU's notion of which CPUs are offline can be
out of date.  This is OK because RCU need only wait on CPUs that were
online at the time that the grace period started.  In addition, RCU's
force-quiescent-state actions will handle the case where a CPU goes
offline after the grace period starts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12 15:19:37 -07:00
Paul E. McKenney
cc99a310ca rcu: Move rcu_report_unblock_qs_rnp() to common code
The rcu_report_unblock_qs_rnp() function is invoked when the
last task blocking the current grace period exits its outermost
RCU read-side critical section.  Previously, this was called only
from rcu_read_unlock_special(), and was therefore defined only when
CONFIG_RCU_PREEMPT=y.  However, this function will be invoked even when
CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at
the beginnings of RCU grace periods.  The reason for this change is that
the last task on a given leaf rcu_node structure's ->blkd_tasks list
might well exit its RCU read-side critical section between the time that
recent CPU-hotplug operations were applied and when the new grace period
was initialized.  This situation could result in RCU waiting forever on
that leaf rcu_node structure, because if all that structure's CPUs were
already offline, there would be no quiescent-state events to drive that
structure's part of the grace period.

This commit therefore moves rcu_report_unblock_qs_rnp() to common code
that is built unconditionally so that the quiescent-state-forcing code
can clean up after this situation, avoiding the grace-period stall.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12 15:19:36 -07:00
Paul E. McKenney
999c286347 rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
Offline CPUs cannot safely invoke trace events, but such CPUs do execute
within rcu_cpu_notify().  Therefore, this commit removes the trace events
from rcu_cpu_notify().  These trace events are for utilization, against
which rcu_cpu_notify() execution time should be negligible.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11 13:22:39 -07:00
Paul E. McKenney
37745d2810 rcu: Provide diagnostic option to slow down grace-period initialization
Grace-period initialization normally proceeds quite quickly, so
that it is very difficult to reproduce races against grace-period
initialization.  This commit therefore allows grace-period
initialization to be artificially slowed down, increasing
race-reproduction probability.  A pair of new Kconfig parameters are
provided, CONFIG_RCU_TORTURE_TEST_SLOW_INIT to enable the slowdowns, and
CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY to specify the number of jiffies
of slowdown to apply.  A boot-time parameter named rcutree.gp_init_delay
allows boot-time delay to be specified.  By default, no delay will be
applied even if CONFIG_RCU_TORTURE_TEST_SLOW_INIT is set.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11 13:22:38 -07:00
Paul E. McKenney
237a0f2193 rcu: Detect stalls caused by failure to propagate up rcu_node tree
If all CPUs have passed through quiescent states, then stalls might be
due to starvation of the grace-period kthread or to failure to propagate
the quiescent states up the rcu_node combining tree.  The current stall
warning messages do not differentiate, so this commit adds a printout
of the root rcu_node structure's ->qsmask field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11 13:22:38 -07:00
Paul E. McKenney
78043c467a rcu: Put all orphan-callback-related code under same comment
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11 13:22:37 -07:00
Paul E. McKenney
b33078b609 rcu: Consolidate offline-CPU callback initialization
Currently, both rcu_cleanup_dead_cpu() and rcu_send_cbs_to_orphanage()
initialize the outgoing CPU's callback list.  However, only
rcu_cleanup_dead_cpu() invokes rcu_send_cbs_to_orphanage(), and
it does so unconditionally, which means that only one of these
initializations is required.  This commit therefore consolidates the
callback-list initialization with the rest of the callback handling in
rcu_send_cbs_to_orphanage().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11 13:22:36 -07:00
Yao Dongdong
9910affa89 rcu: Remove redundant check of cpu_online()
Because invoke_cpu_core() checks whether the current CPU is online,
there is no need for __call_rcu_core() to redundantly check it.
There should not be any performance degradation because the called
function is visible to the compiler.  This commit therefore removes
the redundant check.

Signed-off-by: Yao Dongdong <yaodongdong@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03 11:17:34 -08:00
Paul E. McKenney
e7580f3388 rcu: Get rcu_sched_force_quiescent_state() where it belongs
The very similar functions rcu_force_quiescent_state(),
rcu_bh_force_quiescent_state(), and rcu_sched_force_quiescent_state()
are supposed to be together, but have drifted apart.  This commit
restores rcu_sched_force_quiescent_state() to its rightful place.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03 11:17:19 -08:00
Paul E. McKenney
6629240575 rcu: Use IS_ENABLED() to CONFIG_RCU_FANOUT_EXACT #ifdef
This commit uses IS_ENABLED() to remove the #ifdef from the
rcu_init_levelspread() functions.  No effect on executable code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03 11:14:08 -08:00
Paul E. McKenney
4762767810 rcu: Move early boot callback tests earlier
Because callbacks can now be posted quite early in boot, move the
early boot callback tests to precede RCU initialization.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03 11:06:22 -08:00
Paul E. McKenney
34404ca8fb rcu: Move early-boot callbacks to no-CBs lists for no-CBs CPUs
When a CPU is first determined to be a no-CBs CPUs, this commit causes
any early boot callbacks to be moved to the no-CBs callback list,
allowing them to be invoked.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03 11:06:02 -08:00
Paul E. McKenney
5871968d53 rcu: Tighten up affinity and check for sysidle
If the RCU grace-period kthread invoking rcu_sysidle_check_cpu()
happens to be running on the tick_do_timer_cpu initially,
then rcu_bind_gp_kthread() won't bind it.  This kthread might
then migrate before invoking rcu_gp_fqs(), which will trigger the
WARN_ON_ONCE() in rcu_sysidle_check_cpu().  This commit therefore makes
rcu_bind_gp_kthread() do the binding even if the kthread is currently
on the same CPU.  Because this incurs added overhead, this commit also
causes each RCU grace-period kthread to invoke rcu_bind_gp_kthread()
once at boot rather than at the beginning of each grace period.
And as long as rcu_bind_gp_kthread() is being modified, this commit
eliminates its #ifdef.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 16:04:37 -08:00
Paul E. McKenney
675da67f24 rcu: Fixes to NO_HZ_FULL sysidle accounting
On second and subsequent passes through quiescent-state forcing, the
isidle variable was initialized to false, which would prevent full sysidle
state from being reached if a grace period needed more than one round
of quiescent-state forcing (which most should not).  However, the check
for offline CPUs in the quiescent-state forcing main loop had the wrong
sense, which could prevent CPUs from ever entering full sysidle state.

This commit fixes both of these bugs.  Given that sysidle is not yet
wired up, this has no effect in old kernels, but might have proven
frustrating had anyone attempted to wire it up.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:11:03 -08:00
Paul E. McKenney
5afff48bdf rcu: Update from rcu_expedited variable to rcu_gp_is_expedited()
This commit updates open-coded tests of the rcu_expedited variable
to instead use rcu_gp_is_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:03:01 -08:00
Paul E. McKenney
1925d1967c rcu: Fix a couple of typos in rcu_all_qs() comment header
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:02:10 -08:00
Paul E. McKenney
39c8d313c3 rcu: Avoid clobbering early boot callbacks
When a CPU comes online, it initializes its callback list.  This
is a bad thing if this is the first time that the CPU has come
online and if that CPU has early boot callbacks.  This commit therefore
avoid initializing the callback list if there are callbacks present,
in which case the initial call_rcu() did the initialization for us.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:01:30 -08:00
Paul E. McKenney
143da9c2fc rcu: Prevent early-boot RCU callbacks from splatting
Currently, a call_rcu() that precedes rcu_init() will splat due to the
callback lists not having yet been initialized.  This commit causes the
first such callback to initialize the boot CPU's RCU callback list.

Note that this commit does not change rcu_init()-time initialization,
which means that the callback will be discarded at rcu_init() time.
Fixing this is the job of later commits.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:01:28 -08:00
Paul E. McKenney
2723249a31 rcu: Wire ->rda pointers at compile time
This commit wires up the rcu_state structures' ->rda pointers to the
per-CPU rcu_data structures at compile time, thus ensuring that this
linkage is present at early boot, in turn allowing posting of callbacks
before rcu_init() is executed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:01:27 -08:00
Paul E. McKenney
d3f3f3f25b rcu: Abstract default callback-list initialization from init_callback_list()
In preparation for early-boot posting of callbacks, this commit abstracts
initialization of the default (non-no-CB) callbacks list from the
init_callback_list() function into a new init_default_callback_list()
function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:01:25 -08:00
Lai Jiangshan
3f47da0f32 rcu_tree: Avoid touching rnp->completed when a new GP is started
In rcu_gp_init(), rnp->completed equals to rsp->completed in THEORY,
we don't need to touch it normally.  If something goes wrong,
it will complain and fixup rnp->completed and avoid oops.
This commit thus avoids the normal needless store to rnp->completed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-25 17:03:05 -08:00
Paul E. McKenney
78e691f4ae Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
doc.2015.01.07a: Documentation updates.
fixes.2015.01.15a: Miscellaneous fixes.
preempt.2015.01.06a: Changes to handling of lists of preempted tasks.
srcu.2015.01.06a: SRCU updates.
stall.2015.01.16a: RCU CPU stall-warning updates and fixes.
torture.2015.01.11a: RCU torture-test updates and fixes.
2015-01-15 23:34:34 -08:00
Paul E. McKenney
fb81a44b88 rcu: Add GP-kthread-starvation checks to CPU stall warnings
This commit adds a message that is printed if the relevant grace-period
kthread has not been able to run for the two seconds preceding the
stall warning.  (The two seconds is double the maximum interval between
successive bouts of quiescent-state forcing.)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15 23:33:15 -08:00
Paul E. McKenney
5cd37193ce rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)
In addition, if one of the RCU flavor's grace period has stalled, this
will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
quiescent state visible from other CPUs.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
  was used in preemptible code. ]
2015-01-15 23:33:14 -08:00
Paul E. McKenney
a94844b22a rcu: Optionally run grace-period kthreads at real-time priority
Recent testing has shown that under heavy load, running RCU's grace-period
kthreads at real-time priority can improve performance (according to 0day
test robot) and reduce the incidence of RCU CPU stall warnings.  However,
most systems do just fine with the default non-realtime priorities for
these kthreads, and it does not make sense to expose the entire user
base to any risk stemming from this change, given that this change is
of use only to a few users running extremely heavy workloads.

Therefore, this commit allows users to specify realtime priorities
for the grace-period kthreads, but leaves them running SCHED_OTHER
by default.  The realtime priority may be specified at build time
via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
rcutree.kthread_prio parameter.  Either way, 0 says to continue the
default SCHED_OTHER behavior and values from 1-99 specify that priority
of SCHED_FIFO behavior.  Note that a value of 0 is not permitted when
the RCU_BOOST Kconfig parameter is specified.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15 23:25:04 -08:00
Paul E. McKenney
917963d0b3 rcutorture: Check from beginning to end of grace period
Currently, rcutorture's Reader Batch checks measure from the end of
the previous grace period to the end of the current one.  This commit
tightens up these checks by measuring from the start and end of the same
grace period.  This involves adding rcu_batches_started() and friends
corresponding to the existing rcu_batches_completed() and friends.

We leave SRCU alone for the moment, as it does not yet have a way of
tracking both ends of its grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10 19:08:02 -08:00
Paul E. McKenney
9733e4f0a9 rcu: Make _batches_completed() functions return unsigned long
Long ago, the various ->completed fields were of type long, but now are
unsigned long due to signed-integer-overflow concerns.  However, the
various _batches_completed() functions remained of type long, even though
their only purpose in life is to return the corresponding ->completed
field.  This patch cleans this up by changing these functions' return
types to unsigned long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10 19:07:56 -08:00
Paul E. McKenney
e3663b1024 rcu: Handle gpnum/completed wrap while dyntick idle
Subtle race conditions can result if a CPU stays in dyntick-idle mode
long enough for the ->gpnum and ->completed fields to wrap.  For
example, consider the following sequence of events:

o	CPU 1 encounters a quiescent state while waiting for grace period
	5 to complete, but then enters dyntick-idle mode.

o	While CPU 1 is in dyntick-idle mode, the grace-period counters
	wrap around so that the grace period number is now 4.

o	Just as CPU 1 exits dyntick-idle mode, grace period 4 completes
	and grace period 5 begins.

o	The quiescent state that CPU 1 passed through during the old
	grace period 5 looks like it applies to the new grace period
	5.  Therefore, the new grace period 5 completes without CPU 1
	having passed through a quiescent state.

This could clearly be a fatal surprise to any long-running RCU read-side
critical section that happened to be running on CPU 1 at the time.  At one
time, this was not a problem, given that it takes significant time for
the grace-period counters to overflow even on 32-bit systems.  However,
with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long
idle periods are now becoming quite feasible.  It is therefore time to
close this race.

This commit therefore avoids this race condition by having the
quiescent-state forcing code detect when a CPU is falling too far
behind, and setting a new rcu_data field ->gpwrap when this happens.
Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed
fields are known to be untrustworthy, and can be ignored, along with
any associated quiescent states.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:05:28 -08:00
Paul E. McKenney
6ccd2ecd42 rcu: Improve diagnostics for spurious RCU CPU stall warnings
The current RCU CPU stall warning code will print "Stall ended before
state dump start" any time that the stall-warning code is triggered on
a CPU that has already reported a quiescent state for the current grace
period and if all quiescent states have been reported for the current
grace period.  However, a true stall can result in these symptoms, for
example, by preventing RCU's grace-period kthreads from ever running

This commit therefore checks for this condition, reporting the end of
the stall only if one of the grace-period counters has actually advanced.
Otherwise, it reports the last time that the grace-period kthread made
meaningful progress.  (In normal situations, the grace-period kthread
should make meaningful progress at least every jiffies_till_next_fqs
jiffies.)

Reported-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Miroslav Benes <mbenes@suse.cz>
2015-01-06 11:05:27 -08:00
Paul E. McKenney
fc908ed33e rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts
One way that an RCU CPU stall warning can happen is if the grace-period
kthread is not allowed to execute.  One proxy for this kthread's
forward progress is the number of force-quiescent-state (fqs) scans.
This commit therefore adds the number of fqs scans to the RCU CPU stall
warning printouts when CONFIG_RCU_CPU_STALL_INFO=y.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:05:25 -08:00
Paul E. McKenney
ab954c167e rcu: Remove redundant callback-list initialization
The RCU callback lists are initialized in both rcu_boot_init_percpu_data()
and rcu_init_percpu_data().  The former is intended for initializing
immutable data, so this commit removes the initialization from
rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data().
This change prepares for permitting callbacks to be queued very early
in boot.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:54 -08:00
Paul E. McKenney
6cd534ef8b rcu: Don't scan root rcu_node structure for stalled tasks
Now that blocked tasks are no longer migrated to the root rcu_node
structure, there is no need to scan the root rcu_node structure for
blocked tasks stalling the current grace period.  This commit therefore
removes this scan.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:53 -08:00
Paul E. McKenney
3ba4d0e09b rcu: Note quiescent state when CPU goes offline
The rcu_cleanup_dead_cpu() function (called after a CPU has gone
completely offline) has not reported a quiescent state because there
was probably at least one synchronize_rcu() between the time the CPU
went offline and the CPU_DEAD notifier, and this would have detected
the CPU's offline state via quiescent-state forcing.  However, the plan
is for CPUs to take themselves offline, at which point it makes sense
for them to report their own quiescent state.  This commit makes this
change in preparation for the new CPU-hotplug setup.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:51 -08:00
Paul E. McKenney
1be0085b51 rcu: Don't initiate RCU priority boosting on root rcu_node
Because there is no longer any preempted tasks on the root rcu_node, and
because there is no longer ever an rcub kthread for the root rcu_node,
this commit drops the code in force_qs_rnp() that attempts to awaken
the non-existent root rcub kthread.  This is strictly a performance
enhancement, removing a root rcu_node ->lock acquisition and release
along with some tests in rcu_initiate_boost(), ending with the test that
notes that there is no rcub kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:48 -08:00
Paul E. McKenney
a8f4cbadfb rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu()
Now that we are not migrating callbacks, there is no need to hold the
->orphan_lock across the the ->qsmaskinit bit-clearing process.
This commit therefore releases ->orphan_lock immediately after adopting
the orphaned RCU callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:45 -08:00
Paul E. McKenney
d19fb8d1f3 rcu: Don't migrate blocked tasks even if all corresponding CPUs offline
When the last CPU associated with a given leaf rcu_node structure
goes offline, something must be done about the tasks queued on that
rcu_node structure.  Each of these tasks has been preempted on one of
the leaf rcu_node structure's CPUs while in an RCU read-side critical
section that it have not yet exited.  Handling these tasks is the job of
rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
structure to the root rcu_node structure.

Unfortunately, this migration has to be done one task at a time because
each tasks allegiance must be shifted from the original leaf rcu_node to
the root, so that future attempts to deal with these tasks will acquire
the root rcu_node structure's ->lock rather than that of the leaf.
Worse yet, this migration must be done with interrupts disabled, which
is not so good for realtime response, especially given that there is
no bound on the number of tasks on a given rcu_node structure's list.
(OK, OK, there is a bound, it is just that it is unreasonably large,
especially on 64-bit systems.)  This was not considered a problem back
when rcu_preempt_offline_tasks() was first written because realtime
systems were assumed not to do CPU-hotplug operations while real-time
applications were running.  This assumption has proved of dubious validity
given that people are starting to run multiple realtime applications
on a single SMP system and that it is common practice to offline then
online a CPU before starting its real-time application in order to clear
extraneous processing off of that CPU.  So we now need CPU hotplug
operations to avoid undue latencies.

This commit therefore avoids migrating these tasks, instead letting
them be dequeued one by one from the original leaf rcu_node structure
by rcu_read_unlock_special().  This means that the clearing of bits
from the upper-level rcu_node structures must be deferred until the
last such task has been dequeued, because otherwise subsequent grace
periods won't wait on them.  This commit has the beneficial side effect
of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
CONFIG_RCU_BOOST builds.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:44 -08:00
Paul E. McKenney
b6a932d1d9 rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing
This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit
bit clearing up the rcu_node tree once a given rcu_node structure's
blkd_tasks list becomes empty.  This is the final commit in preparation
for the rework of RCU priority boosting:  It enables preempted tasks to
remain queued on their rcu_node structure even after all of that rcu_node
structure's CPUs have gone offline.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:43 -08:00
Paul E. McKenney
8af3a5e78c rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()
This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()
in preparation for the rework of RCU priority boosting.  This new function
will be invoked from rcu_read_unlock_special() in the reworked scheme,
which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node
structure's ->qsmaskinit field has already been updated.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:41 -08:00
Paul E. McKenney
41050a0096 rcu: Fix rcu_barrier() race that could result in too-short wait
The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions.
It checks a given CPU's lists of callbacks, and if all three no-CBs lists
are empty, ignores that CPU.  However, these three lists could potentially
be empty even when callbacks are present if the check executed just as
the callbacks were being moved from one list to another.  It turns out
that recent versions of rcutorture can spot this race.

This commit plugs this hole by consolidating the per-list counts of
no-CBs callbacks into a single count, which is incremented before
the corresponding callback is posted and after it is invoked.  Then
rcu_barrier() checks this single count to reliably determine whether
the corresponding CPU has no-CBs callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:01:15 -08:00