This change updates the ti-soc-thermal driver to use
standard GPIO DT bindings to read the GPIO number associated
to thermal shutdown IRQ, in case the device features it.
Previously, the code was using a specific DT bindings.
As now OMAP supports the standard way to model GPIOs,
there is no point in having a ti specific binding.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
To reduce thermal maintenance load on Rui, SoC specific patches would be
applied by me now. Rui Zhang will pull in these changes from time to time (at
rc's). Additionally I would be sending him pull request for every merge
window and rc's (for fixes).
Branch names would be: next and fixes.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Fix build error in x86_pkg_temp_thermal.c. It requires that
X86_MCE & X86_THERMAL_VECTOR be enabled, so depend on the latter symbol,
since it depends on X86_MCE (indirectly).
Also, X86_PKG_TEMP_THERMAL is already inside an "if THERMAL" block,
so remove that duplicated dependency.
ERROR: "platform_thermal_package_rate_control" [drivers/thermal/x86_pkg_temp_thermal.ko] undefined!
ERROR: "platform_thermal_package_notify" [drivers/thermal/x86_pkg_temp_thermal.ko] undefined!
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Added documentation describing details of the x86 package temperature
thermal driver.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This driver register CPU digital temperature sensor as a thermal
zone at package level.
Each package will show up as one zone with at max two trip points.
These trip points can be both read and updated. Once a non zero
value is set in the trip point, if the package package temperature
goes above or below this setting, a thermal notification is
generated.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
In case emulated temperature is in use, using the trend
provided by driver layer can lead to bogus situation.
In this case, debugger user would set a temperature value,
but the trend would be from driver computation.
To avoid this situation, this patch changes the get_tz_trend()
to consider the emulated temperature whenever that is in use.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Amit Daniel Kachhap <amit.daniel@samsung.com>
Cc: Durgadoss R <durgadoss.r@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This patch adds the thermal data for TI DRA752 chips.
In this change it includes (autogen):
. Register offset definitions
. Bitfields and masks for all registers
. Conversion table
Also, the thermal limits, thresholds and extrapolation
rules are included. The extrapolation rule is simply
add +2C as margin.
All 5 sensors, MPU, GPU, CORE, DSPEVE and IVA, are defined
and exposed. Only MPU has cooling device.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This patch changes the driver to avoid the usage of IS_ERR_OR_NULL()
macro. This macro can lead to dangerous results, like returning
success (0) during a failure scenario (NULL pointer handling).
For this reason this patch is changing the driver after
revisiting the code. These are the cases:
i. For cases in which IS_ERR_OR_NULL() is used for checking
return values of functions that returns either PTR_ERR()
or a valid pointer, it has been translated to IS_ERR() check only.
ii. For cases that a NULL check is still needed, it has been
translated to if (!ptr || IS_ERR(ptr)).
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
In order to read the history buffer, it is required to
freeze BG FSM. This patch adds the missing piece of code
to freeze the FSM and also a contention area to avoid
other parts of the code to access the DTEMPs.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
For boards that provide a PCB sensor close to SoC junction
temperature, it is possible to remove the cumulative heat
reported by the SoC temperature sensor.
This patch changes the extrapolation computation to consider
an external sensor in the extrapolation equations.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Added callback registration for package threshold reports. Also added
a callback to check the rate control implemented in callback or not.
If there is no rate control implemented, then there is a default rate
control similar to core threshold notification by delaying for
CHECK_INTERVAL (5 minutes) between reports.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
The variable 'descend' is initialized as -1 in function get_property(),
and will never get any chance to be updated by the following code.
if (freq != CPUFREQ_ENTRY_INVALID && descend != -1)
descend = !!(freq > table[i].frequency);
This makes function get_property() return the wrong frequency for given
cooling level if the frequency table is sorted in ascending. Fix it
by correcting the 'descend' check in if-condition to 'descend == -1'.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
'spear_thermal_id_table' is always compiled in and the driver
is dependent on OF. Hence use of of_match_ptr is unnecessary.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Vincenzo Frascino <vincenzo.frascino@st.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
'kirkwood_thermal_id_table' is always compiled in and the driver
is dependent on OF. Hence use of of_match_ptr is unnecessary.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Acked-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
'dove_thermal_id_table' is always compiled in and the driver
is dependent on OF. Hence use of of_match_ptr is unnecessary.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Acked-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
'armada_thermal_id_table' is always compiled in and the driver
is dependent on OF. Hence use of of_match_ptr is unnecessary.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Acked-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Update driver path and status for TI SoC thermal drivers
on MAINTAINERS file.
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
devm_ioremap_resource does sanity checks on the given resource.
No need to duplicate this in the driver.
CC: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
CC: Vincenzo Frascino <vincenzo.frascino@st.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
This patch adds a requirement needing .get_trip_temp() callback
function for registering thermal zone device. This function is
used when thermal zone is updated and essential where thermal core
handles thermal trip based only polling way not hw interrupt.
Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Acked-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Setting policy results in invalid value error.
% echo "step_wise" > policy
% echo: write error: Invalid argument
Need clean up of the buffer which "echo" may add based on the arguments, before
comparing aganist list of governor names.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Eduardo Valentin <eduardo.valentin@ti.com>
Tested-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Use the newly introduced devm_ioremap_resource().
devm_ioremap_resource() provides its own error messages; so all explicit
error messages can be removed from the failure code paths.
CC: Vincenzo Frascino <vincenzo.frascino@st.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Also, Masami Hiramatsu fixed up some minor bugs that were discovered
by sparse.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJRk+PYAAoJEOdOSU1xswtMgO4H/AlePQ4IEOfXgy43Xx8S7Uew
FmvHYmUi5l6UuFRLxcVZBsnSqo3i43UGyj9Lm+g1wBwNP3IDEf+t+fVaN4UP18KW
C+5OoEWyJLe4BlQoVsdBV1+lmivacG3uczvAPY6fibyTqbJN67uzufPLw8ruBMOo
dIIXWUR1sKRyy9+SF23q0rwf5ZfuavlMnmeTZ6omk0evbVT2q0lbpUeN/V1Rrxmc
ZUObTyoFvVMvPYnwutGh7+QB/o0LR9C6MPyyFTvz/o9bD9CDXdbTvSQatoYrbEm8
9/0hpetm+KJpa2M9mf4djeXWCIFX3gBuQ156LEFhS68Ug+HDWQ7Yv9iJQmmR6OE=
=ZToX
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This includes a fix to a memory leak when adding filters to traces.
Also, Masami Hiramatsu fixed up some minor bugs that were discovered
by sparse."
* tag 'trace-fixes-v3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/kprobes: Make print_*probe_event static
tracing/kprobes: Fix a sparse warning for incorrect type in assignment
tracing/kprobes: Use rcu_dereference_raw for tp->files
tracing: Fix leaks of filter preds
Pull x86 fixes from Thomas Gleixner:
- Fix for a CPU hot-add deadlock in microcode update code
- Fix for idle consolidation fallout
- Documentation update for initial kernel direct mapping
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Add missing comments for initial kernel direct mapping
x86/microcode: Add local mutex to fix physical CPU hot-add deadlock
x86: Fix idle consolidation fallout
Pull perf fixes from Thomas Gleixner:
- Fix for a task exit cleanup race caused by a missing a preempt
disable
- Cleanup of the event notification functions with a massive reduction
of duplicated code
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Factor out auxiliary events notification
perf: Fix EXIT event notification
Pull timer fixes from Thomas Gleixner:
- Cure for not using zalloc in the first place, which leads to random
crashes with CPUMASK_OFF_STACK.
- Revert a user space visible change which broke udev
- Add a missing cpu_online early return introduced by the new full
dyntick conversions
- Plug a long standing race in the timer wheel cpu hotplug code.
Sigh...
- Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu
up.
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE
tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
tick: Cleanup NOHZ per cpu data on cpu down
tick: Use zalloc_cpumask_var for allocating offstack cpumasks
Pull core fixes from Thomas Gleixner:
- Two fixlets for the fallout of the generic idle task conversion
- Documentation update
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu/idle: Wrap cpu-idle poll mode within rcu_idle_enter/exit
idle: Fix hlt/nohlt command-line handling in new generic idle
kthread: Document ways of reducing OS jitter due to per-CPU kthreads
Pull ARM fixes from Russell King:
"A small number of fixes for stuff from the last merge window, and in
one case (IRQ time accounting) the previous merge window."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7720/1: ARM v6/v7 cmpxchg64 shouldn't clear upper 32 bits of the old/new value
ARM: 7715/1: MCPM: adapt to GIC changes after upstream merge
ARM: 7714/1: mmc: mmci: Ensure return value of regulator_enable() is checked
ARM: 7712/1: Remove trailing whitespace in arch/arm/Makefile
ARM: 7711/1: dove: fix Dove cpu type from V7 to PJ4
ARM: finally enable IRQ time accounting config
Pull Ceph fixes from Sage Weil:
"Yes, this is a much larger pull than I would like after -rc1. There
are a few things included:
- a few fixes for leaks and incorrect assertions
- a few patches fixing behavior when mapped images are resized
- handling for cloned/layered images that are flattened out from
underneath the client
The last bit was non-trivial, and there is some code movement and
associated cleanup mixed in. This was ready and was meant to go in
last week but I missed the boat on Friday. My only excuse is that I
was waiting for an all clear from the testing and there were many
other shiny things to distract me.
Strictly speaking, handling the flatten case isn't a regression and
could wait, so if you like we can try to pull the series apart, but
Alex and I would much prefer to have it all in as it is a case real
users will hit with 3.10."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (33 commits)
rbd: re-submit flattened write request (part 2)
rbd: re-submit write request for flattened clone
rbd: re-submit read request for flattened clone
rbd: detect when clone image is flattened
rbd: reference count parent requests
rbd: define parent image request routines
rbd: define rbd_dev_unparent()
rbd: don't release write request until necessary
rbd: get parent info on refresh
rbd: ignore zero-overlap parent
rbd: support reading parent page data for writes
rbd: fix parent request size assumption
libceph: init sent and completed when starting
rbd: kill rbd_img_request_get()
rbd: only set up watch for mapped images
rbd: set mapping read-only flag in rbd_add()
rbd: support reading parent page data
rbd: fix an incorrect assertion condition
rbd: define rbd_dev_v2_header_info()
rbd: get rid of trivial v1 header wrappers
...
According to sparse warning, print_*probe_event static because
those functions are not directly called from outside.
Link: http://lkml.kernel.org/r/20130513115839.6545.83067.stgit@mhiramat-M0-7522
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Use rcu_dereference_raw() for accessing tp->files. Because the
write-side uses rcu_assign_pointer() for memory barrier,
the read-side also has to use rcu_dereference_raw() with
read memory barrier.
Link: http://lkml.kernel.org/r/20130513115834.6545.17022.stgit@mhiramat-M0-7522
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Special preds are created when folding a series of preds that
can be done in serial. These are allocated in an ops field of
the pred structure. But they were never freed, causing memory
leaks.
This was discovered using the kmemleak checker:
unreferenced object 0xffff8800797fd5e0 (size 32):
comm "swapper/0", pid 1, jiffies 4294690605 (age 104.608s)
hex dump (first 32 bytes):
00 00 01 00 03 00 05 00 07 00 09 00 0b 00 0d 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff814b52af>] kmemleak_alloc+0x73/0x98
[<ffffffff8111ff84>] kmemleak_alloc_recursive.constprop.42+0x16/0x18
[<ffffffff81120e68>] __kmalloc+0xd7/0x125
[<ffffffff810d47eb>] kcalloc.constprop.24+0x2d/0x2f
[<ffffffff810d4896>] fold_pred_tree_cb+0xa9/0xf4
[<ffffffff810d3781>] walk_pred_tree+0x47/0xcc
[<ffffffff810d5030>] replace_preds.isra.20+0x6f8/0x72f
[<ffffffff810d50b5>] create_filter+0x4e/0x8b
[<ffffffff81b1c30d>] ftrace_test_event_filter+0x5a/0x155
[<ffffffff8100028d>] do_one_initcall+0xa0/0x137
[<ffffffff81afbedf>] kernel_init_freeable+0x14d/0x1dc
[<ffffffff814b24b7>] kernel_init+0xe/0xdb
[<ffffffff814d539c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: stable@vger.kernel.org # 2.6.39+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.
In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.
While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.
Reported-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org> #3.9
Cc: Feng Tang <feng.tang@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
regression) introduced during the 3.10-rc1 merge window. Also
included is a bug fix relating to allocating blocks after resizing an
ext3 file system when using the ext4 file system driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRkZBlAAoJENNvdpvBGATwLYQP/iWOBs2z93WG23cqkgqvL8o6
ZyeJdgy9dkFCArVDX5SSnGkJXZ3iqIKi5HoTKTJKfytgMzgiDAZcLsIHVv6NczwR
UGhjgS3HEdV5tJ46E6JnpB3NLSb+rAdc5kCdlsbzU46CP+JjFiYEhxVpK7ELuM/G
yctChbIH9FY+1OwxHccacBOaJU2ELhnH6B/8Ry/6gM2H0vfKeTNOdocOHdxvbNqg
ooGjytMfVopMQEfVG8aXtTfy341NFJH5fAYEahCcXxeO9ta6Unj9yOu5JV2wVrTt
39+DBsquGX6AVQsc9IxJ6YAN6ldwWN7l3huE9/AI0o/alwGsfVi5M+M/d1MMjDqf
Fgl2EzzBpZQeKKY9UXNi4LLgYdBiILMgKDOGoRKhRb8ynSSf/JX43+24FvidEi3o
o//J4aR+oSZfaovGAeikqyF1cumayhoNN8MINRN8igIinBiC4GjBFEl/Kl/1eAY/
lREGcsmYPXOkVPpM72waRYlP4GwNdOg4QSEY0SGljpwluO+dYtKQjHXcv/s/xL5v
j3GemzYVyjx4zaq1g3PxGfuD6VKFHr0T6jvzd6cHu17lnPlw9fwznHbEm9BEcXDY
gbGx9u+a2ZTqDwYVALbeoRpf9Zz6DUCse3ts4N3rbkXUQQiBYo7tybfVopIMAukb
CexvidDE/ryJrJJFBwoK
=6cRD
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 update from Ted Ts'o:
"Fixed regressions (two stability regressions and a performance
regression) introduced during the 3.10-rc1 merge window.
Also included is a bug fix relating to allocating blocks after
resizing an ext3 file system when using the ext4 file system driver"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd,jbd2: fix oops in jbd2_journal_put_journal_head()
ext4: revert "ext4: use io_end for multiple bios"
ext4: limit group search loop for non-extent files
ext4: fix fio regression
Pull workqueue fix from Tejun Heo:
"A fix for a workqueue_congested() regression that broke fscache"
* 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: workqueue_congested() shouldn't translate WORK_CPU_UNBOUND into node number
An inactive timer's base can refer to a offline cpu's base.
In the current code, cpu_base's lock is blindly reinitialized each
time a CPU is brought up. If a CPU is brought online during the period
that another thread is trying to modify an inactive timer on that CPU
with holding its timer base lock, then the lock will be reinitialized
under its feet. This leads to following SPIN_BUG().
<0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466
<0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1
<4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc)
<4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30)
<4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310)
<4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120)
<4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c)
<4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48)
<4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0)
<4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc)
<4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4)
<4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484)
<4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0)
<4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c)
<4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8)
As an example, this particular crash occurred when CPU #3 is executing
mod_timer() on an inactive timer whose base is refered to offlined CPU
#2. The code locked the timer_base corresponding to CPU #2. Before it
could proceed, CPU #2 came online and reinitialized the spinlock
corresponding to its base. Thus now CPU #3 held a lock which was
reinitialized. When CPU #3 finally ended up unlocking the old cpu_base
corresponding to CPU #2, we hit the above SPIN_BUG().
CPU #0 CPU #3 CPU #2
------ ------- -------
..... ...... <Offline>
mod_timer()
lock_timer_base
spin_lock_irqsave(&base->lock)
cpu_up(2) ..... ......
init_timers_cpu()
.... ..... spin_lock_init(&base->lock)
..... spin_unlock_irqrestore(&base->lock) ......
<spin_bug>
Allocation of per_cpu timer vector bases is done only once under
"tvec_base_done[]" check. In the current code, spinlock_initialization
of base->lock isn't under this check. When a CPU is up each time the
base lock is reinitialized. Move base spinlock initialization under
the check.
Signed-off-by: Tirupathi Reddy <tirupath@codeaurora.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Bjørn Mork reported the following warning when running powertop.
[ 49.289034] ------------[ cut here ]------------
[ 49.289055] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common.isra.48+0x3d/0x125()
[ 49.289244] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-bisect-rcu-warn+ #107
[ 49.289251] ffffffff8157d8c8 ffffffff81801e28 ffffffff8137e4e3 ffffffff81801e68
[ 49.289260] ffffffff8103094f ffffffff81801e68 0000000000000000 ffff88023afcd9b0
[ 49.289268] 0000000000000000 0140000000000000 ffff88023bee7700 ffffffff81801e78
[ 49.289276] Call Trace:
[ 49.289285] [<ffffffff8137e4e3>] dump_stack+0x19/0x1b
[ 49.289293] [<ffffffff8103094f>] warn_slowpath_common+0x62/0x7b
[ 49.289300] [<ffffffff8103097d>] warn_slowpath_null+0x15/0x17
[ 49.289306] [<ffffffff810a9006>] rcu_eqs_exit_common.isra.48+0x3d/0x125
[ 49.289314] [<ffffffff81079b49>] ? trace_hardirqs_off_caller+0x37/0xa6
[ 49.289320] [<ffffffff810a9692>] rcu_idle_exit+0x85/0xa8
[ 49.289327] [<ffffffff8107076e>] trace_cpu_idle_rcuidle+0xae/0xff
[ 49.289334] [<ffffffff810708b1>] cpu_startup_entry+0x72/0x115
[ 49.289341] [<ffffffff813689e5>] rest_init+0x149/0x150
[ 49.289347] [<ffffffff8136889c>] ? csum_partial_copy_generic+0x16c/0x16c
[ 49.289355] [<ffffffff81a82d34>] start_kernel+0x3f0/0x3fd
[ 49.289362] [<ffffffff81a8274c>] ? repair_env_string+0x5a/0x5a
[ 49.289368] [<ffffffff81a82481>] x86_64_start_reservations+0x2a/0x2c
[ 49.289375] [<ffffffff81a82550>] x86_64_start_kernel+0xcd/0xd1
[ 49.289379] ---[ end trace 07a1cc95e29e9036 ]---
The warning is that 'rdtp->dynticks' has an unexpected value, which roughly
translates to - the calls to rcu_idle_enter() and rcu_idle_exit() were not
made in the correct order, or otherwise messed up.
And Bjørn's painstaking debugging indicated that this happens when the idle
loop enters the poll mode. Looking at the poll function cpu_idle_poll(), and
the implementation of trace_cpu_idle_rcuidle(), the problem becomes very clear:
cpu_idle_poll() lacks calls to rcu_idle_enter/exit(), and trace_cpu_idle_rcuidle()
calls them in the reverse order - first rcu_idle_exit(), and then rcu_idle_enter().
Hence the even/odd alternative sequencing of rdtp->dynticks goes for a toss.
And powertop readily triggers this because powertop uses the idle-tracing
infrastructure extensively.
So, to fix this, wrap the code in cpu_idle_poll() within rcu_idle_enter/exit(),
so that it blends properly with the calls inside trace_cpu_idle_rcuidle() and
thus get the function ordering right.
Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/519169BF.4080208@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict
idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
to bail out when the cpu is offline. That's causing subsequent
failures as an offline CPU is supposed to die and not to fiddle with
nohz magic.
Return false in can_stop_idle_tick() if the cpu is offline.
Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz>
Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull powerpc fixes from Benjamin Herrenschmidt:
"This is mostly bug fixes (some of them regressions, some of them I
deemed worth merging now) along with some patches from Li Zhong
hooking up the new context tracking stuff (for the new full NO_HZ)"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
powerpc: Set show_unhandled_signals to 1 by default
powerpc/perf: Fix setting of "to" addresses for BHRB
powerpc/pmu: Fix order of interpreting BHRB target entries
powerpc/perf: Move BHRB code into CONFIG_PPC64 region
powerpc: select HAVE_CONTEXT_TRACKING for pSeries
powerpc: Use the new schedule_user API on userspace preemption
powerpc: Exit user context on notify resume
powerpc: Exception hooks for context tracking subsystem
powerpc: Syscall hooks for context tracking subsystem
powerpc/booke64: Fix kernel hangs at kernel_dbg_exc
powerpc: Fix irq_set_affinity() return values
powerpc: Provide __bswapdi2
powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3
powerpc/powernv: Detect OPAL v3 API version
powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FS
powerpc: Bring all threads online prior to migration/hibernation
powerpc/rtas_flash: Fix validate_flash buffer overflow issue
powerpc/kexec: Fix kexec when using VMX optimised memcpy
powerpc: Fix build errors STRICT_MM_TYPECHECKS
...