linux-hardened/drivers
Daniel Lezcano 07209fcf33 thermal/drivers/step_wise: Fix temperature regulation misbehavior
There is a particular situation when the cooling device is cpufreq and the heat
dissipation is not efficient enough where the temperature increases little by
little until reaching the critical threshold and leading to a SoC reset.

The behavior is reproducible on a hikey6220 with bad heat dissipation (eg.
stacked with other boards).

Running a simple C program doing while(1); for each CPU of the SoC makes the
temperature to reach the passive regulation trip point and ends up to the
maximum allowed temperature followed by a reset.

This issue has been also reported by running the libhugetlbfs test suite.

What is observed is a ping pong between two cpu frequencies, 1.2GHz and 900MHz
while the temperature continues to grow.

It appears the step wise governor calls get_target_state() the first time with
the throttle set to true and the trend to 'raising'. The code selects logically
the next state, so the cpu frequency decreases from 1.2GHz to 900MHz, so far so
good. The temperature decreases immediately but still stays greater than the
trip point, then get_target_state() is called again, this time with the
throttle set to true *and* the trend to 'dropping'. From there the algorithm
assumes we have to step down the state and the cpu frequency jumps back to
1.2GHz. But the temperature is still higher than the trip point, so
get_target_state() is called with throttle=1 and trend='raising' again, we jump
to 900MHz, then get_target_state() is called with throttle=1 and
trend='dropping', we jump to 1.2GHz, etc ... but the temperature does not
stabilizes and continues to increase.

[  237.922654] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[  237.922678] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1
[  237.922690] thermal cooling_device0: cur_state=0
[  237.922701] thermal cooling_device0: old_target=0, target=1
[  238.026656] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1
[  238.026680] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1
[  238.026694] thermal cooling_device0: cur_state=1
[  238.026707] thermal cooling_device0: old_target=1, target=0
[  238.134647] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[  238.134667] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1
[  238.134679] thermal cooling_device0: cur_state=0
[  238.134690] thermal cooling_device0: old_target=0, target=1

In this situation the temperature continues to increase while the trend is
oscillating between 'dropping' and 'raising'. We need to keep the current state
untouched if the throttle is set, so the temperature can decrease or a higher
state could be selected, thus preventing this oscillation.

Keeping the next_target untouched when 'throttle' is true at 'dropping' time
fixes the issue.

The following traces show the governor does not change the next state if
trend==2 (dropping) and throttle==1.

[ 2306.127987] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[ 2306.128009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1
[ 2306.128021] thermal cooling_device0: cur_state=0
[ 2306.128031] thermal cooling_device0: old_target=0, target=1
[ 2306.231991] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1
[ 2306.232016] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1
[ 2306.232030] thermal cooling_device0: cur_state=1
[ 2306.232042] thermal cooling_device0: old_target=1, target=1
[ 2306.335982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1
[ 2306.336006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=1
[ 2306.336021] thermal cooling_device0: cur_state=1
[ 2306.336034] thermal cooling_device0: old_target=1, target=1
[ 2306.439984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1
[ 2306.440008] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0
[ 2306.440022] thermal cooling_device0: cur_state=1
[ 2306.440034] thermal cooling_device0: old_target=1, target=0

[ ... ]

After a while, if the temperature continues to increase, the next state becomes
2 which is 720MHz on the hikey. That results in the temperature stabilizing
around the trip point.

[ 2455.831982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[ 2455.832006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=0
[ 2455.832019] thermal cooling_device0: cur_state=1
[ 2455.832032] thermal cooling_device0: old_target=1, target=1
[ 2455.935985] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1
[ 2455.936013] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0
[ 2455.936027] thermal cooling_device0: cur_state=1
[ 2455.936040] thermal cooling_device0: old_target=1, target=1
[ 2456.043984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1
[ 2456.044009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0
[ 2456.044023] thermal cooling_device0: cur_state=1
[ 2456.044036] thermal cooling_device0: old_target=1, target=1
[ 2456.148001] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[ 2456.148028] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1
[ 2456.148042] thermal cooling_device0: cur_state=1
[ 2456.148055] thermal cooling_device0: old_target=1, target=2
[ 2456.252009] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1
[ 2456.252041] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0
[ 2456.252058] thermal cooling_device0: cur_state=2
[ 2456.252075] thermal cooling_device0: old_target=2, target=1

IOW, this change is needed to keep the state for a cooling device if the
temperature trend is oscillating while the temperature increases slightly.

Without this change, the situation above leads to a catastrophic crash by a
hardware reset on hikey. This issue has been reported to happen on an OMAP
dra7xx also.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Keerthy <j-keerthy@ti.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Tested-by: Keerthy <j-keerthy@ti.com>
Reviewed-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-10-31 19:32:17 -07:00
..
accessibility
acpi dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
amba
android ANDROID: binder: don't queue async transactions to thread. 2017-09-01 09:22:50 +02:00
ata Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2017-09-06 22:41:21 -07:00
atm
auxdisplay
base firmware: Restore support for built-in firmware 2017-09-16 10:58:48 -07:00
bcma
block The highlights include: 2017-09-12 20:03:53 -07:00
bluetooth
bus ARM: SoC driver updates for v4.14 2017-09-10 20:40:00 -07:00
cdrom
char dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
clk The diff is dominated by the Allwinner A10/A20 SoCs getting converted to 2017-09-13 11:04:14 -07:00
clocksource Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
connector
cpufreq dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
cpuidle Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
crypto dmaengine updates for 4.14-rc1 2017-09-07 14:03:05 -07:00
dax - Some request-based DM core and DM multipath fixes and cleanups 2017-09-14 13:43:16 -07:00
dca
devfreq
dio
dma dmaengine updates for 4.14-rc1 2017-09-07 14:03:05 -07:00
dma-buf
edac
eisa
extcon
firewire
firmware dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
fmc
fpga
fsi
gpio - New Drivers 2017-09-07 13:51:13 -07:00
gpu amd fixes pull 2017-09-15 17:52:52 -07:00
hid media updates for v4.14-rc1 2017-09-07 12:53:14 -07:00
hsi
hv Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-07 09:25:15 -07:00
hwmon dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
hwspinlock
hwtracing
i2c i2c: i2c-stm32f7: add driver 2017-09-14 17:34:43 +02:00
ide Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
idle Power management updates for v4.14-rc1 2017-09-05 12:19:08 -07:00
iio - New Drivers 2017-09-07 13:51:13 -07:00
infiniband IB/mlx4: fix sprintf format warning 2017-09-13 18:53:15 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2017-09-16 11:24:26 -07:00
iommu IOMMU Updates for Linux v4.14 2017-09-09 15:03:24 -07:00
ipack
irqchip Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
isdn isdn: isdnloop: fix logic error in isdnloop_sendbuf 2017-09-07 20:03:54 -07:00
leds dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
lightnvm
macintosh powerpc/macintosh: constify wf_sensor_ops structures 2017-09-01 16:42:54 +10:00
mailbox Just behavorial changes to a controller driver: 2017-09-07 13:23:37 -07:00
mcb Char/Misc drivers for 4.14-rc1 2017-09-05 11:08:17 -07:00
md - Some request-based DM core and DM multipath fixes and cleanups 2017-09-14 13:43:16 -07:00
media Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
memory ARM: SoC driver updates for v4.14 2017-09-10 20:40:00 -07:00
memstick
message
mfd dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
misc mm: treewide: remove GFP_TEMPORARY allocation flag 2017-09-13 18:53:16 -07:00
mmc mmc: renesas_sdhi: Add r8a7743/5 support 2017-09-01 15:31:01 +02:00
mtd This pull request contains updates for UBI: 2017-09-16 12:08:10 -07:00
mux
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-16 11:28:59 -07:00
nfc
ntb
nubus
nvdimm - Some request-based DM core and DM multipath fixes and cleanups 2017-09-14 13:43:16 -07:00
nvme nvme-pci: implement the HMB entry number and size limitations 2017-09-11 12:29:40 -04:00
nvmem
of dma-mapping updates for 4.14: 2017-09-12 13:30:06 -07:00
oprofile
parisc
parport Char/Misc drivers for 4.14-rc1 2017-09-05 11:08:17 -07:00
pci Revert "PCI: Avoid race while enabling upstream bridges" 2017-09-15 01:33:51 -05:00
pcmcia
perf
phy Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
pinctrl pinctrl/amd: save pin registers over suspend/resume 2017-09-12 15:58:45 +02:00
platform dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
pnp dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
power power supply and reset changes for the v4.14 series 2017-09-09 14:44:39 -07:00
powercap
pps drivers/pps: use surrounding "if PPS" to remove numerous dependency checks 2017-09-08 18:26:51 -07:00
ps3
ptp
pwm pwm: Changes for v4.14-rc1 2017-09-11 13:04:32 -07:00
rapidio
ras
regulator - New Drivers 2017-09-07 13:51:13 -07:00
remoteproc rpmsg updates for v4.14 2017-09-09 14:34:38 -07:00
reset Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
rpmsg rpmsg: glink: initialize ret to zero to ensure error status check is correct 2017-09-04 10:52:30 -07:00
rtc RTC for 4.14 2017-09-13 10:56:00 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2017-09-12 06:01:59 -07:00
sbus
scsi SCSI misc on 20170913 2017-09-13 10:47:14 -07:00
sfi
sh
sn
soc Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
spi ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
spmi
ssb
staging Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:54:01 -07:00
target Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
tc
tee
thermal thermal/drivers/step_wise: Fix temperature regulation misbehavior 2017-10-31 19:32:17 -07:00
thunderbolt ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
tty dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
uio
usb Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
uwb
vfio vfio: platform: constify amba_id 2017-08-30 14:03:42 -06:00
vhost lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
video fbdev changes for v4.14: 2017-09-14 13:33:33 -07:00
virt
virtio SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
vlynq
vme
w1 power supply and reset changes for the v4.14 series 2017-09-09 14:44:39 -07:00
watchdog Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
xen mm: treewide: remove GFP_TEMPORARY allocation flag 2017-09-13 18:53:16 -07:00
zorro
Kconfig
Makefile