linux-hardened/drivers
Youquan Song 69a37beabf cpuidle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat)
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
 governor will predict it is repeat mode and there is another IPI wake up idle
 CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
idle. However, in the turbostat, following 10 registers reading is sleep 5
seconds by default, so the idle CPU will keep at C1 for a long time though it is
 idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively.
When repeat mode does not happen, the timer will be time out and menu governor
will quickly notice that the repeat mode prediction fails and then re-evaluates
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/time.h>
#include <time.h>
#include <pthread.h>

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
	fprintf(stderr,
		"Usage: idle_predict [options]\n"
		"  --help	-h  Print this help\n"
		"  --thread	-n  Thread number\n"
		"  --loop     	-l  Loop times in shallow Cstate\n"
		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
	int idle_num = 1;
	while (!(*shutdown)) {
		*count = *count + 1;

		if (idle_num % loop)
			usleep(delay);
		else {
			/* sleep 1 second */
			usleep(1000000);
			idle_num = 0;
		}
		idle_num++;
	}

}

static void sighand(int sig)
{
	*shutdown = 1;
}

int main(int argc, char *argv[])
{
	sigset_t sigset;
	int signum = SIGALRM;
	int i, c, er = 0, thread_num = 8;
	pthread_t pt[1024];

	static char optstr[] = "n:l:t:h:";

	while ((c = getopt(argc, argv, optstr)) != EOF)
		switch (c) {
			case 'n':
				thread_num = atoi(optarg);
				break;
			case 'l':
				loop = atoi(optarg);
				break;
			case 't':
				delay = atoi(optarg);
				break;
			case 'h':
			default:
				usage();
				exit(1);
		}

	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
	count = malloc(sizeof(long));
	shutdown = malloc(sizeof(int));
	*count = 0;
	*shutdown = 0;

	sigemptyset(&sigset);
	sigaddset(&sigset, signum);
	sigprocmask (SIG_BLOCK, &sigset, NULL);
	signal(SIGINT, sighand);
	signal(SIGTERM, sighand);

	for(i = 0; i < thread_num ; i++)
		pthread_create(&pt[i], NULL, simple_loop, NULL);

	for (i = 0; i < thread_num; i++)
		pthread_join(pt[i], NULL);

	exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.

While after patched the kernel, we find that deep C-state will keep >99.6%.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:19 +01:00
..
accessibility
acpi ACPI video: Ignore errors after _DOD evaluation. 2012-11-03 09:52:54 +08:00
amba
ata Merge branch 'samsung_platform_data' into staging/for_v3.7 2012-10-05 22:32:05 -03:00
atm sections: fix section conflicts in drivers/atm 2012-10-06 03:04:40 +09:00
auxdisplay
base Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc 2012-11-10 21:58:34 +01:00
bcma bcma: fix unregistration of cores 2012-10-15 14:45:51 -04:00
block loop: Make explicit loop device destruction lazy 2012-10-30 08:37:31 +01:00
bluetooth
bus
cdrom
char sonypi: suspend/resume callbacks should be conditionally compiled on CONFIG_PM_SLEEP 2012-10-25 12:05:50 -07:00
clk spi: Updates for v3.7 2012-10-02 17:26:42 -07:00
clocksource Power management updates for 3.7-rc1 2012-10-02 18:32:35 -07:00
connector
cpufreq cpufreq / powernow-k8: Change maintainer's email address 2012-10-31 21:02:57 +01:00
cpuidle cpuidle: Quickly notice prediction failure for repeat mode 2012-11-15 00:34:19 +01:00
crypto ARM: soc: late platform updates 2012-10-07 20:55:16 +09:00
dca
devfreq
dio
dma Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma 2012-10-26 14:59:01 -07:00
edac amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[] 2012-10-24 16:13:27 +02:00
eisa
extcon extcon : register for cable interest by cable name 2012-10-23 16:32:18 +09:00
firewire firewire: cdev: fix user memory corruption (i386 userland on amd64 kernel) 2012-10-09 18:26:28 +02:00
firmware firmware/memmap: avoid type conflicts with the generic memmap_init() 2012-10-19 14:07:47 -07:00
gpio Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc 2012-11-10 21:58:34 +01:00
gpu drm/vmwgfx: Fix a case where the code would BUG when trying to pin GMR memory 2012-11-09 20:49:06 +10:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2012-11-09 06:56:23 +01:00
hsi
hv Drivers: hv: Cleanup error handling in vmbus_open() 2012-10-24 15:46:27 -07:00
hwmon hwmon: Fix chip feature table headers 2012-11-05 21:54:40 +01:00
hwspinlock
i2c Merge branch 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux 2012-11-03 15:14:54 -07:00
ide sections: fix section conflicts in drivers/ide 2012-10-06 03:04:41 +09:00
idle
iio iio: Remove duplicates for light/ in Kconfig and Makefile 2012-10-19 19:44:06 +01:00
infiniband Merge branches 'cxgb4' and 'mlx4' into for-next 2012-10-23 09:03:49 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2012-11-02 16:11:15 -07:00
iommu iommu/tegra: smmu: Fix deadly typo 2012-10-24 16:58:53 +02:00
irqchip
isdn isdn: Make CONFIG_ISDN depend on CONFIG_NETDEVICES 2012-11-07 18:59:26 -05:00
leds Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds 2012-10-10 20:14:07 +09:00
lguest Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux 2012-10-07 21:04:56 +09:00
macintosh Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2012-10-06 03:16:12 +09:00
md MD RAID10: Fix oops when creating RAID10 arrays via dm-raid.c 2012-10-31 11:42:30 +11:00
media [media] Kconfig: Fix dependencies for driver autoselect options 2012-10-17 16:45:56 -03:00
memory
memstick
message
mfd 1. New drivers: 2012-10-07 17:29:24 +09:00
misc pwm: Changes for v3.7-rc1 2012-10-10 20:15:24 +09:00
mmc mmc: sdhci-s3c: fix the card detection in runtime-pm 2012-11-07 15:40:52 -05:00
mtd mtd: Disable mtdchar mmap on MMU systems 2012-10-09 15:08:42 +01:00
net gianfar: ethernet vanishes after restoring from hibernation 2012-11-09 17:08:36 -05:00
nfc
nubus
of of/platform: sparse fix 2012-10-17 15:53:03 -05:00
oprofile mm: use mm->exe_file instead of first VM_EXECUTABLE vma->vm_file 2012-10-09 16:22:18 +09:00
parisc
parport Xtensa patchset for 3.7 2012-10-09 16:11:46 +09:00
pci PCI/portdrv: Don't create hotplug slots unless port supports hotplug 2012-11-05 16:59:59 -07:00
pcmcia Merge branch 'testing/driver-warnings' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into fixes 2012-10-19 15:40:18 -07:00
pinctrl pinctrl: samsung and exynos need to depend on OF && GPIOLIB 2012-11-06 10:02:14 +01:00
platform Merge branches 'fixes-for-37', 'ec' and 'thermal' into release 2012-10-09 01:47:35 -04:00
pnp
power Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux 2012-10-13 11:27:59 +09:00
pps idr: rename MAX_LEVEL to MAX_IDR_LEVEL 2012-10-06 03:04:56 +09:00
ps3
ptp
pwm pwm: Changes for v3.7-rc1 2012-10-10 20:15:24 +09:00
rapidio rapidio: update for destination ID allocation 2012-10-11 08:50:15 +09:00
regulator MFD bits for the 3.7 merge window. 2012-10-05 12:01:30 +09:00
remoteproc Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux 2012-10-07 21:04:56 +09:00
rpmsg Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux 2012-10-07 21:04:56 +09:00
rtc drivers/rtc/rtc-imxdi.c: add missing spin lock initialization 2012-10-25 14:37:53 -07:00
s390 s390/cio: fix length calculation in idset.c 2012-11-06 22:39:54 +01:00
sbus
scsi Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc 2012-11-10 21:58:34 +01:00
sfi
sh sh: Fix up more fallout from pointless ARM __iomem churn. 2012-10-15 14:08:48 +09:00
sn
spi spi: Some minor MXS fixes 2012-10-28 11:13:54 -07:00
ssb
staging Staging driver fixes for 3.7-rc3 2012-10-26 10:25:31 -07:00
target target: Fix incorrect usage of nested IRQ spinlocks in ABORT_TASK path 2012-11-01 00:38:45 -07:00
tc
thermal exynos4_tmu_driver_ids should be exynos_tmu_driver_ids. 2012-11-03 09:52:55 +08:00
tty Revert "serial: omap: fix software flow control" 2012-10-24 11:57:21 -07:00
uio mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
usb usb: gadget: g_ether: fix frame size check for 802.1Q 2012-11-07 21:12:26 -05:00
uwb
vfio vfio: Fix PCI INTx disable consistency 2012-10-10 09:10:32 -06:00
vhost vhost: fix mergeable bufs on BE hosts 2012-10-24 23:19:30 -04:00
video Bug-fixes: 2012-11-02 13:26:11 -07:00
virt
virtio virtio: Don't access index after unregister. 2012-11-09 14:54:24 +10:30
vlynq
vme
w1
watchdog Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2012-10-07 21:06:10 +09:00
xen Bug-fixes: 2012-11-10 06:56:21 +01:00
zorro
Kconfig
Makefile IPMI: Change link order 2012-10-16 18:07:12 -07:00