From 43e24e82f35291d4c1ca78877ce1b20d3aeb78f1 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 10 May 2017 16:57:49 +1000 Subject: [PATCH 1/7] powerpc/modules: If mprofile-kernel is enabled add it to vermagic On powerpc we can build the kernel with two different ABIs for mcount(), which is used by ftrace. Kernels built with one ABI do not know how to load modules built with the other ABI. The new style ABI is called "mprofile-kernel", for want of a better name. Currently if we build a module using the old style ABI, and the kernel with mprofile-kernel, when we load the module we'll oops something like: # insmod autofs4-no-mprofile-kernel.ko ftrace-powerpc: Unexpected instruction f8810028 around bl _mcount ------------[ cut here ]------------ WARNING: CPU: 6 PID: 3759 at ../kernel/trace/ftrace.c:2024 ftrace_bug+0x2b8/0x3c0 CPU: 6 PID: 3759 Comm: insmod Not tainted 4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 #11 ... NIP [c0000000001eaa48] ftrace_bug+0x2b8/0x3c0 LR [c0000000001eaff8] ftrace_process_locs+0x4a8/0x590 Call Trace: alloc_pages_current+0xc4/0x1d0 (unreliable) ftrace_process_locs+0x4a8/0x590 load_module+0x1c8c/0x28f0 SyS_finit_module+0x110/0x140 system_call+0x38/0xfc ... ftrace failed to modify [] 0xd000000002a31024 actual: 35:65:00:48 We can avoid this by including in the vermagic whether the kernel/module was built with mprofile-kernel. Which results in: # insmod autofs4-pg.ko autofs4: version magic '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 SMP mod_unload modversions ' should be '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269-dirty SMP mod_unload modversions mprofile-kernel' insmod: ERROR: could not insert module autofs4-pg.ko: Invalid module format Fixes: 8c50b72a3b4f ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel") Signed-off-by: Michael Ellerman Acked-by: Balbir Singh Acked-by: Jessica Yu Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/module.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h index 53885512b8d3..6c0132c7212f 100644 --- a/arch/powerpc/include/asm/module.h +++ b/arch/powerpc/include/asm/module.h @@ -14,6 +14,10 @@ #include +#ifdef CC_USING_MPROFILE_KERNEL +#define MODULE_ARCH_VERMAGIC "mprofile-kernel" +#endif + #ifndef __powerpc64__ /* * Thanks to Paul M for explaining this. From f48e91e87e67b56bef63393d1a02c6e22c1d7078 Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Mon, 8 May 2017 17:16:26 +1000 Subject: [PATCH 2/7] powerpc/tm: Fix FP and VMX register corruption In commit dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers"), a section of code was removed that copied the current state to checkpointed state. That code should not have been removed. When an FP (Floating Point) unavailable is taken inside a transaction, we need to abort the transaction. This is because at the time of the tbegin, the FP state is bogus so the state stored in the checkpointed registers is incorrect. To fix this, we treclaim (to get the checkpointed GPRs) and then copy the thread_struct FP live state into the checkpointed state. We then trecheckpoint so that the FP state is correctly restored into the CPU. The copying of the FP registers from live to checkpointed is what was missing. This simplifies the logic slightly from the original patch. tm_reclaim_thread() will now always write the checkpointed FP state. Either the checkpointed FP state will be written as part of the actual treclaim (in tm.S), or it'll be a copy of the live state. Which one we use is based on MSR[FP] from userspace. Similarly for VMX. Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") Cc: stable@vger.kernel.org # 4.9+ Signed-off-by: Michael Neuling Reviewed-by: cyrilbur@gmail.com Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/process.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index d645da302bf2..baae104b16c7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -864,6 +864,25 @@ static void tm_reclaim_thread(struct thread_struct *thr, if (!MSR_TM_SUSPENDED(mfmsr())) return; + /* + * If we are in a transaction and FP is off then we can't have + * used FP inside that transaction. Hence the checkpointed + * state is the same as the live state. We need to copy the + * live state to the checkpointed state so that when the + * transaction is restored, the checkpointed state is correct + * and the aborted transaction sees the correct state. We use + * ckpt_regs.msr here as that's what tm_reclaim will use to + * determine if it's going to write the checkpointed state or + * not. So either this will write the checkpointed registers, + * or reclaim will. Similarly for VMX. + */ + if ((thr->ckpt_regs.msr & MSR_FP) == 0) + memcpy(&thr->ckfp_state, &thr->fp_state, + sizeof(struct thread_fp_state)); + if ((thr->ckpt_regs.msr & MSR_VEC) == 0) + memcpy(&thr->ckvr_state, &thr->vr_state, + sizeof(struct thread_vr_state)); + giveup_all(container_of(thr, struct task_struct, thread)); tm_reclaim(thr, thr->ckpt_regs.msr, cause); From cde97f8492ac4425c4d0647a308e15e78cb4c218 Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Mon, 8 May 2017 17:16:27 +1000 Subject: [PATCH 3/7] selftests/powerpc: Test TM and VMX register state Test that the VMX checkpointed register state is maintained when a VMX unavailable exception is taken during a transaction. Thanks to Breno Leitao and Gustavo Bueno Romero for the original test this is based heavily on. Signed-off-by: Michael Neuling Reviewed-by: Cyril Bur [mpe: Add to .gitignore, always build it 64-bit to fix build errors] Signed-off-by: Michael Ellerman --- tools/testing/selftests/powerpc/tm/.gitignore | 1 + tools/testing/selftests/powerpc/tm/Makefile | 4 +- .../selftests/powerpc/tm/tm-vmx-unavail.c | 118 ++++++++++++++++++ 3 files changed, 122 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore index 427621792229..2f1f7b013293 100644 --- a/tools/testing/selftests/powerpc/tm/.gitignore +++ b/tools/testing/selftests/powerpc/tm/.gitignore @@ -11,3 +11,4 @@ tm-signal-context-chk-fpu tm-signal-context-chk-gpr tm-signal-context-chk-vmx tm-signal-context-chk-vsx +tm-vmx-unavail diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 5576ee6a51f2..958c11c14acd 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -2,7 +2,8 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu tm-signal-context-chk-vmx tm-signal-context-chk-vsx TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ - tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS) + tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail \ + $(SIGNAL_CONTEXT_CHK_TESTS) include ../../lib.mk @@ -13,6 +14,7 @@ CFLAGS += -mhtm $(OUTPUT)/tm-syscall: tm-syscall-asm.S $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include $(OUTPUT)/tm-tmspr: CFLAGS += -pthread +$(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S diff --git a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c new file mode 100644 index 000000000000..137185ba4937 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c @@ -0,0 +1,118 @@ +/* + * Copyright 2017, Michael Neuling, IBM Corp. + * Licensed under GPLv2. + * Original: Breno Leitao & + * Gustavo Bueno Romero + * Edited: Michael Neuling + * + * Force VMX unavailable during a transaction and see if it corrupts + * the checkpointed VMX register state after the abort. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "tm.h" +#include "utils.h" + +int passed; + +void *worker(void *unused) +{ + __int128 vmx0; + uint64_t texasr; + + asm goto ( + "li 3, 1;" /* Stick non-zero value in VMX0 */ + "std 3, 0(%[vmx0_ptr]);" + "lvx 0, 0, %[vmx0_ptr];" + + /* Wait here a bit so we get scheduled out 255 times */ + "lis 3, 0x3fff;" + "1: ;" + "addi 3, 3, -1;" + "cmpdi 3, 0;" + "bne 1b;" + + /* Kernel will hopefully turn VMX off now */ + + "tbegin. ;" + "beq failure;" + + /* Cause VMX unavail. Any VMX instruction */ + "vaddcuw 0,0,0;" + + "tend. ;" + "b %l[success];" + + /* Check VMX0 sanity after abort */ + "failure: ;" + "lvx 1, 0, %[vmx0_ptr];" + "vcmpequb. 2, 0, 1;" + "bc 4, 24, %l[value_mismatch];" + "b %l[value_match];" + : + : [vmx0_ptr] "r"(&vmx0) + : "r3" + : success, value_match, value_mismatch + ); + + /* HTM aborted and VMX0 is corrupted */ +value_mismatch: + texasr = __builtin_get_texasr(); + + printf("\n\n==============\n\n"); + printf("Failure with error: %lx\n", _TEXASR_FAILURE_CODE(texasr)); + printf("Summary error : %lx\n", _TEXASR_FAILURE_SUMMARY(texasr)); + printf("TFIAR exact : %lx\n\n", _TEXASR_TFIAR_EXACT(texasr)); + + passed = 0; + return NULL; + + /* HTM aborted but VMX0 is correct */ +value_match: +// printf("!"); + return NULL; + +success: +// printf("."); + return NULL; +} + +int tm_vmx_unavail_test() +{ + int threads; + pthread_t *thread; + + SKIP_IF(!have_htm()); + + passed = 1; + + threads = sysconf(_SC_NPROCESSORS_ONLN) * 4; + thread = malloc(sizeof(pthread_t)*threads); + if (!thread) + return EXIT_FAILURE; + + for (uint64_t i = 0; i < threads; i++) + pthread_create(&thread[i], NULL, &worker, NULL); + + for (uint64_t i = 0; i < threads; i++) + pthread_join(thread[i], NULL); + + free(thread); + + return passed ? EXIT_SUCCESS : EXIT_FAILURE; +} + + +int main(int argc, char **argv) +{ + return test_harness(tm_vmx_unavail_test, "tm_vmx_unavail_test"); +} From bbb075ddf7d58762040ca413ad82e9974713def3 Mon Sep 17 00:00:00 2001 From: "Gautham R. Shenoy" Date: Fri, 12 May 2017 14:52:06 +0530 Subject: [PATCH 4/7] powerpc/powernv: Set NAPSTATELOST after recovering paca on P9 DD1 Commit 17ed4c8f81da ("powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1") promises to set the NAPSTATELOST bit in paca after recovering the correct paca for the thread waking up from stop1 on DD1, so that the GPRs can be correctly restored on the stop exit path. However, it loads the value 1 into r3, but stores the value in r0 into NAPSTATELOST(r13). Fix this by correctly set the NAPSTATELOST bit in paca after recovering the paca on POWER9 DD1. Fixes: 17ed4c8f81da ("powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1") Signed-off-by: Gautham R. Shenoy Reviewed-by: Nicholas Piggin Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/idle_book3s.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S index 07d4e0ad60db..4898d676dcae 100644 --- a/arch/powerpc/kernel/idle_book3s.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -416,7 +416,7 @@ power9_dd1_recover_paca: * which needs to be restored from the stack. */ li r3, 1 - stb r0,PACA_NAPSTATELOST(r13) + stb r3,PACA_NAPSTATELOST(r13) blr /* From d04c02f8aa96d82e4cbe783f85a820aae820e746 Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Mon, 15 May 2017 23:40:05 +0530 Subject: [PATCH 5/7] powerpc/kprobes: Fix handling of instruction emulation on probe re-entry Commit 22d8b3dec214c ("powerpc/kprobes: Emulate instructions on kprobe handler re-entry") enabled emulating instructions on kprobe re-entry, rather than single-stepping always. However, we didn't update the single stepping code to only be run if the emulation fails. Also, we missed re-enabling preemption if the instruction emulation was successful. Fix those issues. Fixes: 22d8b3dec214c ("powerpc/kprobes: Emulate instructions on kprobe handler re-entry") Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/kprobes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index 160ae0fa7d0d..fc4343514bed 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -305,16 +305,17 @@ int kprobe_handler(struct pt_regs *regs) save_previous_kprobe(kcb); set_current_kprobe(p, regs, kcb); kprobes_inc_nmissed_count(p); - prepare_singlestep(p, regs); kcb->kprobe_status = KPROBE_REENTER; if (p->ainsn.boostable >= 0) { ret = try_to_emulate(p, regs); if (ret > 0) { restore_previous_kprobe(kcb); + preempt_enable_no_resched(); return 1; } } + prepare_singlestep(p, regs); return 1; } else { if (*addr != BREAKPOINT_INSTRUCTION) { From bfb9956ab4d8242f4594b5f4bee534b935384fd9 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Tue, 16 May 2017 20:42:53 +1000 Subject: [PATCH 6/7] powerpc/mm: Fix crash in page table dump with huge pages The page table dump code doesn't know about huge pages, so currently it crashes (or walks random memory, usually leading to a crash), if it finds a huge page. On Book3S we only see huge pages in the Linux page tables when we're using the P9 Radix MMU. Teaching the code to properly handle huge pages is a bit more involved, so for now just prevent the crash. Cc: stable@vger.kernel.org # v4.10+ Fixes: 8eb07b187000 ("powerpc/mm: Dump linux pagetables") Signed-off-by: Michael Ellerman --- arch/powerpc/mm/dump_linuxpagetables.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/dump_linuxpagetables.c b/arch/powerpc/mm/dump_linuxpagetables.c index d659345a98d6..44fe4833910f 100644 --- a/arch/powerpc/mm/dump_linuxpagetables.c +++ b/arch/powerpc/mm/dump_linuxpagetables.c @@ -16,6 +16,7 @@ */ #include #include +#include #include #include #include @@ -391,7 +392,7 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, unsigned long start) for (i = 0; i < PTRS_PER_PMD; i++, pmd++) { addr = start + i * PMD_SIZE; - if (!pmd_none(*pmd)) + if (!pmd_none(*pmd) && !pmd_huge(*pmd)) /* pmd exists */ walk_pte(st, pmd, addr); else @@ -407,7 +408,7 @@ static void walk_pud(struct pg_state *st, pgd_t *pgd, unsigned long start) for (i = 0; i < PTRS_PER_PUD; i++, pud++) { addr = start + i * PUD_SIZE; - if (!pud_none(*pud)) + if (!pud_none(*pud) && !pud_huge(*pud)) /* pud exists */ walk_pmd(st, pud, addr); else @@ -427,7 +428,7 @@ static void walk_pagetables(struct pg_state *st) */ for (i = 0; i < PTRS_PER_PGD; i++, pgd++) { addr = KERN_VIRT_START + i * PGDIR_SIZE; - if (!pgd_none(*pgd)) + if (!pgd_none(*pgd) && !pgd_huge(*pgd)) /* pgd exists */ walk_pud(st, pgd, addr); else From e41e53cd4fe331d0d1f06f8e4ed7e2cc63ee2c34 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Thu, 18 May 2017 20:37:31 +1000 Subject: [PATCH 7/7] powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on an address. What this means in practice is that it should only return true for addresses in the linear mapping which are backed by a valid PFN. We are failing to properly check that the address is in the linear mapping, because virt_to_pfn() will return a valid looking PFN for more or less any address. That bug is actually caused by __pa(), used in virt_to_pfn(). eg: __pa(0xc000000000010000) = 0x10000 # Good __pa(0xd000000000010000) = 0x10000 # Bad! __pa(0x0000000000010000) = 0x10000 # Bad! This started happening after commit bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give you something bogus back. Until we can verify if that GCC bug is no longer an issue, or come up with another solution, this commit does the minimal fix to make virt_addr_valid() work, by explicitly checking that the address is in the linear mapping region. Fixes: bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit") Signed-off-by: Michael Ellerman Reviewed-by: Paul Mackerras Reviewed-by: Balbir Singh Tested-by: Breno Leitao --- arch/powerpc/include/asm/page.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 2a32483c7b6c..8da5d4c1cab2 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -132,7 +132,19 @@ extern long long virt_phys_offset; #define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT) #define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr)) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) + +#ifdef CONFIG_PPC_BOOK3S_64 +/* + * On hash the vmalloc and other regions alias to the kernel region when passed + * through __pa(), which virt_to_pfn() uses. That means virt_addr_valid() can + * return true for some vmalloc addresses, which is incorrect. So explicitly + * check that the address is in the kernel region. + */ +#define virt_addr_valid(kaddr) (REGION_ID(kaddr) == KERNEL_REGION_ID && \ + pfn_valid(virt_to_pfn(kaddr))) +#else #define virt_addr_valid(kaddr) pfn_valid(virt_to_pfn(kaddr)) +#endif /* * On Book-E parts we need __va to parse the device tree and we can't