linux-hardened

Author	SHA1	Message	Date
Namhyung Kim	1e10486ffe	ftrace: Add 'function-fork' trace option The function-fork option is same as event-fork that it tracks task fork/exit and set the pid filter properly. This can be useful if user wants to trace selected tasks including their children only. Link: http://lkml.kernel.org/r/20170417024430.21194-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-04-17 17:13:00 -04:00
Namhyung Kim	d879d0b8c1	ftrace: Fix function pid filter on instances When function tracer has a pid filter, it adds a probe to sched_switch to track if current task can be ignored. The probe checks the ftrace_ignore_pid from current tr to filter tasks. But it misses to delete the probe when removing an instance so that it can cause a crash due to the invalid tr pointer (use-after-free). This is easily reproducible with the following: # cd /sys/kernel/debug/tracing # mkdir instances/buggy # echo $$ > instances/buggy/set_ftrace_pid # rmdir instances/buggy ============================================================================ BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90 Read of size 8 by task kworker/0:1/17 CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198 Call Trace: dump_stack+0x68/0x9f kasan_object_err+0x21/0x70 kasan_report.part.1+0x22b/0x500 ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90 kasan_report+0x25/0x30 __asan_load8+0x5e/0x70 ftrace_filter_pid_sched_switch_probe+0x3d/0x90 ? fpid_start+0x130/0x130 __schedule+0x571/0xce0 ... To fix it, use ftrace_clear_pids() to unregister the probe. As instance_rmdir() already updated ftrace codes, it can just free the filter safely. Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org Fixes: `0c8916c342` ("tracing: Add rmdir to remove multibuffer instances") Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-04-17 16:44:23 -04:00
Steven Rostedt (VMware)	af0009fc16	tracing: Move trace_handle_return() out of line Currently trace_handle_return() looks like this: static inline enum print_line_t trace_handle_return(struct trace_seq s) { return trace_seq_has_overflowed(s) ? TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED; } Where trace_seq_overflowed(s) is: static inline bool trace_seq_has_overflowed(struct trace_seq s) { return s->full \|\| seq_buf_has_overflowed(&s->seq); } And seq_buf_has_overflowed(&s->seq) is: static inline bool seq_buf_has_overflowed(struct seq_buf s) { return s->len > s->size; } Making trace_handle_return() into: return (s->full \|\| (s->seq->len > s->seq->size)) ? TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED; One would think this is not an issue to keep as an inline. But because this is used in the TRACE_EVENT() macro, it is extended for every tracepoint in the system. Taking a look at a single tracepoint x86_irq_vector (was the first one I randomly chosen). As trace_handle_return is used in the TRACE_EVENT() macro of trace_raw_output_##call() we disassemble trace_raw_output_x86_irq_vector and do a diff: - is the original + is the out-of-line code I removed identical lines that were different just due to different addresses. --- /tmp/irq-vec-orig 2017-03-16 09:12:48.569384851 -0400 +++ /tmp/irq-vec-ool 2017-03-16 09:13:39.378153385 -0400 @@ -6,27 +6,23 @@ 53 push %rbx 48 89 fb mov %rdi,%rbx 4c 8b a7 c0 20 00 00 mov 0x20c0(%rdi),%r12 e8 f7 72 13 00 callq ffffffff81155c80 <trace_raw_output_prep> 83 f8 01 cmp $0x1,%eax 74 05 je ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23> 5b pop %rbx 41 5c pop %r12 5d pop %rbp c3 retq 41 8b 54 24 08 mov 0x8(%r12),%edx - 48 8d bb 98 10 00 00 lea 0x1098(%rbx),%rdi + 48 81 c3 98 10 00 00 add $0x1098,%rbx - 48 c7 c6 7b 8a a0 81 mov $0xffffffff81a08a7b,%rsi + 48 c7 c6 ab 8a a0 81 mov $0xffffffff81a08aab,%rsi - e8 c5 85 13 00 callq ffffffff81156f70 <trace_seq_printf> === here's the start of the main difference === + 48 89 df mov %rbx,%rdi + e8 62 7e 13 00 callq ffffffff81156810 <trace_seq_printf> - 8b 93 b8 20 00 00 mov 0x20b8(%rbx),%edx - 31 c0 xor %eax,%eax - 85 d2 test %edx,%edx - 75 11 jne ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58> - 48 8b 83 a8 20 00 00 mov 0x20a8(%rbx),%rax - 48 39 83 a0 20 00 00 cmp %rax,0x20a0(%rbx) - 0f 93 c0 setae %al + 48 89 df mov %rbx,%rdi + e8 4a c5 12 00 callq ffffffff8114af00 <trace_handle_return> 5b pop %rbx - 0f b6 c0 movzbl %al,%eax === end === 41 5c pop %r12 5d pop %rbp c3 retq If you notice, the original has 22 bytes of text more than the out of line version. As this is for every TRACE_EVENT() defined in the system, this can become quite large. text data bss dec hex filename 8690305 5450490 1298432 15439227 eb957b vmlinux-orig 8681725 5450490 1298432 15430647 eb73f7 vmlinux-handle This change has a total of 8580 bytes in savings. $ objdump -dr /tmp/vmlinux-orig \| grep '^[0-9a-f] <trace_raw_output' \| wc -l 324 That's 324 tracepoints. But this does not include modules (which contain many more tracepoints). For an allyesconfig build: $ objdump -dr vmlinux-allyes-orig \| grep '^[0-9a-f]* <trace_raw_output' \| wc -l 1401 That's 1401 tracepoints giving us: text data bss dec hex filename 137920629 140221067 53264384 331406080 13c0db00 vmlinux-allyes-orig 137827709 140221067 53264384 331313160 13bf7008 vmlinux-allyes-handle 92920 bytes in savings!!! Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-03-24 20:51:50 -04:00
Steven Rostedt (VMware)	dbeafd0d61	ftrace: Have function tracing start in early boot up Register the function tracer right after the tracing buffers are initialized in early boot up. This will allow function tracing to begin early if it is enabled via the kernel command line. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-03-24 20:51:48 -04:00
Steven Rostedt (VMware)	9afecfbb95	tracing: Postpone tracer start-up tests till the system is more robust As tracing can now be enabled very early in boot up, even before some critical system services (like scheduling), do not run the tracer selftests until after early_initcall() is performed. If a tracer is registered before such time, it is saved off in a list and the test is run when the system is able to handle more diverse functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-03-24 20:51:46 -04:00
Steven Rostedt (VMware)	e725c731e3	tracing: Split tracing initialization into two for early initialization Create an early_trace_init() function that will initialize the buffers and allow for ealier use of trace_printk(). This will also allow for future work to have function tracing start earlier at boot up. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-03-24 13:08:43 -04:00
Naveen N. Rao	35b6f55aa9	trace/kprobes: Allow return probes with offsets and absolute addresses Since the kernel includes many non-global functions with same names, we will need to use offsets from other symbols (typically _text/_stext) or absolute addresses to place return probes on specific functions. Also, the core register_kretprobe() API never forbid use of offsets or absolute addresses with kretprobes. Allow its use with the trace infrastructure. To distinguish kernels that support this, update ftrace README to explicitly call this out. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/183e7ce2921a08c9c755ee9a5da3134febc6695b.1487770934.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-03-03 19:07:18 -03:00
Anton Blanchard	6b0b755142	perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS We have uses of CONFIG_UPROBE_EVENT and CONFIG_KPROBE_EVENT as well as CONFIG_UPROBE_EVENTS and CONFIG_KPROBE_EVENTS. Consistently use the plurals. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: davem@davemloft.net Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/20170216060050.20866-1-anton@ozlabs.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-01 10:26:39 +01:00
Linus Torvalds	79b17ea740	This release has no new tracing features, just clean ups, minor fixes and small optimizations. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJYtDiAFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L KygH/3sxuM9MCeJ29JsjmV49fHcNqryNZdvSadmnysPm+dFPiI6IgIIbh5R8H89b 2V2gfQSmOTKHu3/wvJr/MprkGP275sWlZPORYFLDl/+NE/3q7g0NKOMWunLcv6dH QQRJIFjSMeGawA3KYBEcwBYMlgNd2VgtTxqLqSBhWth5omV6UevJNHhe3xzZ4nEE YbRX2mxwOuRHOyFp0Hem+Bqro4z1VXJ6YDxOvae2PP8krrIhIHYw9EI22GK68a2g EyKqKPPaEzfU8IjHIQCqIZta5RufnCrDbfHU0CComPANBRGO7g+ZhLO11a/Z316N lyV7JqtF680iem7NKcQlwEwhlLE= =HJnl -----END PGP SIGNATURE----- Merge tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release has no new tracing features, just clean ups, minor fixes and small optimizations" * tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (25 commits) tracing: Remove outdated ring buffer comment tracing/probes: Fix a warning message to show correct maximum length tracing: Fix return value check in trace_benchmark_reg() tracing: Use modern function declaration jump_label: Reduce the size of struct static_key tracing/probe: Show subsystem name in messages tracing/hwlat: Update old comment about migration timers: Make flags output in the timer_start tracepoint useful tracing: Have traceprobe_probes_write() not access userspace unnecessarily tracing: Have COMM event filter key be treated as a string ftrace: Have set_graph_function handle multiple functions in one write ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock tracing: Reset parser->buffer to allow multiple "puts" ftrace: Have set_graph_functions handle write with RDWR ftrace: Reset fgd->hash in ftrace_graph_write() ftrace: Replace (void *)1 with a meaningful macro name FTRACE_GRAPH_EMPTY ftrace: Create a slight optimization on searching the ftrace_hash tracing: Add ftrace_hash_key() helper function ftrace: Convert graph filter to use hash tables ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip ...	2017-02-27 13:26:17 -08:00
Joel Fernandes	67d04bb2bc	tracing: Remove outdated ring buffer comment The comment about ring buffer's organization is outdated and the code sits elsewhere, remove the comment. Link: http://lkml.kernel.org/r/20170217041058.23904-1-joelaf@google.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-17 09:56:51 -05:00
Steven Rostedt (VMware)	0e684b6578	tracing: Reset parser->buffer to allow multiple "puts" trace_parser_put() simply frees the allocated parser buffer. But it does not reset the pointer that was freed. This means that if trace_parser_put() is called on the same parser more than once, it will corrupt the allocation system. Setting parser->buffer to NULL after free allows it to be called more than once without any ill effect. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:31 -05:00
Eric W. Biederman	93faccbbfa	fs: Better permission checking for submounts To support unprivileged users mounting filesystems two permission checks have to be performed: a test to see if the user allowed to create a mount in the mount namespace, and a test to see if the user is allowed to access the specified filesystem. The automount case is special in that mounting the original filesystem grants permission to mount the sub-filesystems, to any user who happens to stumble across the their mountpoint and satisfies the ordinary filesystem permission checks. Attempting to handle the automount case by using override_creds almost works. It preserves the idea that permission to mount the original filesystem is permission to mount the sub-filesystem. Unfortunately using override_creds messes up the filesystems ordinary permission checks. Solve this by being explicit that a mount is a submount by introducing vfs_submount, and using it where appropriate. vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let sget and friends know that a mount is a submount so they can take appropriate action. sget and sget_userns are modified to not perform any permission checks on submounts. follow_automount is modified to stop using override_creds as that has proven problemantic. do_mount is modified to always remove the new MS_SUBMOUNT flag so that we know userspace will never by able to specify it. autofs4 is modified to stop using current_real_cred that was put in there to handle the previous version of submount permission checking. cifs is modified to pass the mountpoint all of the way down to vfs_submount. debugfs is modified to pass the mountpoint all of the way down to trace_automount by adding a new parameter. To make this change easier a new typedef debugfs_automount_t is introduced to capture the type of the debugfs automount function. Cc: stable@vger.kernel.org Fixes: `069d5ac9ae` ("autofs: Fix automounts by using current_real_cred()->uid") Fixes: `aeaa4a79ff` ("fs: Call d_automount with the filesystems creds") Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2017-02-02 04:36:12 +13:00
Al Viro	f81dc7d7d5	splice_pipe_desc: kill ->flags no users left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-26 23:53:38 -05:00
Thomas Gleixner	a5a1d1c291	clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>	2016-12-25 11:04:12 +01:00
Linus Torvalds	179a7ba680	This release has a few updates: o STM can hook into the function tracer o Function filtering now supports more advance glob matching o Ftrace selftests updates and added tests o Softirq tag in traces now show only softirqs o ARM nop added to non traced locations at compile time o New trace_marker_raw file that allows for binary input o Optimizations to the ring buffer o Removal of kmap in trace_marker o Wakeup and irqsoff tracers now adhere to the set_graph_notrace file o Other various fixes and clean ups Note, there are two patches marked for stable. These were discovered near the end of the 4.9 rc release cycle. By the time I had them tested it was just a matter of days before 4.9 would be released, and I figured I would just submit them in the merge window. They are old bugs and not critical. Nothing non-root could abuse. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJYUrFHFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L 2+AIAIr20kSQV/nA5htGAeCTobVk3WUxY6bvjd9mIJDKPP19akNLyREW0G3KnfCr yhx4aFRZG98fRu/6F8qieRosyN36lADDVYHelMFHMpcTOpE2aZGjaaOuNGxOEA9v FmMPTX+K3+dzKyFP4l68R3+5JuQ1/AqLTioTWeLW8IDQ2OOVsjD8+0BuXrNKMJDY o6U4Hk5U/vn+zHc6BmgBzloAXemBd7iJ1t5V3FRRGvm8yv3HU85Twc5ofGeYTWvB J8PboEywRlIzxg0Kd8mxnMI5PgaKZSEc2ub8E7cY/CZ5PYpDE2xDA2hJmJgfYp00 1VW+DHRpRZfElsCcya6S6P4bs5Y= =MGZ/ -----END PGP SIGNATURE----- Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release has a few updates: - STM can hook into the function tracer - Function filtering now supports more advance glob matching - Ftrace selftests updates and added tests - Softirq tag in traces now show only softirqs - ARM nop added to non traced locations at compile time - New trace_marker_raw file that allows for binary input - Optimizations to the ring buffer - Removal of kmap in trace_marker - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file - Other various fixes and clean ups" * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits) selftests: ftrace: Shift down default message verbosity kprobes/trace: Fix kprobe selftest for newer gcc tracing/kprobes: Add a helper method to return number of probe hits tracing/rb: Init the CPU mask on allocation tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too fgraph: Handle a case where a tracer ignores set_graph_notrace tracing: Replace kmap with copy_from_user() in trace_marker writing ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it tracing: Allow benchmark to be enabled at early_initcall() tracing: Have system enable return error if one of the events fail tracing: Do not start benchmark on boot up tracing: Have the reg function allow to fail ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline ring-buffer: Froce rb_update_write_stamp() to be inlined ring-buffer: Force inline of hotpath helper functions tracing: Make __buffer_unlock_commit() always_inline tracing: Make tracepoint_printk a static_key ring-buffer: Always inline rb_event_data() ring-buffer: Make rb_reserve_next_event() always inlined ...	2016-12-15 13:49:34 -08:00
Linus Torvalds	9465d9cc31	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...	2016-12-12 19:56:15 -08:00
Pavankumar Kondeti	c59f29cb14	tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results The 's' flag is supposed to indicate that a softirq is running. This can be detected by testing the preempt_count with SOFTIRQ_OFFSET. The current code tests the preempt_count with SOFTIRQ_MASK, which would be true even when softirqs are disabled but not serving a softirq. Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-12-12 13:51:02 -05:00
Steven Rostedt (Red Hat)	656c7f0d2d	tracing: Replace kmap with copy_from_user() in trace_marker writing Instead of using get_user_pages_fast() and kmap_atomic() when writing to the trace_marker file, just allocate enough space on the ring buffer directly, and write into it via copy_from_user(). Writing into the trace_marker file use to allocate a temporary buffer to perform the copy_from_user(), as we didn't want to write into the ring buffer if the copy failed. But as a trace_marker write is suppose to be extremely fast, and allocating memory causes other tracepoints to trigger, Peter Zijlstra suggested using get_user_pages_fast() and kmap_atomic() to keep the user space pages in memory and reading it directly. But Henrik Austad had issues with this because it required taking the mm->mmap_sem and causing long delays with the write. Instead, just allocate the space in the ring buffer and use copy_from_user() directly. If it faults, return -EFAULT and write "<faulted>" into the ring buffer. Link: http://lkml.kernel.org/r/20161208124018.72dd0f86@gandalf.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Henrik Austad <henrik@austad.us> Cc: Peter Zijlstra <peterz@infradead.org> Updates: `d696b58ca2` "tracing: Do not allocate buffer for trace_marker" Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-12-09 09:18:14 -05:00
Sebastian Andrzej Siewior	b32614c034	tracing/rb: Convert to hotplug state machine Install the callbacks via the state machine. The notifier in struct ring_buffer is replaced by the multi instance interface. Upon __ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke the trace_rb_cpu_prepare() on each CPU. This callback may now fail. This means __ring_buffer_alloc() will fail and cleanup (like previously) and during a CPU up event this failure will not allow the CPU to come up. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-02 00:52:34 +01:00
Joel Fernandes	80ec355210	trace: Add an option for boot clock as trace clock Unlike monotonic clock, boot clock as a trace clock will account for time spent in suspend useful for tracing suspend/resume. This uses earlier introduced infrastructure for using the fast boot clock. Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1480372524-15181-7-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 18:02:59 +01:00
Steven Rostedt (Red Hat)	52ffabe384	tracing: Make __buffer_unlock_commit() always_inline The function __buffer_unlock_commit() is called in a few places outside of trace.c. But for the most part, it should really be inlined, as it is in the hot path of the trace_events. For the callers outside of trace.c, create a new function trace_buffer_unlock_commit_nostack(), as the reason it was used was to avoid the stack tracing that trace_buffer_unlock_commit() could do. Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-11-23 20:30:51 -05:00
Steven Rostedt (Red Hat)	4239174570	tracing: Make tracepoint_printk a static_key Currently, when tracepoint_printk is set (enabled by the "tp_printk" kernel command line), it causes trace events to print via printk(). This is a very dangerous operation, but is useful for debugging. The issue is, it's seldom used, but it is always checked even if it's not enabled by the kernel command line. Instead of having this feature called by a branch against a variable, turn that variable into a static key, and this will remove the test and jump. To simplify things, the functions output_printk() and trace_event_buffer_commit() were moved from trace_events.c to trace.c. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-11-23 15:52:45 -05:00
Steven Rostedt (Red Hat)	3e9a8aadca	tracing: Create a always_inlined __trace_buffer_lock_reserve() As Andi Kleen pointed out in the Link below, the trace events has quite a bit of code execution. A lot of that happens to be calling functions, where some of them should simply be inlined. One of these functions happens to be trace_buffer_lock_reserve() which is also a global, but it is used throughout the file it is defined in. Create a __trace_buffer_lock_reserve() that is always inlined that the file can benefit from. Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-11-23 11:29:58 -05:00
Chunyan Zhang	478409dd68	tracing: Add hook to function tracing for other subsystems to use Currently Function traces can be only exported to the ring buffer. This adds a trace_export concept which can process traces and export them to a registered destination as an addition to the current one that outputs to Ftrace - i.e. ring buffer. In this way, if we want function traces to be sent to other destinations rather than only to the ring buffer, we just need to register a new trace_export and implement its own .write() function for writing traces to storage. With this patch, only function tracing (trace type is TRACE_FN) is supported. Link: http://lkml.kernel.org/r/1479715043-6534-2-git-send-email-zhang.chunyan@linaro.org Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-11-22 17:40:00 -05:00
Steven Rostedt	fa32e8557b	tracing: Add new trace_marker_raw A new file is created: /sys/kernel/debug/tracing/trace_marker_raw This allows for appications to create data structures and write the binary data directly into it, and then read the trace data out from trace_pipe_raw into the same type of data structure. This saves on converting numbers into ASCII that would be required by trace_marker. Suggested-by: Olof Johansson <olof@lixom.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-11-15 15:13:59 -05:00
Masami Hiramatsu	60f1d5e3ba	ftrace: Support full glob matching Use glob_match() to support flexible glob wildcards (,?) and character classes ([) for ftrace. Since the full glob matching is slower than the current partial matching routines(pat, pat, pat), this leaves those routines and just add MATCH_GLOB for complex glob expression. e.g. ---- [root@localhost tracing]# echo 'schedgroup' > set_ftrace_filter [root@localhost tracing]# cat set_ftrace_filter sched_free_group sched_change_group sched_create_group sched_online_group sched_destroy_group sched_offline_group [root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter [root@localhost tracing]# head set_ftrace_filter sys_arch_prctl sys_rt_sigreturn sys_ioperm SyS_iopl sys_modify_ldt SyS_mmap SyS_set_thread_area SyS_get_thread_area SyS_set_tid_address sys_fork ---- Link: http://lkml.kernel.org/r/147566869501.29136.6462645009894738056.stgit@devbox Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-11-14 16:42:58 -05:00
Linus Torvalds	95107b30be	This release cycle is rather small. Just a few fixes to tracing. The big change is the addition of the hwlat tracer. It not only detects SMIs, but also other latency that's caused by the hardware. I have detected some latency from large boxes having bus contention. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJX9a77AAoJEKKk/i67LK/8UPEH/jcqMmOMhQYVQsNaJViA5uJM SV96gaLCc9cxXY04Hf7vx8RkVIyIqTCCQZ+RVZt4RSeqpsB2IzZ1u0CNKs2Z0MTv MdvQJoazRoDgVuPzKAsdAlDd0ykqHEFA5ayF3XDK4P2J97La+B4rQIqEiJX/aDrz i0NQQFg2ZF46mXJXn4oXe6nmr6WnbiEduawVjd7JvgILJO2hojDicOTQlNG41Nys 68fOV8mLk0OL7sFRjySLGcbdbKhP2YbNhxILXl8geLgS9+CFZXkE8oTRjjy9IMNA XrqbFLMWaRVv+Nig7bHIWKE8ZErC5WCYUw4LD2GTLMDx5AkAVLGFFp6TOiO4SG8= =ke23 -----END PGP SIGNATURE----- Merge tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release cycle is rather small. Just a few fixes to tracing. The big change is the addition of the hwlat tracer. It not only detects SMIs, but also other latency that's caused by the hardware. I have detected some latency from large boxes having bus contention" * tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Call traceoff trigger after event is recorded ftrace/scripts: Add helper script to bisect function tracing problem functions tracing: Have max_latency be defined for HWLAT_TRACER as well tracing: Add NMI tracing in hwlat detector tracing: Have hwlat trace migrate across tracing_cpumask CPUs tracing: Add documentation for hwlat_detector tracer tracing: Added hardware latency tracer ftrace: Access ret_stack->subtime only in the function profiler function_graph: Handle TRACE_BPUTS in print_graph_comment tracing/uprobe: Drop isdigit() check in create_trace_uprobe	2016-10-06 11:48:41 -07:00
Linus Torvalds	12b7bcb43e	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - uprobes enhancements (Masami Hiramatsu) - Uncore group events enhancements (David Carrillo-Cisneros) - x86 Intel: Add support for Skylake server uncore PMUs (Kan Liang) - x86 Intel: LBR cleanups and enhancements, for better branch annotation tracking (Peter Zijlstra) - x86 Intel: Add support for PTWRITE and power event tracing (Alexander Shishkin) - ... various fixes, cleanups and smaller enhancements. Lots of tooling changes - a couple of highlights: - Support event group view with hierarchy mode in 'perf top' and 'perf report' (Namhyung Kim) e.g.: $ perf record -e '{cycles,instructions}' make $ perf report --hierarchy --stdio ... # Overhead Command / Shared Object / Symbol # ...................... .................................. ... 25.74% 27.18%sh 19.96% 24.14%libc-2.24.so 9.55% 14.64%[.] __strcmp_sse2 1.54% 0.00%[.] __tfind 1.07% 1.13%[.] _int_malloc 0.95% 0.00%[.] __strchr_sse2 0.89% 1.39%[.] __tsearch 0.76% 0.00%[.] strlen - Add branch stack / basic block info to 'perf annotate --stdio', where for each branch, we add an asm comment after the instruction with information on how often it was taken and predicted. See example with color output at: http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png (Peter Zijlstra) - Add support for using symbols in address filters with Intel PT and ARM CoreSight (hardware assisted tracing facilities) (Adrian Hunter, Mathieu Poirier) - Add support for interacting with Coresight PMU ETMs/PTMs, that are IP blocks to perform hardware assisted tracing on a ARM CPU core (Mathieu Poirier) - Support generating cross arch probes, i.e. if you specify a vmlinux file for different arch than the one in the host machine, $ perf probe --definition function_name args will generate the probe definition string needed to append to the target machine /sys/kernel/debug/tracing/kprobes_events file, using scripting (Masami Hiramatsu). - Allow configuring the default 'perf report -s' sort order in ~/.perfconfig, for instance, "sym,dso" may be more fitting for kernel developers. (Arnaldo Carvalho de Melo) - ... plus lots of other changes, refactorings, features and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (149 commits) perf tests: Add dwarf unwind test for powerpc perf probe: Match linkage name with mangled name perf probe: Fix to cut off incompatible chars from group name perf probe: Skip if the function address is 0 perf probe: Ignore the error of finding inline instance perf intel-pt: Fix decoding when there are address filters perf intel-pt: Enable decoder to handle TIP.PGD with missing IP perf intel-pt: Read address filter from AUXTRACE_INFO event perf intel-pt: Record address filter in AUXTRACE_INFO event perf intel-pt: Add a helper function for processing AUXTRACE_INFO perf intel-pt: Fix missing error codes processing auxtrace_info perf intel-pt: Add support for recording the max non-turbo ratio perf intel-pt: Fix snapshot overlap detection decoder errors perf probe: Increase debug level of SDT debug messages perf record: Add support for using symbols in address filters perf symbols: Add dso__last_symbol() perf record: Fix error paths perf record: Rename label 'out_symbol_exit' perf script: Fix vanished idle symbols perf evsel: Add support for address filters ...	2016-10-03 12:47:28 -07:00
Linus Torvalds	4c04b4b534	Al Viro has been looking at the tracefs code, and has pointed out some issues. This contains one fix by me and one by Al. I'm sure that he'll come up with more but for now I tested these patches and they don't appear to have any negative impact on tracing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJX6FvrAAoJEKKk/i67LK/8EuIH/Arf6vJidYsmbe57WQp8PU3I bldem6ePj6zgZ2ZqPlSGCs1J2DcK4Bh3lPVxdx7rRKVWSd/Zoj+i83hvObusR8M7 Qs1G92bJTvvVO3aPfiN0GvKGdKfGn45L+j0BcBauiTRKqnj3PkhOhIP2/ks0ewSk qeq7R3xxo/FDs26AHS69Hm0PIIw7btyhXNX4GB3Il7IIA5/nUknw3C+bjVj86tYX R4iElcHEhplgoSjKuLgNIRZGUnEFtsm/fnohYXpHacLTUKNXnTDY230x/OKc1yyB 1vOfHS/y5s3XSJ1lcgSjYeNc51lK8NiDASaptZSUnOookKSAooUTFELNzpbc0sg= =+Fr3 -----END PGP SIGNATURE----- Merge tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracefs fixes from Steven Rostedt: "Al Viro has been looking at the tracefs code, and has pointed out some issues. This contains one fix by me and one by Al. I'm sure that he'll come up with more but for now I tested these patches and they don't appear to have any negative impact on tracing" * tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: fix memory leaks in tracing_buffers_splice_read() tracing: Move mutex to protect against resetting of seq data	2016-09-25 18:40:13 -07:00
Al Viro	1ae2293dd6	fix memory leaks in tracing_buffers_splice_read() Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-09-25 13:30:13 -04:00
Steven Rostedt (Red Hat)	1245800c0f	tracing: Move mutex to protect against resetting of seq data The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: `d7350c3f45` ("tracing/core: make the read callbacks reentrants") Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-09-25 10:27:08 -04:00
Steven Rostedt (Red Hat)	f971cc9aab	tracing: Have max_latency be defined for HWLAT_TRACER as well The hwlat tracer uses tr->max_latency, and if it's the only tracer enabled that uses it, the build will fail. Add max_latency and its file when the hwlat tracer is enabled. Link: http://lkml.kernel.org/r/d6c3b7eb-ba95-1ffa-0453-464e1e24262a@infradead.org Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-09-12 09:59:46 -04:00
Steven Rostedt (Red Hat)	e7c15cd8a1	tracing: Added hardware latency tracer The hardware latency tracer has been in the PREEMPT_RT patch for some time. It is used to detect possible SMIs or any other hardware interruptions that the kernel is unaware of. Note, NMIs may also be detected, but that may be good to note as well. The logic is pretty simple. It simply creates a thread that spins on a single CPU for a specified amount of time (width) within a periodic window (window). These numbers may be adjusted by their cooresponding names in /sys/kernel/tracing/hwlat_detector/ The defaults are window = 1000000 us (1 second) width = 500000 us (1/2 second) The loop consists of: t1 = trace_clock_local(); t2 = trace_clock_local(); Where trace_clock_local() is a variant of sched_clock(). The difference of t2 - t1 is recorded as the "inner" timestamp and also the timestamp t1 - prev_t2 is recorded as the "outer" timestamp. If either of these differences are greater than the time denoted in /sys/kernel/tracing/tracing_thresh then it records the event. When this tracer is started, and tracing_thresh is zero, it changes to the default threshold of 10 us. The hwlat tracer in the PREEMPT_RT patch was originally written by Jon Masters. I have modified it quite a bit and turned it into a tracer. Based-on-code-by: Jon Masters <jcm@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-09-02 12:47:51 -04:00
Masami Hiramatsu	8642562555	ftrace: probe: Add README entries for k/uprobe-events Add README entries for kprobe-events and uprobe-events. This allows user to check what options can be acceptable for running kernel. E.g. perf tools can choose correct types for the kernel. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Naohiro Aota <naohiro.aota@hgst.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/147151069524.12957.12957179170304055028.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-08-23 15:39:57 -03:00
Wei Yongjun	67f20b0845	tracing: Using for_each_set_bit() to simplify trace_pid_write() Using for_each_set_bit() to simplify the code. Link: http://lkml.kernel.org/r/1467645004-11169-1-git-send-email-weiyj_lk@163.com Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-07-05 11:22:40 -04:00
Steven Rostedt (Red Hat)	501c237525	ftrace: Move toplevel init out of ftrace_init_tracefs() Commit `345ddcc882` ("ftrace: Have set_ftrace_pid use the bitmap like events do") placed ftrace_init_tracefs into the instance creation, and encapsulated the top level updating with an if conditional, as the top level only gets updated at boot up. Unfortunately, this triggers section mismatch errors as the init functions are called from a function that can be called later, and the section mismatch logic is unaware of the if conditional that would prevent it from happening at run time. To make everyone happy, create a separate ftrace_init_tracefs_toplevel() routine that only gets called by init functions, and this will be what calls other init functions for the toplevel directory. Link: http://lkml.kernel.org/r/20160704102139.19cbc0d9@gandalf.local.home Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: `345ddcc882` ("ftrace: Have set_ftrace_pid use the bitmap like events do") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-07-05 10:47:03 -04:00
Steven Rostedt (Red Hat)	be54f69c26	tracing: Skip more functions when doing stack tracing of events # echo 1 > options/stacktrace # echo 1 > events/sched/sched_switch/enable # cat trace <idle>-0 [002] d..2 1982.525169: <stack trace> => save_stack_trace => __ftrace_trace_stack => trace_buffer_unlock_commit_regs => event_trigger_unlock_commit => trace_event_buffer_commit => trace_event_raw_event_sched_switch => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary The above shows that we are seeing 6 functions before ever making it to the caller of the sched_switch event. # echo stacktrace > events/sched/sched_switch/trigger # cat trace <idle>-0 [002] d..3 2146.335208: <stack trace> => trace_event_buffer_commit => trace_event_raw_event_sched_switch => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary The stacktrace trigger isn't as bad, because it adds its own skip to the stacktracing, but still has two events extra. One issue is that if the stacktrace passes its own "regs" then there should be no addition to the skip, as the regs will not include the functions being called. This was an issue that was fixed by commit `7717c6be69` ("tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding the skip number for kprobes made the probes not have any stack at all. But since this is only an issue when regs is being used, a skip should be added if regs is NULL. Now we have: # echo 1 > options/stacktrace # echo 1 > events/sched/sched_switch/enable # cat trace <idle>-0 [000] d..2 1297.676333: <stack trace> => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => rest_init => start_kernel => x86_64_start_reservations => x86_64_start_kernel # echo stacktrace > events/sched/sched_switch/trigger # cat trace <idle>-0 [002] d..3 1370.759745: <stack trace> => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary And kprobes are not touched. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-06-23 18:48:56 -04:00
Andy Lutomirski	e2ace00117	tracing: Choose static tp_printk buffer by explicit nesting count Currently, the trace_printk code chooses which static buffer to use based on what type of atomic context (NMI, IRQ, etc) it's in. Simplify the code and make it more robust: simply count the nesting depth and choose a buffer based on the current nesting depth. The new code will only drop an event if we nest more than 4 deep, and the old code was guaranteed to malfunction if that happened. Link: http://lkml.kernel.org/r/07ab03aecfba25fcce8f9a211b14c9c5e2865c58.1464289095.git.luto@kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-06-20 09:54:20 -04:00
Steven Rostedt (Red Hat)	345ddcc882	ftrace: Have set_ftrace_pid use the bitmap like events do Convert set_ftrace_pid to use the bitmap like set_event_pid does. This allows for instances to use the pid filtering as well, and will allow for function-fork option to set if the children of a traced function should be traced or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-06-20 09:54:19 -04:00
Steven Rostedt (Red Hat)	76c813e266	tracing: Move pid_list write processing into its own function The addition of PIDs into a pid_list via the write operation of set_event_pid is a bit complex. The same operation will be needed for function tracing pids. Move the code into its own generic function in trace.c, so that we can avoid duplication of this code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-06-20 09:54:18 -04:00
Steven Rostedt (Red Hat)	5cc8976bd5	tracing: Move the pid_list seq_file functions to be global To allow other aspects of ftrace to use the pid_list logic, we need to reuse the seq_file functions. Making the generic part into functions that can be called by other files will help in this regard. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-06-20 09:54:17 -04:00
Steven Rostedt	d8275c454d	tracing: Move filtered_pid helper functions into trace.c As the filtered_pid functions are going to be used by function tracer as well as trace_events, move the code into the generic trace.c file. The functions moved are: trace_find_filtered_pid() trace_ignore_this_task() trace_filter_add_remove_task() Kernel Doc text was also added. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-06-20 09:54:17 -04:00
Steven Rostedt (Red Hat)	0fc1b09ff1	tracing: Use temp buffer when filtering events Filtering of events requires the data to be written to the ring buffer before it can be decided to filter or not. This is because the parameters of the filter are based on the result that is written to the ring buffer and not on the parameters that are passed into the trace functions. The ftrace ring buffer is optimized for writing into the ring buffer and committing. The discard procedure used when filtering decides the event should be discarded is much more heavy weight. Thus, using a temporary filter when filtering events can speed things up drastically. Without a temp buffer we have: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.790706626 seconds time elapsed ( +- 0.71% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.566904059 seconds time elapsed ( +- 0.27% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.690598511 seconds time elapsed ( +- 0.19% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.707486364 seconds time elapsed ( +- 0.30% ) The first run above is without any tracing, just to get a based figure. hackbench takes ~0.79 seconds to run on the system. The second run enables tracing all events where nothing is filtered. This increases the time by 100% and hackbench takes 1.57 seconds to run. The third run filters all events where the preempt count will equal "20" (this should never happen) thus all events are discarded. This takes 1.69 seconds to run. This is 10% slower than just committing the events! The last run enables all events and filters where the filter will commit all events, and this takes 1.70 seconds to run. The filtering overhead is approximately 10%. Thus, the discard and commit of an event from the ring buffer may be about the same time. With this patch, the numbers change: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.778233033 seconds time elapsed ( +- 0.38% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.582102692 seconds time elapsed ( +- 0.28% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.309230710 seconds time elapsed ( +- 0.22% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.786001924 seconds time elapsed ( +- 0.20% ) The first run is again the base with no tracing. The second run is all tracing with no filtering. It is a little slower, but that may be well within the noise. The third run shows that discarding all events only took 1.3 seconds. This is a speed up of 23%! The discard is much faster than even the commit. The one downside is shown in the last run. Events that are not discarded by the filter will take longer to add, this is due to the extra copy of the event. Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-05-03 17:59:24 -04:00
Steven Rostedt (Red Hat)	904d1857ad	tracing: Remove unused function trace_current_buffer_lock_reserve() trace_current_buffer_lock_reserve() has no more users. Remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-29 18:11:54 -04:00
Steven Rostedt (Red Hat)	33fddff24d	tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL There's no real difference between trace_buffer_unlock_commit() and trace_buffer_unlock_commit_regs() except that the former passes NULL to ftrace_stack_trace() instead of regs. Have the former be a static inline of the latter which passes NULL for regs. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-29 17:44:01 -04:00
Steven Rostedt (Red Hat)	a9fe48dcde	tracing: Remove unused function trace_current_buffer_discard_commit() The function trace_current_buffer_discard_commit() has no callers, remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-29 16:14:13 -04:00
Steven Rostedt (Red Hat)	fa66ddb870	tracing: Move trace_buffer_unlock_commit{_regs}() to local header The functions trace_buffer_unlock_commit() and the _regs() version are only used within the kernel/trace directory. Move them to the local header and remove the export as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-29 16:14:12 -04:00
Steven Rostedt (Red Hat)	9cbb1506ab	tracing: Fold filter_check_discard() into its only user The function filter_check_discard() is small and only called by one user, its code can be folded into that one caller and make the code a bit less comlplex. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-29 16:14:08 -04:00
Steven Rostedt (Red Hat)	65da9a0a3b	tracing: Make filter_check_discard() local Nothing outside of the tracing directory calls filter_check_discard() or check_filter_check_discard(). They should not be called by modules. Move their prototypes into the local tracing header and remove their EXPORT_SYMBOL() macros. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-27 10:13:46 -04:00
Wang Xiaoqiang	4afe6495e5	tracing: Don't use the address of the buffer array name in copy_from_user With the following code snippet: ... char buf[64]; ... if (copy_from_user(&buf, ubuf, cnt)) ... Even though the value of "&buf" equals "buf", but there is no need to get the address of the "buf" again. Use "buf" instead of "&buf". Link: http://lkml.kernel.org/r/20160418152329.18b72bea@debian Signed-off-by: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-26 14:42:03 -04:00
Steven Rostedt (Red Hat)	205506228b	tracing: Do not inherit event-fork option for instances As the event-fork option requires doing work when enabled and disabled, it can not be passed down to created instances. The instance must clear this flag when it is created, and must clear it when its removed. As more options may be created with this need, a macro ZEROED_TRACE_FLAGS is created that holds the flags that must not be inherited by the top level instance, and must be cleared on removal of instances. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-25 22:40:12 -04:00
Namhyung Kim	4b94f5b7b4	tracing: Add hist trigger 'log2' modifier Allow users to have numeric fields displayed as log2 values in case value range is very wide by appending '.log2' to field names. For example, # echo 'hist:key=bytes_req' > kmalloc/trigger # cat kmalloc/hist { bytes_req: 504 } hitcount: 1 { bytes_req: 11 } hitcount: 1 { bytes_req: 104 } hitcount: 1 { bytes_req: 48 } hitcount: 1 { bytes_req: 2048 } hitcount: 1 { bytes_req: 4096 } hitcount: 1 { bytes_req: 240 } hitcount: 1 { bytes_req: 392 } hitcount: 1 { bytes_req: 13 } hitcount: 1 { bytes_req: 28 } hitcount: 1 { bytes_req: 12 } hitcount: 1 { bytes_req: 64 } hitcount: 2 { bytes_req: 128 } hitcount: 2 { bytes_req: 32 } hitcount: 2 { bytes_req: 8 } hitcount: 11 { bytes_req: 10 } hitcount: 13 { bytes_req: 24 } hitcount: 25 { bytes_req: 160 } hitcount: 29 { bytes_req: 16 } hitcount: 33 { bytes_req: 80 } hitcount: 36 When using '.log2' modifier, the output looks like: # echo 'hist:key=bytes_req.log2' > kmalloc/trigger # cat kmalloc/hist { bytes_req: ~ 2^12 } hitcount: 1 { bytes_req: ~ 2^11 } hitcount: 1 { bytes_req: ~ 2^9 } hitcount: 2 { bytes_req: ~ 2^6 } hitcount: 3 { bytes_req: ~ 2^3 } hitcount: 13 { bytes_req: ~ 2^5 } hitcount: 19 { bytes_req: ~ 2^8 } hitcount: 49 { bytes_req: ~ 2^7 } hitcount: 57 { bytes_req: ~ 2^4 } hitcount: 74 Link: http://lkml.kernel.org/r/7ff396b246c6a881f46b979735fddf05a0d6c71a.1457029949.git.tom.zanussi@linux.intel.com Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 18:56:03 -04:00
Tom Zanussi	5463bfda32	tracing: Add support for named hist triggers Allow users to define 'named' hist triggers. All triggers created with the same 'name=xxx' option will update the same shared histogram data. This expands the hist trigger syntax from this: # echo hist:keys=xxx ... [ if filter] > event/trigger to this: # echo hist:name=xxx:keys=xxx ... [ if filter] > event/trigger Named histograms must use a 'compatible' set of keys and values, which means each event added to a set of named triggers must have the same names and types. Reading the 'hist' file of any of the participating events will produce the same output as any other participating event, which is to be expected since they share the same data. Link: http://lkml.kernel.org/r/1dbc84ee3322a75daaf5b3ef1d0cc0a2fb682fc7.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 18:56:01 -04:00
Tom Zanussi	52a7f16ded	tracing: Add support for multiple hist triggers per event Allow users to define any number of hist triggers per trace event. Any number of hist triggers may be added for a given event, which may differ by key, value, or filter. Reading the event's 'hist' file will display the output of all the hist triggers defined on an event concatenated in the order they were defined. Link: http://lkml.kernel.org/r/48a0c8dd34c344571de880fb35e211c6d9a28961.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 18:55:59 -04:00
Tom Zanussi	d0bad49bb0	tracing: Add enable_hist/disable_hist triggers Similar to enable_event/disable_event triggers, these triggers enable and disable the aggregation of events into maps rather than enabling and disabling their writing into the trace buffer. They can be used to automatically start and stop hist triggers based on a matching filter condition. If there's a paused hist trigger on system:event, the following would start it when the filter condition was hit: # echo enable_hist:system:event [ if filter] > event/trigger And the following would disable a running system:event hist trigger: # echo disable_hist:system:event [ if filter] > event/trigger See Documentation/trace/events.txt for real examples. Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 18:55:57 -04:00
Tom Zanussi	69a0200c2e	tracing: Add hist trigger support for stacktraces as keys It's often useful to be able to use a stacktrace as a hash key, for keeping a count of the number of times a particular call path resulted in a trace event, for instance. Add a special key named 'stacktrace' which can be used as key in a 'keys=' param for this purpose: # echo hist:keys=stacktrace ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/87515e90b3785232a874a12156174635a348edb1.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:19:01 -04:00
Tom Zanussi	316961988b	tracing: Add hist trigger 'syscall' modifier Allow users to have syscall id fields displayed as syscall names in the output by appending '.syscall' to field names: # echo hist:keys=aaa.syscall ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/2bab1e59933d76a14b545bd2e02f80b8b08ac4d3.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:18:04 -04:00
Tom Zanussi	6b4827ad02	tracing: Add hist trigger 'execname' modifier Allow users to have common_pid field values displayed as program names in the output by appending '.execname' to a common_pid field name: # echo hist:keys=common_pid.execname ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/e172e81f10f5b8d1f08450e3763c850f39fbf698.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:17:56 -04:00
Tom Zanussi	c6afad49d1	tracing: Add hist trigger 'sym' and 'sym-offset' modifiers Allow users to have address fields displayed as symbols in the output by appending '.sym' or 'sym-offset' to field names: # echo hist:keys=aaa.sym,bbb.sym-offset ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/87d4935821491c0275513f0fbfb9bab8d3d3f079.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:17:51 -04:00
Tom Zanussi	0c4a6b4666	tracing: Add hist trigger 'hex' modifier for displaying numeric fields Allow users to have numeric fields displayed as hex values in the output by appending '.hex' to field names: # echo hist:keys=aaa,bbb.hex:vals=ccc.hex ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/67bd431edda2af5798d7694818f7e8d71b6b3463.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:17:43 -04:00
Tom Zanussi	e86ae9baac	tracing: Add hist trigger support for clearing a trace Allow users to append 'clear' to an existing trigger in order to have the hash table cleared. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont \ [ if filter] >> event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont/clear \ [ if filter] >> event/trigger Link: http://lkml.kernel.org/r/ae15dd0d9b2f7af07a37c1ff682063e2dbcdf160.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:17:35 -04:00
Tom Zanussi	83e99914c9	tracing: Add hist trigger support for pausing and continuing a trace Allow users to append 'pause' or 'continue' to an existing trigger in order to have it paused or to have a paused trace continue. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ [ if filter] >> event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause or cont \ [ if filter] >> event/trigger Link: http://lkml.kernel.org/r/b672a92c14702cb924cdf6fc27ea1809bed04907.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:17:29 -04:00
Tom Zanussi	e62347d245	tracing: Add hist trigger support for user-defined sorting ('sort=' param) Allow users to specify keys and/or values to sort on. With this addition, keys and values specified using the 'keys=' and 'vals=' keywords can be used to sort the hist trigger output via a new 'sort=' keyword. If multiple sort keys are specified, the output will be sorted using the second key as a secondary sort key, etc. The default sort order is ascending; if the user wants a different sort order, '.descending' can be appended to the specific sort key. Before this addition, output was always sorted by 'hitcount' in ascending order. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy \ [ if filter] > event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/b30a41db66ba486979c4f987aff5fab500ea53b3.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:17:19 -04:00
Tom Zanussi	76a3b0c8ac	tracing: Add hist trigger support for compound keys Allow users to specify multiple trace event fields to use in keys by allowing multiple fields in the 'keys=' keyword. With this addition, any unique combination of any of the fields named in the 'keys' keyword will result in a new entry being added to the hash table. Link: http://lkml.kernel.org/r/0cfa24e6ac3b0dcece7737d94aa1f322ae3afc4b.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:16:33 -04:00
Tom Zanussi	f2606835d7	tracing: Add hist trigger support for multiple values ('vals=' param) Allow users to specify trace event fields to use in aggregated sums via a new 'vals=' keyword. Before this addition, the only aggregated sum supported was the implied value 'hitcount'. With this addition, 'hitcount' is also supported as an explicit value field, as is any numeric trace event field. This expands the hist trigger syntax from this: # echo hist:keys=xxx [ if filter] > event/trigger to this: # echo hist:keys=xxx:vals=yyy [ if filter] > event/trigger Link: http://lkml.kernel.org/r/2a5d1adb5ba6c65d7bb2148e379f2fed47f29a68.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:16:23 -04:00
Tom Zanussi	7ef224d1d0	tracing: Add 'hist' event trigger command 'hist' triggers allow users to continually aggregate trace events, which can then be viewed afterwards by simply reading a 'hist' file containing the aggregation in a human-readable format. The basic idea is very simple and boils down to a mechanism whereby trace events, rather than being exhaustively dumped in raw form and viewed directly, are automatically 'compressed' into meaningful tables completely defined by the user. This is done strictly via single-line command-line commands and without the aid of any kind of programming language or interpreter. A surprising number of typical use cases can be accomplished by users via this simple mechanism. In fact, a large number of the tasks that users typically do using the more complicated script-based tracing tools, at least during the initial stages of an investigation, can be accomplished by simply specifying a set of keys and values to be used in the creation of a hash table. The Linux kernel trace event subsystem happens to provide an extensive list of keys and values ready-made for such a purpose in the form of the event format files associated with each trace event. By simply consulting the format file for field names of interest and by plugging them into the hist trigger command, users can create an endless number of useful aggregations to help with investigating various properties of the system. See Documentation/trace/events.txt for examples. hist triggers are implemented on top of the existing event trigger infrastructure, and as such are consistent with the existing triggers from a user's perspective as well. The basic syntax follows the existing trigger syntax. Users start an aggregation by writing a 'hist' trigger to the event of interest's trigger file: # echo hist:keys=xxx [ if filter] > event/trigger Once a hist trigger has been set up, by default it continually aggregates every matching event into a hash table using the event key and a value field named 'hitcount'. To view the aggregation at any point in time, simply read the 'hist' file in the same directory as the 'trigger' file: # cat event/hist The detailed syntax provides additional options for user control, and is described exhaustively in Documentation/trace/events.txt and in the virtual tracing/README file in the tracing subsystem. Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 12:16:14 -04:00
Steven Rostedt	c37775d578	tracing: Add infrastructure to allow set_event_pid to follow children Add the infrastructure needed to have the PIDs in set_event_pid to automatically add PIDs of the children of the tasks that have their PIDs in set_event_pid. This will also remove PIDs from set_event_pid when a task exits This is implemented by adding hooks into the fork and exit tracepoints. On fork, the PIDs are added to the list, and on exit, they are removed. Add a new option called event_fork that when set, PIDs in set_event_pid will automatically get their children PIDs added when they fork, as well as any task that exits will have its PID removed from set_event_pid. This works for instances as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-04-19 10:28:28 -04:00
Linus Torvalds	e46b4e2b46	Nothing major this round. Mostly small clean ups and fixes. Some visible changes: A new flag was added to distinguish traces done in NMI context. Preempt tracer now shows functions where preemption is disabled but interrupts are still enabled. Other notes: Updates were done to function tracing to allow better performance with perf. Infrastructure code has been added to allow for a new histogram feature for recording live trace event histograms that can be configured by simple user commands. The feature itself was just finished, but needs a round in linux-next before being pulled. This only includes some infrastructure changes that will be needed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW8/WPAAoJEKKk/i67LK/8wrAH/j2gU9ZfjVxTu8068TBGWRJP yvvzq0cK5evB3dsVuUmKKRfU52nSv4J1WcFF569X0RulSLylR0dHlcxFJMn4kkgR bm0AHRrqOf87ub3VimcpG146iVQij37l5A0SRoFbvSPLQx1KUW18v99x41Ji8dv6 oWXRc6/YhdzEE7l0nUsVjmScQ4b2emsems3cxZzXOY+nRJsiim6i+VaDeatdyey1 csLVqtRCs+x62TVtxG3+GhcLdRoPRbnHAGzrKDFIn1SrQaRXCc54wN5d2hWxjgNI 1laOwaj070lnJiWfBLIP/K+lx+VKRx5/O0rKZX35foLUTqJJKSyjAbKXuMCcSAM= =2h2K -----END PGP SIGNATURE----- Merge tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Nothing major this round. Mostly small clean ups and fixes. Some visible changes: - A new flag was added to distinguish traces done in NMI context. - Preempt tracer now shows functions where preemption is disabled but interrupts are still enabled. Other notes: - Updates were done to function tracing to allow better performance with perf. - Infrastructure code has been added to allow for a new histogram feature for recording live trace event histograms that can be configured by simple user commands. The feature itself was just finished, but needs a round in linux-next before being pulled. This only includes some infrastructure changes that will be needed" * tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits) tracing: Record and show NMI state tracing: Fix trace_printk() to print when not using bprintk() tracing: Remove redundant reset per-CPU buff in irqsoff tracer x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c tracing: Fix crash from reading trace_pipe with sendfile tracing: Have preempt(irqs)off trace preempt disabled functions tracing: Fix return while holding a lock in register_tracer() ftrace: Use kasprintf() in ftrace_profile_tracefs() ftrace: Update dynamic ftrace calls only if necessary ftrace: Make ftrace_hash_rec_enable return update bool tracing: Fix typoes in code comment and printk in trace_nop.c tracing, writeback: Replace cgroup path to cgroup ino tracing: Use flags instead of bool in trigger structure tracing: Add an unreg_all() callback to trigger commands tracing: Add needs_rec flag to event triggers tracing: Add a per-event-trigger 'paused' field tracing: Add get_syscall_name() tracing: Add event record param to trigger_ops.func() tracing: Make event trigger functions available tracing: Make ftrace_event_field checking functions available ...	2016-03-24 10:52:25 -07:00
Joe Perches	a395d6a7e3	kernel/...: convert pr_warning to pr_warn Use the more common logging method with the eventual goal of removing pr_warning altogether. Miscellanea: - Realign arguments - Coalesce formats - Add missing space between a few coalesced formats Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [kernel/power/suspend.c] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-22 15:36:02 -07:00
Peter Zijlstra	7e6867bf83	tracing: Record and show NMI state The latency tracer format has a nice column to indicate IRQ state, but this is not able to tell us about NMI state. When tracing perf interrupt handlers (which often run in NMI context) it is very useful to see how the events nest. Link: http://lkml.kernel.org/r/20160318153022.105068893@infradead.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-03-22 18:04:10 -04:00
Steven Rostedt (Red Hat)	a29054d947	tracing: Fix crash from reading trace_pipe with sendfile If tracing contains data and the trace_pipe file is read with sendfile(), then it can trigger a NULL pointer dereference and various BUG_ON within the VM code. There's a patch to fix this in the splice_to_pipe() code, but it's also a good idea to not let that happen from trace_pipe either. Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Rabin Vincent <rabin.vincent@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-03-18 15:51:42 -04:00
Chunyu Hu	c8ca003b2f	tracing: Fix return while holding a lock in register_tracer() commit `d39cdd2036` ("tracing: Make tracer_flags use the right set_flag callback") introduces a potential mutex deadlock issue, as it forgets to free the mutex when allocaing the tracer_flags gets fail. The issue was found by Dan Carpenter through Smatch static code check tool. Link: http://lkml.kernel.org/r/1457958941-30265-1-git-send-email-chuhu@redhat.com Fixes: `d39cdd2036` ("tracing: Make tracer_flags use the right set_flag callback") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-03-18 10:36:21 -04:00
Chunyu Hu	d39cdd2036	tracing: Make tracer_flags use the right set_flag callback When I was updating the ftrace_stress test of ltp. I encountered a strange phenomemon, excute following steps: echo nop > /sys/kernel/debug/tracing/current_tracer echo 0 > /sys/kernel/debug/tracing/options/funcgraph-cpu bash: echo: write error: Invalid argument check dmesg: [ 1024.903855] nop_test_refuse flag set to 0: we refuse.Now cat trace_options to see the result The reason is that the trace option test will randomly setup trace option under tracing/options no matter what the current_tracer is. but the set_tracer_option is always using the set_flag callback from the current_tracer. This patch adds a pointer to tracer_flags and make it point to the tracer it belongs to. When the option is setup, the set_flag of the right tracer will be used no matter what the the current_tracer is. And the old dummy_tracer_flags is used for all the tracers which doesn't have a tracer_flags, having issue to use it to save the pointer of a tracer. So remove it and use dynamic dummy tracer_flags for tracers needing a dummy tracer_flags, as a result, there are no tracers sharing tracer_flags, so remove the check code. And save the current tracer to trace_option_dentry seems not good as it may waste mem space when mount the debug/trace fs more than one time. Link: http://lkml.kernel.org/r/1457444222-8654-1-git-send-email-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> [ Fixed up function tracer options to work with the change ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-03-08 11:19:08 -05:00
Steven Rostedt (Red Hat)	7717c6be69	tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs() While cleaning the stacktrace code I unintentially changed the skip depth of trace_buffer_unlock_commit_regs() from 0 to 6. kprobes uses this function, and with skipping 6 call backs, it can easily produce no stack. Here's how I tested it: # echo 'p:ext4_sync_fs ext4_sync_fs ' > /sys/kernel/debug/tracing/kprobe_events # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable # cat /sys/kernel/debug/trace sync-2394 [005] 502.457060: ext4_sync_fs: (ffffffff81317650) sync-2394 [005] 502.457063: kernel_stack: <stack trace> sync-2394 [005] 502.457086: ext4_sync_fs: (ffffffff81317650) sync-2394 [005] 502.457087: kernel_stack: <stack trace> sync-2394 [005] 502.457091: ext4_sync_fs: (ffffffff81317650) After putting back the skip stack to zero, we have: sync-2270 [000] 748.052693: ext4_sync_fs: (ffffffff81317650) sync-2270 [000] 748.052695: kernel_stack: <stack trace> => iterate_supers (ffffffff8126412e) => sys_sync (ffffffff8129c4b6) => entry_SYSCALL_64_fastpath (ffffffff8181f0b2) sync-2270 [000] 748.053017: ext4_sync_fs: (ffffffff81317650) sync-2270 [000] 748.053019: kernel_stack: <stack trace> => iterate_supers (ffffffff8126412e) => sys_sync (ffffffff8129c4b6) => entry_SYSCALL_64_fastpath (ffffffff8181f0b2) sync-2270 [000] 748.053381: ext4_sync_fs: (ffffffff81317650) sync-2270 [000] 748.053383: kernel_stack: <stack trace> => iterate_supers (ffffffff8126412e) => sys_sync (ffffffff8129c4b6) => entry_SYSCALL_64_fastpath (ffffffff8181f0b2) Cc: stable@vger.kernel.org # v4.4+ Fixes: `73dddbb57b` "tracing: Only create stacktrace option when STACKTRACE is configured" Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com> Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-01-14 09:28:19 -05:00
Chen Gang	e428abbbf6	tracing: #ifdef out uses of max trace when CONFIG_TRACER_MAX_TRACE is not set tracing_max_lat_fops is used only when TRACER_MAX_TRACE enabled, so also swith the related code. The related warning with defconfig under x86_64: CC kernel/trace/trace.o kernel/trace/trace.c:5466:37: warning: ‘tracing_max_lat_fops’ defined but not used [-Wunused-const-variable] static const struct file_operations tracing_max_lat_fops = { Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-11-10 10:16:05 -05:00
Dmitry Safonov	03e88ae6b3	tracing: Remove unused ftrace_cpu_disabled per cpu variable Since the ring buffer is lockless, there is no need to disable ftrace on CPU. And no one doing so: after commit `68179686ac` ("tracing: Remove ftrace_disable/enable_cpu()") ftrace_cpu_disabled stays the same after initialization, nothing changes it. ftrace_cpu_disabled shouldn't be used by any external module since it disables only function and graph_function tracers but not any other tracer. Link: http://lkml.kernel.org/r/1446836846-22239-1-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-11-07 13:25:14 -05:00
Jiaxing Wang	8b1291994d	tracing: Make tracing work when debugfs is not configured in Currently tracing_init_dentry() returns -ENODEV when debugfs is not configured in, which causes tracefs not populated with tracing files and directories, so we will get an empty directory even after we manually mount tracefs. We can make tracing_init_dentry() return NULL if debugfs is not configured in and can manually mount tracefs. But return -ENODEV if debugfs is configured in but not initialized or failed to create automount point as that would break backward compatibility with older tools. Link: http://lkml.kernel.org/r/1446797056-11683-1-git-send-email-hello.wjx@gmail.com Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-11-06 10:02:33 -05:00
Steven Rostedt (Red Hat)	43ed384339	tracing: Put back comma for empty fields in boot string parsing Both early_enable_events() and apply_trace_boot_options() parse a boot string that may get parsed later on. They both use strsep() which converts a comma into a nul character. To still allow the boot string to be parsed again the same way, the nul character gets converted back to a comma after the token is processed. The problem is that these two functions check for an empty parameter (two commas in a row ",,"), and continue the loop if the parameter is empty, but fails to place the comma back. In this case, the second parsing will end at this blank field, and not process fields afterward. In most cases, users should not have an empty field, but if its going to be checked, the code might as well be correct. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-11-03 22:15:14 -05:00
Jiaxing Wang	a4d1e68823	tracing: Apply tracer specific options from kernel command line. Currently, the trace_options parameter is only applied in tracer_alloc_buffers() when global_trace.current_trace is nop_trace, so a tracer specific option will not be applied even when the specific tracer is also enabled from kernel command line. For example, the 'func_stack_trace' option can't be enabled with the following kernel parameter: ftrace=function ftrace_filter=kfree trace_options=func_stack_trace We can enable tracer specific options by simply apply the options again if the specific tracer is also supplied from command line and started in register_tracer(). To make trace_boot_options_buf can be parsed again, a comma and a space is put back if they were replaced by strsep and strstrip respectively. Also make register_tracer() be __init to access the __init data, and in fact register_tracer is only called from __init code. Link: http://lkml.kernel.org/r/1446599669-9294-1-git-send-email-hello.wjx@gmail.com Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-11-03 21:51:43 -05:00
Sasha Levin	919cd97999	tracing: Allow dumping traces without tracking trace started cpus We don't init iter->started when dumping the ftrace buffer, and there's no real need to do so - so allow skipping that check if the iter doesn't have an initialized ->started cpumask. Link: http://lkml.kernel.org/r/1441385156-27279-1-git-send-email-sasha.levin@oracle.com Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-11-03 16:10:08 -05:00
Jiaxing Wang	681a4a2f45	tracing: Update instance_rmdir() to use tracefs_remove_recursive Update instancd_rmdir to use tracefs_remove_recursive instead of debugfs_remove_recursive.This was left in the transition from debugfs to tracefs. Link: http://lkml.kernel.org/r/1445169490-18315-2-git-send-email-hello.wjx@gmail.com Cc: stable@vger.kernel.org # 4.1+ Fixes: `8434dc9340` ("tracing: Convert the tracing facility over to use tracefs") Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-11-02 13:59:06 -05:00
Steven Rostedt (Red Hat)	37aea98b84	tracing: Add trace options for tracer options to instances Add the tracer options to instances options directory as well. Only add the options for tracers that are allowed to be enabled by an instance. But note, that tracer options are global. That is, tracer options enabled in an instance, also take affect at the top level and in other instances. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 15:22:58 -04:00
Steven Rostedt (Red Hat)	16270145ce	tracing: Add trace options for core options to instances Allow instances to have their own options, at least for the core options (non tracer specific ones). There are a few global options that should not be added to instances, like enabling of trace_printk, and the sched comm recording, which do not have a specific trace instance associated to them. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 15:22:57 -04:00
Steven Rostedt (Red Hat)	2d34f48955	tracing: Make ftrace_trace_stack() depend on general trace_array flag In preparation for the multi buffer instances to have their own trace_flags, the check in ftrace_trace_stack() needs to test the trace_array descriptor flag that is for the current event, not the global_trace descriptor. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 15:22:57 -04:00
Steven Rostedt (Red Hat)	9a38a8856f	tracing: Add a method to pass in trace_array descriptor to option files In preparation of having the multi buffer instances having their own trace option flags, the trace option files needs a way to not only pass in the flag they represent, but also the trace_array descriptor. A new field is added to the trace_array descriptor called trace_flags_index, which is a 32 byte character array representing a bit. This array is simply filled with the index of the array, where index_array[n] = n; Then the address of this array is passed to the file callbacks instead of the index of the flag index. Then to retrieve both the flag index and the trace_array descriptor: data is the passed in argument. index = (unsigned char )data; data -= index; /* Now data points to the address of the array in the trace_array */ tr = container_of(data, struct trace_array, trace_flags_index); Suggested-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 15:22:56 -04:00
Steven Rostedt (Red Hat)	983f938ae6	tracing: Move trace_flags from global to a trace_array field In preparation to make trace options per instance, the global trace_flags needs to be moved from being a global variable to a field within the trace instance trace_array structure. There's still more work to do, as there's some functions that use trace_flags without passing in a way to get to the current_trace array. For those, the global_trace is used directly (from trace.c). This includes setting and clearing the trace_flags. This means that when a new instance is created, it just gets the trace_flags of the global_trace and will not be able to modify them. Depending on the functions that have access to the trace_array, the flags of an instance may not affect parts of its trace, where the global_trace is used. These will be fixed in future changes. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 15:22:55 -04:00
Steven Rostedt (Red Hat)	5557720415	tracing: Move sleep-time and graph-time options out of the core trace_flags The sleep-time and graph-time options are only for the function graph tracer and are not used by anything else. As tracer options are now visible when the tracer is not activated, its better to move the function graph specific tracer options into the function graph tracer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 15:22:42 -04:00
Steven Rostedt (Red Hat)	b9f9108cad	tracing: Remove access to trace_flags in trace_printk.c In the effort to move the global trace_flags to the tracing instances, the direct access to trace_flags must be removed from trace_printk.c Instead, add a new trace_printk_enabled boolean that is set by a new access function trace_printk_control(), that will enable or disable trace_printk. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 04:35:18 -04:00
Steven Rostedt (Red Hat)	b5e87c0581	tracing: Add build bug if we have more trace_flags than bits Add a enum that denotes the last bit of the trace_flags and have a BUILD_BUG_ON(last_bit > 32). If we add more bits than we have in trace_flags, the kernel wont build. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 04:35:18 -04:00
Steven Rostedt (Red Hat)	41d9c0becc	tracing: Always show all tracer options in the options directory There are options that are unique to a specific tracer (like function and function graph). Currently, these options are only visible in the options directory when the tracer is enabled. This has been a pain, especially for something like the func_stack_trace option that if used inappropriately, could bring the system to a crawl. But the only way to see it, is to enable the function tracer. For example, if one had done: # cd /sys/kernel/tracing # echo __schedule > set_ftrace_filter # echo 1 > options/func_stack_trace # echo function > current_tracer The __schedule call will be traced and a stack trace will also be recorded there. Now when you were done, you may do... # echo nop > current_tracer # echo > set_ftrace_filter But you forgot to disable the func_stack_trace. The only way to disable it is to re-enable function tracing first. If you do not add a filter to set_ftrace_filter and just do: # echo function > current_tracer Now you would be performing a stack trace on every function! On some systems, that causes a live lock. Others may take a few minutes to fix your mistake. Having the func_stack_trace option visible allows you to check it and disable it before enabling the funtion tracer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-30 04:34:54 -04:00
Steven Rostedt (Red Hat)	73dddbb57b	tracing: Only create stacktrace option when STACKTRACE is configured Only create the stacktrace trace option when CONFIG_STACKTRACE is configured. Cleaned up the ftrace_trace_stack() function call a little to allow better encapsulation of the stacktrace trace flag. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-29 15:38:55 -04:00
Steven Rostedt (Red Hat)	8179e8a15b	tracing: Do not create function tracer options when not compiled in When the function tracer is not compiled in, do not create the option files for it. Fix up both the sched_wakeup and irqsoff tracers to handle the change. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-29 15:01:34 -04:00
Steven Rostedt (Red Hat)	729358da95	tracing: Only create function graph options when it is compiled in Do not create fuction graph tracer options when function graph tracer is not even compiled in. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-29 13:23:58 -04:00
Steven Rostedt (Red Hat)	a3418a364e	tracing: Use TRACE_FLAGS macro to keep enums and strings matched Use a cute little macro trick to keep the names of the trace flags file guaranteed to match the corresponding masks. The macro TRACE_FLAGS is defined as a serious of enum names followed by the string name of the file that matches it. For example: #define TRACE_FLAGS \ C(PRINT_PARENT, "print-parent"), \ C(SYM_OFFSET, "sym-offset"), \ C(SYM_ADDR, "sym-addr"), \ C(VERBOSE, "verbose"), Now we can define the following: #undef C #define C(a, b) TRACE_ITER_##a##_BIT enum trace_iterator_bits { TRACE_FLAGS }; The above creates: enum trace_iterator_bits { TRACE_ITER_PRINT_PARENT_BIT, TRACE_ITER_SYM_OFFSET_BIT, TRACE_ITER_SYM_ADDR_BIT, TRACE_ITER_VERBOSE_BIT, }; Then we can redefine C as: #undef C #define C(a, b) TRACE_ITER_##a = (1 << TRACE_ITER_##a##_BIT) enum trace_iterator_flags { TRACE_FLAGS }; Which creates: enum trace_iterator_flags { TRACE_ITER_PRINT_PARENT = (1 << TRACE_ITER_PRINT_PARENT_BIT), TRACE_ITER_SYM_OFFSET = (1 << TRACE_ITER_SYM_OFFSET_BIT), TRACE_ITER_SYM_ADDR = (1 << TRACE_ITER_SYM_ADDR_BIT), TRACE_ITER_VERBOSE = (1 << TRACE_ITER_VERBOSE_BIT), }; Then finally we can create the list of file names: #undef C #define C(a, b) b static const char trace_options[] = { TRACE_FLAGS NULL }; Which creates: static const char trace_options[] = { "print-parent", "sym-offset", "sym-addr", "verbose", NULL }; The importance of this is that the strings match the bit index. trace_options[TRACE_ITER_SYM_ADDR_BIT] == "sym-addr" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-29 13:23:57 -04:00
Steven Rostedt (Red Hat)	938db5f569	tracing: Remove unused tracing option "ftrace_preempt" There was a time where the function tracing would disable interrupts unless specifically told not to, where it would only disable preemption. With the new lockless code, the function tracing never disalbes interrupts and just uses disabling of preemption. Remove the option "ftrace_preempt" as it does nothing anyway. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-29 13:23:54 -04:00
Steven Rostedt (Red Hat)	03905582fd	tracing: Move "display-graph" option to main options In order to facilitate making all tracer options visible even when the tracer is not active, we need to get rid of duplicate options. Any option that is shared between multiple tracers really should be a main option. As the wakeup and irqsoff tracers both use the "display-graph" option, and use it exactly the same way, move that option from the tracer options to the main options and consolidate them. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-29 12:56:40 -04:00
Steven Rostedt (Red Hat)	ca475e831f	tracing: Make ftrace_trace_stack() static ftrace_trace_stack() is not called outside of trace.c. Make it a static function. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-28 09:41:11 -04:00
Steven Rostedt (Red Hat)	b7f0c959ed	tracing: Pass trace_array into trace_buffer_unlock_commit() In preparation for having trace options be per instance, the trace_array needs to be passed to the trace_buffer_unlock_commit(). The trace_event_buffer_lock_reserve() already passes in the trace_event_file where the trace_array can be derived from. Also added a "__init" to the boot up test event plus function tracing function function_test_events_call(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-25 17:38:44 -04:00
Steven Rostedt (Red Hat)	41907416bc	tracing: Remove unused function trace_current_buffer_lock_reserve() trace_current_buffer_lock_reserve() is not used by anything. Might as well get rid of it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-25 15:37:31 -04:00
Steven Rostedt (Red Hat)	d78a461427	tracing: Remove ftrace_trace_stack_regs() ftrace_trace_stack_regs() is used in only one place, and because that is such a simple function, just move its code into the location that it was used in (trace_buffer_unlock_commit_regs()). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-09-25 15:37:23 -04:00
Umesh Tiwari	5e2d5ef8ec	ftrace: correct the counter increment for trace_buffer data In ftrace_dump, for disabling buffer, iter.tr->trace_buffer.data is used. But for enabling, iter.trace_buffer->data is used. Even though, both point to same buffer, for readability, same convention should be used. Link: http://lkml.kernel.org/r/1434972306-20043-1-git-send-email-umesh.t@samsung.com Signed-off-by: Umesh Tiwari <umesh.t@samsung.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-07-20 22:30:45 -04:00
Gil Fruchter	72917235fd	tracing: Fix for non-continuous cpu ids Currently exception occures due to access beyond buffer_iter range while using index of cpu bigger than num_possible_cpus(). Below there is an example for such exception when we use cpus 0,1,16,17. In order to fix buffer allocation size for non-continuous cpu ids we allocate according to the max cpu id and not according to the amount of possible cpus. Example: $ cat /sys/kernel/debug/tracing/per_cpu/cpu1/trace Path: /bin/busybox CPU: 0 PID: 82 Comm: cat Not tainted 4.0.0 #29 task: 80734c80 ti: 80012000 task.ti: 80012000 [ECR ]: 0x00220100 => Invalid Read @ 0x00000000 by insn @ 0x800abafc [EFA ]: 0x00000000 [BLINK ]: ring_buffer_read_finish+0x24/0x64 [ERET ]: rb_check_pages+0x20/0x188 [STAT32]: 0x00001a00 : BTA: 0x800abafc SP: 0x80013f0c FP: 0x57719cf8 LPS: 0x200036b4 LPE: 0x200036b8 LPC: 0x00000000 r00: 0x8002aca0 r01: 0x00001606 r02: 0x00000000 r03: 0x00000001 r04: 0x00000000 r05: 0x804b4954 r06: 0x00030003 r07: 0x8002a260 r08: 0x00000286 r09: 0x00080002 r10: 0x00001006 r11: 0x807351a4 r12: 0x00000001 Stack Trace: rb_check_pages+0x20/0x188 ring_buffer_read_finish+0x24/0x64 tracing_release+0x4e/0x170 __fput+0x62/0x158 task_work_run+0xa2/0xd4 do_notify_resume+0x52/0x7c resume_user_mode_begin+0xdc/0xe0 Link: http://lkml.kernel.org/r/1433835155-6894-3-git-send-email-gilf@ezchip.com Signed-off-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Gil Fruchter <gilf@ezchip.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-07-20 22:30:45 -04:00
Gil Fruchter	9fe6b778ca	tracing: Prefer kcalloc over kzalloc with multiply Use kcalloc for allocating an array instead of kzalloc with multiply, as that is what kcalloc is used for. Found with checkpatch. Link: http://lkml.kernel.org/r/1433835155-6894-2-git-send-email-gilf@ezchip.com Signed-off-by: Gil Fruchter <gilf@ezchip.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-07-20 22:30:42 -04:00
Steven Rostedt (Red Hat)	5d6ad960a7	tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_* The name "ftrace" really refers to the function hook infrastructure. It is not about the trace_events. The FTRACE_EVENT_FL_* flags are flags to do with the trace_event files in the tracefs directory. They are not related to function tracing. Rename them to a more descriptive name. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-05-13 15:24:57 -04:00
Steven Rostedt (Red Hat)	2425bcb924	tracing: Rename ftrace_event_{call,class} to trace_event_{call,class} The name "ftrace" really refers to the function hook infrastructure. It is not about the trace_events. The structures ftrace_event_call and ftrace_event_class have nothing to do with the function hooks, and are really trace_event structures. Rename ftrace_event_* to trace_event_*. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-05-13 14:06:10 -04:00
Steven Rostedt (Red Hat)	7f1d2f8210	tracing: Rename ftrace_event_file to trace_event_file The name "ftrace" really refers to the function hook infrastructure. It is not about the trace_events. The structure ftrace_event_file is really about trace events and not "ftrace". Rename it to trace_event_file. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-05-13 14:05:16 -04:00
Drew Richardson	aabfa5f28f	ftrace: Provide trace clock monotonic raw Expose the NMI safe accessor to the monotonic raw clock to the tracer. The mono clock was added with commit `1b3e5c0936`. The advantage of the monotonic raw clock is that it will advance more constantly than the monotonic clock. Imagine someone is trying to optimize a particular program to reduce instructions executed for a given workload while minimizing the effect on runtime. Also suppose that NTP is running and potentially making larger adjustments to the monotonic clock. If NTP is adjusting the monotonic clock to advance more rapidly, the program will appear to use fewer instructions per second but run longer than if the monotonic raw clock had been used. The total number of instructions observed would be the same regardless of the clock source used, but how it's attributed to time would be affected. Conversely if NTP is adjusting the monotonic clock to advance more slowly, the program will appear to use more instructions per second but run more quickly. Of course there are many sources that can cause jitter in performance measurements on modern processors, but let's remove NTP from the list. The monotonic raw clock can also be useful for tracing early boot, e.g. when debugging issues with NTP. Link: http://lkml.kernel.org/r/20150508143037.GB1276@dreric01-Precision-T1650 Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Drew Richardson <drew.richardson@arm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-05-12 15:58:58 -04:00
Linus Torvalds	9ec3a646fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only	2015-04-26 17:22:07 -07:00
David Howells	7682c91843	VFS: kernel/: d_inode() annotations relayfs and tracefs are dealing with inodes of their own; those two act as filesystem drivers Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-15 15:06:55 -04:00
Linus Torvalds	eeee78cf77	Some clean ups and small fixes, but the biggest change is the addition of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints. Tracepoints have helper functions for the TP_printk() called __print_symbolic() and __print_flags() that lets a numeric number be displayed as a a human comprehensible text. What is placed in the TP_printk() is also shown in the tracepoint format file such that user space tools like perf and trace-cmd can parse the binary data and express the values too. Unfortunately, the way the TRACE_EVENT() macro works, anything placed in the TP_printk() will be shown pretty much exactly as is. The problem arises when enums are used. That's because unlike macros, enums will not be changed into their values by the C pre-processor. Thus, the enum string is exported to the format file, and this makes it useless for user space tools. The TRACE_DEFINE_ENUM() solves this by converting the enum strings in the TP_printk() format into their number, and that is what is shown to user space. For example, the tracepoint tlb_flush currently has this in its format file: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) After adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); Its format file will contain this: __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" }) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVLBTuAAoJEEjnJuOKh9ldjHMIALdRS755TXCZGOf0r7O2akOR wMPeum7C+ae1mH+jCsJKUC0/jUfQKaMt/UxoHlipDgcGg8kD2jtGnGCw4Xlwvdsr y4rFmcTRSl1mo0zDSsg6ujoupHlVYN0+JPjrd7S3cv/llJoY49zcanNLF7S2XLeM dZCtWRLWYpBiWO68ai6AqJTnE/eGFIqBI048qb5Eg8dbK243SSeSIf9Ywhb+VsA+ aq6F7cWI/H6j4tbeza8tAN19dcwenDro5EfCDY8ARQHJu1f6Y3+DLf2imjkd6Aiu JVAoGIjHIpI+djwCZC1u4gi4urjfOqYartrM3Q54tb3YWYqHeNqP2ASI2a4EpYk= =Ixwt -----END PGP SIGNATURE----- Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Some clean ups and small fixes, but the biggest change is the addition of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints. Tracepoints have helper functions for the TP_printk() called __print_symbolic() and __print_flags() that lets a numeric number be displayed as a a human comprehensible text. What is placed in the TP_printk() is also shown in the tracepoint format file such that user space tools like perf and trace-cmd can parse the binary data and express the values too. Unfortunately, the way the TRACE_EVENT() macro works, anything placed in the TP_printk() will be shown pretty much exactly as is. The problem arises when enums are used. That's because unlike macros, enums will not be changed into their values by the C pre-processor. Thus, the enum string is exported to the format file, and this makes it useless for user space tools. The TRACE_DEFINE_ENUM() solves this by converting the enum strings in the TP_printk() format into their number, and that is what is shown to user space. For example, the tracepoint tlb_flush currently has this in its format file: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) After adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); Its format file will contain this: __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" })" * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits) tracing: Add enum_map file to show enums that have been mapped writeback: Export enums used by tracepoint to user space v4l: Export enums used by tracepoints to user space SUNRPC: Export enums in tracepoints to user space mm: tracing: Export enums in tracepoints to user space irq/tracing: Export enums in tracepoints to user space f2fs: Export the enums in the tracepoints to userspace net/9p/tracing: Export enums in tracepoints to userspace x86/tlb/trace: Export enums in used by tlb_flush tracepoint tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM() tracing: Allow for modules to convert their enums to values tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation tracing: Give system name a pointer brcmsmac: Move each system tracepoints to their own header iwlwifi: Move each system tracepoints to their own header mac80211: Move message tracepoints to their own header tracing: Add TRACE_SYSTEM_VAR to xhci-hcd tracing: Add TRACE_SYSTEM_VAR to kvm-s390 tracing: Add TRACE_SYSTEM_VAR to intel-sst ...	2015-04-14 10:49:03 -07:00
Linus Torvalds	3f3c73de77	This adds the new tracefs file system. This has been in linux-next for more than one release, as I had it ready for the 4.0 merge window, but a last minute thing that needed to go into Linux first had to be done. That was that perf hard coded the file system number when reading /sys/kernel/debugfs/tracing directory making sure that the path had the debugfs mount # before it would parse the tracing file. This broke other use cases of perf, and the check is removed. Now when mounting /sys/kernel/debug, tracefs is automatically mounted in /sys/kernel/debug/tracing such that old tools will still see that path as expected. But now system admins can mount tracefs directly and not need to mount debugfs, which can expose security issues. A new directory is created when tracefs is configured such that system admins can now mount it separately (/sys/kernel/tracing). This branch is based off of Al Viro's vfs debugfs_automount branch at commit `163f9eb95a` debugfs: Provide a file creation function that also takes an initial size to get the debugfs_create_automount() operation. I just noticed that Al rebased the pull to add his Signed-off-by to that commit, and the commit is now `e59b4e9187`. I did a git diff of those two and see they are the same. Only the latter has Al's SOB. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVLA6YAAoJEEjnJuOKh9ldv6AH/1JUINDQwV+M0VTwzbLogloo Sco0byLhskmx5KLVD7Vs8BJAGrgHTdit32kzBGmLGJvVCKBa+c8lwmRw6rnXg3uX K4kGp7BIyn1/geXoGpCmDKaLGXhDcw49hRzejKDg/OqFtxKTsSeQtG8fo29ps9Do 0VaF6UDp8gYplC2N2BfpB59LVndrITQ3mSsBBeFPvS7IxFJXAhDBOq2yi0aI6HyJ ICo2L/bA9HLxMuceWrXbsun+RP68+AQlnFfAtok7AcuBzUYPCKY0shT2VMOUtpTt 1dGxMxq6q1ACfmY7gbp47WMX9aKjWcSEr0V+IYx/xex6Maf0Xsujsy99bKYUWvs= =OcgU -----END PGP SIGNATURE----- Merge tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracefs from Steven Rostedt: "This adds the new tracefs file system. This has been in linux-next for more than one release, as I had it ready for the 4.0 merge window, but a last minute thing that needed to go into Linux first had to be done. That was that perf hard coded the file system number when reading /sys/kernel/debugfs/tracing directory making sure that the path had the debugfs mount # before it would parse the tracing file. This broke other use cases of perf, and the check is removed. Now when mounting /sys/kernel/debug, tracefs is automatically mounted in /sys/kernel/debug/tracing such that old tools will still see that path as expected. But now system admins can mount tracefs directly and not need to mount debugfs, which can expose security issues. A new directory is created when tracefs is configured such that system admins can now mount it separately (/sys/kernel/tracing)" * tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have mkdir and rmdir be part of tracefs tracefs: Add directory /sys/kernel/tracing tracing: Automatically mount tracefs on debugfs/tracing tracing: Convert the tracing facility over to use tracefs tracefs: Add new tracefs file system tracing: Create cmdline tracer options on tracing fs init tracing: Only create tracer options files if directory exists debugfs: Provide a file creation function that also takes an initial size	2015-04-14 10:22:29 -07:00
Steven Rostedt (Red Hat)	9828413d47	tracing: Add enum_map file to show enums that have been mapped Add a enum_map file in the tracing directory to see what enums have been saved to convert in the print fmt files. As this requires the enum mapping to be persistent in memory, it is only created if the new config option CONFIG_TRACE_ENUM_MAP_FILE is enabled. This is for debugging and will increase the persistent memory footprint of the kernel. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-04-08 10:58:35 -04:00
Steven Rostedt (Red Hat)	3673b8e4ce	tracing: Allow for modules to convert their enums to values Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM() will have those enums converted into their values in the tracepoint print fmt strings. Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.au Acked-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-04-08 09:39:57 -04:00
Steven Rostedt (Red Hat)	0c564a538a	tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values Several tracepoints use the helper functions __print_symbolic() or __print_flags() and pass in enums that do the mapping between the binary data stored and the value to print. This works well for reading the ASCII trace files, but when the data is read via userspace tools such as perf and trace-cmd, the conversion of the binary value to a human string format is lost if an enum is used, as userspace does not have access to what the ENUM is. For example, the tracepoint trace_tlb_flush() has: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) Which maps the enum values to the strings they represent. But perf and trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would not be able to map it. With TRACE_DEFINE_ENUM(), developers can place these in the event header files and ftrace will convert the enums to their values: By adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format [...] __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" }) The above is what userspace expects to see, and tools do not need to be modified to parse them. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Guilherme Cox <cox@computer.org> Cc: Tony Luck <tony.luck@gmail.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-04-08 09:39:56 -04:00
Tejun Heo	1a40243bae	tracing: use %pb[l] to print bitmaps including cpumasks and nodemasks printk and friends can now format bitmaps using '%pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:37 -08:00
Linus Torvalds	41cbc01f6e	The updates included in this pull request for ftrace are: o Several clean ups to the code One such clean up was to convert to 64 bit time keeping, in the ring buffer benchmark code. o Adding of __print_array() helper macro for TRACE_EVENT() o Updating the sample/trace_events/ to add samples of different ways to make trace events. Lots of features have been added since the sample code was made, and these features are mostly unknown. Developers have been making their own hacks to do things that are already available. o Performance improvements. Most notably, I found a performance bug where a waiter that is waiting for a full page from the ring buffer will see that a full page is not available, and go to sleep. The sched event caused by it going to sleep would cause it to wake up again. It would see that there was still not a full page, and go back to sleep again, and that would wake it up again, until finally it would see a full page. This change has been marked for stable. Other improvements include removing global locks from fast paths. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU3M+GAAoJEEjnJuOKh9ldpWQIAJTUzeVXlU0cf3bVn768VW7e XS41WHF34l1tNevmKTh6fCPiw8+U0UMGLQt5WKtyaaARsZn2MlefLVuvHPKFlK2w +qcI4OEVHH97Qgf9HWJSsYgnZaOnOE+TENqnokEgXMimRMuVcd/S4QaGxwJVDcjm iBF5j2TaG4aGbx4a3J7KueoZ3K+39r3ut15hIGi/IZBZldQ1pt26ytafD/KA3CU3 BLRM2HLttAMsV1ds0EDLgZjSGICVetFcdOmI5Gwj7Qr3KrOTRPYJMNc8NdDL7Js9 v8VhujhFGvcCrhO/IKpVvd9yluz3RCF+Z7ihc+D/+1B3Nsm0PTwN3Fl5J+f89AA= =u2Mm -----END PGP SIGNATURE----- Merge tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The updates included in this pull request for ftrace are: o Several clean ups to the code One such clean up was to convert to 64 bit time keeping, in the ring buffer benchmark code. o Adding of __print_array() helper macro for TRACE_EVENT() o Updating the sample/trace_events/ to add samples of different ways to make trace events. Lots of features have been added since the sample code was made, and these features are mostly unknown. Developers have been making their own hacks to do things that are already available. o Performance improvements. Most notably, I found a performance bug where a waiter that is waiting for a full page from the ring buffer will see that a full page is not available, and go to sleep. The sched event caused by it going to sleep would cause it to wake up again. It would see that there was still not a full page, and go back to sleep again, and that would wake it up again, until finally it would see a full page. This change has been marked for stable. Other improvements include removing global locks from fast paths" * tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Do not wake up a splice waiter when page is not full tracing: Fix unmapping loop in tracing_mark_write tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT() tracing: Add TRACE_EVENT_FN example tracing: Add TRACE_EVENT_CONDITION sample tracing: Update the TRACE_EVENT fields available in the sample code tracing: Separate out initializing top level dir from instances tracing: Make tracing_init_dentry_tr() static trace: Use 64-bit timekeeping tracing: Add array printing helper tracing: Remove newline from trace_printk warning banner tracing: Use IS_ERR() check for return value of tracing_init_dentry() tracing: Remove unneeded includes of debugfs.h and fs.h tracing: Remove taking of trace_types_lock in pipe files tracing: Add ref count to tracer for when they are being read by pipe	2015-02-12 08:37:41 -08:00
Vikram Mulukutla	7215853e98	tracing: Fix unmapping loop in tracing_mark_write Commit `6edb2a8a38` introduced an array map_pages that contains the addresses returned by kmap_atomic. However, when unmapping those pages, map_pages[0] is unmapped before map_pages[1], breaking the nesting requirement as specified in the documentation for kmap_atomic/kunmap_atomic. This was caught by the highmem debug code present in kunmap_atomic. Fix the loop to do the unmapping properly. Link: http://lkml.kernel.org/r/1418871056-6614-1-git-send-email-markivx@codeaurora.org Cc: stable@vger.kernel.org # 3.5+ Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Reported-by: Lime Yang <limey@codeaurora.org> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-02-09 18:47:09 -05:00
Steven Rostedt (Red Hat)	eae473581c	tracing: Have mkdir and rmdir be part of tracefs The tracing "instances" directory can create sub tracing buffers with mkdir, and remove them with rmdir. As a mkdir will also create all the files and directories that control the sub buffer the inode mutexes need to be released before this is done, to avoid deadlocks. It is better to let the tracing system unlock the inode mutexes before calling the functions that create the files within the new directory (or deletes the files from the one being destroyed). Now that tracing has been converted over to tracefs, the tracefs file system can be modified to accommodate this feature. It still releases the locks, but the filesystem itself can take care of the ugly business and let the user just do what it needs. The tracing system now attaches a descriptor to the directory dentry that can have userspace create or remove sub directories. If this descriptor does not exist for a dentry, then that dentry can not be used to create other directories. This descriptor holds a mkdir and rmdir method that only takes a character string as an argument. The tracefs file system will first make a copy of the dentry name before releasing the locks. Then it will pass the copied name to the methods. It is up to the tracing system that supplied the methods to handle races with duplicate names and such as all the inode mutexes would be released when the functions are called. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-02-03 12:48:43 -05:00
Steven Rostedt (Red Hat)	f76180bc07	tracing: Automatically mount tracefs on debugfs/tracing As tools currently rely on the tracing directory in debugfs, we can not just created a tracefs infrastructure and expect sysadmins to mount the new tracefs to have their old tools work. Instead, the debugfs tracing directory is still created and the tracefs file system is mounted there when the debugfs filesystem is mounted. No longer does the tracing infrastructure update the debugfs file system, but instead interacts with the tracefs file system. But now, it still appears to the user like nothing changed, except you also have the feature of mounting just the tracing system without needing all of debugfs! Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-02-03 12:48:42 -05:00
Steven Rostedt (Red Hat)	8434dc9340	tracing: Convert the tracing facility over to use tracefs debugfs was fine for the tracing facility as a quick way to get an interface. Now that tracing has matured, it should separate itself from debugfs such that it can be mounted separately without needing to mount all of debugfs with it. That is, users resist using tracing because it requires mounting debugfs. Having tracing have its own file system lets users get the features of tracing without needing to bring in the rest of the kernel's debug infrastructure. Another reason for tracefs is that debubfs does not support mkdir. Currently, to create instances, one does a mkdir in the tracing/instance directory. This is implemented via a hack that forces debugfs to do something it is not intended on doing. By converting over to tracefs, this hack can be removed and mkdir can be properly implemented. This patch does not address this yet, but it lays the ground work for that to be done. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-02-03 12:48:41 -05:00
Steven Rostedt (Red Hat)	09d23a1d8a	tracing: Create cmdline tracer options on tracing fs init The options for cmdline tracers are not created if the debugfs system is not ready yet. If tracing has started before debugfs is up, then the option files for the tracer are not created. Create them when creating the tracing directory if the current tracer requires option files. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-02-03 12:48:39 -05:00
Steven Rostedt (Red Hat)	0f67f04ffc	tracing: Only create tracer options files if directory exists Do not bother creating tracer options if no tracing directory exists. If a tracer is enabled via the command line, and is started before the tracing directory is created, then it wont have its tracer specific options created. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-02-03 12:48:38 -05:00
Steven Rostedt (Red Hat)	dfbc1534ea	Merge branch 'debugfs_automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into trace/ftrace/tracefs Pull in Al Viro's changes to debugfs that implement the new primitive: debugfs_create_automount(), that creates a directory in debugfs that will safely mount another file system automatically when debugfs is mounted. This will let tracefs automount itself on top of debugfs/tracing directory.	2015-02-02 11:47:31 -05:00
Steven Rostedt (Red Hat)	7eeafbcab4	tracing: Separate out initializing top level dir from instances The top level trace array is treated a little different than the instances, as it has to deal with more of the general tracing. The tr->dir is the tracing directory, which is an immutable dentry, where as the tr->dir of instances are the dentry that was created, and can be destroyed later. These should have different functions accessing them. As only tracing_init_dentry() deals with the top level array, fold the code for it into that function, and remove the trace_init_dentry_tr() that was also used by the instances to get their directory dentry. Add a tracing_get_dentry() to just get the tracing dir entry for instances as well as the top level array. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-02-02 10:22:34 -05:00
Steven Rostedt (Red Hat)	c602894814	tracing: Make tracing_init_dentry_tr() static tracing_init_dentry_tr() is not used outside of trace.c, it should be static. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-02-02 10:22:23 -05:00
Borislav Petkov	69a1c994cc	tracing: Remove newline from trace_printk warning banner Remove the output-confusing newline below: [ 0.191328] ******************************************************** [ 0.191493] NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE [ 0.191586] ** ... Link: http://lkml.kernel.org/r/1422375440-31970-1-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> [ added an extra '\n' by itself, to keep what it was suppose to do ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-01-27 17:51:24 -05:00
Steven Rostedt (Red Hat)	14a5ae40f0	tracing: Use IS_ERR() check for return value of tracing_init_dentry() tracing_init_dentry() will soon return NULL as a valid pointer for the top level tracing directroy. NULL can not be used as an error value. Instead, switch to ERR_PTR() and check the return status with IS_ERR(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-01-22 11:19:49 -05:00
Steven Rostedt (Red Hat)	83829b74f5	tracing: Remove extra call to init_ftrace_syscalls() trace_init() calls init_ftrace_syscalls() and then calls trace_event_init() which also calls init_ftrace_syscalls(). It makes more sense to only call it from trace_event_init(). Calling it twice wastes memory, as it allocates the syscall events twice, and loses the first copy of it. Link: http://lkml.kernel.org/r/54AF53BD.5070303@huawei.com Link: http://lkml.kernel.org/r/20150115040505.930398632@goodmis.org Reported-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-01-15 09:41:11 -05:00
Steven Rostedt (Red Hat)	d716ff71dd	tracing: Remove taking of trace_types_lock in pipe files Taking the global mutex "trace_types_lock" in the trace_pipe files causes a bottle neck as most the pipe files can be read per cpu and there's no reason to serialize them. The current_trace variable was given a ref count and it can not change when the ref count is not zero. Opening the trace_pipe files will up the ref count (and decremented on close), so that the lock no longer needs to be taken when accessing the current_trace variable. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-22 23:37:46 -05:00
Steven Rostedt (Red Hat)	cf6ab6d914	tracing: Add ref count to tracer for when they are being read by pipe When one of the trace pipe files are being read (by either the trace_pipe or trace_pipe_raw), do not allow the current_trace to change. By adding a ref count that is incremented when the pipe files are opened, will prevent the current_trace from being changed. This will allow for the removal of the global trace_types_lock from reading the pipe buffers (which is currently a bottle neck). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-22 15:39:40 -05:00
Linus Torvalds	a7c180aa7e	As the merge window is still open, and this code was not as complex as I thought it might be. I'm pushing this in now. This will allow Thomas to debug his irq work for 3.20. This adds two new features: 1) Allow traceopoints to be enabled right after mm_init(). By passing in the trace_event= kernel command line parameter, tracepoints can be enabled at boot up. For debugging things like the initialization of interrupts, it is needed to have tracepoints enabled very early. People have asked about this before and this has been on my todo list. As it can be helpful for Thomas to debug his upcoming 3.20 IRQ work, I'm pushing this now. This way he can add tracepoints into the IRQ set up and have users enable them when things go wrong. 2) Have the tracepoints printed via printk() (the console) when they are triggered. If the irq code locks up or reboots the box, having the tracepoint output go into the kernel ring buffer is useless for debugging. But being able to add the tp_printk kernel command line option along with the trace_event= option will have these tracepoints printed as they occur, and that can be really useful for debugging early lock up or reboot problems. This code is not that intrusive and it passed all my tests. Thomas tried them out too and it works for his needs. Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUjv3kAAoJEEjnJuOKh9ldLNsIANAe5EmDCBw0WjR72n+G3qOH NC8calXfkjqHU0bv8Q3dRv20KH4MHOy6l4+EiV9/ovt71LOF3NEyUJ3HuShf9a8b sWcUhYbX3D1hViQe5sOzv9AWhBCFlKQGoNmQnydX9xa8ivRsBaTGJIGktWlHcwBE jF1i3fj3l3vRQSS8qZFXp3bzreunlGyPoSHcT6eWQeos+utj4sKwQWTLXTLQeM+6 oQtFKRx7E5yX04qO1qFczS8qIEC6JH2C2jIRYEKUGepaELlnGkb8O7jQV/RaLF4/ 6P8VhZFG9YLS7fn7vWu0SnAN+Zwz5LzgjXAZt0FhGtIhLc18Oj8ouHH1UORsdQM= =Z4Un -----END PGP SIGNATURE----- Merge tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "As the merge window is still open, and this code was not as complex as I thought it might be. I'm pushing this in now. This will allow Thomas to debug his irq work for 3.20. This adds two new features: 1) Allow traceopoints to be enabled right after mm_init(). By passing in the trace_event= kernel command line parameter, tracepoints can be enabled at boot up. For debugging things like the initialization of interrupts, it is needed to have tracepoints enabled very early. People have asked about this before and this has been on my todo list. As it can be helpful for Thomas to debug his upcoming 3.20 IRQ work, I'm pushing this now. This way he can add tracepoints into the IRQ set up and have users enable them when things go wrong. 2) Have the tracepoints printed via printk() (the console) when they are triggered. If the irq code locks up or reboots the box, having the tracepoint output go into the kernel ring buffer is useless for debugging. But being able to add the tp_printk kernel command line option along with the trace_event= option will have these tracepoints printed as they occur, and that can be really useful for debugging early lock up or reboot problems. This code is not that intrusive and it passed all my tests. Thomas tried them out too and it works for his needs. Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org" * tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add tp_printk cmdline to have tracepoints go to printk() tracing: Move enabling tracepoints to just after rcu_init()	2014-12-16 12:53:59 -08:00
Steven Rostedt (Red Hat)	0daa230296	tracing: Add tp_printk cmdline to have tracepoints go to printk() Add the kernel command line tp_printk option that will have tracepoints that are active sent to printk() as well as to the trace buffer. Passing "tp_printk" will activate this. To turn it off, the sysctl /proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note, this only works if the cmdline option is used. Echoing 1 into the sysctl file without the cmdline option will have no affect. Note, this is a dangerous option. Having high frequency tracepoints send their data to printk() can possibly cause a live lock. This is another reason why this is only active if the command line option is used. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-15 10:17:38 -05:00
Steven Rostedt (Red Hat)	5f893b2639	tracing: Move enabling tracepoints to just after rcu_init() Enabling tracepoints at boot up can be very useful. The tracepoint can be initialized right after RCU has been. There's no need to wait for the early_initcall() to be called. That's too late for some things that can use tracepoints for debugging. Move the logic to enable tracepoints out of the initcalls and into init/main.c to right after rcu_init(). This also allows trace_printk() to be used early too. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-15 10:16:50 -05:00
Linus Torvalds	a7cb7bb664	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree update from Jiri Kosina: "Usual stuff: documentation updates, printk() fixes, etc" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) intel_ips: fix a type in error message cpufreq: cpufreq-dt: Move newline to end of error message ps3rom: fix error return code treewide: fix typo in printk and Kconfig ARM: dts: bcm63138: change "interupts" to "interrupts" Replace mentions of "list_struct" to "list_head" kernel: trace: fix printk message scsi: mpt2sas: fix ioctl in comment zbud, zswap: change module author email clocksource: Fix 'clcoksource' typo in comment arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help gpio: msm-v1: make boolean argument more obvious usb: Fix typo in usb-serial-simple.c PCI: Fix comment typo 'COMFIG_PM_OPS' powerpc: Fix comment typo 'CONIFG_8xx' powerpc: Fix comment typos 'CONFiG_ALTIVEC' clk: st: Spelling s/stucture/structure/ isci: Spelling s/stucture/structure/ usb: gadget: zero: Spelling s/infrastucture/infrastructure/ treewide: Fix company name in module descriptions ...	2014-12-12 10:08:06 -08:00
Linus Torvalds	350e4f4985	This code is a fork from the trace-3.19 pull as it needed the trace_seq clean ups from that branch. This code solves the issue of performing stack dumps from NMI context. The issue is that printk() is not safe from NMI context as if the NMI were to trigger when a printk() was being performed, the NMI could deadlock from the printk() internal locks. This has been seen in practice. With lots of review from Petr Mladek, this code went through several iterations, and we feel that it is now at a point of quality to be accepted into mainline. Here's what is contained in this patch set: o Creates a "seq_buf" generic buffer utility that allows a descriptor to be passed around where functions can write their own "printk()" formatted strings into it. The generic version was pulled out of the trace_seq() code that was made specifically for tracing. o The seq_buf code was change to model the seq_file code. I have a patch (not included for 3.19) that converts the seq_file.c code over to use seq_buf.c like the trace_seq.c code does. This was done to make sure that seq_buf.c is compatible with seq_file.c. I may try to get that patch in for 3.20. o The seq_buf.c file was moved to lib/ to remove it from being dependent on CONFIG_TRACING. o The printk() was updated to allow for a per_cpu "override" of the internal calls. That is, instead of writing to the console, a call to printk() may do something else. This made it easier to allow the NMI to change what printk() does in order to call dump_stack() without needing to update that code as well. o Finally, the dump_stack from all CPUs via NMI code was converted to use the seq_buf code. The caller to trigger the NMI code would wait till all the NMIs finished, and then it would print the seq_buf data to the console safely from a non NMI context. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUhbrnAAoJEEjnJuOKh9ldsCoIAJ3sKIJ5B3jxJJTCHPAx/lZD GVbV1J1mu4kTAZuhJZOAxW8D6PZGZMyEjg0y6ScDEnBGcjAZ9gTiWCdakPktf9EX GfaPPqwiL9dZ18J9Qc6uR+7M1Ffpzzwbcc6lJrpoTcjRgkoH9wCiLS9ozFQyYzWb /7m5UbUM/PIk9WAjLYXPW6UUVtPTPT0RdEQKofMGTeah+vgqj4TXCOROdlxsXXWF 77vqBvPd5TUPWFH9ftzJGDtZS8SroXVKCu3fZIqHgzAU0yqwVtH/JzDTy9u2UYhX GzDEPeAIdp6m6Uyc406VuIf1QW0gfBgmA0ir80vFoP27uFMM6j5HlF7azgQfx34= =YBgA -----END PGP SIGNATURE----- Merge tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull nmi-safe seq_buf printk update from Steven Rostedt: "This code is a fork from the trace-3.19 pull as it needed the trace_seq clean ups from that branch. This code solves the issue of performing stack dumps from NMI context. The issue is that printk() is not safe from NMI context as if the NMI were to trigger when a printk() was being performed, the NMI could deadlock from the printk() internal locks. This has been seen in practice. With lots of review from Petr Mladek, this code went through several iterations, and we feel that it is now at a point of quality to be accepted into mainline. Here's what is contained in this patch set: - Creates a "seq_buf" generic buffer utility that allows a descriptor to be passed around where functions can write their own "printk()" formatted strings into it. The generic version was pulled out of the trace_seq() code that was made specifically for tracing. - The seq_buf code was change to model the seq_file code. I have a patch (not included for 3.19) that converts the seq_file.c code over to use seq_buf.c like the trace_seq.c code does. This was done to make sure that seq_buf.c is compatible with seq_file.c. I may try to get that patch in for 3.20. - The seq_buf.c file was moved to lib/ to remove it from being dependent on CONFIG_TRACING. - The printk() was updated to allow for a per_cpu "override" of the internal calls. That is, instead of writing to the console, a call to printk() may do something else. This made it easier to allow the NMI to change what printk() does in order to call dump_stack() without needing to update that code as well. - Finally, the dump_stack from all CPUs via NMI code was converted to use the seq_buf code. The caller to trigger the NMI code would wait till all the NMIs finished, and then it would print the seq_buf data to the console safely from a non NMI context One added bonus is that this code also makes the NMI dump stack work on PREEMPT_RT kernels. As printk() includes sleeping locks on PREEMPT_RT, printk() only writes to console if the console does not use any rt_mutex converted spin locks. Which a lot do" * tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: x86/nmi: Fix use of unallocated cpumask_var_t printk/percpu: Define printk_func when printk is not defined x86/nmi: Perform a safe NMI stack trace on all CPUs printk: Add per_cpu printk func to allow printk to be diverted seq_buf: Move the seq_buf code to lib/ seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions tracing: Have seq_buf use full buffer seq_buf: Add seq_buf_can_fit() helper function tracing: Add paranoid size check in trace_printk_seq() tracing: Use trace_seq_used() and seq_buf_used() instead of len tracing: Clean up tracing_fill_pipe_page() seq_buf: Create seq_buf_used() to find out how much was written tracing: Add a seq_buf_clear() helper and clear len and readpos in init tracing: Convert seq_buf fields to be like seq_file fields tracing: Convert seq_buf_path() to be like seq_path() tracing: Create seq_buf layer in trace_seq	2014-12-10 20:35:41 -08:00
Linus Torvalds	1dd7dcb6ea	There was a lot of clean ups and minor fixes. One of those clean ups was to the trace_seq code. It also removed the return values to the trace_seq_() functions and use trace_seq_has_overflowed() to see if the buffer filled up or not. This is similar to work being done to the seq_file code as well in another tree. Some of the other goodies include: o Added some "!" (NOT) logic to the tracing filter. o Fixed the frame pointer logic to the x86_64 mcount trampolines o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems. That is, the ftrace trampoline can be dynamically allocated and be called directly by functions that only have a single hook to them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY= =Z7Cm -----END PGP SIGNATURE----- Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "There was a lot of clean ups and minor fixes. One of those clean ups was to the trace_seq code. It also removed the return values to the trace_seq_() functions and use trace_seq_has_overflowed() to see if the buffer filled up or not. This is similar to work being done to the seq_file code as well in another tree. Some of the other goodies include: - Added some "!" (NOT) logic to the tracing filter. - Fixed the frame pointer logic to the x86_64 mcount trampolines - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems. That is, the ftrace trampoline can be dynamically allocated and be called directly by functions that only have a single hook to them" * tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits) tracing: Truncated output is better than nothing tracing: Add additional marks to signal very large time deltas Documentation: describe trace_buf_size parameter more accurately tracing: Allow NOT to filter AND and OR clauses tracing: Add NOT to filtering logic ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter ftrace/x86: Get rid of ftrace_caller_setup ftrace/x86: Have save_mcount_regs macro also save stack frames if needed ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs ftrace/x86: Simplify save_mcount_regs on getting RIP ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file ftrace/x86: Have static tracing also use ftrace_caller_setup ftrace/x86: Have static function tracing always test for function graph kprobes: Add IPMODIFY flag to kprobe_ftrace_ops ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict kprobes/ftrace: Recover original IP if pre_handler doesn't change it tracing/trivial: Fix typos and make an int into a bool tracing: Deletion of an unnecessary check before iput() ...	2014-12-10 19:58:13 -08:00
Al Viro	ba00410b81	Merge branch 'iov_iter' into for-next	2014-12-08 20:39:29 -05:00
Dan Carpenter	3558a5ac50	tracing: Truncated output is better than nothing The initial reason for this patch is that I noticed that: if (len > TRACE_BUF_SIZE) is off by one. In this code, if len == TRACE_BUF_SIZE, then it means we have truncated the last character off the output string. If we truncate two or more characters then we exit without printing. After some discussion, we decided that printing truncated data is better than not printing at all so we should just use vscnprintf() and remove the test entirely. Also I have updated memcpy() to copy the NUL char instead of setting the NUL in a separate step. Link: http://lkml.kernel.org/r/20141127155752.GA21914@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-03 17:10:14 -05:00
Jiri Kosina	a02001086b	Merge Linus' tree to be be to apply submitted patches to newer code than current trivial.git base	2014-11-20 14:42:02 +01:00
Frans Klaver	eff264efee	kernel: trace: fix printk message s,produciton,production Signed-off-by: Frans Klaver <frans.klaver@xsens.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-11-20 14:29:19 +01:00
Steven Rostedt (Red Hat)	820b75f63d	tracing: Add paranoid size check in trace_printk_seq() To be really paranoid about writing out of bound data in trace_printk_seq(), add another check of len compared to size. Link: http://lkml.kernel.org/r/20141119144004.GB2332@dhcp128.suse.cz Suggested-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:16 -05:00
Steven Rostedt (Red Hat)	5ac4837841	tracing: Use trace_seq_used() and seq_buf_used() instead of len As the seq_buf->len will soon be +1 size when there's an overflow, we must use trace_seq_used() or seq_buf_used() methods to get the real length. This will prevent buffer overflow issues if just the len of the seq_buf descriptor is used to copy memory. Link: http://lkml.kernel.org/r/20141114121911.09ba3d38@gandalf.local.home Reported-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:15 -05:00
Steven Rostedt (Red Hat)	74f06bb723	tracing: Clean up tracing_fill_pipe_page() The function tracing_fill_pipe_page() logic is a little confusing with the use of count saving the seq.len and reusing it. Instead of subtracting a number that is calculated from the saved value of the seq.len from seq.len, just save the seq.len at the start and if we need to reset it, just assign it again. When the seq_buf overflow is len == size + 1, the current logic will break. Changing it to use a saved length for resetting back to the original value is more robust and will work when we change the way seq_buf sets the overflow. Link: http://lkml.kernel.org/r/20141118161546.GJ23958@pathway.suse.cz Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:14 -05:00
Steven Rostedt (Red Hat)	3a161d99c4	tracing: Create seq_buf layer in trace_seq Create a seq_buf layer that trace_seq sits on. The seq_buf will not be limited to page size. This will allow other usages of seq_buf instead of a hard set PAGE_SIZE one that trace_seq has. Link: http://lkml.kernel.org/r/20141104160221.864997179@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:09 -05:00
Steven Rostedt (Red Hat)	19a7fe2062	tracing: Add trace_seq_has_overflowed() and trace_handle_return() Adding a trace_seq_has_overflowed() which returns true if the trace_seq had too much written into it allows us to simplify the code. Instead of checking the return value of every call to trace_seq_printf() and friends, they can all be called normally, and at the end we can return !trace_seq_has_overflowed() instead. Several functions also return TRACE_TYPE_PARTIAL_LINE when the trace_seq overflowed and TRACE_TYPE_HANDLED otherwise. Another helper function was created called trace_handle_return() which takes a trace_seq and returns these enums. Using this helper function also simplifies the code. This change also makes it possible to remove the return values of trace_seq_printf() and friends. They should instead just be void functions. Link: http://lkml.kernel.org/r/20141114011410.365183157@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:39 -05:00
Rasmus Villemoes	d79ac28fde	tracing: Merge consecutive seq_puts calls Consecutive seq_puts calls with literal strings can be merged to a single call. This reduces the size of the generated code, and can also lead to slight .rodata reduction (because of fewer nul and padding bytes). It should also shave a off a few clock cycles. Link: http://lkml.kernel.org/r/1415479332-25944-3-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 21:33:34 -05:00
Rasmus Villemoes	fa6f0cc751	tracing: Replace seq_printf by simpler equivalents Using seq_printf to print a simple string or a single character is a lot more expensive than it needs to be, since seq_puts and seq_putc exist. These patches do seq_printf(m, s) -> seq_puts(m, s) seq_printf(m, "%s", s) -> seq_puts(m, s) seq_printf(m, "%c", c) -> seq_putc(m, c) Subsequent patches will simplify further. Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 21:32:19 -05:00
Luis Claudio R. Goncalves	933ff9f202	tracing: Fix traceoff_on_warning handling on boot command line According to the documentation, adding "traceoff_on_warning" to the boot command line should be enough to enable the feature. But right now it is necessary to specify "traceoff_on_warning=". Along with fixing that, also verify if the value passed, if any, is either "0" or "off". Link: http://lkml.kernel.org/r/20141112231400.GL12281@uudg.org Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 21:03:41 -05:00
Rabin Vincent	07906da788	tracing: Do not risk busy looping in buffer splice If the read loop in trace_buffers_splice_read() keeps failing due to memory allocation failures without reading even a single page then this function will keep busy looping. Remove the risk for that by exiting the function if memory allocation failures are seen. Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.in Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-10 16:47:31 -05:00
Rabin Vincent	e30f53aad2	tracing: Do not busy wait in buffer splice On a !PREEMPT kernel, attempting to use trace-cmd results in a soft lockup: # trace-cmd record -e raw_syscalls:* -F false NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61] ... Call Trace: [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90 [<ffffffff81092e25>] wait_on_pipe+0x35/0x40 [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0 [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0 [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290 [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff8112415d>] do_splice_to+0x6d/0x90 [<ffffffff81126971>] SyS_splice+0x7c1/0x800 [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8 The problem is this: tracing_buffers_splice_read() calls ring_buffer_wait() to wait for data in the ring buffers. The buffers are not empty so ring_buffer_wait() returns immediately. But tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1, meaning it only wants to read a full page. When the full page is not available, tracing_buffers_splice_read() tries to wait again with ring_buffer_wait(), which again returns immediately, and so on. Fix this by adding a "full" argument to ring_buffer_wait() which will make ring_buffer_wait() wait until the writer has left the reader's page, i.e. until full-page reads will succeed. Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.16+ Fixes: `b1169cc69b` ("tracing: Remove mock up poll wait function") Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-10 16:45:43 -05:00
Al Viro	946e51f2bf	move d_rcu from overlapping d_child to overlapping d_alias Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-03 15:20:29 -05:00
Linus Torvalds	e7fda6c4c3	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...	2014-08-05 17:46:42 -07:00
Linus Torvalds	b8c0aa46b3	This pull request has a lot of work done. The main thing is the changes to the ftrace function callback infrastructure. It's introducing a way to allow different functions to call directly different trampolines instead of all calling the same "mcount" one. The only user of this for now is the function graph tracer, which always had a different trampoline, but the function tracer trampoline was called and did basically nothing, and then the function graph tracer trampoline was called. The difference now, is that the function graph tracer trampoline can be called directly if a function is only being traced by the function graph trampoline. If function tracing is also happening on the same function, the old way is still done. The accounting for this takes up more memory when function graph tracing is activated, as it needs to keep track of which functions it uses. I have a new way that wont take as much memory, but it's not ready yet for this merge window, and will have to wait for the next one. Another big change was the removal of the ftrace_start/stop() calls that were used by the suspend/resume code that stopped function tracing when entering into suspend and resume paths. The stop of ftrace was done because there was some function that would crash the system if one called smp_processor_id()! The stop/start was a big hammer to solve the issue at the time, which was when ftrace was first introduced into Linux. Now ftrace has better infrastructure to debug such issues, and I found the problem function and labeled it with "notrace" and function tracing can now safely be activated all the way down into the guts of suspend and resume. Other changes include clean ups of uprobe code. Clean up of the trace_seq() code. And other various small fixes and clean ups to ftrace and tracing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJT35zXAAoJEKQekfcNnQGuOz0H/38zqM0nLFhrgvz3EPk2UOjn xqpX8qyb2V7TJZL+IqeXU2a5cQZl5ba0D4WtBGpxbTae3CJYiuQ87iKUNFoH0om5 FDpn80igb368k8V3qRdRsziKVCCf0XBd/NkHJXc0ZkfXGyzB2Ga4bBxALxp2gj9y bnO+vKo6+tWYKG4hyQb4P3LRXUrK8/LWEsPr39cH2QH1Rdj69Lx9CgrCdUVJmwcb Bj8hEiLXL/RYCFNn79A3wNTUvW0rG/AOIf4SLqXtasSRZ0ToaU0ZyDnrNv+0Ol47 rX8tSk+LfXchL9hpIvjCf1vlAYq3pO02favteR/jip3lx/dTjEDE4RJ9qtJzZ4Q= =fwQY -----END PGP SIGNATURE----- Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a lot of work done. The main thing is the changes to the ftrace function callback infrastructure. It's introducing a way to allow different functions to call directly different trampolines instead of all calling the same "mcount" one. The only user of this for now is the function graph tracer, which always had a different trampoline, but the function tracer trampoline was called and did basically nothing, and then the function graph tracer trampoline was called. The difference now, is that the function graph tracer trampoline can be called directly if a function is only being traced by the function graph trampoline. If function tracing is also happening on the same function, the old way is still done. The accounting for this takes up more memory when function graph tracing is activated, as it needs to keep track of which functions it uses. I have a new way that wont take as much memory, but it's not ready yet for this merge window, and will have to wait for the next one. Another big change was the removal of the ftrace_start/stop() calls that were used by the suspend/resume code that stopped function tracing when entering into suspend and resume paths. The stop of ftrace was done because there was some function that would crash the system if one called smp_processor_id()! The stop/start was a big hammer to solve the issue at the time, which was when ftrace was first introduced into Linux. Now ftrace has better infrastructure to debug such issues, and I found the problem function and labeled it with "notrace" and function tracing can now safely be activated all the way down into the guts of suspend and resume Other changes include clean ups of uprobe code, clean up of the trace_seq() code, and other various small fixes and clean ups to ftrace and tracing" * tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits) ftrace: Add warning if tramp hash does not match nr_trampolines ftrace: Fix trampoline hash update check on rec->flags ring-buffer: Use rb_page_size() instead of open coded head_page size ftrace: Rename ftrace_ops field from trampolines to nr_trampolines tracing: Convert local function_graph functions to static ftrace: Do not copy old hash when resetting tracing: let user specify tracing_thresh after selecting function_graph ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on() tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST s390/ftrace: remove check of obsolete variable function_trace_stop arm64, ftrace: Remove check of obsolete variable function_trace_stop Blackfin: ftrace: Remove check of obsolete variable function_trace_stop metag: ftrace: Remove check of obsolete variable function_trace_stop microblaze: ftrace: Remove check of obsolete variable function_trace_stop MIPS: ftrace: Remove check of obsolete variable function_trace_stop parisc: ftrace: Remove check of obsolete variable function_trace_stop sh: ftrace: Remove check of obsolete variable function_trace_stop sparc64,ftrace: Remove check of obsolete variable function_trace_stop tile: ftrace: Remove check of obsolete variable function_trace_stop ftrace: x86: Remove check of obsolete variable function_trace_stop ...	2014-08-04 11:50:00 -07:00
Thomas Gleixner	1b3e5c0936	ftrace: Provide trace clocks monotonic Expose the new NMI safe accessor to clock monotonic to the tracer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:55 -07:00
Tony Luck	58d4e21e50	tracing: Fix wraparound problems in "uptime" trace clock The "uptime" trace clock added in: commit `8aacf017b0` tracing: Add "uptime" trace clock that uses jiffies has wraparound problems when the system has been up more than 1 hour 11 minutes and 34 seconds. It converts jiffies to nanoseconds using: (u64)jiffies_to_usecs(jiffy) * 1000ULL but since jiffies_to_usecs() only returns a 32-bit value, it truncates at 2^32 microseconds. An additional problem on 32-bit systems is that the argument is "unsigned long", so fixing the return value only helps until 2^32 jiffies (49.7 days on a HZ=1000 system). Avoid these problems by using jiffies_64 as our basis, and not converting to nanoseconds (we do convert to clock_t because user facing API must not be dependent on internal kernel HZ values). Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com Cc: stable@vger.kernel.org # 3.10+ Fixes: `8aacf017b0` "tracing: Add "uptime" trace clock that uses jiffies" Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-21 09:56:12 -04:00
Stanislav Fomichev	6508fa761c	tracing: let user specify tracing_thresh after selecting function_graph Currently, tracing_thresh works only if we specify it before selecting function_graph tracer. If we do the opposite, tracing_thresh will change it's value, but it will not be applied. To fix it, we add update_thresh callback which is called whenever tracing_thresh is updated and for function_graph tracer we register handler which reinitializes tracer depending on tracing_thresh. Link: http://lkml.kernel.org/p/20140718111727.GA3206@stfomichev-desktop.yandex.net Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-18 15:48:52 -04:00
zhangwei(Jovi)	f0160a5a29	tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing, so add it, to be consistent with __trace_printk/__trace_bprintk. Those functions are all called by the same function: trace_printk(). Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-15 11:58:40 -04:00
zhangwei(Jovi)	8abfb8727f	tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs Currently trace option stacktrace is not applicable for trace_printk with constant string argument, the reason is in __trace_puts/__trace_bputs ftrace_trace_stack is missing. In contrast, when using trace_printk with non constant string argument(will call into __trace_printk/__trace_bprintk), then trace option stacktrace is workable, this inconstant result will confuses users a lot. Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-15 11:04:40 -04:00
Steven Rostedt (Red Hat)	ca65ef1ab6	Merge branch 'trace/ftrace/urgent' into trace/ftrace/core Needed `099ed15167` "tracing: Remove ftrace_stop/start() from reading the trace file" for the removal of ftrace_start/stop().	2014-07-09 11:02:34 -04:00
Steven Rostedt (Red Hat)	099ed15167	tracing: Remove ftrace_stop/start() from reading the trace file Disabling reading and writing to the trace file should not be able to disable all function tracing callbacks. There's other users today (like kprobes and perf). Reading a trace file should not stop those from happening. Cc: stable@vger.kernel.org # 3.0+ Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 12:45:54 -04:00
Namhyung Kim	d048a8c7b5	tracing: Add description of set_graph_notrace to tracing/README It was missing the description of set_graph_notrace file. Add it. Link: http://lkml.kernel.org/p/1402590233-22321-5-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:45 -04:00
Fabian Frederick	3f4d8f78a0	tracing: Remove unnecessary null test before debugfs_remove() This fixes checkpatch warning: "WARNING: debugfs_remove(NULL) is safe this check is probably not required" Link: http://lkml.kernel.org/p/1403802871-8599-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:38 -04:00
Steven Rostedt (Red Hat)	12306276fa	tracing: Move the trace_seq_* functions into its own trace_seq.c file The trace_seq_*() functions are a nice utility that allows users to manipulate buffers with printf() like formats. It has its own trace_seq.h header in include/linux and should be in its own file. Being tied with trace_output.c is rather awkward. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:35 -04:00
Steven Rostedt (Red Hat)	f0b70cc48c	tracing: Fix leak of per cpu max data in instances The freeing of an instance, if max data is configured, there will be per cpu data structures created. But these are not freed when the instance is deleted, which causes a memory leak. A new helper function is added that frees the individual buffers within a trace array, instead of duplicating the code. This way changes made for one are applied to the other (normal buffer vs max buffer). Link: http://lkml.kernel.org/r/87k38pbake.fsf@sejong.aot.lge.com Reported-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-10 12:06:30 -04:00
Namhyung Kim	a6af8fbf17	tracing: Cleanup saved_cmdlines_size changes The recent addition of saved_cmdlines_size file had some remaining (minor - mostly coding style) issues. Fix them by passing pointer name to sizeof() and using scnprintf(). Link: http://lkml.kernel.org/p/1402384295-23680-1-git-send-email-namhyung@kernel.org Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-10 09:51:10 -04:00
Steven Rostedt (Red Hat)	8b8b36834d	ring-buffer: Check if buffer exists before polling The per_cpu buffers are created one per possible CPU. But these do not mean that those CPUs are online, nor do they even exist. With the addition of the ring buffer polling, it assumes that the caller polls on an existing buffer. But this is not the case if the user reads trace_pipe from a CPU that does not exist, and this causes the kernel to crash. Simple fix is to check the cpu against buffer bitmask against to see if the buffer was allocated or not and return -ENODEV if it is not. More updates were done to pass the -ENODEV back up to userspace. Link: http://lkml.kernel.org/r/5393DB61.6060707@oracle.com Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-10 09:46:00 -04:00
Steven Rostedt (Red Hat)	a9fcaaac37	tracing: Fix memory leak on instance deletion When an instance is created, it also gets a snapshot ring buffer allocated (with minimum of pages). But when it is deleted the snapshot buffer is not. There was a helper function added to match the allocation of these ring buffers to a way to free them, but it wasn't used by the deletion of an instance. Using that helper function solves this memory leak. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-06 23:17:28 -04:00
Steven Rostedt (Red Hat)	23aaa3c18e	tracing: Fix leak of ring buffer data when new instances creation fails Yoshihiro Yunomae reported that the ring buffer data for a trace instance does not get properly cleaned up when it fails. He proposed a patch that manually cleaned the data up and addad a bunch of labels. The labels are not needed because all trace array is allocated with a kzalloc which initializes it to 0 and all kfree()s can take a NULL pointer and will ignore it. Adding a new helper function free_trace_buffers() that can also take null buffers to free the buffers that were allocated by allocate_trace_buffers(). Link: http://lkml.kernel.org/r/20140605223522.32311.31664.stgit@yunodevel Reported-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-06 04:53:40 -04:00
Yoshihiro YUNOMAE	939c7a4f04	tracing: Introduce saved_cmdlines_size file Introduce saved_cmdlines_size file for changing the number of saved pid-comms. saved_cmdlines currently stores 128 command names using SAVED_CMDLINES, but 'no-existing processes' names are often lost in saved_cmdlines when we read the trace data. So, by introducing saved_cmdlines_size file, we can now change the 128 command names saved to something much larger if needed. When we write a value to saved_cmdlines_size, the number of the value will be stored in pid-comm list: # echo 1024 > /sys/kernel/debug/tracing/saved_cmdlines_size Here, 1024 command names can be stored. The default number is 128 and the maximum number is PID_MAX_DEFAULT (=32768 if CONFIG_BASE_SMALL is not set). So, if we want to avoid losing any command names, we need to set 32768 to saved_cmdlines_size. We can read the maximum number of the list: # cat /sys/kernel/debug/tracing/saved_cmdlines_size 128 Link: http://lkml.kernel.org/p/20140605012427.22115.16173.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-05 12:35:49 -04:00
Yoshihiro YUNOMAE	198376cd88	tracing: Eliminate double free on failure of allocation on boot up If allocation of the max_buffer fails on boot up, the error path will free both per_cpu data structures from the buffers. With the new redesign of the code, those structures are freed if allocations failed. That is, the helper function that allocates the buffers will free the per cpu data on failure. No need to do it again. In fact, the second free will cause a bug as the code can not handle a double free. Link: http://lkml.kernel.org/p/20140603042803.27308.30956.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-03 19:58:31 -04:00
Steven Rostedt (Red Hat)	4c27e756bc	tracing: Move locking of trace_cmdline_lock into start/stop seq calls With the conversion of the saved_cmdlines output to use seq_read, there is now a race between accessing the values of the saved_cmdlines and the writing to them. The trace_cmdline_lock needs to be taken at the start and stop of the seq calls. A new __trace_find_cmdline() call is created to allow for the look up to happen without taking the lock. Fixes: `42584c81c5` tracing: Have saved_cmdlines use the seq_read infrastructure Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-30 13:03:40 -04:00
Steven Rostedt (Red Hat)	379cfdac37	tracing: Try again for saved cmdline if failed due to locking In order to prevent the saved cmdline cache from being filled when tracing is not active, the comms are only recorded after a trace event is recorded. The problem is, a comm can fail to be recorded if the trace_cmdline_lock is held. That lock is taken via a trylock to allow it to happen from any context (including NMI). If the lock fails to be taken, the comm is skipped. No big deal, as we will try again later. But! Because of the code that was added to only record after an event, we may not try again later as the recording is made as a oneshot per event per CPU. Only disable the recording of the comm if the comm is actually recorded. Fixes: `7ffbd48d5c` "tracing: Cache comms only after an event occurred" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-30 09:42:39 -04:00
Yoshihiro YUNOMAE	42584c81c5	tracing: Have saved_cmdlines use the seq_read infrastructure Current tracing_saved_cmdlines_read() implementation is naive; It allocates a large buffer, constructs output data to that buffer for each read operation, and then copies a portion of the buffer to the user space buffer. This has several issues such as slow memory allocation, high CPU usage, and even corruption of the output data. The seq_read infrastructure is made to handle this type of work. By converting it to use seq_read() the code becomes smaller, simplified, as well as correct. Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevel Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-29 23:08:07 -04:00
Steven Rostedt	2184db46e4	tracing: Print nasty banner when trace_printk() is in use trace_printk() is used to debug fast paths within the kernel. Places that gets called in any context (interrupt or NMI) or thousands of times a second. Something you do not want to do with a printk(). In order to make it completely lockless as it needs a temporary buffer to handle some of the string formatting, a page is created per cpu for every context (four per cpu; normal, softirq, irq, NMI). Since trace_printk() should only be used for debugging purposes, there's no reason to waste memory on these buffers on a production system. That means, trace_printk() should never be used unless a developer is debugging their kernel. There's macro magic to allocate the buffers if trace_printk() is used anywhere in the kernel. To help enforce that trace_printk() isn't used outside of development, when it is used, a nasty banner is displayed on bootup (or when a module is loaded that uses trace_printk() and the kernel core does not). Here's the banner: ******************************************************** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE trace_printk() being used. Allocating extra memory. This means that this is a DEBUG kernel and it is unsafe for produciton use. If you see this message and you are not debugging the kernel, report this immediately to your vendor! NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ******************************************************** That should hopefully keep developers from trying to sneak in a trace_printk() or two. Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-29 20:13:59 -04:00
Christoph Lameter	bdffd893a0	tracing: Replace __get_cpu_var uses with this_cpu_ptr Replace uses of &__get_cpu_var for address calculation with this_cpu_ptr. Link: http://lkml.kernel.org/p/alpine.DEB.2.10.1404291415560.18364@gentwo.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-05 22:40:53 -04:00
Steven Rostedt (Red Hat)	b1169cc69b	tracing: Remove mock up poll wait function Now that the ring buffer has a built in way to wake up readers when there's data, using irq_work such that it is safe to do it in any context. But it was still using the old "poor man's" wait polling that checks every 1/10 of a second to see if it should wake up a waiter. This makes the latency for a wake up excruciatingly long. No need to do that anymore. Completely remove the different wait_poll types from the tracers and have them all use the default one now. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-04-30 08:40:05 -04:00
Steven Rostedt (Red Hat)	f487426104	tracing: Break out of tracing_wait_pipe() before wait_pipe() is called When reading from trace_pipe, if tracing is off but nothing was read it should block. If something is read and tracing is off, then EOF is returned. If tracing is on and there's nothing to read, it will block. But because the check of whether tracing is off and something was read is done after the block on the pipe, it is hit or miss if the EOF is returned or not leading to inconsistent behavior. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-04-29 16:07:28 -04:00
Fabian Frederick	ad1438a076	tracing: Add static to local functions This patch adds static to the following functions: -cycle_t buffer_ftrace_now -void free_snapshot -int trace_selftest_startup_dynamic_tracing Link: http://lkml.kernel.org/p/20140417214442.d7abc7c0b0e4b90e7fedecc9@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-04-21 14:00:46 -04:00
Steven Rostedt (Red Hat)	0b9b12c1b8	tracing: Move ftrace_max_lock into trace_array In preparation for having tracers enabled in instances, the max_lock should be unique as updating the max for one tracer is a separate operation than updating it for another tracer using a different max. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-04-21 13:59:27 -04:00
Steven Rostedt (Red Hat)	6d9b3fa5e7	tracing: Move tracing_max_latency into trace_array In preparation for letting the latency tracers be used by instances, remove the global tracing_max_latency variable and add a max_latency field to the trace_array that the latency tracers will now use. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-04-21 13:59:26 -04:00
Steven Rostedt (Red Hat)	4104d326b6	ftrace: Remove global function list and call function directly Instead of having a list of global functions that are called, as only one global function is allow to be enabled at a time, there's no reason to have a list. Instead, simply have all the users of the global ops, use the global ops directly, instead of registering their own ftrace_ops. Just switch what function is used before enabling the function tracer. This removes a lot of code as well as the complexity involved with it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-04-21 13:59:25 -04:00
Linus Torvalds	5166701b36	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it does shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...	2014-04-12 14:49:50 -07:00
Al Viro	a786c06d9f	missing bits of "splice: fix racy pipe->buffers uses" that commit has fixed only the parts of that mess in fs/splice.c itself; there had been more in several other ->splice_read() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-04-12 07:04:19 -04:00
Steven Rostedt (Red Hat)	17a280ea81	tracing: Add missing function triggers dump and cpudump to README The debugfs tracing README file lists all the function triggers except for dump and cpudump. These should be added too. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-04-10 22:43:37 -04:00
Linus Torvalds	68114e5eb8	Most of the changes were largely clean ups, and some documentation. But there were a few features that were added. Uprobes now work with event triggers and multi buffers. Uprobes have support under ftrace and perf. The big feature is that the function tracer can now be used within the multi buffer instances. That is, you can now trace some functions in one buffer, others in another buffer, all functions in a third buffer and so on. They are basically agnostic from each other. This only works for the function tracer and not for the function graph trace, although you can have the function graph tracer running in the top level buffer (or any tracer for that matter) and have different function tracing going on in the sub buffers. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTOthtAAoJEKQekfcNnQGu5c8H/Ana/U+0tmksp1dbHkRHsKSH +Fsv4Jeu8gf1NaFKHEhkUTcFtnzE6qAPV2VCrcJwXbhAhhwZm+LjrnWdoy3215S3 cQW4LftLEonh2cM36Cos74TulMEYN6XmL6dQZV+CILKQkDrWU4qJjQ64okXEkqrd 9iG3p/mSXyvJcmnyg61ALnMOhZDLsXY3djBhWBPhiTPGS6BRb9zh4Pmw6Zv0n2rJ U93Gt/3AQrv1ybu73dUxqP0abp60oXOiWoF/R2jcbKqIM+K9RPJX79unCV3jq3u9 f+6jMlB9PgAMqQj6ihJdwxKDDuzwyrVdEPnsgvl4jarCBCtVVwhKedBaKN/KS8k= =HdXY -----END PGP SIGNATURE----- Merge tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Most of the changes were largely clean ups, and some documentation. But there were a few features that were added: Uprobes now work with event triggers and multi buffers and have support under ftrace and perf. The big feature is that the function tracer can now be used within the multi buffer instances. That is, you can now trace some functions in one buffer, others in another buffer, all functions in a third buffer and so on. They are basically agnostic from each other. This only works for the function tracer and not for the function graph trace, although you can have the function graph tracer running in the top level buffer (or any tracer for that matter) and have different function tracing going on in the sub buffers" * tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits) tracing: Add BUG_ON when stack end location is over written tracepoint: Remove unused API functions Revert "tracing: Move event storage for array from macro to standalone function" ftrace: Constify ftrace_text_reserved tracepoints: API doc update to tracepoint_probe_register() return value tracepoints: API doc update to data argument ftrace: Fix compilation warning about control_ops_free ftrace/x86: BUG when ftrace recovery fails ftrace: Warn on error when modifying ftrace function ftrace: Remove freelist from struct dyn_ftrace ftrace: Do not pass data to ftrace_dyn_arch_init ftrace: Pass retval through return in ftrace_dyn_arch_init() ftrace: Inline the code from ftrace_dyn_table_alloc() ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt tracing: Evaluate len expression only once in __dynamic_array macro tracing: Correctly expand len expressions from __dynamic_array macro tracing/module: Replace include of tracepoint.h with jump_label.h in module.h tracing: Fix event header migrate.h to include tracepoint.h tracing: Fix event header writeback.h to include tracepoint.h tracing: Warn if a tracepoint is not set via debugfs ...	2014-04-03 10:26:31 -07:00
Al Viro	fbb32750a6	pipe: kill ->map() and ->unmap() all pipe_buffer_operations have the same instances of those... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-04-01 23:19:19 -04:00
Steven Rostedt (Red Hat)	2c4a33aba5	tracing: Fix traceon trigger condition to actually turn tracing on While working on my tutorial for 2014 Linux Collaboration Summit I found that the traceon trigger did not work when conditions were used. The other triggers worked fine though. Looking into it, it is because of the way the triggers use the ring buffer to store the fields it will use for the condition. But if tracing is off, nothing is stored in the buffer, and the tracepoint exits before calling the trigger to test the condition. This is fine for all the triggers that only work when tracing is on, but for traceon trigger that is to work when tracing is off, nothing happens. The fix is simple, just use a temp ring buffer to record the event if tracing is off and the event has a trace event conditional trigger enabled. The rest of the tracepoint code will work just fine, but the tracepoint wont be recorded in the other buffers. Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-03-25 23:39:41 -04:00
Steven Rostedt	e1e232ca6b	tracing: Add trace_clock=<clock> kernel parameter Being able to change the trace clock at boot can be advantageous if you need a better source of when things happen across CPUs. The default trace clock is the fastest, but it uses local clocks which may not be synced across CPUs and it does not let you know when events took place with respect to events on other CPUs. The global trace clock can help in this case, and if you do not care about timings, the counter "clock" is the best, as that is just a simple atomic counter that is incremented for every event. Usage is to add "trace_clock=counter" on the kernel command line. You can replace counter with "global" or any of the clocks listed in /sys/kernel/debug/tracing/trace_clock Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Appreciated-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-02-20 12:32:54 -05:00
Steven Rostedt (Red Hat)	591dffdade	ftrace: Allow for function tracing instance to filter functions Create a "set_ftrace_filter" and "set_ftrace_notrace" files in the instance directories to let users filter of functions to trace for the given instance. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-02-20 12:29:07 -05:00
Steven Rostedt (Red Hat)	50512ab576	tracing: Convert tracer->enabled to counter As tracers will soon be used by instances, the tracer enabled field needs to be converted to a counter instead of a boolean. This counter is protected by the trace_types_lock mutex. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-02-20 12:13:17 -05:00
Steven Rostedt (Red Hat)	6b450d2533	tracing: Disable tracers before deletion of instance When an instance is about to be deleted, make sure the tracer is set to nop. If it isn't reset the tracer and set it to the nop tracer, otherwise memory leaks and bad pointers may result. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-02-20 12:13:16 -05:00
Steven Rostedt (Red Hat)	f1b21c9a40	tracing: Only let top level have option files Currently, only the top level instance can have tracing options. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-02-20 12:13:11 -05:00
Steven Rostedt (Red Hat)	607e2ea167	tracing: Set up infrastructure to allow tracers for instances Currently the tracers (function, function_graph, irqsoff, etc) can only be used by the top level tracing directory (not for instances). This sets up the infrastructure to allow instances to be able to run a separate tracer apart from the what the top level tracing is doing. As tracers need to adapt for being used by instances, the tracers must flag if they can be used by instances or not. Currently only the 'nop' tracer can be used by all instances. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-02-20 12:13:10 -05:00
Steven Rostedt (Red Hat)	bf6065b5c7	tracing: Pass trace_array to flag_changed callback As options (flags) may affect instances instead of being global the flag_changed() callbacks need to receive the trace_array descriptor of the instance they will be modifying. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-02-20 12:13:08 -05:00
Steven Rostedt (Red Hat)	8c1a49aedb	tracing: Pass trace_array to set_flag callback As options (flags) may affect instances instead of being global the set_flag() callbacks need to receive the trace_array descriptor of the instance they will be modifying. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-02-20 12:13:07 -05:00
Steven Rostedt (Red Hat)	3132e107d6	tracing: Check if tracing is enabled in trace_puts() If trace_puts() is used very early in boot up, it can crash the machine if it is called before the ring buffer is allocated. If a trace_printk() is used with no arguments, then it will be converted into a trace_puts() and suffer the same fate. Cc: stable@vger.kernel.org # 3.10+ Fixes: `09ae72348e` "tracing: Add trace_puts() for even faster trace_printk() tracing" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-23 12:27:59 -05:00
Steven Rostedt (Red Hat)	71485c4589	tracing: Fix formatting of trace README file Fix the formatting of the README file in the trace debugfs to fit in an 80 character window. Also add a comment about the event trigger counter with regards to traceon and traceoff. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-23 00:10:04 -05:00
Tom Zanussi	26f255646e	tracing/README: Add event file usage to tracing mini-HOWTO It would be useful to have a cheat-sheet for everything under tracing/events/ alongside the existing text describing the other files in the tracing/ dir. Add short descriptions of the directories and files under events/ along with examples, similar to the existing text for the other files in tracing/. Also clean up a few minor alignment problems noticed when adding the new text. Link: http://lkml.kernel.org/r/1389993104.3040.445.camel@empanada Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-22 23:06:57 -05:00
Al Viro	92fdd98cf8	tracing: Fix buggered tee(2) on tracing_pipe In kernel/trace/trace.c we have this: static void tracing_pipe_buf_release(struct pipe_inode_info pipe, struct pipe_buffer buf) { __free_page(buf->page); } static const struct pipe_buf_operations tracing_pipe_buf_ops = { .can_merge = 0, .map = generic_pipe_buf_map, .unmap = generic_pipe_buf_unmap, .confirm = generic_pipe_buf_confirm, .release = tracing_pipe_buf_release, .steal = generic_pipe_buf_steal, .get = generic_pipe_buf_get, }; with void generic_pipe_buf_get(struct pipe_inode_info pipe, struct pipe_buffer buf) { page_cache_get(buf->page); } and I don't see anything that would've prevented tee(2) called on the pipe that got stuff spliced into it from that sucker. ->ops->get() will be called, then buf gets copied into target pipe's ->bufs[] and eventually readers get to both copies of the buffer. With get_page(page) look at that page __free_page(page) look at that page __free_page(page) which is not a good thing, to put it mildly. AFAICS, that ought to use the normal generic_pipe_buf_release() (aka page_cache_release(buf->page)), shouldn't it? [ SDR - As trace_pipe just allocates the page with alloc_page(GFP_KERNEL), and doesn't do anything special with it (no LRU logic). The __free_page() should be fine, as it wont actually free a page with reference count. Maybe there's a chance to leak memory? Anyway, This change is at a minimum good for being symmetric with generic_pipe_buf_get, it is fine to add. ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ SDR - Removed no longer used tracing_pipe_buf_release ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-19 16:53:13 -05:00
Steven Rostedt (Red Hat)	dced341b2d	tracing: Have trace buffer point back to trace_array The trace buffer has a descriptor pointer that goes back to the trace array. But it was never assigned. Luckily, nothing uses it (yet), but it will in the future. Although nothing currently uses this, if any of the new features get backported to older kernels, and because this is such a simple change, I'm marking it for stable too. Cc: stable@vger.kernel.org # v3.10+ Fixes: `12883efb67` "tracing: Consolidate max_tr into main trace_array structure" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-14 10:19:46 -05:00

... 2 3 4 5 6 ...

1035 commits