Since kernel 5.10 tracepoints don't use JUMP_LABEL engine for .text kernel
modification.
Linux kernel introduced 'static_call' as a replacement for global function
pointers. It uses code patching to allow direct calls to be used instead of
indirect calls. Related Linux kernel commits:
e6d6c071f21e7e478838 (diff-d7873f00dcd8c46df3e1e57b3225ff91036c83d5d7339d410b468418fc9a32a4)
Currently, only x86(-64) architecture has implementation for static calls.
This commit should address #69
Latest SELinux changes:
59bed0a813
introduced two SELinux problems on kernels < 4.17. First, LKRG won't compile
on such kernels due to function name mismatch. However, even if this would be
fixed there is another issue in the same function. Instead of overwriting the
value of SELinux state itself, the code was overwriting pointer. The second bug
could never be triggered because of the first bug failing LKRG compilation for
such kernels.
This commit fixes both problems and addresses #60
New Linux kernels may be built with the CONFIG_GCC_PLUGIN_RANDSTRUCT
option. This randomly changes the order of fields in certain structures,
including selinux_state. Currently, LKRG isn't capable to recreate the
structure layout. Thus, we have to disable LKRG's SELinux monitoring on
kernels built with this option.
CONFIG_GCC_PLUGIN_RANDSTRUCT was introduced to make it harder for attackers
to overwrite particular fields of structures. LKRG's goal was the same.
So even disabling LKRG's monitoring, we still have some mitigations for
SELinux state overwrites.
We might make LKRG capable to recreate randomized structures in the future.
Starting with Linux 5.6 a new kernel configuration option was
introduced: CONFIG_SECURITY_SELINUX_DISABLE.
Reflect in LKRG selinux_struct that the SELinux "disabled"
field is optional for these newer kernels.
Since Linux kernel 5.7 SYSCALL_WRAPPER's magic was backported to x86 (32
bits) as well. This commit bring the support for it.
However, regardless of the SYSCALL_WRAPPER's magic LKRG was broken on IA-32
and this commit "resurrects" such support. It also addresses #49 and #46
We do not want to support RT kernels (at least not for now). RT kernels are
commonly used in medical and similar devices, where reliability is crucial.
It is safer to to not support RT kernels in LKRG for now.
For more information please read entire discussion at #40.
Regardless of the fix for #47 (156d2bab39),
LOCKDEP might still report warning messages. This commit silnce them on
non-debug build. If P_LKRG_DEBUG_BUILD is enabled, such information will
be still available.
If kernel is compiled with CONFIG_OPTPROBES we must synchronize with kprobe
optimizer during the process of creating database. LKRG places many
kretprobes which modifies .text section. In the standard scenario after
placing the kprobes, LKRG can safely calculate the hash of all .text
sections. However, if CONFIG_OPTPROBES is enabled, placed kprobes could be
optimized. Optimization modifies .text seciton by converting kprobes into
FTRACE which using different hooking mechanism. If LKRG is in the process
of building the hash database and optimizer is running parallel, at least
we could have FP and at most deadlock.
This fix addresses described issue and reported bug #47
Since kernel 5.8+ 'native_write_cr4' must be manually resolved. However, this is X86 specific code which should nbot be executed on other platforms. This commit fixes that and addresses #48
Some custom compilation of the kernel might aggresively inline critical
functions (from LKRG perspective). That's problematic for the project.
However, some of the problems *might* be solved by uncommenting this new
definition (P_KERNEL_AGGRESSIVE_INLINING). Unfortunately, not all of the
problems can be solved by it (at least no for now). You need to experiment.
This can be useful to address issues like #40
security_bprm_committed_creds does not return any value (void). LKRG's old
logic for handling exec* family verified return code. This is an incorrect
behavior for the current design. Fix it.
Since kernel 5.8 function search_binary_handler is not exported anymore.
On the aggressively optimized kernels it is possible that
`search_binary_handler` can be inlined. However, GCC can splits the
function to put the big part in its own function, which receives as a name
the original function name plus .part + .<some number>, and inlines the
rest in other functions.
This is a very problematic behavior from the LKRG point of view and was
reported as #41 and #45. This commit fixes the problem by replacing the
'search_binary_handler' (or 'do_execveat_common') hook with
security_bprm_committing_creds and security_bprm_committed_creds.
Additionally, this change is desired from the security point of view.
On the aggressively optimized kernels it is possible that kprobe optimizer
won't be fast enough to do the job before LKRG creates own database. This
is problematic because LKRG might snapshot hash of the kernel's .text
section with non-optimized own hooks. As soon as the kprobe optimizer
finishes the job, previously snapshoted hash won't be correct and LKRG will
detect this inconsistency.
To be able to correctly solve this unusual corner case problem, LKRG can
wait for kprobe optimizer before creating database.
We switched to using late_initcall_sync() in order to have LKRG initialize
sufficiently late when it's linked into the kernel. That change was a
no-op when building/loading LKRG as a module on recent kernels, because
their module.h defines late_initcall_sync() as an alias for module_init().
However, it broke LKRG on some older kernels, where late_initcall_sync()
wasn't defined for modules at all.
This commit fixes that by explicitly using module_init() when building LKRG
as a module. This change is a no-op on recent kernels.
Fixes#37, updates ddc14c6544
There are unofficial versions of RANDKSTACK patches floating about
the web, including in VMWare's PhotonOS.
The randomized stack addresses conflict with LKRG's ADDR_LIMIT
checks a la:
```
[ 195.272462] [p_lkrg] <Exploit Detection> Detected ADDR_LIMIT
segment corruption! process[552 | sysctl] has different segment
address! [7ffffffff000 vs ffffffffffffffff]
```
Address this by ensuring that P_VERIFY_ADDR_LIMIT does not get
defined when CONFIG_PAX_RANDKSTACK is enabled.
This is a strange edge-case, and normally wouldn't be submitted as
a pull request to upstream projects, except that users seeking to
harden their kernels with public code are likely to run across
LKRG and some links to the PhotonOS patches or similar extracts
from Grsecurity's old patchsets. The commit is a no-op in 99% of
cases, but may result in one less bug report over the next decade.
Based on #40 it looks like some people compile the kernel with
CONFIG_FTRACE and CONFIG_FUNCTION_TRACER but don't enable
CONFIG_DYNAMIC_FTRACE. Let's try to check that in this commit.
Since kernel 5.11, on x86(-64) architecture TIF_SECCOMP flag is not used
anymore to track SECCOMP state per thread. This commit updates the code
accordingly
Current LKRG's architecture has small benefits from validation waking-up
tasks. However, it might have noticeable performance impact. After this
commit, 'pint_validate' option 2 has the same meaning as option 1.
Since kernel 5.8 'search_binary_handler' function does not have EXPORT
attribute and LKRG can't place correct hooks. In such case use
'do_execveat_common' function instead.
Many exploits use culnerability to corrupt 'addr_limit' and achieve full R/W primitive in the kernel. This is a 'known' technique. We can't verify 'addr_limit' as part of normal verification process because kernel might legitimately modify it via call set_fs(KERNEL..). However, there are places where we can enforce such policy, e.g. during generic_permission() or capable() hook as well as at the syscall hook. I'm adding such verification on execve() syscall as well. Since kernel 5.10 on x86 platform set_fs/get_fs API is removed (and addr_limit variable) but it's not the case for ARM architecture. Moreover, many Android exploit relies on 'addr_limit' corruption. This beta-version of 'addr_limit' verification can be effective and important feature.
Docker/containers require 'overlay' or 'overlay2' module for supporting storage/scratchspaces. To be able to correctly support docker environment LKRG needs to hook 'ovl_create_or_link' function from these modules. However, it is possible that during LKRG initialization, 'overlay[2]' is not loaded yet because of the delayed docker/containers initialization. In such case we will produce FP. This commit changes the requirement of loading 'overlay[2]' module before loading LKRG. Now, we are dynamically add necessary hooks.
We want to be loaded after Linux kernel is done with the majority of the initialization work. Otherwise, some critical kernel attributes (from LKRG's perspective), like RO data, might be still dynamically modified without informing LKRG. This might produce FP.
p_kzfree() wraps kzfree() call for kernel < 5.10 and kfree_sensitive()
in the other case. This reflects the changes made in kernel since
23224e45004ed84c8466fd1e8e5860f541187029 and fix the build against
kernel 5.10.
P_LKRG_TASK_OFF_DEBUG introduces extra lines of code which was not taken into account for seccomp() and namespace API. This commit fixes it. Additionally, we are adding extra information in case of corruption (dump_stack()).
This is a relatively heavy feature. It introduces a possibility of having a 'ring-buffer' per each tracked task in the kernel. Such buffer keeps a history of important events (from LKRG perspective) related to this task
When we detect an invalid binary, we forbid it execution by changing the
subprocess execution path with slashes.
On some kernels, this may not work because we have to map the memory page as
writable. At this moment of execution, we can stop executing the process using
the froce_sig to send signals. The send_sig_info will not arrive at the right
time.
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
We don't need to introduce custom LKRG-counter lock to synchronize with JUMP_LABEL engine and avoid potential deadlock with FTRACE. We can check if jump_label lock is taken after acquiring ftrace lock and before taking text_mutex.
This simplification changes p_text_section_(un)lock API.
This also fixes problem reported by Jacek
1) We are hooking into FTRACE's internal functions to be able to monitor when new modifications are executed and react accordingly.
2) Linux kernel has bugs in FTRACE code. The LKRG may highlight them.
3) We are introducing 'p_state_init' variable to track when full LKRG's initialization is complete.
1) This is necessary for future FTRACE support. FTRACE is not fully synchronized with JUMP_LABEL (which I think is a buggy logic in the kernel). However, we can manually add such logic. The way how text_mutex is used by both subsystems makes it prone to deadlock if 3rd system wants to sync with both of them.
2) New lock efnorces changes in p_text_section_(un)lock API which we do in the same commit
3) Introduce new LKRG's counter lock API - trylock
4) Add a few minor changes:
- notrace attribute (probably, we need to add such attributes to majority of our functions)
- add information about module name in case of KMOD notifier activity
With the current design of JUMP_LABEL support we do not need to manually take this mutex. Our hooks are deep enough to be protected and integrity routine depends on text mutext
Introduce new type of SELinux lock type - p_lkrg_selinux_lock. Verification routine can take this lock only when atomic counter is zero. This means there are no other consumers of SELinux variables
If we want to track all changes in kernel core .text section and when modification happens we want to know which exactly bytes were modified, which offset and what's the symbol corelated to it, we can compile LKRG with this feature. Disabled by default. It's mostly useful for debugging.
It's OK to remove such lock, since LKRG .text section validation is syncing with the kernel through JL mutex and .text mutex. There is one corner case where these mutexes are not taken by the kernel when new kernel module might be compiled without long nops. In such case, Linux kernel might modify such .text and 'inject' long-nops where is needed, however, it is done when new module has UNFORMED state. UNFORMED modules can't be verified yet so we are fine.
Due to kernel commit f3ac60671954c ("sched/headers: Move task-stack
related APIs from <linux/sched.h> to <linux/sched/task_stack.h>") (Linux
v4.11) `linux/sched/task_stack.h' should be included to access
`task_stack_page'.
Compilation failure is appearing on armv8l arch:
In file included from ./include/linux/prefetch.h:15,
from ./arch/arm/include/asm/atomic.h:12,
from ./include/linux/atomic.h:7,
from ./include/asm-generic/bitops/lock.h:5,
from ./arch/arm/include/asm/bitops.h:243,
from ./include/linux/bitops.h:26,
from ./include/linux/kernel.h:12,
from /usr/src/RPM/BUILD/lkrg-0.8.1/src/modules/exploit_detection/../../p_lkrg_main.h:23,
from /usr/src/RPM/BUILD/lkrg-0.8.1/src/modules/exploit_detection/p_exploit_detection.c:18:
/usr/src/RPM/BUILD/lkrg-0.8.1/src/modules/exploit_detection/p_exploit_detection.c: In function 'p_iterate_processes':
./arch/arm/include/asm/processor.h:99:40: error: implicit declaration of function 'task_stack_page'; did you mean 'walk_stackframe'? [-Werror=implicit-function-declaration]
99 | ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
| ^~~~~~~~~~~~~~~
/usr/src/RPM/BUILD/lkrg-0.8.1/src/modules/exploit_detection/p_exploit_detection.c:779:30: note: in expansion of macro 'task_pt_regs'
779 | p_regs_set_ip(task_pt_regs(p_tmp), -1);
| ^~~~~~~~~~~~
cc1: some warnings being treated as errors
make[1]: *** [scripts/Makefile.build:265: /usr/src/RPM/BUILD/lkrg-0.8.1/src/modules/exploit_detection/p_exploit_detection.o] Error 1
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
* Various spelling corrections by codespell 1.17.1
* Various grammar corrections
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Co-authored-by: Solar Designer <solar@openwall.com>
This fixes LKRG build on Linux 5.8+, which renamed that header file. Thanks to
Andy Lavr for reporting this problem and suggesting a (different) fix, which
made us revisit our use of that header file.
We only need that header file on older kernels (< 4.4.72 or < RHEL 7.4) for the
one use of md5_transform() in get_random_long(). On newer kernels, we simply
use the kernel-provided get_random_long(). Further, 5.8's crypto/sha.h doesn't
declare md5_transform() anyway (linux/cryptohash.h on much older kernels did).
- Not all hooks are fatal. If for any reason non-fatal hook can't be placed, continue initialization and print appropriate message
- If hook is fatal, stop intialization
[2] Add support for ISRA optimized functions:
- Some of the functions might be optimized by ISRA. However, some of the hooks can still be functional even under ISRA optimized functions.
1) Introduce 'smap_validate' to control if SMAP validation will be performed
0 - disable SMAP validation
1 - enable SMAP validation
6) Introduce 'smap_enforce' to control how LKRG reacts when SMAP validation fails:
0 - log & accept
1 - log & restore
2 - panic() - kill the kernel
1) Introduce 'kint_validate' to control kernel/system integrity logic:
0 - disabled
1 - validation is performed only when manually triggered
3 - validation is performed periodically by timer interrupt and on random events
2) Introduce 'kint_enforce' to control how LKRG reacts when kernel/system integrity fails:
0 - log & accept corruption
1 - log only (for SELinux and CR0.WP violation log & restore original values)
2 - panic() - kill the kernel
3) Introduce 'pint_validate' to control tasks validation logic:
0 - disabled
1 - validate only currently running tasks
2 - validate only currently running tasks + task which changes state to RUNNING
3 - validate all tasks in the system (paranoid mode)
4) Introduce 'pint_enforce' to control how LKRG reacts when task validation fails:
0 - log & accept corruption
1 - kill corrupted task
2 - panic() - kill the kernel
5) Introduce 'smep_validate' to control if SMEP validation will be performed
0 - disable SMEP validation
1 - enable SMEP validation
6) Introduce 'smep_enforce' to control how LKRG reacts when SMEP validation fails:
0 - log & accept
1 - log & restore
2 - panic() - kill the kernel
7) Introduce 'umh_validate' to control if UMH validation will be performed
0 - disable UMH validation
1 - allow only whitelited binaries to execute via UMH
2 - completely block UMH
8) Introduce 'smep_enforce' to control how LKRG reacts when UMH validation fails:
0 - log only
1 - prevent execution
2 - panic() - kill the kernel
9) Introduce 'pcfi_validate' to control if pCFI validation will be performed
0 - disabled
1 - no stackwalk (weak pCFI)
2 - fully enabled
10) Introduce 'pcfi_enforce' to control how LKRG reacts when pCFI validation fails:
0 - log only
1 - kill corrupted task
2 - panic() - kill the kernel
11) Rename 'timestamp' to 'interval'
12) Rename 'force_run' to 'trigger'
13) Rename 'clean_message' to 'heartbeat'
14) Rename 'msr_enforce' to 'msr_validate'
15) Option 'hide' stays the same
16) Option 'log_level' stays the same
17) Option 'block_modules' stays the same