Minimal supplement to upstream Kernel Self Protection Project changes. Features already provided by SELinux + Yama and archs other than multiarch arm64 / x86_64 aren't in scope. Only tags have stable history. Shared IRC channel with KSPP: irc.libera.chat #linux-hardening
Find a file
Gavin Shan e02dc017c3 mm/page_alloc: fix nodes for reclaim in fast path
When @node_reclaim_node isn't 0, the page allocator tries to reclaim
pages if the amount of free memory in the zones are below the low
watermark.  On Power platform, none of NUMA nodes are scanned for page
reclaim because no nodes match the condition in zone_allows_reclaim().
On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
of Node-A to Node-A.  So the preferred node even won't be scanned for
page reclaim.

   __alloc_pages_nodemask()
   get_page_from_freelist()
      zone_allows_reclaim()

Anton proposed the test code as below:

   # cat alloc.c
      :
   int main(int argc, char *argv[])
   {
	void *p;
	unsigned long size;
	unsigned long start, end;

	start = time(NULL);
	size = strtoul(argv[1], NULL, 0);
	printf("To allocate %ldGB memory\n", size);

	size <<= 30;
	p = malloc(size);
	assert(p);
	memset(p, 0, size);

	end = time(NULL);
	printf("Used time: %ld seconds\n", end - start);
	sleep(3600);
	return 0;
   }

The system I use for testing has two NUMA nodes.  Both have 128GB
memory.  In below scnario, the page caches on node#0 should be reclaimed
when it encounters pressure to accommodate request of allocation.

   # echo 2 > /proc/sys/vm/zone_reclaim_mode; \
     sync; \
     echo 3 > /proc/sys/vm/drop_caches; \
   # taskset -c 0 cat file.32G > /dev/null; \
     grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33619712 kB
   # taskset -c 0 ./alloc 128
   # grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33619840 kB
   # grep MemFree /sys/devices/system/node/node0/meminfo
     Node 0 MemFree:          186816 kB

With the patch applied, the pagecache on node-0 is reclaimed when its
free memory is running out.  It's the expected behaviour.

   # echo 2 > /proc/sys/vm/zone_reclaim_mode; \
     sync; \
     echo 3 > /proc/sys/vm/drop_caches
   # taskset -c 0 cat file.32G > /dev/null; \
     grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33605568 kB
   # taskset -c 0 ./alloc 128
   # grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:        1379520 kB
   # grep MemFree /sys/devices/system/node/node0/meminfo
     Node 0 MemFree:           317120 kB

Fixes: 5f7a75acdb ("mm: page_alloc: do not cache reclaim distances")
Link: http://lkml.kernel.org/r/1486532455-29613-1-git-send-email-gwshan@linux.vnet.ibm.com
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>	[3.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:56 -08:00
arch powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write 2017-02-24 17:46:56 -08:00
block block: use for_each_thread() in sys_ioprio_set()/sys_ioprio_get() 2017-02-22 16:41:27 -08:00
certs certs: Add a secondary system keyring that can be added to dynamically 2016-04-11 22:48:09 +01:00
crypto crypto: xts - Add ECB dependency 2017-02-23 20:11:06 +08:00
Documentation zram: extend zero pages to same element pages 2017-02-24 17:46:56 -08:00
drivers zram: extend zero pages to same element pages 2017-02-24 17:46:56 -08:00
firmware WHENCE: use https://linuxtv.org for LinuxTV URLs 2015-12-04 10:35:11 -02:00
fs userfaultfd_copy: return -ENOSPC in case mm has gone 2017-02-24 17:46:55 -08:00
include mm/ksm: handle protnone saved writes when making page write protect 2017-02-24 17:46:56 -08:00
init Merge branch 'akpm' (patches from Andrew) 2017-02-22 19:29:24 -08:00
ipc userfaultfd: non-cooperative: add event for memory unmaps 2017-02-24 17:46:55 -08:00
kernel userfaultfd: non-cooperative: add event for exit() notification 2017-02-24 17:46:55 -08:00
lib Less anger inducing pull request for 4.11 2017-02-23 18:58:18 -08:00
mm mm/page_alloc: fix nodes for reclaim in fast path 2017-02-24 17:46:56 -08:00
net bpf: Fix bpf_xdp_event_output 2017-02-23 13:53:42 -05:00
samples Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
scripts Merge branch 'akpm' (patches from Andrew) 2017-02-22 19:29:24 -08:00
security mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
sound mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
tools userfaultfd: non-cooperative: selftest: enable REMOVE event test for shmem 2017-02-24 17:46:54 -08:00
usr kbuild: initramfs cleanup, set target from Kconfig 2017-01-05 09:40:16 -08:00
virt mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:48:52 -04:00
.mailmap mailmap: add codeaurora.org names for nameless email commits 2017-01-10 18:31:55 -08:00
COPYING
CREDITS MAINTAINERS: Remove old e-mail address 2017-02-13 12:24:56 -05:00
Kbuild scripts/gdb: provide linux constants 2016-05-23 17:04:14 -07:00
Kconfig
MAINTAINERS Less anger inducing pull request for 4.11 2017-02-23 18:58:18 -08:00
Makefile A slightly quieter cycle for documentation this time around. 2017-02-22 18:51:29 -08:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.