Instead of passing the name of the memory cgroup which the cache is
created for in the memcg_name_argument, let's obtain it immediately in
memcg_create_kmem_cache.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They are simple wrappers around memcg_{charge,uncharge}_kmem, so let's
zap them and call these functions directly.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the freeing page and its buddy page are not at the same zone, the
current holding zone->lock for the freeing page cann't prevent buddy page
getting allocated, this could trigger VM_BUG_ON_PAGE in page_is_buddy() at
a very tiny chance, such as:
cpu 0: cpu 1:
hold zone_1 lock
check page and it buddy
PageBuddy(buddy) is true hold zone_2 lock
page_order(buddy) == order is true alloc buddy
trigger VM_BUG_ON_PAGE(page_count(buddy) != 0)
zone_1->lock prevents the freeing page getting allocated
zone_2->lock prevents the buddy page getting allocated
they are not the same zone->lock.
If we can't remove the zone_id check statement, it's better handle this
rare race. This patch fixes this by placing the zone_id check before the
VM_BUG_ON_PAGE check.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit ed4d4902eb ("mm, hugetlb: remove hugetlb_zero and
hugetlb_infinity") replaced 'unsigned long hugetlb_zero' with 'int zero'
leading to out-of-bounds access in proc_doulongvec_minmax(). Use
'.extra1 = NULL' instead of '.extra1 = &zero'. Passing NULL is
equivalent to passing minimal value, which is 0 for unsigned types.
Fixes: ed4d4902eb ("mm, hugetlb: remove hugetlb_zero and hugetlb_infinity")
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Whether there is a vm_ops->page_mkwrite or not, the page dirtying is
pretty much the same. Make sure the page references are the same in both
cases, then merge the two branches.
It's tempting to go even further and page-lock the !page_mkwrite case, to
get it in line with everybody else setting the page table and thus further
simplify the model. But that's not quite compelling enough to justify
dropping the pte lock, then relocking and verifying the entry for
filesystems without ->page_mkwrite, which notably includes tmpfs. Leave
it for now and lock the page late in the !page_mkwrite case.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shared anonymous mmaps are implemented with shmem files, so all VMAs with
shared writable semantics also have an underlying backing file.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increases the number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also adjust __SWP_TYPE_SHIFT, effectively increase size of
possible swap file to 128G.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also adjust __SWP_TYPE_SHIFT and increase number of bits
availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One bit in ->vm_flags is unused now!
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After removing vma->shared.nonlinear we have only one member of
vma->shared union, which doesn't make much sense.
This patch drops the union and move struct vma->shared.linear to
vma->shared.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't create non-linear mappings anymore. Let's drop code which
handles them in rmap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have to handle non-linear mappings for /proc/PID/{smaps,clear_refs}
which is unused now. Let's drop it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't create non-linear mappings anymore. Let's drop code which
handles them on page fault.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have remap_file_pages(2) emulation in -mm tree for few release cycles
and we plan to have it mainline in v3.20. This patchset removes rest of
VM_NONLINEAR infrastructure.
Patches 1-8 take care about generic code. They are pretty
straight-forward and can be applied without other of patches.
Rest patches removes pte_file()-related stuff from architecture-specific
code. It usually frees up one bit in non-present pte. I've tried to reuse
that bit for swap offset, where I was able to figure out how to do that.
For obvious reason I cannot test all that arch-specific code and would
like to see acks from maintainers.
In total, remap_file_pages(2) required about 1.4K lines of not-so-trivial
kernel code. That's too much for functionality nobody uses.
Tested-by: Felipe Balbi <balbi@ti.com>
This patch (of 38):
We don't create non-linear mappings anymore. Let's drop code which
handles them on unmap/zap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.
Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.
Let's drop it and get rid of all these special-cased code.
The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.
I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on. I've checked Debian code search and source of all
packages in ALT Linux. No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.
There are few basic tests in LTP for the syscall. They work just fine
with emulation.
To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.
The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to 1100000 to avoid -ENOMEM.
Before: 23.3 ( +- 4.31% ) seconds
After: 43.9 ( +- 0.85% ) seconds
Slowdown: 1.88x
I believe we can live with that.
Test case:
#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#define MB (1024UL * 1024)
#define SIZE (4096 * MB)
int main(int argc, char **argv)
{
unsigned long *p;
long i, pass;
for (pass = 0; pass < 10; pass++) {
p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return -1;
}
for (i = 0; i < SIZE / 4096; i++)
p[i * 4096 / sizeof(*p)] = i;
for (i = 0; i < SIZE / 4096; i++) {
if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
perror("remap_file_pages");
return -1;
}
}
for (i = SIZE / 4096 - 1; i >= 0; i--)
assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);
munmap(p, SIZE);
}
return 0;
}
[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_COMPACTION=y, CONFIG_DEBUG_FS=n:
mm/vmstat.c:690: warning: 'frag_start' defined but not used
mm/vmstat.c:702: warning: 'frag_next' defined but not used
mm/vmstat.c:710: warning: 'frag_stop' defined but not used
mm/vmstat.c:715: warning: 'walk_zones_in_node' defined but not used
It's all a bit of a tangly mess and it's unclear why CONFIG_COMPACTION
figures in there at all. Move frag_start/frag_next/frag_stop and
migratetype_names[] into the existing CONFIG_PROC_FS block.
walk_zones_in_node() gets a special ifdef.
Also move the #include lines up to where #include lines live.
[axel.lin@ingics.com: fix build error when !CONFIG_PROC_FS]
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here, free memory is allocated using kmem_cache_zalloc. So, use
kmem_cache_free instead of kfree.
This is done using Coccinelle and semantic patch used
is as follows:
@@
expression x,E,c;
@@
x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
... when != x = E
when != &x
?-kfree(x)
+kmem_cache_free(c,x)
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
compound_head() is implemented with assumption that there would be race
condition when checking tail flag. This assumption is only true when we
try to access arbitrary positioned struct page.
The situation that virt_to_head_page() is called is different case. We
call virt_to_head_page() only in the range of allocated pages, so there
is no race condition on tail flag. In this case, we don't need to
handle race condition and we can reduce overhead slightly. This patch
implements compound_head_fast() which is similar with compound_head()
except tail flag race handling. And then, virt_to_head_page() uses this
optimized function to improve performance.
I saw 1.8% win in a fast-path loop over kmem_cache_alloc/free, (14.063
ns -> 13.810 ns) if target object is on tail page.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We had to insert a preempt enable/disable in the fastpath a while ago in
order to guarantee that tid and kmem_cache_cpu are retrieved on the same
cpu. It is the problem only for CONFIG_PREEMPT in which scheduler can
move the process to other cpu during retrieving data.
Now, I reach the solution to remove preempt enable/disable in the
fastpath. If tid is matched with kmem_cache_cpu's tid after tid and
kmem_cache_cpu are retrieved by separate this_cpu operation, it means
that they are retrieved on the same cpu. If not matched, we just have
to retry it.
With this guarantee, preemption enable/disable isn't need at all even if
CONFIG_PREEMPT, so this patch removes it.
I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free in
CONFIG_PREEMPT. (14.821 ns -> 14.049 ns)
Below is the result of Christoph's slab_test reported by Jesper Dangaard
Brouer.
* Before
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 49 cycles kfree -> 62 cycles
10000 times kmalloc(16) -> 48 cycles kfree -> 64 cycles
10000 times kmalloc(32) -> 53 cycles kfree -> 70 cycles
10000 times kmalloc(64) -> 64 cycles kfree -> 77 cycles
10000 times kmalloc(128) -> 74 cycles kfree -> 84 cycles
10000 times kmalloc(256) -> 84 cycles kfree -> 114 cycles
10000 times kmalloc(512) -> 83 cycles kfree -> 116 cycles
10000 times kmalloc(1024) -> 81 cycles kfree -> 120 cycles
10000 times kmalloc(2048) -> 104 cycles kfree -> 136 cycles
10000 times kmalloc(4096) -> 142 cycles kfree -> 165 cycles
10000 times kmalloc(8192) -> 238 cycles kfree -> 226 cycles
10000 times kmalloc(16384) -> 403 cycles kfree -> 264 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 68 cycles
10000 times kmalloc(16)/kfree -> 68 cycles
10000 times kmalloc(32)/kfree -> 69 cycles
10000 times kmalloc(64)/kfree -> 68 cycles
10000 times kmalloc(128)/kfree -> 68 cycles
10000 times kmalloc(256)/kfree -> 68 cycles
10000 times kmalloc(512)/kfree -> 74 cycles
10000 times kmalloc(1024)/kfree -> 75 cycles
10000 times kmalloc(2048)/kfree -> 74 cycles
10000 times kmalloc(4096)/kfree -> 74 cycles
10000 times kmalloc(8192)/kfree -> 75 cycles
10000 times kmalloc(16384)/kfree -> 510 cycles
* After
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 46 cycles kfree -> 61 cycles
10000 times kmalloc(16) -> 46 cycles kfree -> 63 cycles
10000 times kmalloc(32) -> 49 cycles kfree -> 69 cycles
10000 times kmalloc(64) -> 57 cycles kfree -> 76 cycles
10000 times kmalloc(128) -> 66 cycles kfree -> 83 cycles
10000 times kmalloc(256) -> 84 cycles kfree -> 110 cycles
10000 times kmalloc(512) -> 77 cycles kfree -> 114 cycles
10000 times kmalloc(1024) -> 80 cycles kfree -> 116 cycles
10000 times kmalloc(2048) -> 102 cycles kfree -> 131 cycles
10000 times kmalloc(4096) -> 135 cycles kfree -> 163 cycles
10000 times kmalloc(8192) -> 238 cycles kfree -> 218 cycles
10000 times kmalloc(16384) -> 399 cycles kfree -> 262 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 65 cycles
10000 times kmalloc(16)/kfree -> 66 cycles
10000 times kmalloc(32)/kfree -> 65 cycles
10000 times kmalloc(64)/kfree -> 66 cycles
10000 times kmalloc(128)/kfree -> 66 cycles
10000 times kmalloc(256)/kfree -> 71 cycles
10000 times kmalloc(512)/kfree -> 72 cycles
10000 times kmalloc(1024)/kfree -> 71 cycles
10000 times kmalloc(2048)/kfree -> 71 cycles
10000 times kmalloc(4096)/kfree -> 71 cycles
10000 times kmalloc(8192)/kfree -> 65 cycles
10000 times kmalloc(16384)/kfree -> 511 cycles
Most of the results are better than before.
Note that this change slightly worses performance in !CONFIG_PREEMPT,
roughly 0.3%. Implementing each case separately would help performance,
but, since it's so marginal, I didn't do that. This would help
maintanance since we have same code for all cases.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>