c4cc6d07b2
For three years swapin_readahead has been cluttered with fanciful CONFIG_NUMA code, advancing addr, and stepping on to the next vma at the boundary, to line up the mempolicy for each page allocation. It _might_ be a good idea to allocate swap more according to vma layout; but the fact is, that's not how we do it at all, 2.6 even less than 2.4: swap is allocated as needed for pages as they sink to the bottom of the inactive LRUs. Sometimes that may match vma layout, but not so often that it's worth going to these misleading vma->vm_next lengths: rip all that out. Originally I intended to retain the incrementation of addr, but correct its initial value: valid_swaphandles generally supplies an offset below the target addr (this is readaround rather than readahead), but addr has not been adjusted accordingly, so in the interleave case it has usually been allocating the target page from the "wrong" node (though that may not matter very much). But look at the equivalent shmem_swapin code: either by oversight or by design, though it has all the apparatus for choosing a new mempolicy per page, it uses the same idx throughout, choosing the same mempolicy and interleave node for each page of the cluster. Which is actually a much better strategy: each node has its own LRUs and its own kswapd, so if you're betting on any particular relationship between swap and node, the best bet is that nearby swap entries belong to pages from the same node - even when the mempolicy of the target page is to interleave. And examining a map of nodes corresponding to swap entries on a numa=fake system bears this out. (We could later tweak swap allocation to make it even more likely, but this patch is merely about removing cruft.) So, neither adjust nor increment addr in swapin_readahead, and then shmem_swapin can use it too; the pseudo-vma to pass policy need only be set up once per cluster, and so few fields of pvma are used, let's skip the memset - from shmem_alloc_page also. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
---|---|---|
.. | ||
allocpercpu.c | ||
backing-dev.c | ||
bootmem.c | ||
bounce.c | ||
fadvise.c | ||
filemap.c | ||
filemap_xip.c | ||
fremap.c | ||
highmem.c | ||
hugetlb.c | ||
internal.h | ||
Kconfig | ||
madvise.c | ||
Makefile | ||
memory.c | ||
memory_hotplug.c | ||
mempolicy.c | ||
mempool.c | ||
migrate.c | ||
mincore.c | ||
mlock.c | ||
mmap.c | ||
mmzone.c | ||
mprotect.c | ||
mremap.c | ||
msync.c | ||
nommu.c | ||
oom_kill.c | ||
page-writeback.c | ||
page_alloc.c | ||
page_io.c | ||
page_isolation.c | ||
pdflush.c | ||
prio_tree.c | ||
quicklist.c | ||
readahead.c | ||
rmap.c | ||
shmem.c | ||
shmem_acl.c | ||
slab.c | ||
slob.c | ||
slub.c | ||
sparse-vmemmap.c | ||
sparse.c | ||
swap.c | ||
swap_state.c | ||
swapfile.c | ||
thrash.c | ||
tiny-shmem.c | ||
truncate.c | ||
util.c | ||
vmalloc.c | ||
vmscan.c | ||
vmstat.c |