We have allowed migration for only LRU pages until now and it was enough to make high-order pages. But recently, embedded system(e.g., webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) so we have seen several reports about troubles of small high-order allocation. For fixing the problem, there were several efforts (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, reserved memory, vmalloc and so on) but if there are lots of non-movable pages in system, their solutions are void in the long run. So, this patch is to support facility to change non-movable pages with movable. For the feature, this patch introduces functions related to migration to address_space_operations as well as some page flags. If a driver want to make own pages movable, it should define three functions which are function pointers of struct address_space_operations. 1. bool (*isolate_page) (struct page *page, isolate_mode_t mode); What VM expects on isolate_page function of driver is to return *true* if driver isolates page successfully. On returing true, VM marks the page as PG_isolated so concurrent isolation in several CPUs skip the page for isolation. If a driver cannot isolate the page, it should return *false*. Once page is successfully isolated, VM uses page.lru fields so driver shouldn't expect to preserve values in that fields. 2. int (*migratepage) (struct address_space *mapping, struct page *newpage, struct page *oldpage, enum migrate_mode); After isolation, VM calls migratepage of driver with isolated page. The function of migratepage is to move content of the old page to new page and set up fields of struct page newpage. Keep in mind that you should indicate to the VM the oldpage is no longer movable via __ClearPageMovable() under page_lock if you migrated the oldpage successfully and returns 0. If driver cannot migrate the page at the moment, driver can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time because VM interprets -EAGAIN as "temporal migration failure". On returning any error except -EAGAIN, VM will give up the page migration without retrying in this time. Driver shouldn't touch page.lru field VM using in the functions. 3. void (*putback_page)(struct page *); If migration fails on isolated page, VM should return the isolated page to the driver so VM calls driver's putback_page with migration failed page. In this function, driver should put the isolated page back to the own data structure. 4. non-lru movable page flags There are two page flags for supporting non-lru movable page. * PG_movable Driver should use the below function to make page movable under page_lock. void __SetPageMovable(struct page *page, struct address_space *mapping) It needs argument of address_space for registering migration family functions which will be called by VM. Exactly speaking, PG_movable is not a real flag of struct page. Rather than, VM reuses page->mapping's lower bits to represent it. #define PAGE_MAPPING_MOVABLE 0x2 page->mapping = page->mapping | PAGE_MAPPING_MOVABLE; so driver shouldn't access page->mapping directly. Instead, driver should use page_mapping which mask off the low two bits of page->mapping so it can get right struct address_space. For testing of non-lru movable page, VM supports __PageMovable function. However, it doesn't guarantee to identify non-lru movable page because page->mapping field is unified with other variables in struct page. As well, if driver releases the page after isolation by VM, page->mapping doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at __ClearPageMovable). But __PageMovable is cheap to catch whether page is LRU or non-lru movable once the page has been isolated. Because LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also good for just peeking to test non-lru movable pages before more expensive checking with lock_page in pfn scanning to select victim. For guaranteeing non-lru movable page, VM provides PageMovable function. Unlike __PageMovable, PageMovable functions validates page->mapping and mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden destroying of page->mapping. Driver using __SetPageMovable should clear the flag via __ClearMovablePage under page_lock before the releasing the page. * PG_isolated To prevent concurrent isolation among several CPUs, VM marks isolated page as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru movable page, it can skip it. Driver doesn't need to manipulate the flag because VM will set/clear it automatically. Keep in mind that if driver sees PG_isolated page, it means the page have been isolated by VM so it shouldn't touch page.lru field. PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag for own purpose. [opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru] Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: John Einar Reitan <john.reitan@foss.arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
247 lines
6.6 KiB
C
247 lines
6.6 KiB
C
#ifndef _LINUX_COMPACTION_H
|
|
#define _LINUX_COMPACTION_H
|
|
|
|
/* Return values for compact_zone() and try_to_compact_pages() */
|
|
/* When adding new states, please adjust include/trace/events/compaction.h */
|
|
enum compact_result {
|
|
/* For more detailed tracepoint output - internal to compaction */
|
|
COMPACT_NOT_SUITABLE_ZONE,
|
|
/*
|
|
* compaction didn't start as it was not possible or direct reclaim
|
|
* was more suitable
|
|
*/
|
|
COMPACT_SKIPPED,
|
|
/* compaction didn't start as it was deferred due to past failures */
|
|
COMPACT_DEFERRED,
|
|
|
|
/* compaction not active last round */
|
|
COMPACT_INACTIVE = COMPACT_DEFERRED,
|
|
|
|
/* For more detailed tracepoint output - internal to compaction */
|
|
COMPACT_NO_SUITABLE_PAGE,
|
|
/* compaction should continue to another pageblock */
|
|
COMPACT_CONTINUE,
|
|
|
|
/*
|
|
* The full zone was compacted scanned but wasn't successfull to compact
|
|
* suitable pages.
|
|
*/
|
|
COMPACT_COMPLETE,
|
|
/*
|
|
* direct compaction has scanned part of the zone but wasn't successfull
|
|
* to compact suitable pages.
|
|
*/
|
|
COMPACT_PARTIAL_SKIPPED,
|
|
|
|
/* compaction terminated prematurely due to lock contentions */
|
|
COMPACT_CONTENDED,
|
|
|
|
/*
|
|
* direct compaction partially compacted a zone and there might be
|
|
* suitable pages
|
|
*/
|
|
COMPACT_PARTIAL,
|
|
};
|
|
|
|
/* Used to signal whether compaction detected need_sched() or lock contention */
|
|
/* No contention detected */
|
|
#define COMPACT_CONTENDED_NONE 0
|
|
/* Either need_sched() was true or fatal signal pending */
|
|
#define COMPACT_CONTENDED_SCHED 1
|
|
/* Zone lock or lru_lock was contended in async compaction */
|
|
#define COMPACT_CONTENDED_LOCK 2
|
|
|
|
struct alloc_context; /* in mm/internal.h */
|
|
|
|
#ifdef CONFIG_COMPACTION
|
|
extern int PageMovable(struct page *page);
|
|
extern void __SetPageMovable(struct page *page, struct address_space *mapping);
|
|
extern void __ClearPageMovable(struct page *page);
|
|
extern int sysctl_compact_memory;
|
|
extern int sysctl_compaction_handler(struct ctl_table *table, int write,
|
|
void __user *buffer, size_t *length, loff_t *ppos);
|
|
extern int sysctl_extfrag_threshold;
|
|
extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
|
|
void __user *buffer, size_t *length, loff_t *ppos);
|
|
extern int sysctl_compact_unevictable_allowed;
|
|
|
|
extern int fragmentation_index(struct zone *zone, unsigned int order);
|
|
extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
|
|
unsigned int order,
|
|
unsigned int alloc_flags, const struct alloc_context *ac,
|
|
enum migrate_mode mode, int *contended);
|
|
extern void compact_pgdat(pg_data_t *pgdat, int order);
|
|
extern void reset_isolation_suitable(pg_data_t *pgdat);
|
|
extern enum compact_result compaction_suitable(struct zone *zone, int order,
|
|
unsigned int alloc_flags, int classzone_idx);
|
|
|
|
extern void defer_compaction(struct zone *zone, int order);
|
|
extern bool compaction_deferred(struct zone *zone, int order);
|
|
extern void compaction_defer_reset(struct zone *zone, int order,
|
|
bool alloc_success);
|
|
extern bool compaction_restarting(struct zone *zone, int order);
|
|
|
|
/* Compaction has made some progress and retrying makes sense */
|
|
static inline bool compaction_made_progress(enum compact_result result)
|
|
{
|
|
/*
|
|
* Even though this might sound confusing this in fact tells us
|
|
* that the compaction successfully isolated and migrated some
|
|
* pageblocks.
|
|
*/
|
|
if (result == COMPACT_PARTIAL)
|
|
return true;
|
|
|
|
return false;
|
|
}
|
|
|
|
/* Compaction has failed and it doesn't make much sense to keep retrying. */
|
|
static inline bool compaction_failed(enum compact_result result)
|
|
{
|
|
/* All zones were scanned completely and still not result. */
|
|
if (result == COMPACT_COMPLETE)
|
|
return true;
|
|
|
|
return false;
|
|
}
|
|
|
|
/*
|
|
* Compaction has backed off for some reason. It might be throttling or
|
|
* lock contention. Retrying is still worthwhile.
|
|
*/
|
|
static inline bool compaction_withdrawn(enum compact_result result)
|
|
{
|
|
/*
|
|
* Compaction backed off due to watermark checks for order-0
|
|
* so the regular reclaim has to try harder and reclaim something.
|
|
*/
|
|
if (result == COMPACT_SKIPPED)
|
|
return true;
|
|
|
|
/*
|
|
* If compaction is deferred for high-order allocations, it is
|
|
* because sync compaction recently failed. If this is the case
|
|
* and the caller requested a THP allocation, we do not want
|
|
* to heavily disrupt the system, so we fail the allocation
|
|
* instead of entering direct reclaim.
|
|
*/
|
|
if (result == COMPACT_DEFERRED)
|
|
return true;
|
|
|
|
/*
|
|
* If compaction in async mode encounters contention or blocks higher
|
|
* priority task we back off early rather than cause stalls.
|
|
*/
|
|
if (result == COMPACT_CONTENDED)
|
|
return true;
|
|
|
|
/*
|
|
* Page scanners have met but we haven't scanned full zones so this
|
|
* is a back off in fact.
|
|
*/
|
|
if (result == COMPACT_PARTIAL_SKIPPED)
|
|
return true;
|
|
|
|
return false;
|
|
}
|
|
|
|
|
|
bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
|
|
int alloc_flags);
|
|
|
|
extern int kcompactd_run(int nid);
|
|
extern void kcompactd_stop(int nid);
|
|
extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
|
|
|
|
#else
|
|
static inline int PageMovable(struct page *page)
|
|
{
|
|
return 0;
|
|
}
|
|
static inline void __SetPageMovable(struct page *page,
|
|
struct address_space *mapping)
|
|
{
|
|
}
|
|
|
|
static inline void __ClearPageMovable(struct page *page)
|
|
{
|
|
}
|
|
|
|
static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
|
|
unsigned int order, int alloc_flags,
|
|
const struct alloc_context *ac,
|
|
enum migrate_mode mode, int *contended)
|
|
{
|
|
return COMPACT_CONTINUE;
|
|
}
|
|
|
|
static inline void compact_pgdat(pg_data_t *pgdat, int order)
|
|
{
|
|
}
|
|
|
|
static inline void reset_isolation_suitable(pg_data_t *pgdat)
|
|
{
|
|
}
|
|
|
|
static inline enum compact_result compaction_suitable(struct zone *zone, int order,
|
|
int alloc_flags, int classzone_idx)
|
|
{
|
|
return COMPACT_SKIPPED;
|
|
}
|
|
|
|
static inline void defer_compaction(struct zone *zone, int order)
|
|
{
|
|
}
|
|
|
|
static inline bool compaction_deferred(struct zone *zone, int order)
|
|
{
|
|
return true;
|
|
}
|
|
|
|
static inline bool compaction_made_progress(enum compact_result result)
|
|
{
|
|
return false;
|
|
}
|
|
|
|
static inline bool compaction_failed(enum compact_result result)
|
|
{
|
|
return false;
|
|
}
|
|
|
|
static inline bool compaction_withdrawn(enum compact_result result)
|
|
{
|
|
return true;
|
|
}
|
|
|
|
static inline int kcompactd_run(int nid)
|
|
{
|
|
return 0;
|
|
}
|
|
static inline void kcompactd_stop(int nid)
|
|
{
|
|
}
|
|
|
|
static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx)
|
|
{
|
|
}
|
|
|
|
#endif /* CONFIG_COMPACTION */
|
|
|
|
#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
|
|
struct node;
|
|
extern int compaction_register_node(struct node *node);
|
|
extern void compaction_unregister_node(struct node *node);
|
|
|
|
#else
|
|
|
|
static inline int compaction_register_node(struct node *node)
|
|
{
|
|
return 0;
|
|
}
|
|
|
|
static inline void compaction_unregister_node(struct node *node)
|
|
{
|
|
}
|
|
#endif /* CONFIG_COMPACTION && CONFIG_SYSFS && CONFIG_NUMA */
|
|
|
|
#endif /* _LINUX_COMPACTION_H */
|