Move the fallback arch_vma_name() to a sensible place (kernel/signal.c).
Currently it's in fs/proc/task_mmu.c, a file that is dependent on both
CONFIG_PROC_FS and CONFIG_MMU being enabled, but it's used from
kernel/signal.c from where it is called unconditionally.
[akpm@osdl.org: build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement /proc/pid/maps for NOMMU by reading the vm_area_list attached to
current->mm->context.vmlist.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set the backing device info capabilities for /dev/mem and /dev/kmem to
permit direct sharing under no-MMU conditions and full mapping capabilities
under MMU conditions. Make the BDI used by these available to all directly
mappable character devices.
Also comment the capabilities for /dev/zero.
[akpm@osdl.org: ifdef reductions]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The function is exported but not used from anywhere else. It's also marked as
"not for driver use" so noone out there should really care.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement do_no_pfn() for handling mapping of memory without a struct page
backing it. This avoids creating fake page table entries for regions which
are not backed by real memory.
This feature is used by the MSPEC driver and other users, where it is
highly undesirable to have a struct page sitting behind the page (for
instance if the page is accessed in cached mode via the struct page in
parallel to the the driver accessing it uncached, which can result in data
corruption on some architectures, such as ia64).
This version uses specific NOPFN_{SIGBUS,OOM} return values, rather than
expect all negative pfn values would be an error. It also bugs on cow
mappings as this would not work with the VM.
[akpm@osdl.org: micro-optimise]
Signed-off-by: Jes Sorensen <jes@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the node in order to optimize zone_to_nid.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GFP_THISNODE must be set to 0 in the non numa case otherwise we disable retry
and warnings for failing allocations in the SMP and UP case.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NUMA_BUILD constant is always available and will be set to 1 on
NUMA_BUILDs. That way checks valid only under CONFIG_NUMA can easily be done
without #ifdef CONFIG_NUMA
F.e.
if (NUMA_BUILD && <numa_condition>) {
...
}
[akpm: not a thing we'd normally do, but CONFIG_NUMA is special: it is
causing ifdef explosion in core kernel, so let's see if this is a comfortable
way in whcih to control that]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This moves the definition of struct page from mm.h to its own header file
page-struct.h. This is a prereq to fix SetPageUptodate which is broken on
s390:
#define SetPageUptodate(_page)
do {
struct page *__page = (_page);
if (!test_and_set_bit(PG_uptodate, &__page->flags))
page_test_and_clear_dirty(_page);
} while (0)
_page gets used twice in this macro which can cause subtle bugs. Using
__page for the page_test_and_clear_dirty call doesn't work since it causes
yet another problem with the page_test_and_clear_dirty macro as well.
In order to avoid all these problems caused by macros it seems to be a good
idea to get rid of them and convert them to static inline functions.
Because of header file include order it's necessary to have a seperate
header file for the struct page definition.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The VM is supposed to minimise the number of pages which get written off the
LRU (for IO scheduling efficiency, and for high reclaim-success rates). But
we don't actually have a clear way of showing how true this is.
So add `nr_vmscan_write' to /proc/vmstat and /proc/zoneinfo - the number of
pages which have been written by the vm scanner in this zone and globally.
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arch-independent zone-sizing determines the size of a node
(pgdat->node_spanned_pages) based on the physical memory that was
registered by the architecture. However, when
CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the
spanned_pages will be much larger and that mem_map will be allocated that
is used lated on memory hot-add.
This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE
to call push_node_boundaries() which will set the node beginning and end to
at *least* the requested boundary.
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The x86_64 code accounted for memmap and some portions of the the DMA zone as
holes. This was because those areas would never be reclaimed and accounting
for them as memory affects min watermarks. This patch will account for the
memmap as a memory hole. Architectures may optionally use set_dma_reserve()
if they wish to account for a portion of memory in ZONE_DMA as a hole.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
At a basic level, architectures define structures to record where active
ranges of page frames are located. Once located, the code to calculate zone
sizes and holes in each architecture is very similar. Some of this zone and
hole sizing code is difficult to read for no good reason. This set of patches
eliminates the similar-looking architecture-specific code.
The patches introduce a mechanism where architectures register where the
active ranges of page frames are with add_active_range(). When all areas have
been discovered, free_area_init_nodes() is called to initialise the pgdat and
zones. The zone sizes and holes are then calculated in an architecture
independent manner.
Patch 1 introduces the mechanism for registering and initialising PFN ranges
Patch 2 changes ppc to use the mechanism - 139 arch-specific LOC removed
Patch 3 changes x86 to use the mechanism - 136 arch-specific LOC removed
Patch 4 changes x86_64 to use the mechanism - 74 arch-specific LOC removed
Patch 5 changes ia64 to use the mechanism - 52 arch-specific LOC removed
Patch 6 accounts for mem_map as a memory hole as the pages are not reclaimable.
It adjusts the watermarks slightly
Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,
gensparse_defconfig and defconfig. Bob Picco has also tested and debugged on
IA64. Jack Steiner successfully boot tested on a mammoth SGI IA64-based
machine. These were on patches against 2.6.17-rc1 and release 3 of these
patches but there have been no ia64-changes since release 3.
There are differences in the zone sizes for x86_64 as the arch-specific code
for x86_64 accounts the kernel image and the starting mem_maps as memory holes
but the architecture-independent code accounts the memory as present.
The big benefit of this set of patches is a sizable reduction of
architecture-specific code, some of which is very hairy. There should be a
greater reduction when other architectures use the same mechanisms for zone
and hole sizing but I lack the hardware to test on.
Additional credit;
Dave Hansen for the initial suggestion and comments on early patches
Andy Whitcroft for reviewing early versions and catching numerous
errors
Tony Luck for testing and debugging on IA64
Bob Picco for fixing bugs related to pfn registration, reviewing a
number of patch revisions, providing a number of suggestions
on future direction and testing heavily
Jack Steiner and Robin Holt for testing on IA64 and clarifying
issues related to memory holes
Yasunori for testing on IA64
Andi Kleen for reviewing and feeding back about x86_64
Christian Kujau for providing valuable information related to ACPI
problems on x86_64 and testing potential fixes
This patch:
Define the structure to represent an active range of page frames within a node
in an architecture independent manner. Architectures are expected to register
active ranges of PFNs using add_active_range(nid, start_pfn, end_pfn) and call
free_area_init_nodes() passing the PFNs of the end of each zone.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
un-, de-, -free, -destroy, -exit, etc functions should in general return
void. Also,
There is very little, say, filesystem driver code can do upon failed
kmem_cache_destroy(). If it will be decided to BUG in this case, BUG
should be put in generic code, instead.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are a few places I've found in jbd that look like they may not be
16T-safe, or consistent with the use of unsigned longs for block
containers. Problems here would be somewhat hard to hit, would require
journal blocks past the 8T boundary, which would not be terribly common.
Still, should fix.
(some of these have come from the ext4 work on jbd as well).
I think there's one more possibility that the wrap() function may not be
safe IF your last block in the journal butts right up against the 232 block
boundary, but that seems like a VERY remote possibility, and I'm not
worrying about it at this point.
Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove whitespace from ext3 and jbd, before we clone ext4.
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6: (30 commits)
i2c: Drop unimplemented slave functions
i2c: Constify i2c_algorithm declarations, part 2
i2c: Constify i2c_algorithm declarations, part 1
i2c: Let drivers constify i2c_algorithm data
i2c-isa: Restore driver owner
i2c-viapro: Add support for the VT8237A and VT8251
i2c: Warn on i2c client creation failure
i2c-core: Drop useless bitmaskings
i2c-algo-pcf: Discard the mdelay data struct member
i2c-algo-bit: Cleanups
i2c-isa: Fail adding driver on attach_adapter error
i2c: __must_check fixes (chip drivers)
i2c-dev: attach/detach_adapter cleanups
i2c-stub: Chip address as a module parameter
i2c: Plan i2c-isa for removal
i2c: New bus driver for TI OMAP boards
i2c-algo-bit: Discard the mdelay data struct member
i2c-matroxfb: Struct init conversion
i2c: Fix copy-n-paste in subsystem Kconfig
i2c-au1550: Add I2C support for Au1200
...
This patch adds pci_stop_bus_device() which stops a PCI device (detach
the driver, remove from the global list and so on) and any children.
This is needed for ACPI based PCI-to-PCI bridge hot-remove, and it will
be also needed for ACPI based PCI root bridge hot-remove.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are numerous drivers that can use multithreaded probing but having
some kind of global flag as the way to control this makes migration to
threaded probing hard and since it enables it everywhere and is almost
as likely to cause serious pain as holding a clog dance in a minefield.
If we have a pci_driver multithread_probe flag to inherit you can turn
it on for one driver at a time.
From playing so far however I think we need a different model at the
device layer which serializes until the called probe function says "ok
you can start another one now". That would need some kind of flag and
semaphore plus a helper function.
Anyway in the absence of that this is a starting point to usefully play
with this stuff
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patch 3 implements the core part of PCI-Express AER and aerdrv
port service driver.
When a root port service device is probed, the aerdrv will call
request_irq to register irq handler for AER error interrupt.
When a device sends an PCI-Express error message to the root port,
the root port will trigger an interrupt, by either MSI or IO-APIC,
then kernel would run the irq handler. The handler collects root
error status register and schedules a work. The work will call
the core part to process the error based on its type
(Correctable/non-fatal/fatal).
As for Correctable errors, the patch chooses to just clear the correctable
error status register of the device.
As for the non-fatal error, the patch follows generic PCI error handler
rules to call the error callback functions of the endpoint's driver. If
the device is a bridge, the patch chooses to broadcast the error to
downstream devices.
As for the fatal error, the patch resets the pci-express link and
follows generic PCI error handler rules to call the error callback
functions of the endpoint's driver. If the device is a bridge, the patch
chooses to broadcast the error to downstream devices.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Introduce msi_ht_cap_enabled() to check the MSI capability in the
Hypertransport configuration space.
It is used in a generic quirk quirk_msi_ht_cap() to check whether
MSI is enabled on hypertransport chipset, and a nVidia specific quirk
quirk_nvidia_ck804_msi_ht_cap() where two 2 HT MSI mappings have to
be checked.
Both quirks set the PCI_BUS_FLAGS_NO_MSI bus flag when MSI is disabled.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
0x08 is the HT capability, while PCI_CAP_ID_HT_IRQCONF would be
the subtype 0x80 that mpic_scan_ht_pic() uses.
Rename PCI_CAP_ID_HT_IRQCONF into PCI_CAP_ID_HT.
And by the way, use it in the ipath driver instead of defining its
own HT_CAPABILITY_ID.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c: Drop unimplemented slave functions
Drop the function declarations for slave mode support of i2c adapters.
This was never implemented, and by the time it is I bet we will want
something different anyway.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c: Let drivers constify i2c_algorithm data
Let drivers constify I2C algorithm method operations tables,
moving them from ".data" to ".rodata".
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c-algo-pcf: Discard the mdelay data struct member
Just as i2c-algo-bit, i2c-algo-pcf has an unused mdelay struct member,
which we can get rid of to spare some code and memory.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c-algo-bit: Discard the mdelay data struct member
The i2c_algo_bit_data structure has an mdelay member, which is not
used by the algorithm code (the code has always been ifdef'd out.)
Let's discard it to save some code and memory.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c-algo-sibyte: Merge into i2c-sibyte
Merge i2c-algo-sibyte into i2c-sibyte, as this is a complete,
hardware-dependent SMBus implementation and not a reusable algorithm.
Perform some basic coding style cleanups while we're here (mainly
space-based indentation replaced by tabulations.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch enables building of individual WAN protocol support
routines (parts of generic HDLC) as separate modules.
All protocol-private definitions are moved from hdlc.h file
to protocol drivers. User-space interface and interface
between generic HDLC and underlying low-level HDLC drivers
are unchanged.
Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (47 commits)
Driver core: Don't call put methods while holding a spinlock
Driver core: Remove unneeded routines from driver core
Driver core: Fix potential deadlock in driver core
PCI: enable driver multi-threaded probe
Driver Core: add ability for drivers to do a threaded probe
sysfs: add proper sysfs_init() prototype
drivers/base: check errors
drivers/base: Platform notify needs to occur before drivers attach to the device
v4l-dev2: handle __must_check
add CONFIG_ENABLE_MUST_CHECK
add __must_check to device management code
Driver core: fixed add_bind_files() definition
Driver core: fix comments in drivers/base/power/resume.c
sysfs_remove_bin_file: no return value, dump_stack on error
kobject: must_check fixes
Driver core: add ability for devices to create and remove bin files
Class: add support for class interfaces for devices
Driver core: create devices/virtual/ tree
Driver core: add device_rename function
Driver core: add ability for classes to handle devices properly
...
Add the pm_trace attribute in /sys/power which has to be explicitly set to
one to really enable the "PM tracing" code compiled in when CONFIG_PM_TRACE
is set (which modifies the machine's CMOS clock in unpredictable ways).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change suspend_console() so that it waits for all consoles to flush the
remaining messages and make it possible to switch the console suspending off
with the help of a Kconfig option.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Stefan Seyfried <seife@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make swsusp use memory bitmaps to store its internal information during the
resume phase of the suspend-resume cycle.
If the pfns of saveable pages are saved during the suspend phase instead of
the kernel virtual addresses of these pages, we can use them during the resume
phase directly to set the corresponding bits in a memory bitmap. Then, this
bitmap is used to mark the page frames corresponding to the pages that were
saveable before the suspend (aka "unsafe" page frames).
Next, we allocate as many page frames as needed to store the entire suspend
image and make sure that there will be some extra free "safe" page frames for
the list of PBEs constructed later. Subsequently, the image is loaded and, if
possible, the data loaded from it are written into their "original" page
frames (ie. the ones they had occupied before the suspend).
The image data that cannot be written into their "original" page frames are
loaded into "safe" page frames and their "original" kernel virtual addresses,
as well as the addresses of the "safe" pages containing their copies, are
stored in a list of PBEs. Finally, the list of PBEs is used to copy the
remaining image data into their "original" page frames (this is done
atomically, by the architecture-dependent parts of swsusp).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove some things that are no longer used or defined elsewhere from suspend.h
and make the inline version of software_suspend() return the right error code.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current suspend code has to be run on one CPU, so we use the CPU
hotplug to take the non-boot CPUs offline on SMP machines. However, we
should also make sure that these CPUs will not be enabled by someone else
after we have disabled them.
The functions disable_nonboot_cpus() and enable_nonboot_cpus() are moved to
kernel/cpu.c, because they now refer to some stuff in there that should
better be static. Also it's better if disable_nonboot_cpus() returns an
error instead of panicking if something goes wrong, and
enable_nonboot_cpus() has no reason to panic(), because the CPUs may have
been enabled by the userland before it tries to take them online.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement async reads for swsusp resuming.
Crufty old PIII testbox:
15.7 MB/s -> 20.3 MB/s
Sony Vaio:
14.6 MB/s -> 33.3 MB/s
I didn't implement the post-resume bio_set_pages_dirty(). I don't really
understand why resume needs to run set_page_dirty() against these pages.
It might be a worry that this code modifies PG_Uptodate, PG_Error and
PG_Locked against the image pages. Can this possibly affect the resumed-into
kernel? Hopefully not, if we're atomically restoring its mem_map?
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jens Axboe <axboe@suse.de>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Switch the swsusp writeout code from 4k-at-a-time to 4MB-at-a-time.
Crufty old PIII testbox:
12.9 MB/s -> 20.9 MB/s
Sony Vaio:
14.7 MB/s -> 26.5 MB/s
The implementation is crude. A better one would use larger BIOs, but wouldn't
gain any performance.
The memcpys will be mostly pipelined with the IO and basically come for free.
The ENOMEM path has not been tested. It should be.
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the DIV_ROUND_UP() helper macro: divide `n' by `d', rounding up.
Stolen from the gfs2 tree(!) because the swsusp patches need it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.
Move it into <linux/smp.h>.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I've come across some problems with the assembly version of the ELFNOTE
macro currently in -mm. (in
x86-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch)
The first is that older gas does not support :varargs in .macro
definitions (in my testing 2.17 does while 2.15 does not, I don't know
when it became supported). The Changes file says binutils >= 2.12 so I
think we need to avoid using it. There are no other uses in mainline or
-mm. Old gas appears to just ignore it so you get "too many arguments"
type errors.
Secondly it seems that passing strings as arguments to assembler macros
is broken without varargs. It looks like they get unquoted or each
character is treated as a separate argument or something and this causes
all manner of grief. I think this is because of the use of -traditional
when compiling assembly files.
Therefore I have translated the assembler macro into a pre-processor
macro.
I added the desctype as a separate argument instead of including it with
the descdata as the previous version did since -traditional means the
ELFNOTE definition after the #else needs to have the same number of
arguments (I think so anyway, the -traditional CPP semantics are pretty
fscking strange!).
With this patch I am able to define elfnotes in assembly like this with
both old and new assemblers.
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "linux")
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "2.6")
ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0")
ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .long, __PAGE_OFFSET)
Which seems reasonable enough.
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch will pack any .note.* section into a PT_NOTE segment in the output
file.
To do this, we tell ld that we need a PT_NOTE segment. This requires us to
start explicitly mapping sections to segments, so we also need to explicitly
create PT_LOAD segments for text and data, and map the sections to them
appropriately. Fortunately, each section will default to its previous
section's segment, so it doesn't take many changes to vmlinux.lds.S.
This only changes i386 for now, but I presume the corresponding changes for
other architectures will be as simple.
This change also adds <linux/elfnote.h>, which defines C and Assembler macros
for actually creating ELF notes.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds support for the Atmel AVR32 architecture as well as the AT32AP7000
CPU and the AT32STK1000 development board.
AVR32 is a new high-performance 32-bit RISC microprocessor core, designed for
cost-sensitive embedded applications, with particular emphasis on low power
consumption and high code density. The AVR32 architecture is not binary
compatible with earlier 8-bit AVR architectures.
The AVR32 architecture, including the instruction set, is described by the
AVR32 Architecture Manual, available from
http://www.atmel.com/dyn/resources/prod_documents/doc32000.pdf
The Atmel AT32AP7000 is the first CPU implementing the AVR32 architecture. It
features a 7-stage pipeline, 16KB instruction and data caches and a full
Memory Management Unit. It also comes with a large set of integrated
peripherals, many of which are shared with the AT91 ARM-based controllers from
Atmel.
Full data sheet is available from
http://www.atmel.com/dyn/resources/prod_documents/doc32003.pdf
while the CPU core implementation including caches and MMU is documented by
the AVR32 AP Technical Reference, available from
http://www.atmel.com/dyn/resources/prod_documents/doc32001.pdf
Information about the AT32STK1000 development board can be found at
http://www.atmel.com/dyn/products/tools_card.asp?tool_id=3918
including a BSP CD image with an earlier version of this patch, development
tools (binaries and source/patches) and a root filesystem image suitable for
booting from SD card.
Alternatively, there's a preliminary "getting started" guide available at
http://avr32linux.org/twiki/bin/view/Main/GettingStarted which provides links
to the sources and patches you will need in order to set up a cross-compiling
environment for avr32-linux.
This patch, as well as the other patches included with the BSP and the
toolchain patches, is actively supported by Atmel Corporation.
[dmccr@us.ibm.com: Fix more pxx_page macro locations]
[bunk@stusta.de: fix `make defconfig']
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave McCracken <dmccr@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Permit __do_IRQ() to be dispensed with based on a configuration option.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace ctxid with sid in selinux_audit_rule_match interface for
consistency with other interfaces.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rename selinux_ctxid_to_string to selinux_sid_to_string to be
consistent with other interfaces.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate selinux_task_ctxid since it duplicates selinux_task_get_sid.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are many places where we need to determine the node of a zone.
Currently we use a difficult to read sequence of pointer dereferencing.
Put that into an inline function and use throughout VM. Maybe we can find
a way to optimize the lookup in the future.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode. Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.
However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs. We have had a case where a machine with
46GB of memory was using 40-42GB for slab. Zone reclaim was effective in
dealing with pagecache pages. However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).
This patch implements slab reclaim during zone reclaim. Zone reclaim
occurs if there is a danger of an off node allocation. At that point we
1. Shrink the per node page cache if the number of pagecache
pages is more than min_unmapped_ratio percent of pages in a zone.
2. Shrink the slab cache if the number of the nodes reclaimable slab pages
(patch depends on earlier one that implements that counter)
are more than min_slab_ratio (a new /proc/sys/vm tunable).
The shrinking of the slab cache is a bit problematic since it is not node
specific. So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit. I hope we will have zone based
slab reclaim at some point which will make that easier.
The default for the min_slab_ratio is 5%
Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.
[akpm@osdl.org: cleanups]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the atomic counter for slab_reclaim_pages and replace the counter
and NR_SLAB with two ZVC counter that account for unreclaimable and
reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.
Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
intend seems to be to check for slab pages that could be freed.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*_pages is a better description of the role of the variable.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In many places we will need to use the same combination of flags. Specify
a single GFP_THISNODE definition for ease of use in gfp.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This
flag is essential if a kernel component requires memory to be located on a
certain node. It will be needed for alloc_pages_node() to force allocation
on the indicated node and for alloc_pages() to force allocation on the
current node.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Let's try to keep mm/ comments more useful and up to date. This is a start.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lock_page needs the caller to have a reference on the page->mapping inode
due to sync_page, ergo set_page_dirty_lock is obviously buggy according to
its comments.
Solve it by introducing a new lock_page_nosync which does not do a sync_page.
akpm: unpleasant solution to an unpleasant problem. If it goes wrong it could
cause great slowdowns while the lock_page() caller waits for kblockd to
perform the unplug. And if a filesystem has special sync_page() requirements
(none presently do), permanent hangs are possible.
otoh, set_page_dirty_lock() is usually (always?) called against userspace
pages. They are always up-to-date, so there shouldn't be any pending read I/O
against these pages.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch splits alloc_percpu() up into two phases. Likewise for
free_percpu(). This allows clients to limit initial allocations to online
cpu's, and to populate or depopulate per-cpu data at run time as needed:
struct my_struct *obj;
/* initial allocation for online cpu's */
obj = percpu_alloc(sizeof(struct my_struct), GFP_KERNEL);
...
/* populate per-cpu data for cpu coming online */
ptr = percpu_populate(obj, sizeof(struct my_struct), GFP_KERNEL, cpu);
...
/* access per-cpu object */
ptr = percpu_ptr(obj, smp_processor_id());
...
/* depopulate per-cpu data for cpu going offline */
percpu_depopulate(obj, cpu);
...
/* final removal */
percpu_free(obj);
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a notifer chain to the out of memory killer. If one of the registered
callbacks could release some memory, do not kill the process but return and
retry the allocation that forced the oom killer to run.
The purpose of the notifier is to add a safety net in the presence of
memory ballooners. If the resource manager inflated the balloon to a size
where memory allocations can not be satisfied anymore, it is better to
deflate the balloon a bit instead of killing processes.
The implementation for the s390 ballooner is included.
[akpm@osdl.org: cleanups]
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I wonder why we need this bitmask indexing into zone->node_zonelists[]?
We always start with the highest zone and then include all lower zones
if we build zonelists.
Are there really cases where we need allocation from ZONE_DMA or
ZONE_HIGHMEM but not ZONE_NORMAL? It seems that the current implementation
of highest_zone() makes that already impossible.
If we go linear on the index then gfp_zone() == highest_zone() and a lot
of definitions fall by the wayside.
We can now revert back to the use of gfp_zone() in mempolicy.c ;-)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After we have done this we can now do some typing cleanup.
The memory policy layer keeps a policy_zone that specifies
the zone that gets memory policies applied. This variable
can now be of type enum zone_type.
The check_highest_zone function and the build_zonelists funnctionm must
then also take a enum zone_type parameter.
Plus there are a number of loops over zones that also should use
zone_type.
We run into some troubles at some points with functions that need a
zone_type variable to become -1. Fix that up.
[pj@sgi.com: fix set_mempolicy() crash]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a check in zonelist_policy that compares pieces of the bitmap
obtained from a gfp mask via GFP_ZONETYPES with a zone number in function
zonelist_policy().
The bitmap is an ORed mask of __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM.
The policy_zone is a zone number with the possible values of ZONE_DMA,
ZONE_DMA32, ZONE_HIGHMEM and ZONE_NORMAL. These are two different domains
of values.
For some reason seemed to work before the zone reduction patchset (It
definitely works on SGI boxes since we just have one zone and the check
cannot fail).
With the zone reduction patchset this check definitely fails on systems
with two zones if the system actually has memory in both zones.
This is because ZONE_NORMAL is selected using no __GFP flag at
all and thus gfp_zone(gfpmask) == 0. ZONE_DMA is selected when __GFP_DMA
is set. __GFP_DMA is 0x01. So gfp_zone(gfpmask) == 1.
policy_zone is set to ZONE_NORMAL (==1) if ZONE_NORMAL and ZONE_DMA are
populated.
For ZONE_NORMAL gfp_zone(<no _GFP_DMA>) yields 0 which is <
policy_zone(ZONE_NORMAL) and so policy is not applied to regular memory
allocations!
Instead gfp_zone(__GFP_DMA) == 1 which results in policy being applied
to DMA allocations!
What we realy want in that place is to establish the highest allowable
zone for a given gfp_mask. If the highest zone is higher or equal to the
policy_zone then memory policies need to be applied. We have such
a highest_zone() function in page_alloc.c.
So move the highest_zone() function from mm/page_alloc.c into
include/linux/gfp.h. On the way we simplify the function and use the new
zone_type that was also introduced with the zone reduction patchset plus we
also specify the right type for the gfp flags parameter.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
eventcounters: Do not display counters for zones that are not available on an
arch
Do not define or display counters for the DMA32 and the HIGHMEM zone if such
zones were not configured.
[akpm@osdl.org: s390 fix]
[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make ZONE_HIGHMEM optional
- ifdef out code and definitions related to CONFIG_HIGHMEM
- __GFP_HIGHMEM falls back to normal allocations if there is no
ZONE_HIGHMEM
- GFP_ZONEMASK becomes 0x01 if there is no DMA32 and no HIGHMEM
zone.
[jdike@addtoit.com: build fix]
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make ZONE_DMA32 optional
- Add #ifdefs around ZONE_DMA32 specific code and definitions.
- Add CONFIG_ZONE_DMA32 config option and use that for x86_64
that alone needs this zone.
- Remove the use of CONFIG_DMA_IS_DMA32 and CONFIG_DMA_IS_NORMAL
for ia64 and fix up the way per node ZVCs are calculated.
- Fall back to prior GFP_ZONEMASK of 0x03 if there is no
DMA32 zone.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use enum for zones and reformat zones dependent information
Add comments explaning the use of zones and add a zones_t type for zone
numbers.
Line up information that will be #ifdefd by the following patches.
[akpm@osdl.org: comment cleanups]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move totalhigh_pages and nr_free_highpages() into highmem.c/.h
Move the totalhigh_pages definition into highmem.c/.h. Move the
nr_free_highpages function into highmem.c
[yoichi_yuasa@tripeaks.co.jp: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It fixes various coding style issues, specially when spaces are useless. For
example '*' go next to the function name.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__init in headers is pretty useless because the compiler doesn't check it, and
they get out of sync relatively frequently. So if you see an __init in a
header file, it's quite unreliable and you need to check the definition
anyway.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the following needlessly global functions static:
- slab.c: kmem_find_general_cachep()
- swap.c: __page_cache_release()
- vmalloc.c: __vmalloc_node()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that we can detect writers of shared mappings, throttle them. Avoids OOM
by surprise.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tracking of dirty pages in shared writeable mmap()s.
The idea is simple: write protect clean shared writeable pages, catch the
write-fault, make writeable and set dirty. On page write-back clean all the
PTE dirty bits and write protect them once again.
The implementation is a tad harder, mainly because the default
backing_dev_info capabilities were too loosely maintained. Hence it is not
enough to test the backing_dev_info for cap_account_dirty.
The current heuristic is as follows, a VMA is eligible when:
- its shared writeable
(vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
- it is not a 'special' mapping
(vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
- the backing_dev_info is cap_account_dirty
mapping_cap_account_dirty(vma->vm_file->f_mapping)
- f_op->mmap() didn't change the default page protection
Page from remap_pfn_range() are explicitly excluded because their COW
semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
because they don't have a backing store anyway.
mprotect() is taught about the new behaviour as well. However it overrides
the last condition.
Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
It can be called on any page, but is currently only implemented for mapped
pages, if the page is found the be of a VMA that accounts dirty pages it will
also wrprotect the PTE.
Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
under ->private_lock. This seems to be safe, since ->private_lock is used to
serialize access to the buffers, not the page itself. This is needed because
clear_page_dirty() will call into page_mkclean() and would thereby violate
locking order.
[dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce a VM_BUG_ON, which is turned on with CONFIG_DEBUG_VM. Use this
in the lightweight, inline refcounting functions; PageLRU and PageActive
checks in vmscan, because they're pretty well confined to vmscan. And in
page allocate/free fastpaths which can be the hottest parts of the kernel
for kbuilds.
Unlike BUG_ON, VM_BUG_ON must not be used to execute statements with
side-effects, and should not be used outside core mm code.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Give non-highmem architectures access to the kmap API for the purposes of
overriding (this is what the attached patch does).
The proposal is that we should now require all architectures with coherence
issues to manage data coherence via the kmap/kunmap API. Thus driver
writers never have to write code like
kmap(page)
modify data in page
flush_kernel_dcache_page(page)
kunmap(page)
instead, kmap/kunmap will manage the coherence and driver (and filesystem)
writers don't need to worry about how to flush between kmap and kunmap.
For most architectures, the page only needs to be flushed if it was
actually written to *and* there are user mappings of it, so the best
implementation looks to be: clear the page dirty pte bit in the kernel page
tables on kmap and on kunmap, check page->mappings for user maps, and then
the dirty bit, and only flush if it both has user mappings and is dirty.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_cpu_var()/per_cpu()/__get_cpu_var() arguments must be simple
identifiers. Otherwise the arch dependent implementations might break.
This patch enforces the correct usage of the macros by producing a syntax
error if the variable is not a simple identifier.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current time_before/time_after macros will fail typechecks
when passed u64 values (as returned by get_jiffies_64()). On 64bit
systems, this will just result in a warning about mismatching types
without explicit casts, but since unsigned long and u64
(unsigned long long) are of same size, it will still work.
On 32bit systems, a long is 32bits, so the value from get_jiffies_64()
will be truncated by the cast and thus lose all the precision gained by
64bit jiffies.
Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch adds the per thread cookie field to the task struct and the PDA.
Also it makes sure that the PDA value gets the new cookie value at context
switch, and that a new task gets a new cookie at task creation time.
Signed-off-by: Arjan van Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>
The EDD code would scan the command line as a fixed array, without
taking account of either whitespace, null-termination, the old
command-line protocol, late overrides early, or the fact that the
command line may not be reachable from INITSEG.
This should fix those problems, and enable us to use a longer command
line.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Apparently IA64 needs it, but i386/x86-64 don't anymore
since gcc 2.95 support was dropped. Nobody else on linux-arch
requested keeping it generically
Cc: tony.luck@intel.com
Cc: kaos@sgi.com
Signed-off-by: Andi Kleen <ak@suse.de>
Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily. This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time). However for very frequent FPU users... you
take an extra trap every context switch.
The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch. If the app indeed uses the FPU, the
trap is avoided. (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).
After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again). The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.
[akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.
AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.
+From: Jeremy Fitzhardinge <jeremy@goop.org>
KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.
Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup. Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup). It's
surprising this didn't cause more havok.
The fix is to use .pushsection/.popsection, so this stuff nests
properly. A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.
+From: Chuck Ebbert <76306.1226@compuserve.com>
Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.
Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
For NUMA optimization and some other algorithms it is useful to have a fast
to get the current CPU and node numbers in user space.
x86-64 added a fast way to do this in a vsyscall. This adds a generic
syscall for other architectures to make it a generic portable facility.
I expect some of them will also implement it as a faster vsyscall.
The cache is an optimization for the x86-64 vsyscall optimization. Since
what the syscall returns is an approximation anyways and user space
often wants very fast results it can be cached for some time. The norma
methods to get this information in user space are relatively slow
The vsyscall is in a better position to manage the cache because it has direct
access to a fast time stamp (jiffies). For the generic syscall optimization
it doesn't help much, but enforce a valid argument to keep programs
portable
I only added an i386 syscall entry for now. Other architectures can follow
as needed.
AK: Also added some cleanups from Andrew Morton
Signed-off-by: Andi Kleen <ak@suse.de>
This patch adds a vgetcpu vsyscall, which depending on the CPU RDTSCP
capability uses either the RDTSCP or CPUID to obtain a CPU and node
numbers and pass them to the program.
AK: Lots of changes over Vojtech's original code:
Better prototype for vgetcpu()
It's better to pass the cpu / node numbers as separate arguments
to avoid mistakes when going from SMP to NUMA.
Also add a fast time stamp based cache using a user supplied
argument to speed things more up.
Use fast method from Chuck Ebbert to retrieve node/cpu from
GDT limit instead of CPUID
Made sure RDTSCP init is always executed after node is known.
Drop printk
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
To quote Alan Cox:
The default Linux behaviour on an NMI of either memory or unknown is to
continue operation. For many environments such as scientific computing
it is preferable that the box is taken out and the error dealt with than
an uncorrected parity/ECC error get propogated.
A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is unchanged. In other respects
the new proc/sys entry works like the existing panic controls already in
that directory.
This is separate to the edac support - EDAC allows supported chipsets to
handle ECC errors well, this change allows unsupported cases to at least
panic rather than cause problems further down the line.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
watchdog.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
There is a potential deadlock in the driver core. It boils down to
the fact that bus_remove_device() calls klist_remove() instead of
klist_del(), thereby waiting until the reference count of the
klist_node in the bus's klist of devices drops to 0. The refcount
can't reach 0 so long as a modprobe process is trying to bind a new
driver to the device being removed, by calling __driver_attach(). The
problem is that __driver_attach() tries to acquire the device's
parent's semaphore, but the caller of bus_remove_device() is quite
likely to own that semaphore already.
It isn't sufficient just to replace klist_remove() with klist_del().
Doing so runs the risk that the device would remain on the bus's klist
of devices for some time, and so could be bound to another driver even
after it was unregistered. What's needed is a new way to distinguish
whether or not a device is registered, based on a criterion other than
whether its klist_node is linked into the bus's klist of devices. That
way driver binding can fail when the device is unregistered, even if
it is still linked into the klist.
This patch (as782) implements the solution, by adding a new bitflag to
indiate when a struct device is registered, by testing the flag before
allowing a driver to bind a device, and by changing the definition of
the device_is_registered() inline.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds the infrastructure for drivers to do a threaded probe, and
waits at init time for all currently outstanding probes to complete.
A new kernel thread will be created when the probe() function for the
driver is called, if the multithread_probe bit is set in the driver
saying it can support this kind of operation.
I have tested this with USB and PCI, and it works, and shaves off a lot
of time in the boot process, but there are issues with finding root boot
disks, and some USB drivers assume that this can never happen, so it is
currently not enabled for any bus type. Individual drivers can enable
this right now if they wish, and bus authors can selectivly turn it on
as well, once they determine that their subsystem will work properly
with it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't be crufty. Mark it __must_check too.
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Those 1500 warnings can be a bit of a pain. Add a config option to shut them
up.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We're getting a lot of crashes in the sysfs/kobject/device/bus/class code and
they're very hard to diagnose.
I'm suspecting that in some cases this is because drivers aren't checking
return values and aren't handling errors correctly. So the code blithely
blunders on and crashes later in very obscure ways.
There's just no reason to ignore errors which can and do occur. So the patch
sprinkles __must_check all over these APIs.
Causes 1,513 new warnings. Heh.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make sysfs_remove_bin_file() void. If it detects an error,
printk the file name and call dump_stack().
sysfs_hash_and_remove() now returns an error code indicating
its success or failure so that sysfs_remove_bin_file() can
know success/failure.
Convert the only driver that checked the return value of
sysfs_remove_bin_file().
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Makes it easier for devices to create and remove binary attribute files
so they don't have to call directly into sysfs. This is needed to help
with the conversion from struct class_device to struct device.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When moving class_device usage over to device, we need to handle
class_interfaces properly with devices. This patch adds that support.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This change creates a devices/virtual/CLASS_NAME tree for struct devices
that belong to a class, yet do not have a "real" struct device for a
parent. It automatically creates the directories on the fly as needed.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds two new callbacks to the class structure:
int (*dev_uevent)(struct device *dev, char **envp, int num_envp,
char *buffer, int buffer_size);
void (*dev_release)(struct device *dev);
And one pointer:
struct device_attribute * dev_attrs;
which all corrispond with the same thing as the "normal" class devices
do, yet this is for when a struct device is bound to a class.
Someday soon, struct class_device will go away, and then the other
fields in this structure can be removed too. But this is necessary in
order to get the transition to work properly.
Tested out on a network core patch that converted it to use struct
device instead of struct class_device.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is needed for the network class devices in order to be able to
convert over to use struct device.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Teach platform_bus about the new suspend_late/resume_early PM calls,
issued with IRQs off. Do we really need sysdev and friends any more,
or can janitors start switching its users over to platform_device so
we can do a minor code-ectomy?
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Remove the new suspend_prepare() phase. It doesn't seem very usable,
has never been tested, doesn't address fault cleanup, and would need
a sibling resume_complete(); plus there are no real use cases. It
could be restored later if those issues get resolved.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds a new pm_message_t event type to use when preparing to restore a
swsusp snapshot. Devices that have been initialized by Linux after resume
(rather than left in power-up-reset state) may need to be reset; this new
event type give drivers the chance to do that.
The drivers that will care about this are those which understand more hardware
states than just "on" and "reset", relying on hardware state during resume()
methods to be either the state left by the preceding suspend(), or a
power-lost reset. The best current example of this class of drivers are USB
host controller drivers, which currently do not work through swsusp when
they're statically linked.
When the swsusp freeze/thaw mechanism kicks in, a troublesome third state
could exist: one state set up by a different kernel instance, before a
snapshot image is resumed. This mechanism lets drivers prevent that state.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changes the PCI core to use the new suspend infrastructure changes.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Allow devices to participate in the suspend process more intimately,
in particular, allow the final phase (with interrupts disabled) to
also be open to normal devices, not just system devices.
Also, allow classes to participate in device suspend.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NetLabel]: update docs with website information
[NetLabel]: rework the Netlink attribute handling (part 2)
[NetLabel]: rework the Netlink attribute handling (part 1)
[Netlink]: add nla_validate_nested()
[NETLINK]: add nla_for_each_nested() to the interface list
[NetLabel]: change the SELinux permissions
[NetLabel]: make the CIPSOv4 cache spinlocks bottom half safe
[NetLabel]: correct improper handling of non-NetLabel peer contexts
[TCP]: make cubic the default
[TCP]: default congestion control menu
[ATM] he: Fix __init/__devinit conflict
[NETFILTER]: Add dscp,DSCP headers to header-y
[DCCP]: Introduce dccp_probe
[DCCP]: Use constants for CCIDs
[DCCP]: Introduce constants for CCID numbers
[DCCP]: Allow default/fallback service code.
Add logic to check ARP request / reply packets used for ARP
monitor link integrity checking.
The current method simply examines the slave device to see if it
has sent and received traffic; this can be fooled by extraneous traffic.
For example, if multiple hosts running bonding are behind a common
switch, the probe traffic from the multiple instances of bonding will
update the tx/rx times on each other's slave devices.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add priv_flag to specifically identify bonding-involved devices. Needed
because IFF_MASTER is an unreliable identifier (vlan interfaces above bonding
will inherit IFF_MASTER). Misidentification of devices would cause
notifier events for other devices to be erroneously processed by bonding,
causing various havoc.
Bug discovered by Martin Papik <martin.papik@ipsec.info>; this patch is
modified from his original.
Signed-off-by: Martin Papik <martin.papik@ipsec.info>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This is version 21 of the Wireless Extensions. Changelog :
o finishes migrating the ESSID API (remove the +1)
o netdev->get_wireless_stats is no more
o long/short retry
This is a redacted version of a patch originally submitted by Jean
Tourrilhes. I removed most of the additions, in order to minimize
future support requirements for nl80211 (or other WE successor).
CC: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
They all contain the same thing. Instead, have a single generic one in
include/asm-generic, and permit an arch to override as needed.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch adds xt_dscp.h and xt_DSCP.h to the kernel headers which are
exported via 'make headers_install'. These are necessary for userspace
to add rules using dscp match and DSCP target.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (28 commits)
ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callback
ocfs2: Remove ->unblock lockres operation
ocfs2: move downconvert worker to lockres ops
ocfs2: Remove unused dlmglue functions
ocfs2: Have the metadata lock use generic dlmglue functions
ocfs2: Add ->set_lvb callback in dlmglue
ocfs2: Add ->check_downconvert callback in dlmglue
ocfs2: Check for refreshing locks in generic unblock function
ocfs2: don't unconditionally pass LVB flags
ocfs2: combine inode and generic blocking AST functions
ocfs2: Add ->get_osb() dlmglue locking operation
ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_ops
ocfs2: combine inode and generic AST functions
ocfs2: Clean up lock resource refresh flags
ocfs2: Remove i_generation from inode lock names
ocfs2: Encode i_generation in the meta data lvb
ocfs2: Free up some space in the lvb
ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock()
ocfs2: manually d_move() during ocfs2_rename()
[PATCH] Allow file systems to manually d_move() inside of ->rename()
...
Some file systems want to manually d_move() the dentries involved in a
rename. We can do this by making use of the FS_ODD_RENAME flag if we just
have nfs_rename() unconditionally do the d_move(). While there, we rename
the flag to be more descriptive.
OCFS2 uses this to protect that part of the rename operation with a cluster
lock.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This has been discussed on dccp@vger and removes the necessity for applications
to supply service codes in each and every case.
If an application does not want to provide a service code, that's fine, it will
be given 0. Otherwise, service codes can be set via socket options as before.
This patch has been tested using various client/server configurations
(including listening on multiple service codes).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (50 commits)
[libata] Delete pata_it8172 driver
[PATCH] libata: improve handling of diagostic fail (and hardware that misreports it)
[PATCH] libata: fix non-uniform ports handling
Fix libata resource conflict for legacy mode
[libata] ata_piix: build fix
[PATCH] pata_amd: Check enable bits on Nvidia
[PATCH] Update SiS PATA
[libata] Add pata_jmicron driver to Kconfig, Makefile
[libata #pata-drivers] Trim trailing whitespace.
[libata] Trim trailing whitespace.
[libata] Add a bunch of PATA drivers.
Rename libata-bmdma.c to libata-sff.c.
libata: Grand renaming.
Clean up drivers/ata/Kconfig a bit.
[PATCH] CONFIG_PM=n slim: drivers/scsi/sata_sil*
[PATCH] sata_via: Add SATA support for vt8237a
[PATCH] libata: change path to libata in libata.tmpl
[PATCH] libata: s/CONFIG_SCSI_SATA/CONFIG_[S]ATA/g in pci/quirks.c
libata: Make sure drivers/ata is a separate Kconfig menu
[libata] ata_piix: add missing kfree()
...
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (217 commits)
net/ieee80211: fix more crypto-related build breakage
[PATCH] Spidernet: add ethtool -S (show statistics)
[NET] GT96100: Delete bitrotting ethernet driver
[PATCH] mv643xx_eth: restrict to 32-bit PPC_MULTIPLATFORM
[PATCH] Cirrus Logic ep93xx ethernet driver
r8169: the MMIO region of the 8167 stands behin BAR#1
e1000, ixgb: Remove pointless wrappers
[PATCH] Remove powerpc specific parts of 3c509 driver
[PATCH] s2io: Switch to pci_get_device
[PATCH] gt96100: move to pci_get_device API
[PATCH] ehea: bugfix for register access functions
[PATCH] e1000 disable device on PCI error
drivers/net/phy/fixed: #if 0 some incomplete code
drivers/net: const-ify ethtool_ops declarations
[PATCH] ethtool: allow const ethtool_ops
[PATCH] sky2: big endian
[PATCH] sky2: fiber support
[PATCH] sky2: tx pause bug fix
drivers/net: Trim trailing whitespace
[PATCH] ehea: IBM eHEA Ethernet Device Driver
...
Manually resolved conflicts in drivers/net/ixgb/ixgb_main.c and
drivers/net/sky2.c related to CHECKSUM_HW/CHECKSUM_PARTIAL changes by
commit 84fa7933a3 that just happened to be
next to unrelated changes in this update.
Some MMC hosts can only handle log2 block sizes. Unfortunately,
the MMC password support needs to be able to send non-log2 block
sizes. Provide a capability so that the MMC password support can
decide whether it should use this support or not.
The unfortunate side effect of this host limitation is that any
MMC card protected by a password which is not a log2 block size
can not be accessed on a host which only allows a log2 block size.
This change just adds the flag. The MMC password support code
needs updating to use it (if and when it is finally submitted.)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (74 commits)
NFS: unmark NFS direct I/O as experimental
NFS: add comments clarifying the use of nfs_post_op_update()
NFSv4: rpc_mkpipe creating socket inodes w/out sk buffers
NFS: Use SEEK_END instead of hardcoded value
NFSv4: When mounting with a port=0 argument, substitute port=2049
NFSv4: Poll more aggressively when handling NFS4ERR_DELAY
NFSv4: Handle the condition NFS4ERR_FILE_OPEN
NFSv4: Retry lease recovery if it failed during a synchronous operation.
NFS: Don't invalidate the symlink we just stuffed into the cache
NFS: Make read() return an ESTALE if the file has been deleted
NFSv4: It's perfectly legal for clp to be NULL here....
NFS: nfs_lookup - don't hash dentry when optimising away the lookup
SUNRPC: Fix Oops in pmap_getport_done
SUNRPC: Add refcounting to the struct rpc_xprt
SUNRPC: Clean up soft task error handling
SUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors
SUNRPC: rpc_delay() should not clobber the rpc_task->tk_status
Fix a referral error Oops
NFS: NFS_ROOT should use the new rpc_create API
NFS: Fix up compiler warnings on 64-bit platforms in client.c
...
Manually resolved conflict in net/sunrpc/xprtsock.c
In a subsequent patch, this will allow the portmapper to take a reference
to the rpc_xprt for which it is updating the port number, fixing an Oops.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we have a copy of the symlink path in the page cache, we can pass
a struct page down to the XDR routines instead of a string buffer.
Test plan:
Connectathon, all NFS versions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the LOOKUP or GETATTR in nfs_instantiate fail, nfs_instantiate will do a
d_drop before returning. But some callers already do a d_drop in the case
of an error return. Make certain we do only one d_drop in all error paths.
This issue was introduced because over time, the symlink proc API diverged
slightly from the create/mkdir/mknod proc API. To prevent other coding
mistakes of this type, change the symlink proc API to be more like
create/mkdir/mknod and move the nfs_instantiate call into the symlink proc
routines so it is used in exactly the same way for create, mkdir, mknod,
and symlink.
Test plan:
Connectathon, all versions of NFS.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The two function call API for creating a new RPC client is now obsolete.
Remove it.
Also, remove an unnecessary check to see whether the caller is capable of
using privileged network services. The kernel RPC client always uses a
privileged ephemeral port by default; callers are responsible for checking
the authority of users to make use of any RPC service, or for specifying
that a nonprivileged port is acceptable.
Test plan:
Repeated runs of Connectathon locking suite. Check network trace to ensure
correctness of NLM requests and replies.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Prepare for more generic transport endpoint handling needed by transports
that might use different forms of addressing, such as IPv6.
Introduce a single function call to replace the two-call
xprt_create_proto/rpc_create_client API. Define a new rpc_create_args
structure that allows callers to pass in remote endpoint addresses of
varying length.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Remove some unused macros related to accessing an RPC peer address
Test plan:
Compile kernel with CONFIG_NFS option enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
IPv6 addresses are big (128 bytes). Now that no RPC client consumers treat
the addr field in rpc_xprt structs as an opaque, and access it only via the
API calls, we can safely widen the field in the rpc_xprt struct to
accomodate larger addresses.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Provide an API for formatting the remote peer address for printing without
exposing its internal structure. The address could be dynamic, so we
support a function call to get the address rather than reading it straight
out of a structure.
Test-plan:
Destructive testing (unplugging the network temporarily). Probably need
to rig a server where certain services aren't running, or that returns an
error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>