Fix a number of issues with the per-MM VMA patch:
(1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
a NOMMU system with more than 2G pages. Makes no difference on a 32-bit
system.
(2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
lest it overflow.
(3) Move the allocation of the vm_area_struct slab back for fork.c.
(4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
(5) Use BUG_ON() rather than if () BUG().
(6) Make the default validate_nommu_regions() a static inline rather than a
#define.
(7) Make free_page_series()'s objection to pages with a refcount != 1 more
informative.
(8) Adjust the __put_nommu_region() banner comment to indicate that the
semaphore must be held for writing.
(9) Limit the number of warnings about munmaps of non-mmapped regions.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a build failure with generic debug pagealloc:
mm/debug-pagealloc.c: In function 'set_page_poison':
mm/debug-pagealloc.c:8: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: In function 'clear_page_poison':
mm/debug-pagealloc.c:13: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: In function 'page_poison':
mm/debug-pagealloc.c:18: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: At top level:
mm/debug-pagealloc.c:120: error: redefinition of 'kernel_map_pages'
include/linux/mm.h:1278: error: previous definition of 'kernel_map_pages' was here
mm/debug-pagealloc.c: In function 'kernel_map_pages':
mm/debug-pagealloc.c:122: error: 'debug_pagealloc_enabled' undeclared (first use in this function)
by fixing
- debug_flags should be in struct page
- define DEBUG_PAGEALLOC config option for all architectures
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need full-scale adjustment to fix a TCP miscount in the next
patch, so just move it into a helper and call for that from the
other places.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove an include that isn't actually needed to prevent needless
rebuilds.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The readq/writeq really need to be static inline on the arches which
don't provide them.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (58 commits)
SUNRPC: Ensure IPV6_V6ONLY is set on the socket before binding to a port
NSM: Fix unaligned accesses in nsm_init_private()
NFS: Simplify logic to compare socket addresses in client.c
NFS: Start PF_INET6 callback listener only if IPv6 support is available
lockd: Start PF_INET6 listener only if IPv6 support is available
SUNRPC: Remove CONFIG_SUNRPC_REGISTER_V4
SUNRPC: rpcb_register() should handle errors silently
SUNRPC: Simplify kernel RPC service registration
SUNRPC: Simplify svc_unregister()
SUNRPC: Allow callers to pass rpcb_v4_register a NULL address
SUNRPC: rpcbind actually interprets r_owner string
SUNRPC: Clean up address type casts in rpcb_v4_register()
SUNRPC: Don't return EPROTONOSUPPORT in svc_register()'s helpers
SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services
SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets
NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks
SUNRPC: Remove @family argument from svc_create() and svc_create_pooled()
SUNRPC: Change svc_create_xprt() to take a @family argument
SUNRPC: svc_setup_socket() gets protocol family from socket
SUNRPC: Pass a family argument to svc_register()
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits)
ext4: Regularize mount options
ext4: fix locking typo in mballoc which could cause soft lockup hangs
ext4: fix typo which causes a memory leak on error path
jbd2: Update locking coments
ext4: Rename pa_linear to pa_type
ext4: add checks of block references for non-extent inodes
ext4: Check for an valid i_mode when reading the inode from disk
ext4: Use WRITE_SYNC for commits which are caused by fsync()
ext4: Add auto_da_alloc mount option
ext4: Use struct flex_groups to calculate get_orlov_stats()
ext4: Use atomic_t's in struct flex_groups
ext4: remove /proc tuning knobs
ext4: Add sysfs support
ext4: Track lifetime disk writes
ext4: Fix discard of inode prealloc space with delayed allocation.
ext4: Automatically allocate delay allocated blocks on rename
ext4: Automatically allocate delay allocated blocks on close
ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl
ext4: Simplify delalloc code by removing mpage_da_writepages()
ext4: Save stack space by removing fake buffer heads
...
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (59 commits)
ide-floppy: do not complete rq's prematurely
ide: be able to build pmac driver without IDE built-in
ide-pmac: IDE cable detection on Apple PowerBook
ide: inline SELECT_DRIVE()
ide: turn selectproc() method into dev_select() method (take 5)
MAINTAINERS: move old ide-{floppy,tape} entries to CREDITS (take 2)
ide: move data register access out of tf_{read|load}() methods (take 2)
ide: call {in|out}put_data() methods from tf_{read|load}() methods (take 2)
ide-io-std: shorten ide_{in|out}put_data()
ide: rename IDE_TFLAG_IN_[HOB_]FEATURE
ide: turn set_irq() method into write_devctl() method
ide: use ATA_HOB
ide-disk: use ATA_ERR
ide: add support for CFA specified transfer modes (take 3)
ide-iops: only clear DMA words on setting DMA mode
ide: identify data word 53 bit 1 doesn't cover words 62 and 63 (take 3)
au1xxx-ide: auide_{in|out}sw() should be static
ide-floppy: use ide_pio_bytes()
ide-{floppy,tape}: fix padding for PIO transfers
ide: remove CONFIG_BLK_DEV_IDEDOUBLER config option
...
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
The s1d13xxx chip provides two values of identification value: the
Production id (e.g 13506/13505/13806..) and a revision number 0,1,2,3).
Together these can help us to differentiate between similiar setups.
This patch adds the proper way of grabbing both those values and save them
for future reference (in order to decide what functions a card supports,
e.g acceleration).
We also move away from the concept of all s1d13xxx = s1d13806 when we
really support alot more.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: simplify s1d13xxxfb_probe()]
Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With a postfix decrement t reaches -1 on timeout which results in a
return of 0.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an accelerator constant so almost all Cirrus are recognized as
accelerators by the fbset command.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix 8bpp mode by adding handling of the Laguna chipsets to various places
and stop trashing a HDR register which probably does not exist on the
Laguna.
Fix compilation warnings about uninitialized variables also.
Finally, all 8bpp, 16bpp and 32bpp modes work on the Laguna chipset.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add additional overflow register setting for Laguna chips.
Also, simplify some code in the cirrusfb_pan_display() and
cirrusfb_blank().
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix trailing whitespace because quilt complained about it.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- the LEAP_YEAR macro is buggy - it references its arg multiple times.
Fix this by turning it into a C function.
- give it a more approriate name
- Move it to rtc.h so that other .c files can use it, instead of copying it.
Cc: dann frazier <dannf@hp.com>
Acked-by: Alessandro Zummo <alessandro.zummo@towertech.it>
Cc: stephane eranian <eranian@googlemail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
autofs_dev-ioctl.h is included by both the kernel module and user space tools
and it includes two kernel header files. Compiles work if the kernel headers
are installed but fail otherwise.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement full support for OF SPI bindings. Now the driver can manage its
own chip selects without any help from the board files and/or fsl_soc
constructors.
The "legacy" code is well isolated and could be removed as time goes by.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The main purpose of this patch is to pass 'struct spi_device' to the chip
select handling routines. This is needed so that we could implement
full-fledged OpenFirmware support for this driver.
While at it, also:
- Replace two {de,activate}_cs routines by single cs_contol().
- Don't duplicate platform data callbacks in mpc83xx_spi struct.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce new wakeup macros that allow passing an event mask to the wakeup
targets. They exactly mimic their non-_poll() counterpart, with the added
event mask passing capability. I did add only the ones currently
requested, avoiding the _nr() and _all() for the moment.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: William Lee Irwin III <wli@movementarian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset introduces wakeup hints for some of the most popular (from
epoll POV) devices, so that epoll code can avoid spurious wakeups on its
waiters.
The problem with epoll is that the callback-based wakeups do not, ATM,
carry any information about the events the wakeup is related to. So the
only choice epoll has (not being able to call f_op->poll() from inside the
callback), is to add the file* to a ready-list and resolve the real events
later on, at epoll_wait() (or its own f_op->poll()) time. This can cause
spurious wakeups, since the wake_up() itself might be for an event the
caller is not interested into.
The rate of these spurious wakeup can be pretty high in case of many
network sockets being monitored.
By allowing devices to report the events the wakeups refer to (at least
the two major classes - POLLIN/POLLOUT), we are able to spare useless
wakeups by proper handling inside the epoll's poll callback.
Epoll will have in any case to call f_op->poll() on the file* later on,
since the change to be done in order to have the full event set sent via
wakeup, is too invasive for the way our f_op->poll() system works (the
full event set is calculated inside the poll function - there are too many
of them to even start thinking the change - also poll/select would need
change too).
Epoll is changed in a way that both devices which send event hints, and
the ones that don't, are correctly handled. The former will gain some
efficiency though.
As a general rule for devices, would be to add an event mask by using
key-aware wakeup macros, when making up poll wait queues. I tested it
(together with the epoll's poll fix patch Andrew has in -mm) and wakeups
for the supported devices are correctly filtered.
Test program available here:
http://www.xmailserver.org/epoll_test.c
This patch:
Nothing revolutionary here. Just using the available "key" that our
wakeup core already support. The __wake_up_locked_key() was no brainer,
since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers
around __wake_up_common().
The __wake_up_sync() function had a body, so the choice was between
borrowing the body for __wake_up_sync_key() and calling it from
__wake_up_sync(), or make an inline and calling it from both. I chose the
former since in most archs it all resolves to "mov $0, REG; jmp ADDR".
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: William Lee Irwin III <wli@movementarian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
People started using eventfd in a semaphore-like way where before they
were using pipes.
That is, counter-based resource access. Where a "wait()" returns
immediately by decrementing the counter by one, if counter is greater than
zero. Otherwise will wait. And where a "post(count)" will add count to
the counter releasing the appropriate amount of waiters. If eventfd the
"post" (write) part is fine, while the "wait" (read) does not dequeue 1,
but the whole counter value.
The problem with eventfd is that a read() on the fd returns and wipes the
whole counter, making the use of it as semaphore a little bit more
cumbersome. You can do a read() followed by a write() of COUNTER-1, but
IMO it's pretty easy and cheap to make this work w/out extra steps. This
patch introduces a new eventfd flag that tells eventfd to only dequeue 1
from the counter, allowing simple read/write to make it behave like a
semaphore. Simple test here:
http://www.xmailserver.org/eventfd-sem.c
To be back-compatible with earlier kernels, userspace applications should
probe for the availability of this feature via
#ifdef EFD_SEMAPHORE
fd = eventfd2 (CNT, EFD_SEMAPHORE);
if (fd == -1 && errno == EINVAL)
<fallback>
#else
<fallback>
#endif
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: <linux-api@vger.kernel.org>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We cover all log-levels by pr_... macros except KERN_CONT one. Add it
for convenience.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the filesystem freeze operation has been elevated to the VFS, and
is just an ioctl away, some sort of safety net for unintentionally frozen
root filesystems may be in order.
The timeout thaw originally proposed did not get merged, but perhaps
something like this would be useful in emergencies.
For example, freeze /path/to/mountpoint may freeze your root filesystem if
you forgot that you had that unmounted.
I chose 'j' as the last remaining character other than 'h' which is sort
of reserved for help (because help is generated on any unknown character).
I've tested this on a non-root fs with multiple (nested) freezers, as well
as on a system rendered unresponsive due to a frozen root fs.
[randy.dunlap@oracle.com: emergency thaw only if CONFIG_BLOCK enabled]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the following header file changes:
- remove arch ifdefs and asm/suspend.h from linux/suspend.h
- add asm/suspend.h to disk.c (for arch_prepare_suspend())
- add linux/io.h to swsusp.c (for ioremap())
- x86 32/64 bit compile fixes
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Synopsis: if shmem_writepage calls swap_writepage directly, most shmem
swap loads benefit, and a catastrophic interaction between SLUB and some
flash storage is avoided.
shmem_writepage() has always been peculiar in making no attempt to write:
it has just transferred a shmem page from file cache to swap cache, then
let that page make its way around the LRU again before being written and
freed.
The idea was that people use tmpfs because they want those pages to stay
in RAM; so although we give it an overflow to swap, we should resist
writing too soon, giving those pages a second chance before they can be
reclaimed.
That was always questionable, and I've toyed with this patch for years;
but never had a clear justification to depart from the original design.
It became more questionable in 2.6.28, when the split LRU patches classed
shmem and tmpfs pages as SwapBacked rather than as file_cache: that in
itself gives them more resistance to reclaim than normal file pages. I
prepared this patch for 2.6.29, but the merge window arrived before I'd
completed gathering statistics to justify sending it in.
Then while comparing SLQB against SLUB, running SLUB on a laptop I'd
habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping
tests five times slower than SLAB or SLQB - other machines slower too, but
nowhere near so bad. Simpler "cp -a" swapping tests showed the same.
slub_max_order=0 brings sanity to all, but heavy swapping is too far from
normal to justify such a tuning. The crucial factor on that laptop turns
out to be that I'm using an SD card for swap. What happens is this:
By default, SLUB uses order-2 pages for shmem_inode_cache (and many other
fs inodes), so creating tmpfs files under memory pressure brings lumpy
reclaim into play. One subpage of the order is chosen from the bottom of
the LRU as usual, then the other three picked out from their random
positions on the LRUs.
In a tmpfs load, many of these pages will be ones which already passed
through shmem_writepage, so already have swap allocated. And though their
offsets on swap were probably allocated sequentially, now that the pages
are picked off at random, their swap offsets are scattered.
But the flash storage on the SD card is very sensitive to having its
writes merged: once swap is written at scattered offsets, performance
falls apart. Rotating disk seeks increase too, but less disastrously.
So: stop giving shmem/tmpfs pages a second pass around the LRU, write them
out to swap as soon as their swap has been allocated.
It's surely possible to devise an artificial load which runs faster the
old way, one whose sizing is such that the tmpfs pages on their second
pass are the ones that are wanted again, and other pages not.
But I've not yet found such a load: on all machines, under the loads I've
tried, immediate swap_writepage speeds up shmem swapping: especially when
using the SLUB allocator (and more effectively than slub_max_order=0), but
also with the others; and it also reduces the variance between runs. How
much faster varies widely: a factor of five is rare, 5% is common.
One load which might have suffered: imagine a swapping shmem load in a
limited mem_cgroup on a machine with plenty of memory. Before 2.6.29 the
swapcache was not charged, and such a load would have run quickest with
the shmem swapcache never written to swap. But now swapcache is charged,
so even this load benefits from shmem_writepage directly to swap.
Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h:
it's silly because that will never get called; but refactoring shmem.c
sensibly according to CONFIG_SWAP will be a separate task.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
try_to_free_pages() is used for the direct reclaim of up to
SWAP_CLUSTER_MAX pages when watermarks are low. The caller to
alloc_pages_nodemask() can specify a nodemask of nodes that are allowed to
be used but this is not passed to try_to_free_pages(). This can lead to
unnecessary reclaim of pages that are unusable by the caller and int the
worst case lead to allocation failure as progress was not been make where
it is needed.
This patch passes the nodemask used for alloc_pages_nodemask() to
try_to_free_pages().
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mlock() facility does not exist for NOMMU since all mappings are
effectively locked anyway, so we don't make the bits available when
they're not useful.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Enrik Berkhan <Enrik.Berkhan@ge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use debug_kmap_atomic in kmap_atomic, kmap_atomic_pfn, and
iomap_atomic_prot_pfn.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86 has debug_kmap_atomic_prot() which is error checking function for
kmap_atomic. It is usefull for the other architectures, although it needs
CONFIG_TRACE_IRQFLAGS_SUPPORT.
This patch exposes it to the other architectures.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags. There should be no functional change.
This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg. virtual_address to the
driver, which might be important in some special cases).
This is required for a subsequent fix. And will also make it easier to
merge page_mkwrite() with fault() in future.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On PowerPC we allocate large boot time hashes on node 0. This leads to an
imbalance in the free memory, for example on a 64GB box (4 x 16GB nodes):
Free memory:
Node 0: 97.03%
Node 1: 98.54%
Node 2: 98.42%
Node 3: 98.53%
If we switch to using vmalloc (like ia64 and x86-64) things are more
balanced:
Free memory:
Node 0: 97.53%
Node 1: 98.35%
Node 2: 98.33%
Node 3: 98.33%
For many HPC applications we are limited by the free available memory on
the smallest node, so even though the same amount of memory is used the
better balancing helps.
Since all 64bit NUMA capable architectures should have sufficient vmalloc
space, it makes sense to enable it via CONFIG_64BIT.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838
On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange
way from the user's point of view:
# echo 500 >/proc/sys/vm/dirty_writeback_centisecs
# cat /proc/sys/vm/dirty_writeback_centisecs
499
So, we have 5000 jiffies converted to only 499 clock ticks and reported
back.
TICK_NSEC = 999848
ACTHZ = 256039
Keeping in-kernel variable in units passed from userspace will fix issue
of course, but this probably won't be right for every sysctl.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_DEBUG_PAGEALLOC is now supported by x86, powerpc, sparc64, and
s390. This patch implements it for the rest of the architectures by
filling the pages with poison byte patterns after free_pages() and
verifying the poison patterns before alloc_pages().
This generic one cannot detect invalid page accesses immediately but
invalid read access may cause invalid dereference by poisoned memory and
invalid write access can be detected after a long delay.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I notice there are many places doing copy_from_user() which follows
kmalloc():
dst = kmalloc(len, GFP_KERNEL);
if (!dst)
return -ENOMEM;
if (copy_from_user(dst, src, len)) {
kfree(dst);
return -EFAULT
}
memdup_user() is a wrapper of the above code. With this new function, we
don't have to write 'len' twice, which can lead to typos/mistakes. It
also produces smaller code and kernel text.
A quick grep shows 250+ places where memdup_user() *may* be used. I'll
prepare a patchset to do this conversion.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a helper function account_page_dirtied(). Use that from two
callsites. reiser4 adds a function which adds a third callsite.
Signed-off-by: Edward Shishkin<edward.shishkin@gmail.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: cleanup
In almost cases, for_each_zone() is used with populated_zone(). It's
because almost function doesn't need memoryless node information.
Therefore, for_each_populated_zone() can help to make code simplify.
This patch has no functional change.
[akpm@linux-foundation.org: small cleanup]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct tty_operations::proc_fops took it's place and there is one less
create_proc_read_entry() user now!
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Used for gradual switch of TTY drivers from using ->read_proc which helps
with gradual switch from ->read_proc for the whole tree.
As side effect, fix possible race condition when ->data initialized after
PDE is hooked into proc tree.
->proc_fops takes precedence over ->read_proc.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 29a814d2ee (vfs: add hooks for
ext4's delayed allocation support) exported the following functions
mpage_bio_submit()
__mpage_writepage()
for the benefit of ext4's delayed allocation support. Since commit
a1d6cc563b (ext4: Rework the
ext4_da_writepages() function), these functions are not used by the
ext4 driver anymore. However, the now unnecessary exports still
remain, and this patch removes those. Moreover, these two functions
can become static again.
The issue was spotted by namespacecheck.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Don't pull it in sched.h; very few files actually need it and those
can include directly. sched.h itself only needs forward declaration
of struct fs_struct;
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* all changes of current->fs are done under task_lock and write_lock of
old fs->lock
* refcount is not atomic anymore (same protection)
* its decrements are done when removing reference from current; at the
same time we decide whether to free it.
* put_fs_struct() is gone
* new field - ->in_exec. Set by check_unsafe_exec() if we are trying to do
execve() and only subthreads share fs_struct. Cleared when finishing exec
(success and failure alike). Makes CLONE_FS fail with -EAGAIN if set.
* check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
is in progress.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pure code move; two new helper functions for nfsd and daemonize
(unshare_fs_struct() and daemonize_fs_struct() resp.; for now -
the same code as used to be in callers). unshare_fs_struct()
exported (for nfsd, as copy_fs_struct()/exit_fs() used to be),
copy_fs_struct() and exit_fs() don't need exports anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Since SELECT_DRIVE() has boiled down to a mere dev_select() method call, it now
makes sense to just inline it...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Turn selectproc() method into dev_select() method by teaching it to write to the
device register and moving it from 'struct ide_port_ops' to 'struct ide_tp_ops'.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: benh@kernel.crashing.org
Cc: petkovbb@gmail.com
[bart: add ->dev_select to at91_ide.c and tx4939.c (__BIG_ENDIAN case)]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The feature register has never been readable -- when its location is read, one
gets the error register value; hence rename IDE_TFLAG_IN_[HOB_]FEATURE into
IDE_TFLAG_IN_[HOB_]ERROR and introduce the 'hob_error' field into the 'struct
ide_taskfile' (despite the error register not really depending on the HOB bit).
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Turn set_irq() method with its software reset hack into write_devctl() method
(for just writing a value into the device control register) at last...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Fix ide_init_sg_cmd() setup for non-fs requests.
* Convert ide_pc_intr() to use ide_pio_bytes() for floppy media.
* Remove no longer needed ide_io_buffers() and sg/sg_cnt fields
from struct ide_atapi_pc.
* Remove partial completions; kill idefloppy_update_buffers(), as a
result.
* Add some more debugging statements.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
struct ide_atapi_pc is often allocated on the stack and size of ->pc_buf
size is 256 bytes. However since only ide_floppy_create_read_capacity_cmd()
and idetape_create_inquiry_cmd() require such size allocate buffers for
these pc-s explicitely and decrease ->pc_buf size to 64 bytes.
Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Move ide_map_sg() calls out from ide_build_sglist()
to ide_dma_prepare().
* Pass command to ide_destroy_dmatable().
* Rename ide_build_sglist() to ide_dma_map_sg()
and ide_destroy_dmatable() to ide_dma_unmap_sg().
There should be no functional changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add (an optional) ->dma_check method for checking if DMA can be
used for a given command and fail DMA setup in ide_dma_prepare()
if necessary.
* Convert alim15x3 and trm290 host drivers to use ->dma_check.
* Rename ali15x3_dma_setup() to ali_dma_check() while at it.
There should be no functional changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ide_dma_prepare() helper.
* Convert ide_issue_pc() and do_rw_taskfile() to use it.
* Make ide_build_sglist() static.
There should be no functional changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
All custom ->dma_timeout implementations call the generic one thus it is
possible to have only an optional method for resetting DMA engine instead:
* Add ->dma_clear method and convert hpt366, pdc202xx_old and sl82c105
host drivers to use it.
* Always use ide_dma_timeout() in ide_dma_timeout_retry() and remove
->dma_timeout method.
* Make ide_dma_timeout() static.
There should be no functional changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
We now support arbitrary number of bytes per-IRQ also for fs requests
so remove ide_cd_check_transfer_size() and IDE_AFLAG_LIMIT_NFRAMES.
Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Export ide_pio_bytes().
* Add ->last_xfer_len field to struct ide_cmd.
* Add ide_cd_error_cmd() helper to ide-cd.
* Convert ide-cd to use scatterlists also for PIO transfers (fs requests
only for now) and get rid of partial completions (except when the error
happens -- which is still subject to change later because looking at
ATAPI spec it seems that the device is free to error the whole transfer
with setting the Error bit only on the last transfer chunk).
* Update ide_cd_{prepare_rw,restore_request,do_request}() accordingly.
* Inline ide_cd_restore_request() into cdrom_start_rw().
Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
wireless: remove duplicated .ndo_set_mac_address
netfilter: xtables: fix IPv6 dependency in the cluster match
tg3: Add GRO support.
niu: Add GRO support.
ucc_geth: Fix use-after-of_node_put() in ucc_geth_probe().
gianfar: Fix use-after-of_node_put() in gfar_of_init().
kernel: remove HIPQUAD()
netpoll: store local and remote ip in net-endian
netfilter: fix endian bug in conntrack printks
dmascc: fix incomplete conversion to network_device_ops
gso: Fix support for linear packets
skbuff.h: fix missing kernel-doc
ni5010: convert to net_device_ops
* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
hwmon: (fschmd) Add support for the FSC Hades IC
hwmon: (fschmd) Add support for the FSC Syleus IC
i2c-i801: Instantiate FSC hardware montioring chips
dmi: Let dmi_walk() users pass private data
hwmon: Define a standard interface for chassis intrusion detection
Move the pcf8591 driver to hwmon
hwmon: (w83627ehf) Only expose in6 or temp3 on the W83667HG
hwmon: (w83627ehf) Add support for W83667HG
hwmon: (w83627ehf) Invert fan pin variables logic
hwmon: (hdaps) Fix Thinkpad X41 axis inversion
hwmon: (hdaps) Allow inversion of separate axis
hwmon: (ds1621) Clean up documentation
hwmon: (ds1621) Avoid unneeded register access
hwmon: (ds1621) Clean up register access
hwmon: (ds1621) Reorder code statements
* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
Revert "proc: revert /proc/uptime to ->read_proc hook"
proc 2/2: remove struct proc_dir_entry::owner
proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc
proc: fix sparse warnings in pagemap_read()
proc: move fs/proc/inode-alloc.txt comment into a source file
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PCI PM: Make pci_prepare_to_sleep() disable wake-up if needed
radeonfb: Use __pci_complete_power_transition()
PCI PM: Introduce __pci_[start|complete]_power_transition() (rev. 2)
PCI PM: Restore config spaces of all devices during early resume
PCI PM: Make pci_set_power_state() handle devices with no PM support
PCI PM: Put devices into low power states during late suspend (rev. 2)
PCI PM: Move pci_restore_standard_config to pci-driver.c
PCI PM: Use pci_set_power_state during early resume
PCI PM: Consistently use variable name "error" for pm call return values
kexec: Change kexec jump code ordering
PM: Change hibernation code ordering
PM: Change suspend code ordering
PM: Rework handling of interrupts during suspend-resume
PM: Introduce functions for suspending and resuming device interrupts
Fix this build error when REISERFS_FS_POSIX_ACL is not set:
fs/reiserfs/inode.c: In function 'reiserfs_new_inode':
fs/reiserfs/inode.c:1919: warning: passing argument 1 of 'reiserfs_inherit_default_acl' from incompatible pointer type
fs/reiserfs/inode.c:1919: warning: passing argument 2 of 'reiserfs_inherit_default_acl' from incompatible pointer type
fs/reiserfs/inode.c:1919: warning: passing argument 3 of 'reiserfs_inherit_default_acl' from incompatible pointer type
fs/reiserfs/inode.c:1919: error: too many arguments to function 'reiserfs_inherit_default_acl'
due to a missing transaction-handle argument in the non-acl
compatibility function.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (53 commits)
drm: detect hdmi monitor by hdmi identifier (v3)
drm: drm_fops.c unlock missing on error path
drm: reorder struct drm_ioctl_desc to save space on 64 bit builds
radeon: add some new pci ids
drm: read EDID extensions from monitor
drm: Use a little stash on the stack to avoid kmalloc in most DRM ioctls.
drm/radeon: add regs required for occlusion queries support
drm/i915: check the return value from the copy from user
drm/radeon: fix logic in r600_page_table_init() to match ati_gart
drm/radeon: r600 ptes are 64-bit, cleanup cleanup function.
drm/radeon: don't call irq changes on r600 suspend/resume
drm/radeon: fix r600 writeback across suspend/resume
drm/radeon: fix r600 writeback setup.
drm: fix warnings about new mappings in info code.
drm/radeon: NULL noise: drivers/gpu/drm/radeon/radeon_*.c
drm/radeon: fix r600 pci mapping calls.
drm/radeon: r6xx/r7xx: fix possible oops in r600_page_table_cleanup()
radeon: call the correct idle function, logic got inverted.
drm/radeon: RS600: fix interrupt handling
drm/r600: fix rptr address along lines of previous fixes to radeon.
...
* 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
dma-debug: make memory range checks more consistent
dma-debug: warn of unmapping an invalid dma address
dma-debug: fix dma_debug_add_bus() definition for !CONFIG_DMA_API_DEBUG
dma-debug/x86: register pci bus for dma-debug leak detection
dma-debug: add a check dma memory leaks
dma-debug: add checks for kernel text and rodata
dma-debug: print stacktrace of mapping path on unmap error
dma-debug: Documentation update
dma-debug: x86 architecture bindings
dma-debug: add function to dump dma mappings
dma-debug: add checks for sync_single_sg_*
dma-debug: add checks for sync_single_range_*
dma-debug: add checks for sync_single_*
dma-debug: add checking for [alloc|free]_coherent
dma-debug: add add checking for map/unmap_sg
dma-debug: add checking for map/unmap_page/single
dma-debug: add core checking functions
dma-debug: add debugfs interface
dma-debug: add kernel command line parameters
dma-debug: add initialization code
...
Fix trivial conflicts due to whitespace changes in arch/x86/kernel/pci-nommu.c
The radeonfb driver needs to program the device's PMCSR directly due
to some quirky hardware it has to handle (see
http://bugzilla.kernel.org/show_bug.cgi?id=12846 for details) and
after doing that it needs to call the platform (usually ACPI) to
finish the power transition of the device. Currently it uses
pci_set_power_state() for this purpose, however making a specific
assumption about the internal behavior of this function, which has
changed recently so that this assumption is no longer satisfied.
For this reason, introduce __pci_complete_power_transition() that may
be called by the radeonfb driver to complete the power transition of
the device. For symmetry, introduce __pci_start_power_transition().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce helper functions allowing us to prevent device drivers from
getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume. These functions make it
possible to keep timer interrupts enabled while the "late" suspend and
"early" resume callbacks provided by device drivers are being
executed. In turn, this allows device drivers' "late" suspend and
"early" resume callbacks to sleep, execute ACPI callbacks etc.
The functions introduced here will be used to rework the handling of
interrupts during suspend (hibernation) and resume. Namely,
interrupts will only be disabled on the CPU right before suspending
sysdevs, while device drivers will be prevented from receiving
interrupts, with the help of the new helper function, before their
"late" suspend callbacks run (and analogously during resume).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
At the moment, dmi_walk() lacks flexibility, users can't pass data to
the callback function. Add a pointer for private data to make this
function more flexible.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Roland Dreier <rolandd@cisco.com>
This patch is a simple s/p_._//g to the reiserfs code. This is the
fifth in a series of patches to rip out some of the awful variable
naming in reiserfs.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a simple s/p_s_tb/tb/g to the reiserfs code. This is the
fourth in a series of patches to rip out some of the awful variable
naming in reiserfs.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a simple s/p_s_inode/inode/g to the reiserfs code. This
is the third in a series of patches to rip out some of the awful
variable naming in reiserfs.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a simple s/p_s_bh/bh/g to the reiserfs code. This is the
second in a series of patches to rip out some of the awful variable
naming in reiserfs.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a simple s/p_s_sb/sb/g to the reiserfs code. This is the
first in a series of patches to rip out some of the awful variable
naming in reiserfs.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch strips trailing whitespace from the reiserfs code.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some time ago, some changes were made to make security inode attributes
be atomically written during inode creation. ReiserFS fell behind in
this area, but with the reworking of the xattr code, it's now fairly
easy to add.
The following patch adds the ability for security attributes to be added
automatically during inode creation.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current reiserfs xattr implementation open codes reiserfs_readdir
and frees the path before calling the filldir function. Typically, the
filldir function is something that modifies the file system, such as a
chown or an inode deletion that also require reading of an inode
associated with each direntry. Since the file system is modified, the
path retained becomes invalid for the next run. In addition, it runs
backwards in attempt to minimize activity.
This is clearly suboptimal from a code cleanliness perspective as well
as performance-wise.
This patch implements a generic reiserfs_for_each_xattr that uses the
generic readdir and a specific filldir routine that simply populates an
array of dentries and then performs a specific operation on them. When
all files have been operated on, it then calls the operation on the
directory itself.
The result is a noticable code reduction and better performance.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Deadlocks are possible in the xattr code between the journal lock and the
xattr sems.
This patch implements journalling for xattr operations. The benefit is
twofold:
* It gets rid of the deadlock possibility by always ensuring that xattr
write operations are initiated inside a transaction.
* It corrects the problem where xattr backing files aren't considered any
differently than normal files, despite the fact they are metadata.
I discussed the added journal load with Chris Mason, and we decided that
since xattrs (versus other journal activity) is fairly rare, the introduction
of larger transactions to support journaled xattrs wouldn't be too big a deal.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig had asked me quite some time ago to port the reiserfs
xattrs to the generic xattr interface.
This patch replaces the reiserfs-specific xattr handling code with the
generic struct xattr_handler.
However, since reiserfs doesn't split the prefix and name when accessing
xattrs, it can't leverage generic_{set,get,list,remove}xattr without
needlessly reconstructing the name on the back end.
Update 7/26/07: Added missing dput() to deletion path.
Update 8/30/07: Added missing mark_inode_dirty when i_mode is used to
represent an ACL and no previous ACL existed.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The per-inode locking can be made more fine-grained to surround just the
interaction with the filesystem itself. This really only applies to
protecting reads during a write, since concurrent writes are barred with
inode->i_mutex at the vfs level.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the switch to using inode->i_mutex locking during lookups/creation
in the xattr root, the per-super xattr lock is no longer needed.
This patch removes it.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current reiserfs xattr implementation will not clean up old xattr
files if files are deleted when REISERFS_FS_XATTR is unset. This
results in inaccessible lost files, wasting space.
This patch compiles in basic xattr knowledge, such as how to delete them
and change ownership for quota tracking. If the file system has never
used xattrs, then the operation is quite fast: it returns immediately
when it sees there is no .reiserfs_priv directory.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a number of helper functions for marking a reiserfs inode
private that were leftover from reiserfs did its own thing wrt to
private inodes. S_PRIVATE has been in the kernel for some time, so this
patch removes the helpers and uses IS_PRIVATE instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Although reiserfs can currently handle severe errors such as journal failure,
it cannot handle less severe errors like metadata i/o failure. The following
patch adds a reiserfs_error() function akin to the one in ext3.
Subsequent patches will use this new error handler to handle errors more
gracefully in general.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch kills off reiserfs_journal_abort as it is never called, and
combines __reiserfs_journal_abort_{soft,hard} into one function called
reiserfs_abort_journal, which performs the same work. It is silent
as opposed to the old version, since the message was always issued
after a regular 'abort' message.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ReiserFS panics can be somewhat inconsistent.
In some cases:
* a unique identifier may be associated with it
* the function name may be included
* the device may be printed separately
This patch aims to make warnings more consistent. reiserfs_warning() prints
the device name, so printing it a second time is not required. The function
name for a warning is always helpful in debugging, so it is now automatically
inserted into the output. Hans has stated that every warning should have
a unique identifier. Some cases lack them, others really shouldn't have them.
reiserfs_warning() now expects an id associated with each message. In the
rare case where one isn't needed, "" will suffice.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
uniqueness2type and type2uniquness issue a warning when the value is
unknown. When called from reiserfs_warning, this causes a re-entrancy
problem and deadlocks on the error buffer lock.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ReiserFS warnings can be somewhat inconsistent.
In some cases:
* a unique identifier may be associated with it
* the function name may be included
* the device may be printed separately
This patch aims to make warnings more consistent. reiserfs_warning() prints
the device name, so printing it a second time is not required. The function
name for a warning is always helpful in debugging, so it is now automatically
inserted into the output. Hans has stated that every warning should have
a unique identifier. Some cases lack them, others really shouldn't have them.
reiserfs_warning() now expects an id associated with each message. In the
rare case where one isn't needed, "" will suffice.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes leaf_paste_entries more consistent with respect to the
other leaf operations. Using buffer_info instead of buffer_head
directly allows us to get a superblock pointer for use in error
handling.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes up the reiserfs code such that transaction ids are
always unsigned ints. In places they can currently be signed ints or
unsigned longs.
The former just causes an annoying clm-2200 warning and may join a
transaction when it should wait.
The latter is just for correctness since the disk format uses a 32-bit
transaction id. There aren't any runtime problems that result from it
not wrapping at the correct location since the value is truncated
correctly even on big endian systems. The 0 value might make it to
disk, but the mount-time checks will bump it to 10 itself.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>