Commit graph

2661 commits

Author SHA1 Message Date
David Teigland
9229f01349 [GFS2] Cast 64 bit printk args to unsigned long long.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-24 09:21:30 -04:00
Steven Whitehouse
90cdd2083a [GFS2] Flag up issue in selinux code
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-22 10:36:25 -04:00
Ryan O'Hara
639b6d79b8 [GFS2] selinux support
This adds support to GFS2 for selinux extended attributes. There is a
known bug in gfs2_ea_get() which is believed to be independant of this
patch. Further patches will follow once that bug is fixed in order to
make GFS2 use as much of the generic eattr infrastructure as possible.

Signed-off-by: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-22 10:08:35 -04:00
David Teigland
d2f222e631 [GFS2] setup lock_dlm kobject earlier
Setup the lock_dlm kobject before setting up the dlm lockspace instead
of after.  We want to use the sysfs files to detect the mount without
having to wait for the dlm setup which can take a while.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-19 08:24:02 -04:00
Steven Whitehouse
320dd101e2 [GFS2] glock debugging and inode cache changes
This adds some extra debugging to glock.c and changes
inode.c's deallocation code to call the debugging code
at a suitable moment. I'm chasing down a particular bug
to do with deallocation at the moment and the code can
go again once the bug is fixed.

Also this includes the first part of some changes to unify
the Linux struct inode and GFS2's struct gfs2_inode. This
transformation will happen in small parts over the next short
period.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 16:25:27 -04:00
Steven Whitehouse
3a8a9a1034 [GFS2] Update copyright date to 2006
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 15:09:15 -04:00
Steven Whitehouse
bd8968010a [GFS2] Remove semaphore.h from C files
We no longer use semaphores, everything has been converted to
mutex or rwsem, so we don't need to include this header any more.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 14:54:58 -04:00
Steven Whitehouse
1b50259bc3 [GFS2] Drop log lock on I/O error & tidy up
This patch drops the log spinlock when an I/O error occurs
to avoid any possible problems in case of blocking or
recursion in the I/O error routine. It also has a few
cosmetic changes to tidy up various other files.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 14:10:52 -04:00
Steven Whitehouse
02f211f4d0 [GFS2] Remove bits.c from the Makefile
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 14:03:43 -04:00
Steven Whitehouse
3efd7534a8 [GFS2] Make newly moved functions static
The functions moved from bits.c can now be made static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 14:02:52 -04:00
Steven Whitehouse
88c8ab1fcb [GFS2] Merge bits.[ch] into rgrp.c
Since they are small and will be inlined by the complier,
it makes sense to merge the contents of bits.[ch] into
rgrp.c

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 13:52:39 -04:00
Steven Whitehouse
64c14ea73b [GFS2] Fix ref count bug that used to bite us on umount
The ref count of certain glock's got elevated too far during unlink
which caused umount to fail. This fixes it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-16 13:37:11 -04:00
Steven Whitehouse
b9cb981310 [GFS2] Fix attributes setting logic
The attributes logic for immutable was wrong so that there was
not way to remove this attribute once set. This fixes the
bug.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-12 17:07:56 -04:00
Steven Whitehouse
9801f6461e [GFS2] Remove incorrect initialisation of gh_owner
The gh_owner field shouldn't be set or reset outside the glock code.
These were left over from when recursive locking was allowed. It
isn't any more, so they are not needed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-12 14:06:02 -04:00
Steven Whitehouse
e90c01e148 [GFS2] Reverse block order in build_height
The original code ordered the blocks allocated in the build_height
routine backwards causing excessive disk seeks during a read of the
metadata. This patch reverses the order to try and reduce disk seeks.

Example: A five level metadata tree, I = Inode, P = Pointers, D = Data

You need to read the blocks in the order:

I P5 P4 P3 P2 P1 D

in order to read a single data block. The new code now orders the blocks
in this way. The old code used to order them as:

I P1 P2 P3 P4 P5 D

requiring two extra seeks on average. Note that for files which are
grown by gradual extension rather than by truncate or by llseek/write
at a large offset, this doesn't apply. In the case of writing to a
file linearly, this routine will only be called upon to extend the
height of the tree by one block at a time, so the ordering is
determined by when its called rather than by the internals of the
routine itself. Optimising that part of the ordering is a much
harder problem.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-12 12:09:15 -04:00
Steven Whitehouse
7d63b54a65 Merge branch 'master' 2006-05-12 10:48:52 -04:00
Linus Torvalds
e515f048c4 Merge git://oss.sgi.com:8090/xfs-2.6
* git://oss.sgi.com:8090/xfs-2.6:
  [XFS] Fix a possible metadata buffer (AGFL) refcount leak when fixing an
  [XFS] Fix a project quota space accounting leak on rename.
  [XFS] Fix a possible forced shutdown due to mishandling write barriers
2006-05-08 17:41:05 -07:00
Trond Myklebust
75dff55af9 [PATCH] fs/locks.c: Fix lease_init
It is insane to be giving lease_init() the task of freeing the lock it is
supposed to initialise, given that the lock is not guaranteed to be
allocated on the stack. This causes lockups in fcntl_setlease().
Problem diagnosed by Daniel Hokka Zakrisson <daniel@hozac.com>

Also fix a slab leak in __setlease() due to an uninitialised return value.
Problem diagnosed by Björn Steinbrink.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Daniel Hokka Zakrisson <daniel@hozac.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-08 08:07:17 -07:00
Nathan Scott
e63a369001 [XFS] Fix a possible metadata buffer (AGFL) refcount leak when fixing an
AG freelist.

SGI-PV: 952681
SGI-Modid: xfs-linux-melb:xfs-kern:25902a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-05-08 19:51:58 +10:00
Nathan Scott
b1ecdda931 [XFS] Fix a project quota space accounting leak on rename.
SGI-PV: 951636
SGI-Modid: xfs-linux-melb:xfs-kern:25811a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-05-08 19:51:42 +10:00
Nathan Scott
d08d389d5a [XFS] Fix a possible forced shutdown due to mishandling write barriers
with remount,ro.

SGI-PV: 951944
SGI-Modid: xfs-linux-melb:xfs-kern:25742a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-05-08 19:51:28 +10:00
Steven Whitehouse
fd88de569b [GFS2] Readpages support
This adds readpages support (and also corrects a small bug in
the readpage error path at the same time). Hopefully this will
improve performance by allowing GFS to submit larger lumps of
I/O at a time.

In order to simplify the setting of BH_Boundary, it currently gets
set when we hit the end of a indirect pointer block. There is
always a boundary at this point with the current allocation code.
It doesn't get all the boundaries right though, so there is still
room for improvement in this.

See comments in fs/gfs2/ops_address.c for further information about
readpages with GFS2.

Signed-off-by: Steven Whitehouse
2006-05-05 16:59:11 -04:00
Robert S Peterson
5bb76af1e0 [GFS2] Set d_ops for root inode
Well, I managed to track down the bug in gfs2 that was causing
my grief.  Below is a patch for the problem.  Please incorporate
as you see fit.  Or should I say: as you see git.

The problem was basically that you never set d_ops for the root
inode, so the wrong hash algorithm was being used.  But only for
the root directory.  Turns out that if I used subdirectories, it
used the proper hash and my files were found just fine.

Signed-off-by: Robert S Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-05 16:29:50 -04:00
Jens Axboe
98232d504d [PATCH] compat_sys_vmsplice: one-off in UIO_MAXIOV check
nr_segs may not be > UIO_MAXIOV, however it may be equal to. This makes
the behaviour identical to the real sys_vmsplice(). The other foov
syscalls also agree that this is the way to go.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-04 09:13:49 +02:00
Jens Axboe
a0548871ed [PATCH] splice: redo page lookup if add_to_page_cache() returns -EEXIST
This can happen quite easily, if several processes are trying to splice
the same file at the same time. It's not a failure, it just means someone
raced with us in allocating this file page. So just dump the allocated
page and relookup the original.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-04 06:55:12 +02:00
Jens Axboe
76ad4d1110 [PATCH] splice: rename remaining info variables to pipe
Same thing was done in fs/pipe.c and most of fs/splice.c, but we had
a few missing still.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-04 06:55:12 +02:00
Jens Axboe
1432873af7 [PATCH] splice: LRU fixups
Nick says that the current construct isn't safe. This goes back to the
original, but sets PIPE_BUF_FLAG_LRU on user pages as well as they all
seem to be on the LRU in the first place.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-04 06:55:12 +02:00
Jens Axboe
bfc4ee39fd [PATCH] splice: fix unlocking of page on error ->prepare_write()
Looking at generic_file_buffered_write(), we need to unlock_page() if
prepare write fails and it isn't due to racing with truncate().

Also trim the size if ->prepare_write() fails, if we have to.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-04 06:55:12 +02:00
Mingming Cao
5dea5176e5 [PATCH] ext3: multile block allocate little endian fixes
Some places in ext3 multiple block allocation code (in 2.6.17-rc3) don't
handle the little endian well.  This was resulting in *wrong* block numbers
being assigned to in-memory block variables and then stored on disk
eventually.  The following patch has been verified to fix an ext3
filesystem failure when run ltp test on a 64 bit machine.

Signed-off-by; Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-03 20:05:41 -07:00
David Teigland
97a35d1e5f [DLM] fix grant_after_purge softlockup
In dlm_grant_after_purge() we were holding a hash table read_lock while
calling put_rsb() which potentially removes the rsb from the hash table,
taking the same lock in write.  Fix this by flagging rsb's ahead of time
that have been purged.  Then iteratively read_lock the hash table, find a
flagged rsb, unlock, process rsb.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-02 13:34:03 -04:00
Steven Whitehouse
d2d7b8a2a7 [GFS2] Fix bug in writepage()
As pointed out by Wendy Cheng, the logic in GFS2's writepage() function
wasn't quite right with respect to invalidating pages when a file has been
truncated. This patch fixes that.

CC: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-02 12:09:42 -04:00
Jens Axboe
330ab71619 [PATCH] vmsplice: restrict stealing a little more
Apply the same rules as the anon pipe pages, only allow stealing
if no one else is using the page.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-02 15:29:57 +02:00
Jens Axboe
a893b99be7 [PATCH] splice: fix page LRU accounting
Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly
to know whether we need to fiddle with page LRU state after stealing it,
however for some origins we just don't know if the page is on the LRU
list or not.

So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file()
instead.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-02 15:03:27 +02:00
Jens Axboe
7591489a8f [PATCH] vmsplice: fix badly placed end paranthesis
We need to use the minium of {len, PAGE_SIZE-off}, not {len, PAGE_SIZE}-off.
The latter doesn't make any sense, and could cause us to attempt negative
length transfers...

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-02 12:57:18 +02:00
Linus Torvalds
9817d207dc Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] vmsplice: allow user to pass in gift pages
  [PATCH] pipe: enable atomic copying of pipe data to/from user space
  [PATCH] splice: call handle_ra_miss() on failure to lookup page
  [PATCH] Add ->splice_read/splice_write to def_blk_fops
  [PATCH] pipe: introduce ->pin() buffer operation
  [PATCH] splice: fix bugs in pipe_to_file()
  [PATCH] splice: fix bugs with stealing regular pipe pages
2006-05-01 18:33:40 -07:00
Andi Kleen
d261020229 [PATCH] x86_64: Add compat_sys_vmsplice and use it in x86-64
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:43 -07:00
Jens Axboe
7afa6fd037 [PATCH] vmsplice: allow user to pass in gift pages
If SPLICE_F_GIFT is set, the user is basically giving this pages away to
the kernel. That means we can steal them for eg page cache uses instead
of copying it.

The data must be properly page aligned and also a multiple of the page size
in length.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 20:02:33 +02:00
Jens Axboe
f6762b7ad8 [PATCH] pipe: enable atomic copying of pipe data to/from user space
The pipe ->map() method uses kmap() to virtually map the pages, which
is both slow and has known scalability issues on SMP. This patch enables
atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
instead.

lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
are results from that on a UP machine with highmem (1.5GiB of RAM), running
first a UP kernel, SMP kernel, and SMP kernel patched.

Vanilla-UP:
Pipe bandwidth: 1622.28 MB/sec
Pipe bandwidth: 1610.59 MB/sec
Pipe bandwidth: 1608.30 MB/sec
Pipe latency: 7.3275 microseconds
Pipe latency: 7.2995 microseconds
Pipe latency: 7.3097 microseconds

Vanilla-SMP:
Pipe bandwidth: 1382.19 MB/sec
Pipe bandwidth: 1317.27 MB/sec
Pipe bandwidth: 1355.61 MB/sec
Pipe latency: 9.6402 microseconds
Pipe latency: 9.6696 microseconds
Pipe latency: 9.6153 microseconds

Patched-SMP:
Pipe bandwidth: 1578.70 MB/sec
Pipe bandwidth: 1579.95 MB/sec
Pipe bandwidth: 1578.63 MB/sec
Pipe latency: 9.1654 microseconds
Pipe latency: 9.2266 microseconds
Pipe latency: 9.1527 microseconds

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 20:02:05 +02:00
Jens Axboe
e27dedd84c [PATCH] splice: call handle_ra_miss() on failure to lookup page
Notify the readahead logic of the missing page. Suggested by
Oleg Nesterov.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:59:54 +02:00
Jens Axboe
7f9c51f0d9 [PATCH] Add ->splice_read/splice_write to def_blk_fops
It can use the generic handlers.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:59:32 +02:00
Jens Axboe
f84d751994 [PATCH] pipe: introduce ->pin() buffer operation
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.

Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.c

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:59:03 +02:00
Jens Axboe
0568b409c7 [PATCH] splice: fix bugs in pipe_to_file()
Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.

- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.

And as a cleanup on that:

- Make the bottom fall-through logic a little less convoluted. Also make
  the steal path hold an extra reference to the page, so we don't have
  to differentiate between stolen and non-stolen at the end.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:50:48 +02:00
Jens Axboe
46e678c96b [PATCH] splice: fix bugs with stealing regular pipe pages
- Check that page has suitable count for stealing in the regular pipes.
- pipe_to_file() assumes that the page is locked on succesful steal, so
  do that in the pipe steal hook
- Missing unlock_page() in add_to_page_cache() failure.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-30 16:36:32 +02:00
Steven Whitehouse
56409abbf8 [GFS2] Remove some unused code
Remove some of the unused code flagged up by Adrian Bunk.

Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse
2006-04-28 11:48:45 -04:00
Adrian Bunk
08bc2dbc73 [GFS2] [-mm patch] fs/gfs2/: possible cleanups
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 unused functions
- remove the following global function that was both unused and
  unimplemented:
  - super.c: gfs2_do_upgrade()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:59:12 -04:00
David Teigland
c56b39cd2c [DLM] PATCH 3/3 dlm: show recover state
Expose the current recovery state in sysfs to help in debugging.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:51:53 -04:00
David Teigland
1c032c0311 [DLM] PATCH 2/3 dlm: lowcomms close
When a node is removed from a lockspace configuration, close our
connection to it, clearing any remaining messages for it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:50:41 -04:00
David Teigland
ae118962b9 [DLM] PATCH 1/3 dlm: force free user lockspace
Lockspaces created from user space should be forcibly freed without
requiring any further user space interaction.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:48:59 -04:00
Steven Whitehouse
363275216c [GFS2] Reordering in deallocation to avoid recursive locking
Despite my earlier careful search, there was a recursive lock left
in the deallocation code. This removes it. It also should speed up
deallocation be reducing the number of locking operations which take
place by using two "try lock" operations on the two locks involved in
inode deallocation which allows us to grab the locks out of order
(compared with NFS which grabs the inode lock first and the iopen
lock later). It is ok for us to fail while doing this since if it
does fail it means that someone else is still using the inode and
thus it wouldn't be possible to deallocate anyway.

This fixes the bug reported to me by Rob Kenna.

Cc: Rob Kenna <rkenna@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:46:21 -04:00
Andreas Schwab
2833c28aa0 [PATCH] powerpc: Wire up *at syscalls
Wire up *at syscalls.

This patch has been tested on ppc64 (using glibc's testsuite, both 32bit
and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system
available, but I expect no problems).

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-28 21:04:59 +10:00