Commit graph

29027 commits

Author SHA1 Message Date
Josef Bacik
6f1fed7753 Btrfs: don't lookup csums for prealloc extents
The tree logging stuff was looking up csums to copy over for prealloc
extents which is just work we don't need to be doing.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:41 -04:00
Josef Bacik
e6138876ad Btrfs: cache extent state when writing out dirty metadata pages
Everytime we write out dirty pages we search for an offset in the tree,
convert the bits in the state, and then when we wait we search for the
offset again and clear the bits.  So for every dirty range in the io tree we
are doing 4 rb searches, which is suboptimal.  With this patch we are only
doing 2 searches for every cycle (modulo weird things happening).  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:41 -04:00
Josef Bacik
ce19533256 Btrfs: do not hold the file extent leaf locked when adding extent item
For some reason we unlock everything except the leaf we are on, set the path
blocking and then add the extent item for the extent we just finished
writing.  I can't for the life of me figure out why we would want to do
this, and the history doesn't really indicate that there was a real reason
for it, so just remove it.  This will reduce our tree lock contention on
heavy writes.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:40 -04:00
Josef Bacik
de0022b9da Btrfs: do not async metadata csumming in certain situations
There are a coule scenarios where farming metadata csumming off to an async
thread doesn't help.  The first is if our processor supports crc32c, in
which case the csumming will be fast and so the overhead of the async model
is not worth the cost.  The other case is for our tree log.  We will be
making that stuff dirty and writing it out and waiting for it immediately.
Even with software crc32c this gives me a ~15% increase in speed with O_SYNC
workloads.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:40 -04:00
Zach Brown
221b831835 btrfs: fix min csum item size warnings in 32bit
commit 7ca4be45a0 limited csum items to
PAGE_CACHE_SIZE.  It used min() with incompatible types in 32bit which
generates warnings:

fs/btrfs/file-item.c: In function ‘btrfs_csum_file_blocks’:
fs/btrfs/file-item.c:717: warning: comparison of distinct pointer types lacks a cast

This uses min_t(u32,) to fix the warnings.  u32 seemed reasonable
because btrfs_root->leafsize is u32 and PAGE_CACHE_SIZE is unsigned
long.

Signed-off-by: Zach Brown <zab@zabbo.net>
2012-10-09 09:15:39 -04:00
Josef Bacik
67b0fd63d5 Btrfs: run delayed refs first when out of space
Running delayed refs is faster than running delalloc, so lets do that first
to try and reclaim space.  This makes my fs_mark test about 20% faster.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:39 -04:00
Miao Xie
354aa0fb6d Btrfs: fix orphan transaction on the freezed filesystem
With the following debug patch:

 static int btrfs_freeze(struct super_block *sb)
 {
+ 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+	struct btrfs_transaction *trans;
+
+	spin_lock(&fs_info->trans_lock);
+	trans = fs_info->running_transaction;
+	if (trans) {
+		printk("Transid %llu, use_count %d, num_writer %d\n",
+			trans->transid, atomic_read(&trans->use_count),
+			atomic_read(&trans->num_writers));
+	}
+	spin_unlock(&fs_info->trans_lock);
 	return 0;
 }

I found there was a orphan transaction after the freeze operation was done.

It is because the transaction may not be committed when the transaction handle
end even though it is the last handle of the current transaction. This design
avoid committing the transaction frequently, but also introduce the above
problem.

So I add btrfs_attach_transaction() which can catch the current transaction
and commit it. If there is no transaction, it will return ENOENT, and do not
anything.

This function also can be used to instead of btrfs_join_transaction_freeze()
because it don't increase the writer counter and don't start a new transaction,
so it also can fix the deadlock between sync and freeze.

Besides that, it is used to instead of btrfs_join_transaction() in
transaction_kthread(), because if there is no transaction, the transaction
kthread needn't anything.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-09 09:15:39 -04:00
Miao Xie
a698d0755a Btrfs: add a type field for the transaction handle
This patch add a type field into the transaction handle structure,
in this way, we needn't implement various end-transaction functions
and can make the code more simple and readable.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-09 09:15:38 -04:00
Miao Xie
e8830e606f Btrfs: fix memory leak in start_transaction()
This patch fixes memory leak of the transaction handle which happened
when starting transaction failed on a freezed fs.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-09 09:15:38 -04:00
Mark Fasheh
d24bec3ae5 btrfs: extended inode ref iteration
The iterate_irefs in backref.c is used to build path components from inode
refs. This patch adds code to iterate extended refs as well.

I had modify the callback function signature to abstract out some of the
differences between ref structures. iref_to_path() also needed similar
changes.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
2012-10-09 09:15:01 -04:00
Mark Fasheh
f186373fef btrfs: extended inode refs
This patch adds basic support for extended inode refs. This includes support
for link and unlink of the refs, which basically gets us support for rename
as well.

Inode creation does not need changing - extended refs are only added after
the ref array is full.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
2012-10-09 09:14:45 -04:00
Linus Torvalds
9e2d8656f5 Merge branch 'akpm' (Andrew's patch-bomb)
Merge patches from Andrew Morton:
 "A few misc things and very nearly all of the MM tree.  A tremendous
  amount of stuff (again), including a significant rbtree library
  rework."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits)
  sparc64: Support transparent huge pages.
  mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd().
  mm: Add and use update_mmu_cache_pmd() in transparent huge page code.
  sparc64: Document PGD and PMD layout.
  sparc64: Eliminate PTE table memory wastage.
  sparc64: Halve the size of PTE tables
  sparc64: Only support 4MB huge pages and 8KB base pages.
  memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning
  mm: memcg: clean up mm_match_cgroup() signature
  mm: document PageHuge somewhat
  mm: use %pK for /proc/vmallocinfo
  mm, thp: fix mlock statistics
  mm, thp: fix mapped pages avoiding unevictable list on mlock
  memory-hotplug: update memory block's state and notify userspace
  memory-hotplug: preparation to notify memory block's state at memory hot remove
  mm: avoid section mismatch warning for memblock_type_name
  make GFP_NOTRACK definition unconditional
  cma: decrease cc.nr_migratepages after reclaiming pagelist
  CMA: migrate mlocked pages
  kpageflags: fix wrong KPF_THP on non-huge compound pages
  ...
2012-10-09 16:23:15 +09:00
Naoya Horiguchi
7a71932d56 kpageflags: fix wrong KPF_THP on non-huge compound pages
KPF_THP can be set on non-huge compound pages (like slab pages or pages
allocated by drivers with __GFP_COMP) because PageTransCompound only
checks PG_head and PG_tail.  Obviously this is a bug and breaks user space
applications which look for thp via /proc/kpageflags.

This patch rules out setting KPF_THP wrongly by additionally checking
PageLRU on the head pages.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:00 +09:00
Yan Hong
cd8ed2a45a fs/fs-writeback.c: remove unneccesary parameter of __writeback_single_inode()
The parameter 'wb' is never used in this function.

Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:00 +09:00
Michel Lespinasse
38a76013ad mm: avoid taking rmap locks in move_ptes()
During mremap(), the destination VMA is generally placed after the
original vma in rmap traversal order: in move_vma(), we always have
new_pgoff >= vma->vm_pgoff, and as a result new_vma->vm_pgoff >=
vma->vm_pgoff unless vma_merge() merged the new vma with an adjacent one.

When the destination VMA is placed after the original in rmap traversal
order, we can avoid taking the rmap locks in move_ptes().

Essentially, this reintroduces the optimization that had been disabled in
"mm anon rmap: remove anon_vma_moveto_tail".  The difference is that we
don't try to impose the rmap traversal order; instead we just rely on
things being in the desired order in the common case and fall back to
taking locks in the uncommon case.  Also we skip the i_mmap_mutex in
addition to the anon_vma lock: in both cases, the vmas are traversed in
increasing vm_pgoff order with ties resolved in tree insertion order.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:42 +09:00
Michel Lespinasse
6b2dbba8b6 mm: replace vma prio_tree with an interval tree
Implement an interval tree as a replacement for the VMA prio_tree.  The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA.  So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:39 +09:00
Michel Lespinasse
bf7ad8eeab rbtree: move some implementation details from rbtree.h to rbtree.c
rbtree users must use the documented APIs to manipulate the tree
structure.  Low-level helpers to manipulate node colors and parenthood are
not part of that API, so move them to lib/rbtree.c

[dwmw2@infradead.org: fix jffs2 build issue due to renamed __rb_parent_color field]
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:32 +09:00
Michel Lespinasse
ea5272f5c9 rbtree: fix incorrect rbtree node insertion in fs/proc/proc_sysctl.c
The recently added code to use rbtrees in sysctl did not follow the proper
rbtree interface on insertion - it was calling rb_link_node() which
inserts a new node into the binary tree, but missed the call to
rb_insert_color() which properly balances the rbtree and establishes all
expected rbtree invariants.

I found out about this only because faulty commit also used
rb_init_node(), which I am removing within this patchset.  But I think
it's an easy mistake to make, and it makes me wonder if we should change
the rbtree API so that insertions would be done with a single rb_insert()
call (even if its implementation could still inline the rb_link_node()
part and call a private __rb_insert_color function to do the rebalancing).

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:32 +09:00
Michel Lespinasse
4c199a93a2 rbtree: empty nodes have no color
Empty nodes have no color.  We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also,
we can get rid of the rb_init_node function which had been introduced by
commit 88d19cf379 ("timers: Add rb_init_node() to allow for stack
allocated rb nodes") to avoid some issue with the empty node's color not
being initialized.

I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though.  axboe introduced them in commit 10fd48f237
("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I
see it, the 'empty node' abstraction is only used by rbtree users to
flag nodes that they haven't inserted in any rbtree, so asking the
predecessor or successor of such nodes doesn't make any sense.

One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups.  This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.

[sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:32 +09:00
Davidlohr Bueso
01dc52ebdf oom: remove deprecated oom_adj
The deprecated /proc/<pid>/oom_adj is scheduled for removal this month.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:24 +09:00
Konstantin Khlebnikov
314e51b985 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
Konstantin Khlebnikov
0103bd16fb mm: prepare VM_DONTDUMP for using in drivers
Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags:
VM_DONTEXPAND, VM_DONTCOPY.  Currently this flag used only for
sys_madvise.  The next patch will use it for replacing the outdated flag
VM_RESERVED.

Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL
(VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:18 +09:00
Konstantin Khlebnikov
0b173bc4da mm: kill vma flag VM_CAN_NONLINEAR
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:17 +09:00
Linus Torvalds
8711798772 Merge branch 'linux-next' of git://git.open-osd.org/linux-open-osd
Pull exofs update from Boaz Harrosh:
 "Just three one liners"

* 'linux-next' of git://git.open-osd.org/linux-open-osd:
  pnfs_osd_xdr: Remove unused #include from pnfs_osd_xdr.h
  ore: signedness bug in _sp2d_min_pg()
  exofs: check for allocation failure in uri_store()
2012-10-09 15:54:27 +09:00
Al Viro
121977187c Fix F_DUPFD_CLOEXEC breakage
Fix a braino in F_DUPFD_CLOEXEC; f_dupfd() expects flags for alloc_fd(),
get_unused_fd() etc and there clone-on-exec if O_CLOEXEC, not
FD_CLOEXEC.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 15:52:31 +09:00
Jan Schmidt
5a1d7843ca btrfs: improved readablity for add_inode_ref
Moved part of the code into a sub function and replaced most of the gotos
by ifs, hoping that it will be easier to read now.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
2012-10-08 20:09:02 -04:00
Josef Bacik
0aa4a17d82 Btrfs: handle not finding the extent exactly when logging changed extents
I started hitting warnings when running xfstest 68 in a loop because there
were EM's that were not lined up properly with the physical extents.  This
is ok, if we do something like punch a hole or write to a preallocated space
or something like that we can have an EM that doesn't cover the entire
physical extent.  So fix the tree logging stuff to cope with this case so we
don't just commit the transaction.  With this patch I no longer see the
warnings from the tree logging code.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-08 20:09:02 -04:00
David Sterba
005d6427ac btrfs: move transaction aborts to the point of failure
Call btrfs_abort_transaction as early as possible when an error
condition is detected, that way the line number reported is useful
and we're not clueless anymore which error path led to the abort.

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-10-08 20:09:02 -04:00
Miao Xie
8732d44f80 Btrfs: fix the missing error information in create_pending_snapshot()
The macro btrfs_abort_transaction() can get the line number of the code
where the problem happens, so we should invoke it in the place that the
error occurs, or we will lose the line number.

Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-08 20:07:33 -04:00
Liu Bo
aa42ffd918 Btrfs: fix off-by-one in file clone
Btrfs uses inclusive range end for lock_extent(), unlock_extent() and
related functions, so we made off-by-one errors in file clone.

This fixes it and also fixes some style problems.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-08 20:07:32 -04:00
Peng Tao
af283885b7 pnfsblock: cleanup nfs4_blkdev_get
It is not needed at all and it is messing with return values...

Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:40 -04:00
Peng Tao
1fd937bd75 NFS41: send real read size in layoutget
For buffer read, use offst-to-isize.

For direct read, use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:34 -04:00
Peng Tao
6296556f0b NFS41: send real write size in layoutget
For buffer write, block layout client scan inode mapping to find
next hole and use offset-to-hole as layoutget length. Object
layout client uses offset-to-isize as layoutget length.

For direct write, both block layout and object layout use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:22 -04:00
Peng Tao
35754bc00e NFS: track direct IO left bytes
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:31:55 -04:00
Oleg Nesterov
d5bbd43d5f exec: make de_thread() killable
Change de_thread() to use KILLABLE rather than UNINTERRUPTIBLE while
waiting for other threads.  The only complication is that we should
clear ->group_exit_task and ->notify_count before we return, and we
should do this under tasklist_lock.  -EAGAIN is used to match the
initial signal_group_exit() check/return, it doesn't really matter.

This fixes the (unlikely) race with coredump.  de_thread() checks
signal_group_exit() before it starts to kill the subthreads, but this
can't help if another CLONE_VM (but non CLONE_THREAD) task starts the
coredumping after de_thread() unlocks ->siglock.  In this case the
killed sub-thread can block in exit_mm() waiting for coredump_finish(),
execing thread waits for that sub-thead, and the coredumping thread
waits for execing thread.  Deadlock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 06:53:20 +09:00
Jeff Layton
72bd481f86 cifs: reinstate the forcegid option
Apparently this was lost when we converted to the standard option
parser in 8830d7e07a

Cc: Sachin Prabhu <sprabhu@redhat.com>
Cc: stable@vger.kernel.org # v3.4+
Reported-by: Gregory Lee Bartholomew <gregory.lee.bartholomew@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-10-07 20:05:47 -05:00
Frediano Ziglio
fd3ba42c76 Convert properly UTF-8 to UTF-16
wchar_t is currently 16bit so converting a utf8 encoded characters not
in plane 0 (>= 0x10000) to wchar_t (that is calling char2uni) lead to a
-EINVAL return. This patch detect utf8 in cifs_strtoUTF16 and add special
code calling utf8s_to_utf16s.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-10-07 20:04:53 -05:00
Steve French
b7a10626c8 [CIFS] WARN_ON_ONCE if kernel_sendmsg() returns -ENOSPC
kernel_sendmsg() is less likely to return -ENOSPC and it might be
a bug to do so. However, in the past there might have been cases
where a -ENOSPC was returned from a low level driver.

Add a WARN_ON_ONCE() to ensure that it is safe to assume that -ENOSPC
is no longer returned. This -ENOSPC specific handling will be removed
once we are sure it is no longer returned.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-10-07 20:00:47 -05:00
Linus Torvalds
7035cdf36d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "The bulk of this pull is a series from Alex that refactors and cleans
  up the RBD code to lay the groundwork for supporting the new image
  format and evolving feature set.  There are also some cleanups in
  libceph, and for ceph there's fixed validation of file striping
  layouts and a bugfix in the code handling a shrinking MDS cluster."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
  ceph: avoid 32-bit page index overflow
  ceph: return EIO on invalid layout on GET_DATALOC ioctl
  rbd: BUG on invalid layout
  ceph: propagate layout error on osd request creation
  libceph: check for invalid mapping
  ceph: convert to use le32_add_cpu()
  ceph: Fix oops when handling mdsmap that decreases max_mds
  rbd: update remaining header fields for v2
  rbd: get snapshot name for a v2 image
  rbd: get the snapshot context for a v2 image
  rbd: get image features for a v2 image
  rbd: get the object prefix for a v2 rbd image
  rbd: add code to get the size of a v2 rbd image
  rbd: lay out header probe infrastructure
  rbd: encapsulate code that gets snapshot info
  rbd: add an rbd features field
  rbd: don't use index in __rbd_add_snap_dev()
  rbd: kill create_snap sysfs entry
  rbd: define rbd_dev_image_id()
  rbd: define some new format constants
  ...
2012-10-08 06:38:18 +09:00
Linus Torvalds
6432f21284 The big new feature added this time is supporting online resizing
using the meta_bg feature.  This allows us to resize file systems
 which are greater than 16TB.  In addition, the speed of online
 resizing has been improved in general.
 
 We also fix a number of races, some of which could lead to deadlocks,
 in ext4's Asynchronous I/O and online defrag support, thanks to good
 work by Dmitry Monakhov.
 
 There are also a large number of more minor bug fixes and cleanups
 from a number of other ext4 contributors, quite of few of which have
 submitted fixes for the first time.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQbxMXAAoJENNvdpvBGATwlg4QAJZ4mHNSL2eaaxjRtTbL1pAz
 +FVXpJ3lhw1lSfE9hJGqPVE8EfU2fWjIqxEI7dgh95Tukc5pUnPAQ2/hBz8ZA0qq
 o0AFMk3mRnvCEh6HsZfumsV83eqpR3k/zEy4uFH+KtxBskPe2sEKy3B7qOxvgdKW
 Gh8B2WqF2BpIj9WIT1P9G6xsxZW64EMHTbWcgRhuoRD7bakDNnwQ3kElz/TJQU5q
 bM/5wE7pqKwU2J1L0Ho0mxDi0f/BbXeJdA9k1tQy2KM1pZwHtpj4Ls0qmfoi49GE
 KyZqQOXlFbAz/9tidPDceY5KoRRQm1MwZ+1MimQX1P+40cs/w3pNu3yiibcaXIru
 UZ63AQMCj5JHMcFNVi20sVCwjU/ibNtEO75cfDD4bzPgHJvfCj73EbHTLl21nbTu
 izIMffhJEHmRnmRXiiortYVuI4b19oIfnXg7eclrJoUWSuGwKKsJOc5nMjDqidG4
 B7Gq4TD89sGkIYzx+50E+ll2ispcBN0BQnGqp4k2BzgDyEHhuFYk7VuVQvJgCGTi
 eobzQJj7JUXPWxyemcAVkQTtUq4vVbkm/IwS+/GA9b9Z80X8hR8x6EVHUW5lX3qC
 YHoBSCU4XKZXXWqzx0fIVCXyKKFiBzM+OXcgHOKH90vK8k6kPmPODhNCxvV3pITU
 jfl9q+X1dY4SpybZjLt5
 =iYeV
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The big new feature added this time is supporting online resizing
  using the meta_bg feature.  This allows us to resize file systems
  which are greater than 16TB.  In addition, the speed of online
  resizing has been improved in general.

  We also fix a number of races, some of which could lead to deadlocks,
  in ext4's Asynchronous I/O and online defrag support, thanks to good
  work by Dmitry Monakhov.

  There are also a large number of more minor bug fixes and cleanups
  from a number of other ext4 contributors, quite of few of which have
  submitted fixes for the first time."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
  ext4: fix ext4_flush_completed_IO wait semantics
  ext4: fix mtime update in nodelalloc mode
  ext4: fix ext_remove_space for punch_hole case
  ext4: punch_hole should wait for DIO writers
  ext4: serialize truncate with owerwrite DIO workers
  ext4: endless truncate due to nonlocked dio readers
  ext4: serialize unlocked dio reads with truncate
  ext4: serialize dio nonlocked reads with defrag workers
  ext4: completed_io locking cleanup
  ext4: fix unwritten counter leakage
  ext4: give i_aiodio_unwritten a more appropriate name
  ext4: ext4_inode_info diet
  ext4: convert to use leXX_add_cpu()
  ext4: ext4_bread usage audit
  fs: reserve fallocate flag codepoint
  ext4: remove redundant offset check in mext_check_arguments()
  ext4: don't clear orphan list on ro mount with errors
  jbd2: fix assertion failure in commit code due to lacking transaction credits
  ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
  ext4: enable FITRIM ioctl on bigalloc file system
  ...
2012-10-08 06:36:39 +09:00
Linus Torvalds
7f60ba388f 1. We no longer ad-hoc to the function tracer "high level" infrastructure
and no longer use its debugfs knobs. The change slightly touches
    kernel/trace directory, but it got the needed ack from Steven Rostedt:
    http://lkml.org/lkml/2012/8/21/688
 2. Added maintainers entry;
 3. A bunch of fixes, nothing special.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQbjoFAAoJEGgI9fZJve1bZgUP/A/ZwFGfdnochDgRhK5p7ljY
 baRZpgSh2B+BxIDTEPLfVh6HbOivmYJ8WF0unD9kKTzCCS71ZUMiLB25G/bV4lnZ
 fawAOhGfOLG3rXmldxf6nJllHr9JpoSmVEHypvjFNcbjYZ04zhe7jM+YsaWmBw68
 eHXQkOSdfPPpKXZ2B0Eef/EoGWhORW0kTD7xFlorsxYAkksSheY0PC0nYgCFhvCZ
 168y9pi4T4lucr4s44x8AJ/r/5BQ1jEQAY/A2qUE/iBfRFP4XyE1Oao4OHtVDYdU
 KjVPA1VYmwkKSfnkiVFrpb/94IyrKslblgR8nX0kK3L/ccFYjQix4nd9jR+n857s
 xfAuj9nfhUO6fI5qoaVSOBufxKyPp1S7X8INEAJ7WQ0c9VoMv00biK9M77ifDGZg
 ll/Ecq1CADtcbOnQXf6qwGwRKmpR+qgPkIzpNXcuGMuM4AEPwtckOhCyXFr37Txk
 6ZoGM8IIaBJ0yXxHkfpUA7l9ZF0gXR+qHMQCwpUS8tIMx35On+IbybEaKbniKEi1
 AURgQ7ZimVYAHPi0Y0L00+EKI3IPVQJvCFH7SG+wUfLWcbEtNbTv3MAer5o3DANJ
 GMnWBwNw9ClTydWKI0GMNmnWpFukWhd4OXleyl2+q4qRJi3HhNacrok3s/2r+CnT
 QRg8i/0SDvxGuXazrTZT
 =1HAE
 -----END PGP SIGNATURE-----

Merge tag 'for-v3.7' of git://git.infradead.org/users/cbou/linux-pstore

Pull pstore changes from Anton Vorontsov:

 1) We no longer ad-hoc to the function tracer "high level"
    infrastructure and no longer use its debugfs knobs.  The change
    slightly touches kernel/trace directory, but it got the needed ack
    from Steven Rostedt:

      http://lkml.org/lkml/2012/8/21/688

 2) Added maintainers entry;

 3) A bunch of fixes, nothing special.

* tag 'for-v3.7' of git://git.infradead.org/users/cbou/linux-pstore:
  pstore: Avoid recursive spinlocks in the oops_in_progress case
  pstore/ftrace: Convert to its own enable/disable debugfs knob
  pstore/ram: Add missing platform_device_unregister
  MAINTAINERS: Add pstore maintainers
  pstore/ram: Mark ramoops_pstore_write_buf() as notrace
  pstore/ram: Fix printk format warning
  pstore/ram: Fix possible NULL dereference
2012-10-07 17:30:50 +09:00
Trond Myklebust
19c54abab7 NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
Split it into two functions, one which checks if layoutgets are blocked,
and one which checks if the layout stateid has expired.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-05 16:56:58 -07:00
Wei Yongjun
c99b6841d7 omfs: convert to use beXX_add_cpu()
Convert cpu_to_beXX(beXX_to_cpu(E1) + E2) to use beXX_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:31 +09:00
Sachin Kamat
9fb8844210 fs/proc/root.c: use NULL instead of 0 for pointer
This cleanup also fixes the following sparse warning:

  fs/proc/root.c:64:45: warning: Using plain integer as NULL pointer

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:19 +09:00
Prasad Joshi
ab4a1f2470 proc_sysctl.c: use BUG_ON instead of BUG
The use of if (!head) BUG(); can be replaced with the single line
BUG_ON(!head).

Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:18 +09:00
yan
17baa2a2c4 proc: use kzalloc instead of kmalloc and memset
Part of the memory will be written twice after this change, but that
should be negligible.

[akpm@linux-foundation.org: fix __proc_create() coding-style issues, remove unneeded zero-initialisations]
Signed-off-by: yan <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:18 +09:00
yan
0e06936057 proc: no need to initialize proc_inode->fd in proc_get_inode()
proc_get_inode() obtains the inode via a call to iget_locked().
iget_locked() calls alloc_inode() which will call proc_alloc_inode() which
clears proc_inode.fd, so there is no need to clear this field in
proc_get_inode().

If iget_locked() instead found the inode via find_inode_fast(), that inode
will not have I_NEW set so this change has no effect.

Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:18 +09:00
yan
620727506d proc: return -ENOMEM when inode allocation failed
If proc_get_inode() returns NULL then presumably it encountered memory
exhaustion.  proc_lookup_de() should return -ENOMEM in this case, not
-EINVAL.

Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:17 +09:00
Denys Vlasenko
2aa362c49c coredump: extend core dump note section to contain file names of mapped files
This note has the following format:

long count     -- how many files are mapped
long page_size -- units for file_ofs
array of [COUNT] elements of
   long start
   long end
   long file_ofs
followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:17 +09:00
Denys Vlasenko
49ae4d4b11 coredump: add a new elf note with siginfo of the signal
Existing PRSTATUS note contains only si_signo, si_code, si_errno fields
from the siginfo of the signal which caused core to be dumped.

There are tools which try to analyze crashes for possible security
implications, and they want to use, among other data, si_addr field from
the SIGSEGV.

This patch adds a new elf note, NT_SIGINFO, which contains the complete
siginfo_t of the signal which killed the process.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:16 +09:00
Denys Vlasenko
5ab1c309b3 coredump: pass siginfo_t* to do_coredump() and below, not merely signr
This is a preparatory patch for the introduction of NT_SIGINFO elf note.

With this patch we pass "siginfo_t *siginfo" instead of "int signr" to
do_coredump() and put it into coredump_params.  It will be used by the
next patch.  Most changes are simple s/signr/siginfo->si_signo/.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:16 +09:00
Oleg Nesterov
0f4cfb2e4e coredump: use SUID_DUMPABLE_ENABLED rather than hardcoded 1
Cosmetic. Change setup_new_exec() and task_dumpable() to use
SUID_DUMPABLE_ENABLED for /bin/grep.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:16 +09:00
Oleg Nesterov
12a2b4b224 coredump: add support for %d=__get_dumpable() in core name
Some coredump handlers want to create a core file in a way compatible with
standard behavior.  Standard behavior with fs.suid_dumpable = 2 is to
create core file with uid=gid=0.  However, there was no way for coredump
handler to know that the process being dumped was suid'ed.

This patch adds the new %d specifier for format_corename() which simply
reports __get_dumpable(mm->flags), this is compatible with
/proc/sys/fs/suid_dumpable we already have.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=787135

Developed during a discussion with Denys Vlasenko.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Alex Kelly <alex.page.kelly@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Jiri Moskovcak <jmoskovc@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:15 +09:00
Alex Kelly
179899fd5d coredump: update coredump-related headers
Create a new header file, fs/coredump.h, which contains functions only
used by the new coredump.c.  It also moves do_coredump to the
include/linux/coredump.h header file, for consistency.

Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:15 +09:00
Alex Kelly
046d662f48 coredump: make core dump functionality optional
Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of
core dump.  This saves approximately 2.6k in the compiled kernel, and
complements CONFIG_ELF_CORE, which now depends on it.

CONFIG_COREDUMP also disables coredump-related sysctls, except for
suid_dumpable and related functions, which are necessary for ptrace.

[akpm@linux-foundation.org: fix binfmt_aout.c build]
Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:15 +09:00
Namjae Jeon
14864655c0 fat: simplify writeback_inode()
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:12 +09:00
Namjae Jeon
126ac0518c fat: no need to reset EOF in ent_put for FAT32
#define FAT_ENT_EOF(EOF_FAT32)

there is no need to reset value of 'new' for FAT32 as the values is
already correct

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:12 +09:00
Cruz Julian Bishop
441dff34aa fs/fat: fix checkpatch issues in fatent.c
1: Stop any lines going over 80 characters
2: Remove a blank line before EXPORT_SYMBOL_GPL

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:11 +09:00
Cruz Julian Bishop
f08b4988f2 fs/fat: fix all other checkpatch issues in dir.c
1: Import linux/uaccess.h instead of asm.uaccess.h
2: Stop any lines going over 80 characters
3: Stopped setting any variables in if statements
4: Stopped splitting quoted strings
5: Removed unneeded parentheses

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:11 +09:00
Cruz Julian Bishop
3f36f6100a fs/fat: fix some small checkpatch issues in dir.c
Simply remove the spacing between function definitions and
EXPORT_SYMBOL_GPL calls, which were previously generating warnings.

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:11 +09:00
Cruz Julian Bishop
c90518290e fs/fat: fix two checkpatch issues in cache.c
This does the following:
	1: Splits the arguments of a function call to stop it
		from exceeding 80 characters
	2: Re-indents the arguments of another function call
		to prevent the splitting of a quoted string.

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:10 +09:00
Cruz Julian Bishop
4a3aeb13b7 fs/fat: chang indentation of some comments in fat.h
The comments were not lined up properly, so I just re-indented them.

This also fixes a stupid checkpatch issue unknowingly

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:10 +09:00
Cruz Julian Bishop
d5a4a3867f fs/fat: fix some checkpatch issues in fat.h
Mainly fix spacing issues such as "foo * bar" and "foo= bar"

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:10 +09:00
Cruz Julian Bishop
85cb9bf535 fs/fat: fix a checkpatch issue in namei_msdos.c
Add a space before an equals sign/operator in line 410.

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:09 +09:00
Steven J. Magnani
7669e8fb09 fat (exportfs): fix dentry reconnection
Maintain an index of directory inodes by starting cluster, so that
fat_get_parent() can return the proper cached inode rather than inventing
one that cannot be traced back to the filesystem root.

Add a new msdos/vfat binary mount option "nfs" so that FAT filesystems
that are _not_ exported via NFS are not saddled with maintenance of an
index they will never use.

Finally, simplify NFS file handle generation and lookups.  An
ext2-congruent implementation is adequate for FAT needs.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:09 +09:00
Steven J. Magnani
21b6633d51 fat (exportfs): move NFS support code
Under memory pressure, the system may evict dentries from cache.  When the
FAT driver receives a NFS request involving an evicted dentry, it is
unable to reconnect it to the filesystem root.  This causes the request to
fail, often with ENOENT.

This is partially due to ineffectiveness of the current FAT NFS
implementation, and partially due to an unimplemented fh_to_parent method.
 The latter can cause file accesses to fail on shares exported with
subtree_check.

This patch set provides the FAT driver with the ability to
reconnect dentries.  NFS file handle generation and lookups are simplified
and made congruent with ext2.

Testing has involved a memory-starved virtual machine running 3.5-rc5 that
exports a ~2 GB vfat filesystem containing a kernel tree (~770 MB, ~40000
files, 9 levels).  Both 'cp -r' and 'ls -lR' operations were performed
from a client, some overlapping, some consecutive.  Exports with
'subtree_check' and 'no_subtree_check' have been tested.

Note that while this patch set improves FAT's NFS support, it does not
eliminate ESTALE errors completely.

The following should be considered for NFS clients who are sensitive to ESTALE:

* Mounting with lookupcache=none
  Unfortunately this can degrade performance severely, particularly for deep
  filesystems.

* Incorporating VFS patches to retry ESTALE failures on the client-side,
  such as https://lkml.org/lkml/2012/6/29/381

* Handling ESTALE errors in client application code

This patch:

Move NFS-related code into its own C file.  No functional changes.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:09 +09:00
Namjae Jeon
4b63709861 fat: use accessor function for msdos_dir_entry 'start'
Use accessor function for msdos_dir_entry 'start'

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:08 +09:00
Wei Yongjun
32daab969c hpfs: convert to use leXX_add_cpu()
Convert cpu_to_leXX(leXX_to_cpu(E1) + E2) to use leXX_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:08 +09:00
Alan Cox
6eec482f47 binfmt_elf: Uninitialized variable
load_elf_interp() has interp_map_addr carefully described as
"uninitialized_var" and marked so as to avoid a warning.  However if you
trace the code it is passed into load_elf_interp and then this value is
checked against NULL.

As this return value isn't used this is actually safe but it freaks
various analysis tools that see un-initialized memory addresses being read
before their value is ever defined.

Set it to NULL as a matter of programming good taste if nothing else

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:00 +09:00
Paton J. Lewis
03a7beb55b epoll: support for disabling items, and a self-test app
Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll
item.  If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the epoll item in a multi-threaded environment.  Also added a new
test_epoll self- test app to both demonstrate the need for this feature
and test it.

Signed-off-by: Paton J. Lewis <palewis@adobe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Holland <pholland@adobe.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:00 +09:00
Fengguang Wu
125c4c706b idr: rename MAX_LEVEL to MAX_IDR_LEVEL
To avoid name conflicts:

  drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

While at it, also make the other names more consistent and add
parentheses.

[akpm@linux-foundation.org: repair fallout]
[sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: walter harms <wharms@bfs.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:04:56 +09:00
Dmitry Monakhov
c278531d39 ext4: fix ext4_flush_completed_IO wait semantics
BUG #1) All places where we call ext4_flush_completed_IO are broken
    because buffered io and DIO/AIO goes through three stages
    1) submitted io,
    2) completed io (in i_completed_io_list) conversion pended
    3) finished  io (conversion done)
    And by calling ext4_flush_completed_IO we will flush only
    requests which were in (2) stage, which is wrong because:
     1) punch_hole and truncate _must_ wait for all outstanding unwritten io
      regardless to it's state.
     2) fsync and nolock_dio_read should also wait because there is
        a time window between end_page_writeback() and ext4_add_complete_io()
        As result integrity fsync is broken in case of buffered write
        to fallocated region:
        fsync                                      blkdev_completion
	 ->filemap_write_and_wait_range
                                                   ->ext4_end_bio
                                                     ->end_page_writeback
          <-- filemap_write_and_wait_range return
	 ->ext4_flush_completed_IO
   	 sees empty i_completed_io_list but pended
   	 conversion still exist
                                                     ->ext4_add_complete_io

BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series

This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
   wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
   prevent endless wait

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2012-10-05 11:31:55 -04:00
Trond Myklebust
22aaf71495 NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
Clamp the layout barrier sequence id to the current sequence id
minus the maximum number of outstanding layoutget requests.

Also ensure that we correctly initialise lo->plh_barrier if there are
no layout segments associated to this layout header.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04 16:57:48 -07:00
Trond Myklebust
0f35ad6f68 NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04 16:28:17 -07:00
Linus Torvalds
e1cc485262 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 & udf fixes from Jan Kara:
 "Shortlog pretty much says it all.

  The interesting bits are UDF support for direct IO and ext3 fix for a
  long standing oops in data=journal mode."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: Fix assertion failure in commit code due to lacking transaction credits
  UDF: Add support for O_DIRECT
  ext3: Replace 0 with NULL for pointer in super.c file
  udf: add writepages support for udf
  ext3: don't clear orphan list on ro mount with errors
  reiserfs: Make reiserfs_xattr_handlers static
2012-10-04 09:14:01 -07:00
David Sterba
7e97b8daf6 btrfs: allow setting NOCOW for a zero sized file via ioctl
Hi,

the patch si simple, but it has user visible impact and I'm not quite sure how
to resolve it.

In short, $subj says it, chattr -C supports it and we want to use it.

The conditions that acutally allow to change the NOCOW flag are clear. What if
I try to set the flag on a file that is not empty? Options:

1) whole ioctl will fail, EINVAL
2.1) ioctl will succeed, the NOCOW flag will be silently removed, but the file
     will stay COW-ed and checksummed
2.2) ioctl will succeed, flag will not be removed and a syslog message will
     warn that the COW flag has not been changed
2.2.1) dtto, no syslog message

Man page of chattr states that

 "If it is set on a file which already has data blocks, it is undefined when
 the blocks assigned to the file will be fully stable."

Yes, it's undefined and with current implementation it'll never happen. So from
this end, the user cannot expect anything. I'm trying to find a reasonable
behaviour, so that a command like 'chattr -R -aijS +C' to tweak a broad set of
flags in a deep directory does not fail unnecessarily and does not pollute the
log.

My personal preference is 2.2.1, but my dev's oppinion is skewed, not counting
the fact that I know the code and otherwise would look there before consulting
the documentation.

The patch implements 2.2.1.

david

-------------8<-------------------
From: David Sterba <dsterba@suse.cz>

It's safe to turn off checksums for a zero sized file.

http://thread.gmane.org/gmane.comp.file-systems.btrfs/18030

"We cannot switch on NODATASUM for a file that already has extents that
are checksummed. The invariant here is that either all the extents or
none are checksummed.

Theoretically it's possible to add/remove all checksums from a given
file, but it's a potentially longtime operation, the file has to be in
some intermediate state where the checksums partially exist but have to
be ignored (for the csum->nocsum) until the file is fully converted,
this brings more special cases to extent handling, it has to survive
power failure and remain consistent, and probably needs to be restarted
after next mount."

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-10-04 09:40:00 -04:00
Josef Bacik
c3308f84c1 Btrfs: fix punch hole when no extent exists
I saw the warning in btrfs_drop_extent_cache where our end is less than our
start while running xfstests 68 in a loop.  This is because we
unconditionally do drop_end = min(end, extent_end) in
__btrfs_drop_extents(), even though we may not have found an extent in the
range we were looking to drop.  So keep track of wether or not we found
something, and if we didn't just use our end.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:40:00 -04:00
Josef Bacik
926ced123b Btrfs: don't do anything in our ->freeze_fs and ->unfreeze_fs
We do not need to do anything special to freeze or unfreeze, it's all taken
care of by the generic work, and what we currently have is wrong anyway
since we shouldn't be returnning to userspace with mutexes held anyway.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:40:00 -04:00
Josef Bacik
892951a92e Btrfs: remove unused write cache pages hook
The btree inode has it's own write cache pages so we can remove this write
cache pages hook as it's not used.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:59 -04:00
Josef Bacik
b5bae2612a Btrfs: fix race when getting the eb out of page->private
We can race when checking wether PagePrivate is set on a page and we
actually have an eb saved in the pages private pointer.  We could have
easily written out this page and released it in the time that we did the
pagevec lookup and actually got around to looking at this page.  So use
mapping->private_lock to ensure we get a consistent view of the
page->private pointer.  This is inline with the alloc and releasepage paths
which use private_lock when manipulating page->private.  Thanks,

Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:59 -04:00
Josef Bacik
ff44c6e36d Btrfs: do not hold the write_lock on the extent tree while logging
Dave Sterba pointed out a sleeping while atomic bug while doing fsync.  This
is because I'm an idiot and didn't realize that rwlock's were spin locks, so
we've been holding this thing while doing allocations and such which is not
good.  This patch fixes this by dropping the write lock before we do
anything heavy and re-acquire it when it is done.  We also need to take a
ref on the em's in case their corresponding pages are evicted and mark them
as being logged so that releasepage does not remove them and doesn't remove
them from our local list.  Thanks,

Reported-by: Dave Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:58 -04:00
Josef Bacik
98114659e0 Btrfs: fix race with freeze and free space inodes
So we start our freeze, somebody comes in and does an fsync() on a file
where we have to commit a transaction for whatever reason, and we will
deadlock because the freeze is waiting on FS_FREEZE people to stop writing
to the file system, but the transaction is waiting for its free space inodes
to be written out, which are in turn waiting on sb_start_intwrite while
trying to write the file extents.  To fix this we'll just skip the
sb_start_intwrite() if we TRANS_JOIN_NOLOCK since we're being waited on by a
transaction commit so we're safe wrt to freeze and this will keep us from
deadlocking.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:58 -04:00
Liu Bo
6bbe3a9c80 Btrfs: kill obsolete arguments in btrfs_wait_ordered_extents
nocow_only is now an obsolete argument.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-04 09:39:57 -04:00
Liu Bo
2e90cf858f Btrfs: cleanup fs_info->hashers
fs_info->hashers is now an obsolete one.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-04 09:39:57 -04:00
Liu Bo
ab26e9d6c8 Btrfs: cleanup for duplicated code in find_free_extent
There is already an 'add free space' phrase in front of this one, we
needn't to redo it.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-04 09:39:57 -04:00
Josef Bacik
60376ce4a8 Btrfs: fix race in sync and freeze again
I screwed this up, there is a race between checking if there is a running
transaction and actually starting a transaction in sync where we could race
with a freezer and get ourselves into trouble.  To fix this we need to make
a new join type to only do the try lock on the freeze stuff.  If it fails
we'll return EPERM and just return from sync.  This fixes a hang Liu Bo
reported when running xfstest 68 in a loop.  Thanks,

Reported-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:56 -04:00
David Sterba
b3ae244e71 btrfs: return EPERM upon rmdir on a subvolume
A subvolume cannot be deleted via rmdir, but the error code ENOTEMPTY
is confusing. Return EPERM instead, as this is not permitted.

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-10-04 09:39:56 -04:00
Wei Yongjun
ebb3dad435 Btrfs: using for_each_set_bit_from to simplify the code
Using for_each_set_bit_from() to simplify the code.

spatch with a semantic match is used to found this.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
2012-10-04 09:39:55 -04:00
Anand Jain
1bcea35597 Btrfs: write_buf is now callable outside send.c
Developing service cmds needs it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
2012-10-04 09:39:55 -04:00
Tsutomu Itoh
b4f359ab06 Btrfs: remove unnecessary code in btree_get_extent()
Unnecessary lookup_extent_mapping() is removed because an error is
returned to the caller.
This patch was made based on the advice from Stefan Behrens, thanks.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-10-04 09:39:54 -04:00
Tsutomu Itoh
0433f20d43 Btrfs: cleanup of error processing in btree_get_extent()
This patch simplifies a little complex error processing in
btree_get_extent().

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-10-04 09:39:54 -04:00
Andy Adamson
5f65753033 NFSv4 set open access operation call flag in nfs4_init_opendata_res
nfs4_open_recover_helper zeros the nfs4_opendata result structures, removing
the result access_request information which leads to an XDR decode error.

Move the setting of the result access_request field to nfs4_init_opendata_res
which sets all the other required nfs4_opendata result fields and is shared
between the open and recover open paths.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-03 17:10:28 -07:00
Dan Carpenter
74b217d0d3 ore: signedness bug in _sp2d_min_pg()
This for loop doesn't work correctly when "p" is unsigned.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-10-03 13:51:51 -07:00
Trond Myklebust
8544a9dc18 NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
CONFIG_EXPERIMENTAL is deprecated and, regardless of that, this code
is being enabled in most newer distributions.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-03 10:54:50 -07:00
Alex Elder
6285bc2312 ceph: avoid 32-bit page index overflow
A pgoff_t is defined (by default) to have type (unsigned long).  On
architectures such as i686 that's a 32-bit type.  The ceph address
space code was attempting to produce 64 bit offsets by shifting a
page's index by PAGE_CACHE_SHIFT, but the result was not what was
desired because the shift occurred before the result got promoted
to 64 bits.

Fix this by converting all uses of page->index used in this way to
use the page_offset() macro, which ensures the 64-bit result has the
intended value.

This fixes http://tracker.newdream.net/issues/3112

Reported-by:  Mohamed Pakkeer <pakkeer.mohideen@realimage.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-03 10:51:18 -05:00
Sage Weil
457712a0bc ceph: return EIO on invalid layout on GET_DATALOC ioctl
If the user calls GET_DATALOC on a file with an invalid (e.g.,
zeroed) layout, return EIO to userland.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-10-03 10:51:17 -05:00
Linus Torvalds
eb0ad9c06d JFS TRIM support and some minor fixes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQbEYbAAoJEDaohF61QIxkd4QP/Akm+fi7I8vHFc7jdckxmt+X
 BS0i1SpMvxH29JhSx2ZaamNdgSW8hc3pVP2XkoICUMIDOBcx564y1mRRnOXZ1mdw
 NAwPmx4gP5OMtHOOJS0X2z7cIRnyFvhtpPtrbU2P5aOmdpEPpLMKilFoOOrWfsn0
 bhNtFQOIZnYetG9isdNxi/P3NWctCDkL2x+PvKjUp081zDkJvSWjn6RB5QGaGKVF
 BOXXSK/1LYvRSJehYXekLObhlCCHUV8jMDsZ8dzJxiHaOdSOKPbjCGG1D8CJIYV2
 7lOaZvjs7muQR4QME2CBQ6VDPW2KUPapAKRuBCglMXwP+Ym+aU/MpXYCE3nydgeQ
 kVR5jvwQcB2hRj8ALyCL79i6jM1DoN3rbaU2FHdU9enHmmEuG4424D5pwHHyUlRJ
 zb52HPGpilAHIyXHAhAl2EOvlFA60Sx1dUKiAkgc+mFKcsZaGUJw3XWSp99trSa2
 f7ZHzrmQaq0tcoYDiH00wZZfCHPlwuXxmFkRrC721xjYNOaMZKAn8n7Xi42Ap30Z
 7IzWhwNOOZuxx9CmEZDXY+6UAx28aRXsuiKwOeUVLyXcAdOx/9DxjoUjUbgNZqf0
 wu6z3kXqkD3ZnkOiKcV1lE1oBtdgZtY2S92s6SBdduxojZqC9U8RYU9ogssePo5R
 pK69xihOXuAetSGmopPO
 =9O7B
 -----END PGP SIGNATURE-----

Merge tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggy

Pull JFS update from Dave Kleikamp:
 "JFS TRIM support and some minor fixes"

* tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggy:
  jfs: Fix do_div precision in commit b40c2e66
  JFS: use list_move instead of list_del/list_add
  jfs: Remove obsolete email address
  fs/jfs: TRIM support for JFS Filesystem
2012-10-03 08:48:21 -07:00
Linus Torvalds
88265322c1 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - Integrity: add local fs integrity verification to detect offline
     attacks
   - Integrity: add digital signature verification
   - Simple stacking of Yama with other LSMs (per LSS discussions)
   - IBM vTPM support on ppc64
   - Add new driver for Infineon I2C TIS TPM
   - Smack: add rule revocation for subject labels"

Fixed conflicts with the user namespace support in kernel/auditsc.c and
security/integrity/ima/ima_policy.c.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
  Documentation: Update git repository URL for Smack userland tools
  ima: change flags container data type
  Smack: setprocattr memory leak fix
  Smack: implement revoking all rules for a subject label
  Smack: remove task_wait() hook.
  ima: audit log hashes
  ima: generic IMA action flag handling
  ima: rename ima_must_appraise_or_measure
  audit: export audit_log_task_info
  tpm: fix tpm_acpi sparse warning on different address spaces
  samples/seccomp: fix 31 bit build on s390
  ima: digital signature verification support
  ima: add support for different security.ima data types
  ima: add ima_inode_setxattr/removexattr function and calls
  ima: add inode_post_setattr call
  ima: replace iint spinblock with rwlock/read_lock
  ima: allocating iint improvements
  ima: add appraise action keywords and default rules
  ima: integrity appraisal extension
  vfs: move ima_file_free before releasing the file
  ...
2012-10-02 21:38:48 -07:00
Linus Torvalds
782c3fb22b No big changes for 3.7 in UBIFS:
* Error reporting and debug printing improvements
 * Power cut emulation fixes
 * Minor cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQaumRAAoJECmIfjd9wqK016UQAImRrzhPoegy2Pr7j/CpzUST
 wJru3QlgdHgjukWer/s4CGTTjmTLMgLAllUnJLO3yn+Po4o67RJxI4SjhMf5lwDG
 TSmB/ttQpRmYxaaS2EOC6Ow+jdMM2MhAsPrD8nAfLvOm/XduuSyquPY/xFa6U4rJ
 oE4w23b5t2BZI/UJmsCaYs6Rj1h8w1aZUukii+J9BN8DGygn/vJuI9EDNKdtQmxe
 vmoT2uWKPyL859kY/+lONH048NWMkEB3BhNmuAsFqGY/tdQRdmeyhgWEKNjbmc/M
 DXZe7ovy5umUyw6iDpfhEhmOBdV9xorDySsmKNP1/q60rUGD6R/tynC4y1q1IueB
 c6fbry6xpwKIccMS/w7x9cbZxMnG++iQoj71pr2FV9Bg5Xd7XduM/XjzI5rMHuqv
 EnmcwT/LIu5Lw1vMU8pfW2yIdTkotNR/2aca1bs0f4WtS3AhngdqmNbFmddcpY37
 qvTYDrSMOW/IaegiDQzucVXNhIs8ufIULZzmj9CtmgeyvNVJboTJRik+HJDWFqeC
 04TaM8k+Yjp/isbzTmWEyBG3G06cNpMwLtT5OKcpRVQ1ZYu62AbnR+Dbep6n0qt+
 gBORW8TTY32GudeSiHLKCrOsrkx3zNSli6T4Sw9YD5e29Dee62KRTgjKVplezQfB
 Djh70vUyX2tHdYlW88gf
 =gM4M
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.7-rc1' of git://git.infradead.org/linux-ubifs

Pull ubifs changes from Artem Bityutskiy:
 "No big changes for 3.7 in UBIFS:
   - Error reporting and debug printing improvements
   - Power cut emulation fixes
   - Minor cleanups"

Fix trivial conflict in fs/ubifs/debug.c due to the user namespace
changes.

* tag 'upstream-3.7-rc1' of git://git.infradead.org/linux-ubifs:
  UBIFS: print less
  UBIFS: use pr_ helper instead of printk
  UBIFS: comply with coding style
  UBIFS: use __aligned() attribute
  UBIFS: remove __DATE__ and __TIME__
  UBIFS: fix power cut emulation for mtdram
  UBIFS: improve scanning debug output
  UBIFS: always print full error reports
  UBIFS: print PID in debug messages
2012-10-02 20:47:48 -07:00
Linus Torvalds
60c7b4df82 xfs: update for 3.7-rc1
Several enhancements and cleanups:
 
 - make inode32 and inode64 remountable options
 - SEEK_HOLE/SEEK_DATA enhancements
 - cleanup struct declarations in xfs_mount.h
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQayQwAAoJENaLyazVq6ZO2H0P/RiWpYlVLP0ZNbxUOvL33WFy
 xOHwNkhRG23Z0sgDfrUJjDibzOuSmf/XothsChGPE4Mm8JLCjZKzxR77ySP+pbC5
 sylrciXl5E2tY3wtW8oh2kQORLRneqA6qXNCrOtYklqwSmB0qx7bf9OdjL/ALiuC
 SvTAn/PhyVZqGJgVZyEAc63CsbSzK2XmLgyfeQeA7JbSptbg5fDkFYrbNE+zr+yW
 S1umMsuZW+zwDjzRq0lGkLcYOH3fD0JpMrfqvtQsk3+SIDlq1HSaObD63dhOJVUV
 LUpOqgXCPEC3nOL3Dh+i8gv5OTRLJW0P2zYuAY++kqYnuRsYU18tcfdd0loTGeHV
 LulurSIMBJV19Qx1K3C3E8KPwbwNwNlvmpEbgXy00Qet7LTPbofcUXDi3mcV6ozQ
 SMGQh42VGV4S8wEjAp93wLva7xLf6enV/X6Stuzy/ec8HN8K2tYMyYyGvSxZbjmq
 9JTxCulDvbr7EnSDqCkH1inlQ4cXvBSzOi2trDpQbKq0cuYbqlzgL2sloBv+y+CB
 4fTWEeeisQ05VhHU8Yp3bsVtiyUjGGcbOgvqM+NHmmh1HuHUmiBnXh7k3RfaLfBi
 E4RprFT0Yrpo16mt3Fn0GjkhY87DYHBGIehFOlu13zKPc5q+IBTwIjSugxoa23EL
 uXTtFFHMxR4pUV/An5Wf
 =G6gd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.7-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "Several enhancements and cleanups:

   - make inode32 and inode64 remountable options
   - SEEK_HOLE/SEEK_DATA enhancements
   - cleanup struct declarations in xfs_mount.h"

* tag 'for-linus-v3.7-rc1' of git://oss.sgi.com/xfs/xfs:
  xfs: Make inode32 a remountable option
  xfs: add inode64->inode32 transition into xfs_set_inode32()
  xfs: Fix mp->m_maxagi update during inode64 remount
  xfs: reduce code duplication handling inode32/64 options
  xfs: make inode64 as the default allocation mode
  xfs: Fix m_agirotor reset during AG selection
  Make inode64 a remountable option
  xfs: stop the sync worker before xfs_unmountfs
  xfs: xfs_seek_hole() refinement with hole searching from page cache for unwritten extents
  xfs: xfs_seek_data() refinement with unwritten extents check up from page cache
  xfs: Introduce a helper routine to probe data or hole offset from page cache
  xfs: Remove type argument from xfs_seek_data()/xfs_seek_hole()
  xfs: fix race while discarding buffers [V4]
  xfs: check for possible overflow in xfs_ioc_trim
  xfs: unlock the AGI buffer when looping in xfs_dialloc
  xfs: kill struct declarations in xfs_mount.h
  xfs: fix uninitialised variable in xfs_rtbuf_get()
2012-10-02 20:42:58 -07:00