Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then
letting it do all the work. But freezing is more of an fs thing, and doesn't
really have much to do with the bdev at all, all the work gets done with the
super. In btrfs we do not populate s_bdev, since we can have multiple bdev's
for one fs and setting s_bdev makes removing devices from a pool kind of tricky.
This means that freezing a btrfs filesystem fails, which causes us to corrupt
with things like tux-on-ice which use the fsfreeze mechanism. So instead of
populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs
freezing stuff into freeze_super and thaw_super. These just take the
super_block that we're freezing and does the appropriate work. It's basically
just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to
use the new super helpers. I've tested this with ext4 and btrfs and verified
everything continues to work the same as before.
The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if
the fs is already frozen. I thought this was a better solution than adding a
freeze counter to the super_block, but if everybody hates this idea I'm open to
suggestions. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
need list_for_each_entry_safe() here. Original didn't even
have restart logics, so if you race with umount() it blew up.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
At the same time we can kill s_need_restart and local mutex in there.
__put_super() made public for a while; will be gone later.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We used to remove from s_list and s_instances at the same
time. So let's *not* do the former and skip superblocks
that have empty s_instances in the loops over s_list.
The next step, of course, will be to get rid of rescan logics
in those loops.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make sure that s_umount is acquired *before* we drop the final
active reference; we still have the fast path (atomic_dec_unless)
and we have gotten rid of the window between the moment when
s_active hits zero and s_umount is acquired. Which simplifies
the living hell out of grab_super() and inotify pin_to_kill()
stuff.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
First of all, get_sb_nodev() grabs anon dev minor and we
never free it in ecryptfs ->kill_sb(). Moreover, on one
of the failure exits in ecryptfs_get_sb() we leak things -
it happens before we set ->s_root and ->put_super() won't
be called in that case. Solution: kill ->put_super(), do
all that stuff in ->kill_sb(). And use kill_anon_sb() instead
of generic_shutdown_super() to deal with anon dev leak.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We set the "it's dead, don't mount on it" flag _and_ do not remove it if
we turn the damn thing negative and leave it around. And if it goes
positive afterwards, well...
Fortunately, there's only one place where that needs to be caught:
only d_delete() can turn the sucker negative without immediately freeing
it; all other places that can lead to ->d_iput() call are followed by
unconditionally freeing struct dentry in question. So the fix is obvious:
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16014
Reported-by: Adam Tkac <vonsch@gmail.com>
Tested-by: Adam Tkac <vonsch@gmail.com>
Cc: <stable@kernel.org> [2.6.34.x]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (92 commits)
powerpc: Remove unused 'protect4gb' boot parameter
powerpc: Build-in e1000e for pseries & ppc64_defconfig
powerpc/pseries: Make request_ras_irqs() available to other pseries code
powerpc/numa: Use ibm,architecture-vec-5 to detect form 1 affinity
powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim
powerpc: Use smt_snooze_delay=-1 to always busy loop
powerpc: Remove check of ibm,smt-snooze-delay OF property
powerpc/kdump: Fix race in kdump shutdown
powerpc/kexec: Fix race in kexec shutdown
powerpc/kexec: Speedup kexec hash PTE tear down
powerpc/pseries: Add hcall to read 4 ptes at a time in real mode
powerpc: Use more accurate limit for first segment memory allocations
powerpc/kdump: Use chip->shutdown to disable IRQs
powerpc/kdump: CPUs assume the context of the oopsing CPU
powerpc/crashdump: Do not fail on NULL pointer dereferencing
powerpc/eeh: Fix oops when probing in early boot
powerpc/pci: Check devices status property when scanning OF tree
powerpc/vio: Switch VIO Bus PM to use generic helpers
powerpc: Avoid bad relocations in iSeries code
powerpc: Use common cpu_die (fixes SMP+SUSPEND build)
...
* 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (207 commits)
drm/radeon/kms/pm/r600: select the mid clock mode for single head low profile
drm/radeon: fix power supply kconfig interaction.
drm/radeon/kms: record object that have been list reserved
drm/radeon: AGP memory is only I/O if the aperture can be mapped by the CPU.
drm/radeon/kms: don't default display priority to high on rs4xx
drm/edid: fix typo in 1600x1200@75 mode
drm/nouveau: fix i2c-related init table handlers
drm/nouveau: support init table i2c device identifier 0x81
drm/nouveau: ensure we've parsed i2c table entry for INIT_*I2C* handlers
drm/nouveau: display error message for any failed init table opcode
drm/nouveau: fix init table handlers to return proper error codes
drm/nv50: support fractional feedback divider on newer chips
drm/nv50: fix monitor detection on certain chipsets
drm/nv50: store full dcb i2c entry from vbios
drm/nv50: fix suspend/resume with DP outputs
drm/nv50: output calculated crtc pll when debugging on
drm/nouveau: dump pll limits entries when debugging is on
drm/nouveau: bios parser fixes for eDP boards
drm/nouveau: fix a nouveau_bo dereference after it's been destroyed
drm/nv40: remove some completed ctxprog TODOs
...
* 'dbg-early-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
echi-dbgp: Add kernel debugger support for the usb debug port
earlyprintk,vga,kdb: Fix \b and \r for earlyprintk=vga with kdb
kgdboc: Add ekgdboc for early use of the kernel debugger
x86,early dr regs,kgdb: Allow kernel debugger early dr register access
x86,kgdb: Implement early hardware breakpoint debugging
x86, kgdb, init: Add early and late debug states
x86, kgdb: early trap init for early debug
* 'kdb-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: (25 commits)
kdb,debug_core: Allow the debug core to receive a panic notification
MAINTAINERS: update kgdb, kdb, and debug_core info
debug_core,kdb: Allow the debug core to process a recursive debug entry
printk,kdb: capture printk() when in kdb shell
kgdboc,kdb: Allow kdb to work on a non open console port
kgdb: Add the ability to schedule a breakpoint via a tasklet
mips,kgdb: kdb low level trap catch and stack trace
powerpc,kgdb: Introduce low level trap catching
x86,kgdb: Add low level debug hook
kgdb: remove post_primary_code references
kgdb,docs: Update the kgdb docs to include kdb
kgdboc,keyboard: Keyboard driver for kdb with kgdb
kgdb: gdb "monitor" -> kdb passthrough
sparc,sunzilog: Add console polling support for sunzilog serial driver
sh,sh-sci: Use NO_POLL_CHAR in the SCIF polled console code
kgdb,8250,pl011: Return immediately from console poll
kgdb: core changes to support kdb
kdb: core for kgdb back end (2 of 2)
kdb: core for kgdb back end (1 of 2)
kgdb,blackfin: Add in kgdb_arch_set_pc for blackfin
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (31 commits)
dquot: Detect partial write error to quota file in write_blk() and add printk_ratelimit for quota error messages
ocfs2: Fix lock inversion in quotas during umount
ocfs2: Use __dquot_transfer to avoid lock inversion
ocfs2: Fix NULL pointer deref when writing local dquot
ocfs2: Fix estimate of credits needed for quota allocation
ocfs2: Fix quota locking
ocfs2: Avoid unnecessary block mapping when refreshing quota info
ocfs2: Do not map blocks from local quota file on each write
quota: Refactor dquot_transfer code so that OCFS2 can pass in its references
quota: unify quota init condition in setattr
quota: remove sb_has_quota_active in get/set_info
quota: unify ->set_dqblk
quota: unify ->get_dqblk
ext3: make barrier options consistent with ext4
quota: Make quota stat accounting lockless.
suppress warning: "quotatypes" defined but not used
ext3: Fix waiting on transaction during fsync
jbd: Provide function to check whether transaction will issue data barrier
ufs: add ufs speciffic ->setattr call
BKL: Remove BKL from ext2 filesystem
...
* 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (113 commits)
omap4: Add support for i2c init
omap: Fix i2c platform init code for omap4
OMAP2 clock: fix recursive spinlock attempt when CONFIG_CPU_FREQ=y
OMAP powerdomain, hwmod, omap_device: add some credits
OMAP4 powerdomain: Support LOWPOWERSTATECHANGE for powerdomains
OMAP3 clock: add support for setting the divider for sys_clkout2 using clk_set_rate
OMAP4 powerdomain: Fix pwrsts flags for ALWAYS ON domains
OMAP: timers: Fix clock source names for OMAP4
OMAP4 clock: Support clk_set_parent
OMAP4: PRCM: Add offset defines for all CM registers
OMAP4: PRCM: Add offset defines for all PRM registers
OMAP4: PRCM: Remove duplicate definition of base addresses
OMAP4: PRM: Remove MPU internal code name and apply PRCM naming convention
OMAP4: CM: Remove non-functional registers in ES1.0
OMAP: hwmod: Replace WARN by pr_warning for clockdomain check
OMAP: hwmod: Rename hwmod name for the MPU
OMAP: hwmod: Do not exit the iteration if one clock init failed
OMAP: hwmod: Replace WARN by pr_warning if clock lookup failed
OMAP: hwmod: Remove IS_ERR check with omap_clk_get_by_name return value
OMAP: hwmod: Fix wrong pointer iteration in oh->slaves
...
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c-nforce2: Remove redundant error messages on ACPI conflict
i2c: Use <linux/io.h> instead of <asm/io.h>
i2c-algo-pca: Fix coding style issues
i2c-dev: Fix all coding style issues
i2c-core: Fix some coding style issues
i2c-gpio: Move initialization code to subsys_initcall()
i2c-parport: Make template structure const
i2c-dev: Remove unnecessary casts
at24: Fall back to byte or word reads if needed
i2c-stub: Expose the default functionality flags
i2c/scx200_acb: Make PCI device ids constant
i2c-i801: Fix all checkpatch warnings
i2c-i801: All newer devices have all the optional features
i2c-i801: Let the user disable selected driver features
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (25 commits)
serial: Tidy REMOTE_DEBUG
serial: isicomm: handle running out of slots
serial: bfin_sport_uart: Use resource size to fix off-by-one error
tty: fix obsolete comment on tty_insert_flip_string_fixed_flag
serial: Add driver for the Altera UART
serial: Add driver for the Altera JTAG UART
serial: timbuart: make sure last byte is sent when port is closed
serial: two branches the same in timbuart_set_mctrl()
serial: uartlite: move from byte accesses to word accesses
tty: n_gsm: depends on NET
tty: n_gsm line discipline
serial: TTY: new ldiscs for staging
serial: bfin_sport_uart: drop redundant cpu depends
serial: bfin_sport_uart: drop the experimental markings
serial: bfin_sport_uart: pull in bfin_sport.h for SPORT defines
serial: bfin_sport_uart: only enable SPORT TX if data is to be sent
serial: bfin_sport_uart: drop useless status masks
serial: bfin_sport_uart: zero sport_uart_port if allocated dynamically
serial: bfin_sport_uart: protect changes to uart_port
serial: bfin_sport_uart: add support for CTS/RTS via GPIOs
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (38 commits)
net: Expose all network devices in a namespaces in sysfs
hotplug: netns aware uevent_helper
kobj: Send hotplug events in the proper namespace.
netlink: Implment netlink_broadcast_filtered
net/sysfs: Fix the bitrot in network device kobject namespace support
netns: Teach network device kobjects which namespace they are in.
kobject: Send hotplug events in all network namespaces
driver-core: fix Typo in drivers/base/core.c for CONFIG_MODULE
pci: check caps from sysfs file open to read device dependent config space
sysfs: add struct file* to bin_attr callbacks
sysfs: Remove usage of S_BIAS to avoid merge conflict with the vfs tree
sysfs: Don't use enums in inline function declaration.
sysfs-namespaces: add a high-level Documentation file
sysfs: Comment sysfs directory tagging logic
driver core: Implement ns directory support for device classes.
sysfs: Implement sysfs_delete_link
sysfs: Add support for tagged directories with untagged members.
sysfs: Implement sysfs tagged directory support.
kobj: Add basic infrastructure for dealing with namespaces.
sysfs: Remove double free sysfs_get_sb
...
This patch changes quota_tree.c:write_blk() to detect error caused by partial
write to quota file and add a macro to limit control printed quota error
messages so we won't fill up dmesg with a corrupted quota file.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
We cannot cancel delayed work from ocfs2_local_free_info because that is called
with dqonoff_mutex held and the work it cancels requires dqonoff_mutex to
finish. Cancel the work before acquiring dqonoff_mutex.
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
dquot_transfer() acquires own references to dquots via dqget(). Thus it waits
for dq_lock which creates a lock inversion because dq_lock ranks above
transaction start but transaction is already started in ocfs2_setattr(). Fix
the problem by passing own references directly to __dquot_transfer.
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
commit_dqblk() can write quota info to global file. That is actually a bad
thing to do because if we are just modifying local quota file, we are not
prepared (do not hold proper locks, do not have transaction credits) to do
a modification of the global quota file. So do not use commit_dqblk() and
instead call our writing function directly.
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
We were missing reservation of a journal credit for modification of quota
file inode when creating new dquot structure in the global quota file.
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
OCFS2 had three issues with quota locking:
a) When reading dquot from global quota file, we started a transaction while
holding dqio_mutex which is prone to deadlocks because other paths do it
the other way around
b) During ocfs2_sync_dquot we were not protected against concurrent writers
on the same node. Because we first copy data to local buffer, a race
could happen resulting in old data being written to global quota file and
thus causing quota inconsistency after a crash.
c) ip_alloc_sem of quota files was acquired while a transaction is started
in ocfs2_quota_write which can deadlock because we first get ip_alloc_sem
and then start a transaction when extending quota files.
We fix the problem a) by pulling all necessary code to ocfs2_acquire_dquot
and ocfs2_release_dquot. Thus we no longer depend on generic dquot_acquire
to do the locking and can force proper lock ordering.
Problems b) and c) are fixed by locking i_mutex and ip_alloc_sem of
global quota file in ocfs2_lock_global_qf and removing ip_alloc_sem from
ocfs2_quota_read and ocfs2_quota_write.
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
The position of global quota file info does not change. So we do not have
to do logical -> physical block translation every time we reread it from
disk. Thus we can also avoid taking ip_alloc_sem.
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
There is no need to map offset of local dquot structure to on disk block
in each quota write. It is enough to map it just once and store the physical
block number in quota structure in memory. Moreover this simplifies locking
as we do not have to take ip_alloc_sem from quota write path.
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently, __dquot_transfer() acquires its own references of dquot structures
that will be put into inode. But for OCFS2, this creates a lock inversion
between dq_lock (waited on in dqget) and transaction start (started in
ocfs2_setattr). Currently, deadlock is impossible because dq_lock is acquired
only during dquot_acquire and dquot_release and we already hold a reference to
dquot structures in ocfs2_setattr so neither of these functions can be called
while we call dquot_transfer. But this is rather subtle and it is hard to teach
lockdep about it. So provide __dquot_transfer function that can be passed dquot
references directly. OCFS2 can then pass acquired dquot references directly to
__dquot_transfer with proper locking.
Signed-off-by: Jan Kara <jack@suse.cz>
Quota must being initialized if size or uid/git changes requested.
But initialization performed in two different places:
in case of i_size file system is responsible for dquot init
, but in case of uid/gid init will be called internally in
dquot_transfer().
This ambiguity makes code harder to understand.
Let's move this logic to one common helper function.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
The methods already do these checks, so remove them in the quotactl
implementation to allow non-VFS quota implementations to also support
these calls.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Pass the larger struct fs_disk_quota to the ->set_dqblk operation so
that the Q_SETQUOTA and Q_XSETQUOTA operations can be implemented
with a single filesystem operation and we can retire the ->set_xquota
operation. The additional information (RT-subvolume accounting and
warn counts) are left zero for the VFS quota implementation.
Add new fieldmask values for setting the numer of blocks and inodes
values which is required for the VFS quota, but wasn't for XFS.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Pass the larger struct fs_disk_quota to the ->get_dqblk operation so
that the Q_GETQUOTA and Q_XGETQUOTA operations can be implemented
with a single filesystem operation and we can retire the ->get_xquota
operation. The additional information (RT-subvolume accounting and
warn counts) are left zero for the VFS quota implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
ext4 was updated to accept barrier/nobarrier mount options
in addition to the older barrier=0/1. The barrier story
is complex enough, we should help people by making the options
the same at least, even if the defaults are different.
This patch allows the barrier/nobarrier mount options for ext3,
while keeping nobarrier the default.
It also unconditionally displays barrier status in show_options,
and prints a message at mount time if barriers are not enabled,
just as ext4 does.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>