ea3d7209ca
Currently, page faults and hole punching are completely unsynchronized. This can result in page fault faulting in a page into a range that we are punching after truncate_pagecache_range() has been called and thus we can end up with a page mapped to disk blocks that will be shortly freed. Filesystem corruption will shortly follow. Note that the same race is avoided for truncate by checking page fault offset against i_size but there isn't similar mechanism available for punching holes. Fix the problem by creating new rw semaphore i_mmap_sem in inode and grab it for writing over truncate, hole punching, and other functions removing blocks from extent tree and for read over page faults. We cannot easily use i_data_sem for this since that ranks below transaction start and we need something ranking above it so that it can be held over the whole truncate / hole punching operation. Also remove various workarounds we had in the code to reduce race window when page fault could have created pages with stale mapping information. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
45 lines
1.3 KiB
C
45 lines
1.3 KiB
C
/*
|
|
* linux/fs/ext4/truncate.h
|
|
*
|
|
* Common inline functions needed for truncate support
|
|
*/
|
|
|
|
/*
|
|
* Truncate blocks that were not used by write. We have to truncate the
|
|
* pagecache as well so that corresponding buffers get properly unmapped.
|
|
*/
|
|
static inline void ext4_truncate_failed_write(struct inode *inode)
|
|
{
|
|
down_write(&EXT4_I(inode)->i_mmap_sem);
|
|
truncate_inode_pages(inode->i_mapping, inode->i_size);
|
|
ext4_truncate(inode);
|
|
up_write(&EXT4_I(inode)->i_mmap_sem);
|
|
}
|
|
|
|
/*
|
|
* Work out how many blocks we need to proceed with the next chunk of a
|
|
* truncate transaction.
|
|
*/
|
|
static inline unsigned long ext4_blocks_for_truncate(struct inode *inode)
|
|
{
|
|
ext4_lblk_t needed;
|
|
|
|
needed = inode->i_blocks >> (inode->i_sb->s_blocksize_bits - 9);
|
|
|
|
/* Give ourselves just enough room to cope with inodes in which
|
|
* i_blocks is corrupt: we've seen disk corruptions in the past
|
|
* which resulted in random data in an inode which looked enough
|
|
* like a regular file for ext4 to try to delete it. Things
|
|
* will go a bit crazy if that happens, but at least we should
|
|
* try not to panic the whole kernel. */
|
|
if (needed < 2)
|
|
needed = 2;
|
|
|
|
/* But we need to bound the transaction so we don't overflow the
|
|
* journal. */
|
|
if (needed > EXT4_MAX_TRANS_DATA)
|
|
needed = EXT4_MAX_TRANS_DATA;
|
|
|
|
return EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + needed;
|
|
}
|
|
|