linux-hardened

Author	SHA1	Message	Date
Johannes Thumshirn	4bb3c2e2b5	btrfs: use btrfs_crc32c{,_final}() in for free space cache The CRC checksum in the free space cache is not dependant on the super block's csum_type field but always a CRC32C. So use btrfs_crc32c() and btrfs_crc32c_final() instead of btrfs_csum_data() and btrfs_csum_final() for computing these checksums. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:35:00 +02:00
Johannes Thumshirn	65019df8c3	btrfs: resurrect btrfs_crc32c() Commit `9678c54388` ("btrfs: Remove custom crc32c init code") removed the btrfs_crc32c() function, because it was a duplicate of the crc32c() library function we already have in the kernel. Resurrect it as a shim wrapper over crc32c() to make following transformations of the checksumming code in btrfs easier. Also provide a btrfs_crc32_final() to ease following transformations. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:35:00 +02:00
Johannes Thumshirn	5852c8b961	btrfs: use btrfs_csum_data() instead of directly calling crc32c btrfsic_test_for_metadata() directly calls the crc32c() library function for calculating the CRC32C checksum, but then uses btrfs_csum_final() to invert the result. To ease further refactoring and development around checksumming in BTRFS convert to calling btrfs_csum_data(), which is a wrapper around crc32c(). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:35:00 +02:00
Qu Wenruo	a94d1d0cb3	btrfs: Flush before reflinking any extent to prevent NOCOW write falling back to COW without data reservation [BUG] The following script can cause unexpected fsync failure: #!/bin/bash dev=/dev/test/test mnt=/mnt/btrfs mkfs.btrfs -f $dev -b 512M > /dev/null mount $dev $mnt -o nospace_cache # Prealloc one extent xfs_io -f -c "falloc 8k 64m" $mnt/file1 # Fill the remaining data space xfs_io -f -c "pwrite 0 -b 4k 512M" $mnt/padding sync # Write into the prealloc extent xfs_io -c "pwrite 1m 16m" $mnt/file1 # Reflink then fsync, fsync would fail due to ENOSPC xfs_io -c "reflink $mnt/file1 8k 0 4k" -c "fsync" $mnt/file1 umount $dev The fsync fails with ENOSPC, and the last page of the buffered write is lost. [CAUSE] This is caused by: - Btrfs' back reference only has extent level granularity So write into shared extent must be COWed even only part of the extent is shared. So for above script we have: - fallocate Create a preallocated extent where we can do NOCOW write. - fill all the remaining data and unallocated space - buffered write into preallocated space As we have not enough space available for data and the extent is not shared (yet) we fall into NOCOW mode. - reflink Now part of the large preallocated extent is shared, later write into that extent must be COWed. - fsync triggers writeback But now the extent is shared and therefore we must fallback into COW mode, which fails with ENOSPC since there's not enough space to allocate data extents. [WORKAROUND] The workaround is to ensure any buffered write in the related extents (not just the reflink source range) get flushed before reflink/dedupe, so that NOCOW writes succeed that happened before reflinking succeed. The workaround is expensive, we could do it better by only flushing NOCOW range, but that needs extra accounting for NOCOW range. For now, fix the possible data loss first. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:35:00 +02:00
Nikolay Borisov	5f791ec31f	btrfs: Return EAGAIN if we can't start no snpashot write in check_can_nocow The first thing code does in check_can_nocow is trying to block concurrent snapshots. If this fails (due to snpashot already being in progress) the function returns ENOSPC which makes no sense. Instead return EAGAIN. Despite this return value not being propagated to callers it's good practice to return the closest in terms of semantics error code. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:59 +02:00
Nikolay Borisov	0b6f5d408b	btrfs: Add comments on locking of several device-related fields Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:59 +02:00
Nikolay Borisov	bd80d94efb	btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range In case no cached_state argument is passed to btrfs_lock_and_flush_ordered_range use one locally in the function. This optimises the case when an ordered extent is found since the unlock function will be able to unlock that state directly without searching for it again. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:59 +02:00
Nikolay Borisov	23d31bd476	btrfs: Use newly introduced btrfs_lock_and_flush_ordered_range There several functions which open code btrfs_lock_and_flush_ordered_range, just replace them with a call to the function. No functional changes. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:59 +02:00
Nikolay Borisov	ffa87214c1	btrfs: add new helper btrfs_lock_and_flush_ordered_range There is a certain idiom used in multiple places in btrfs' codebase, dealing with flushing an ordered range. Factor this in a separate function that can be reused. Future patches will replace the existing code with that function. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:59 +02:00
Qu Wenruo	1200b51f57	btrfs: remove the incorrect comment on RO fs when btrfs_run_delalloc_range() fails At the context of btrfs_run_delalloc_range(), we haven't started/joined a transaction, thus even something went wrong, we can't and won't abort transaction, thus no way to make the fs RO. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:59 +02:00
Qu Wenruo	480b9b4d84	btrfs: extent-tree: Add trace events for space info numbers update Add trace event for update_bytes_pinned() and update_bytes_may_use() to detect underflow better. The output would be something like (only showing data part): ## Buffered write start, 16K total ## 2255.954 xfs_io/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=4096 2257.169 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=4096 diff=4096 2257.346 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=8192 diff=4096 2257.542 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=12288 diff=4096 ## Delalloc start ## 3727.853 kworker/u8:3-e/700 btrfs:update_bytes_may_use:(nil)U: type=DATA old=16384 diff=-16384 ## Space cache update ## 3733.132 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=65536 3733.169 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=65536 diff=-65536 3739.868 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=65536 3739.891 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=65536 diff=-65536 These two trace events will allow bcc tool to probe btrfs_space_info changes and detect underflow with more details (e.g. backtrace for each update). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:58 +02:00
Qu Wenruo	0185f364cb	btrfs: extent-tree: Add lockdep assert when updating space info Just add a safe net for btrfs_space_info member updating. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:58 +02:00
David Sterba	cff8267228	btrfs: read number of data stripes from map only once There are several places that call nr_data_stripes, but this value does not change. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:58 +02:00
David Sterba	72ad813157	btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:58 +02:00
David Sterba	158da513b1	btrfs: refactor helper for bg flags to name conversion The helper lacks the btrfs_ prefix and the parameter is the raw blockgroup type, so none of the callers has to do the flags -> index conversion. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:58 +02:00
David Sterba	e3ecdb3fde	btrfs: factor out devs_max setting in __btrfs_alloc_chunk Merge the repeated code before the if-else block. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:57 +02:00
David Sterba	8c3e3582a4	btrfs: use u8 for raid_array members The raid_attr table is now 7 * 56 = 392 bytes long, consisting of just small numbers so we don't have to use ints. New size is 7 * 32 = 224, saving 3 cachelines. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:57 +02:00
David Sterba	946c9256c6	btrfs: factor out helper for counting data stripes Factor the sequence of ifs to a helper, the 'data stripes' here means the number of stripes without redundancy and parity. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:57 +02:00
David Sterba	44b28adafd	btrfs: use raid_attr table for btrfs_bg_type_to_factor The factor is the number of copies. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:57 +02:00
David Sterba	6079e12cdb	btrfs: use raid_attr table to find profiles for integrity lowering Replace open coded list of the profiles by selecting them from the raid_attr table. The criteria are now more explicit, we need profiles that have more than 1 copy of the data or can reconstruct the data with a missing device. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:57 +02:00
David Sterba	081db89b13	btrfs: use raid_attr to get allowed profiles for balance conversion Iterate over the table and gather all allowed profiles for a given number of devices, instead of open coding. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:56 +02:00
David Sterba	fc9a2ac77c	btrfs: use raid_attr in btrfs_chunk_max_errors The number of tolerated failures is stored in the raid_attr table, use it. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:56 +02:00
David Sterba	9fa02ac75b	btrfs: use raid_attr table in get_profile_num_devs The dev_max constraints are defined in the raid_attr table, use it instead of open-coding it. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:56 +02:00
David Sterba	c8bf1b6703	btrfs: remove mapping tree structures indirection fs_info::mapping_tree is the physical<->logical mapping tree and uses the same underlying structure as extents, but is embedded to another structure. There are no other members and this indirection is useless. No functional change. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:56 +02:00
David Sterba	49cc180ca9	btrfs: raid56: allow the exact minimum number of devices for balance convert The minimum number of devices for RAID5 is 2, though this is only a bit expensive RAID1, and for RAID6 it's 3, which is a triple copy that works only 3 devices. mkfs.btrfs allows that and mounting such filesystem also works, so the conversion via balance filters is inconsistent with the others and we should not prevent it. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:56 +02:00
David Sterba	0ee5f8ae08	btrfs: fix minimum number of chunk errors for DUP The list of profiles in btrfs_chunk_max_errors lists DUP as a profile DUP able to tolerate 1 device missing. Though this profile is special with 2 copies, it still needs the device, unlike the others. Looking at the history of changes, thre's no clear reason why DUP is there, functions were refactored and blocks of code merged to one helper. `d20983b40e` Btrfs: fix writing data into the seed filesystem - factor code to a helper `de11cc12df` Btrfs: don't pre-allocate btrfs bio - unrelated change, DUP still in the list with max errors 1 `a236aed14c` Btrfs: Deal with failed writes in mirrored configurations - introduced the max errors, leaves DUP and RAID1 in the same group Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:55 +02:00
Liu Bo	be9b8dfa9c	Btrfs: remove unused variables in __btrfs_unlink_inode This code was first introduced in `5f39d397df` ("Btrfs: Create extent_buffer interface for large blocksizes") and the function was named btrfs_unlink_trans. It later got renamed to __btrfs_unlink_inode and finally commit `16cdcec736` ("btrfs: implement delayed inode items operation") changed the way inodes are deleted and obviated the need for those two members. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> [ replace changelog by Nikolay's version ] Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:55 +02:00
Goldwyn Rodrigues	cebf05ca65	btrfs: Remove unused variable mode in btrfs_mount This is a leftover from `312c89fbca` ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()"), the mode was used for opening devices that's not done here anymore. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:55 +02:00
Su Yue	8f63a84051	btrfs: switch order of unlocks of space_info and bg in do_trimming() In function do_trimming(), block_group->lock should be unlocked first, as the locks should be released in the reverse order. This does not cause problems but should follow the best practices. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:55 +02:00
Qu Wenruo	4c094c33c9	btrfs: tree-checker: Check if the file extent end overflows Under certain conditions, we could have strange file extent item in log tree like: item 18 key (69599 108 397312) itemoff 15208 itemsize 53 extent data disk bytenr 0 nr 0 extent data offset 0 nr 18446744073709547520 ram 18446744073709547520 The num_bytes + ram_bytes overflow 64 bit type. For num_bytes part, we can detect such overflow along with file offset (key->offset), as file_offset + num_bytes should never go beyond u64. For ram_bytes part, it's about the decompressed size of the extent, not directly related to the size. In theory it is OK to have a large value, and put extra limitation on RAM bytes may cause unexpected false alerts. So in tree-checker, we only check if the file offset and num bytes overflow. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:55 +02:00
Nikolay Borisov	2ed95d2d59	btrfs: Remove redundant assignment of tgt_device->commit_total_bytes This is already done in btrfs_init_dev_replace_tgtdev which is the first phase of device replace, called before doing scrub. During that time exclusive lock is held. Additionally btrfs_fs_device::commit_total_bytes is always set based on the size of the underlying block device which shouldn't change once set. This makes the 2nd assignment of the variable in the finishing phase redundant. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:55 +02:00
Nikolay Borisov	f232ab04f6	btrfs: Explicitly reserve space for devreplace item Part of device replace involves writing an item to the device root containing information about pending replace operations. Currently space for this item is not being explicitly reserved so this works thanks to presence of global reserve. While not fatal it's not a good practice. Let's be explicit about space requirement of device replace and reserve space when starting the transaction. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:54 +02:00
Nikolay Borisov	fa19452a40	btrfs: Streamline replace sem unlock in btrfs_dev_replace_start There are only 2 branches which goto leave label with need_unlock set to true. Essentially need_unlock is used as a substitute for directly calling up_write. Since the branches needing this are only 2 and their context is not that big it's more clear to just call up_write where required. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:54 +02:00
Nikolay Borisov	e1e0eb43ce	btrfs: Ensure btrfs_init_dev_replace_tgtdev sees up to date values btrfs_init_dev_replace_tgtdev reads certain values from the source device (such as commit_total_bytes) which are updated during transaction commit. Currently this function is called before committing any pending transaction, leading to possibly reading outdated values. Fix this by moving the function below the transaction commit, at this point the EXCL_OP bit it set hence once transaction is complete the total size of the device cannot be changed (it's usually changed by resize/remove ops which are blocked). Fixes: `9e271ae27e` ("Btrfs: kernel operation should come after user input has been verified") Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:54 +02:00
Nikolay Borisov	419684b2c2	btrfs: dev-replace: Remove impossible WARN_ON This WARN_ON can never trigger because src_device cannot be null. btrfs_find_device_by_devspec always returns either an error or a valid pointer to the device. Just remove it. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:54 +02:00
Nikolay Borisov	b0d9e1ea17	btrfs: Reduce critical section in btrfs_init_dev_replace_tgtdev There is no point in holding btrfs_fs_devices::device_list_mutex while initialising fields of the not-yet-published device. Instead, hold the mutex only when the newly initialised device is being published. I think holding device_list_mutex here is redundant altogether, because at this point BTRFS_FS_EXCL_OP is set which prevents device removal/addition/balance/resize to occur. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:54 +02:00
Nikolay Borisov	ddb9378469	btrfs: Don't opencode sync_blockdev in btrfs_init_dev_replace_tgtdev Using sync_blockdev makes it plain obvious what's happening. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:53 +02:00
David Sterba	5911c8fe05	btrfs: fiemap: preallocate ulists for btrfs_check_shared btrfs_check_shared looks up parents of a given extent and uses ulists for that. These are allocated and freed repeatedly. Preallocation in the caller will avoid the overhead and also allow us to use the GFP_KERNEL as it is happens before the extent locks are taken. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:53 +02:00
David Sterba	9b4e675a99	btrfs: detect fast implementation of crc32c on all architectures Currently, there's only check for fast crc32c implementation on X86, based on the CPU flags. This is used to decide if checksumming should be offloaded to worker threads or can be calculated by the caller. As there are more architectures that implement a faster version of crc32c (ARM, SPARC, s390, MIPS, PowerPC), also there are specialized hw cards. The detection is based on driver name, all generic C implementations contain 'generic', while the specialized versions do not. Alternatively the priority could be used, but this is not currently provided by the crypto API. The flag is set per-filesystem at mount time and used for the offloading decisions. Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:53 +02:00
Qu Wenruo	78192442d3	btrfs: extent-tree: Refactor add_pinned_bytes() to add\|sub_pinned_bytes() Instead of using @sign to determine whether we're adding or subtracting. Even it only has 3 callers, it's still (and in fact already caused problem in the past) confusing to use. Refactor add_pinned_bytes() to add_pinned_bytes() and sub_pinned_bytes() to explicitly show what we're doing. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:53 +02:00
Linus Torvalds	01305db842	XArray updates for 5.2-rc6 Account XArray nodes for the page cache to the appropriate cgroup (Johannes Weiner) Fix idr_get_next() when called under the RCU lock (Matthew Wilcox) Add a test for xa_insert() (Matthew Wilcox) -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAl0WuKsACgkQDpNsjXcp gj73zgf9Eb477PuwYZpFBA9ZxI5v/6WyqbaWXKdqEhotARgIUuv1CfVnkt1IJE6P Z3QCRABZ3pIKHgIErJN53B7AdvdONUO4Xf9VFBqmxeWE7F9L3sROOpXc8IrR26kV hITQn8mwgacNQ8mLtQmcSFaCVC2E7yVNBhVd5zmcA6jNIAFsOJcP06KLJTe94OXe AB9TJvswxpzAEX8emHQ/a1SFBNZWJ7b53hBcu8CJn8CuGDxmo1/+qqoRyNY+WrDO OohFk2u1j6Esfc6j0k+Akt8mEFyfU2oxFfv5MjL0KYEyrHoU84eZljFGgf7rQqGj fqH9RO8J8eoj4D/3XaLL5QYRLIxRaw== =AXZy -----END PGP SIGNATURE----- Merge tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax Pull XArray fixes from Matthew Wilcox: - Account XArray nodes for the page cache to the appropriate cgroup (Johannes Weiner) - Fix idr_get_next() when called under the RCU lock (Matthew Wilcox) - Add a test for xa_insert() (Matthew Wilcox) * tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax: XArray tests: Add check_insert idr: Fix idr_get_next race with idr_remove mm: fix page cache convergence regression	2019-06-29 17:14:57 +08:00
Linus Torvalds	0839c53762	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL mm, swap: fix THP swap out fork,memcg: alloc_thread_stack_node needs to set tsk->stack MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info mm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warning mm/page_idle.c: fix oops because end_pfn is larger than max_pfn initramfs: fix populate_initrd_image() section mismatch mm/oom_kill.c: fix uninitialized oc->constraint mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails signal: remove the wrong signal_pending() check in restore_user_sigmask() fs/binfmt_flat.c: make load_flat_shared_library() work mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask fs/proc/array.c: allow reporting eip/esp for all coredumping threads mm/dev_pfn: exclude MEMORY_DEVICE_PRIVATE while computing virtual address	2019-06-29 17:11:01 +08:00
Linus Torvalds	c949c30b26	Two more NFS client fixes for Linux 5.2 Stable bugfixes: - SUNRPC: Fix up calculation of client message length # 5.1+ - NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O # 4.8+ -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl0Wf3EACgkQ18tUv7Cl QOs2ORAA5/CXFa471jUldOsHejxfFoddFBkuqf8qZ1AF3TZdFuITAsq+xydxfO5U hYzUUlOTKedEi+ISYLFs1tjU/nYRQJv7fFZxVwq6uDZ53Z/doiMLAIR67Eq7EcTY KBWA9zdldnBzb0S87+hkbmaNPR5pjqxBzLEfMmOQEAAh5pSGf5YSeUNTXLGj4wBd iXf25o1VSjUmNpSHaA3KsrqTJ4mJ7+i/17Iny1c4xRgZbJtoTm44DpceHCheJpbl DymRSgjSr0vFjJbufcKkbF2OPp1ZsnkDiKyJmZzgPOa3+TMGzisU5yiASoac6D+j gs426yEz9rvR/TMZtFS05nfu2clKuS8foLGwZelJ7XjQSXJgObCb4xf97jLIOWNb J+BWwsTmUIQS+fMUQDA+rlbyepJ+skVZpbjmUy+/Uy52oqtnYK6uTD469NdmxBwr 7z2pnCUjJFTqo6BHeCQgR5XlSt1MGDByamcVAWONS+9zJttRhfUOjq0PIOLSrsBK 5zRzJxtBoYLwP5py3zKAeV9RcvDNSgh5U6P0hhFRtHfqMUmtGeA58nNND2S6Qm3/ vAB7WZL0aVSvc3zpz7qdctitMESQNspCkMooAp/EoIime3YkqKCS+AgED9jKLhJR /5eqtr6tehh6A4dshzSlDF7cFrKyUd+ulS0IN8vt1V2TYgOQmbY= =FATk -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull two more NFS client fixes from Anna Schumaker: "These are both stable fixes. One to calculate the correct client message length in the case of partial transmissions. And the other to set the proper TCP timeout for flexfiles" * tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O SUNRPC: Fix up calculation of client message length	2019-06-29 17:02:22 +08:00
Linus Torvalds	43251dbd6a	A small fix for a potential -rc1 regression from Jeff. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl0WLe4THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzizPrB/4tNUS8J9mW9Zd3xLAzZmwjq+WAfCV8 wp3IjBHCgvn9SmTYOJtozjTLJVlmeGNVyrCaWbtzQ2YLKvyBTCUF4kg9EG7FMX9a ixzlHb2+Wu46LYWiA7jhUnoKNMMl1swm01BOvfmGprSwV70BAEF0i2/D7WHikolX rgcwGb58vUMmXQ1VGfIO9Pox2a8jaZNj82BZnDniMDxetZ5sRsZXGy43s14zC6Lt YnwDT70Y7+Pr9SwHMA5bnZ8kCtQpr0qAHmDVhEd965Io1XZ+2/EHF5IwqK0xGg+e KUQdRyhMWjIGG34SWMt5tbT+9Lzeju4CAka9NPSJ1tRtFnk1AvpILbnB =TpFR -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-client Pull ceph fix from Ilya Dryomov: "A small fix for a potential -rc1 regression from Jeff" * tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-client: ceph: fix ceph_mdsc_build_path to not stop on first component	2019-06-29 17:01:02 +08:00
Linus Torvalds	9dda12b6fa	for-linus-20190628 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0WM9YQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmkyEADVPjXlIZETBpAl/oK/StNc1NMdfgBiWaX7 kQHbFu3V4soDpvR8iQvMVyFc7dUpwo9lmgxIOcZSfdCf/ciJ/G4trhH4UljXfRsj 2vdKV3rZXragrclN0zGtW90sBBYxSilaezzRQbnnXjEgGaHFkeJJR3xW00UMoGrm GDO2gSQdhDKqhJtKjiCASkyN9uWMkcLFdsGErPgA6e4S3NTbaLKaY/xFUCcMF7aX N1aYkIfdyl38QUU/N+5WLgiJYHkiZNqcrJ+a5aECioqqiNh9ST+UR1jCgo7tlt4h b3Gb5mxP0CPUuTh3VQD8GHCaPzDsxUIxThJkz5aih3M9NEQmm5Du0GDChaDuMoUR zyFT/Yl4JfeO93mlpxGUyC5WyFCQdj0QOBuyxInCchvJC5kbpRflMuKt+xRYlSqg 331njdykyKkgutagLzzTME38RPUbttZVmbc6K422PXKkYW+FOlS352FZpl5qxDOu 5+ihOXOLvO09VXu6kcC5UH4Yi6nuGYDS95oIZhJ0OODx10xnKSE4ZozlPXAEreAR NVJN7vbHVqLnphuplRK9Kh0VngdIhLkeTsUxaTnX6UQSioHPDJPqPP5nfSu9Xkyo e+2UAXkfVjnw45jAu8Mrsu0KhabCB5Pde8Jk+kmqPcuWXQEN5OHqeA09vtvKj81J lIagz1NZxw== =WzXj -----END PGP SIGNATURE----- Merge tag 'for-linus-20190628' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Just two small fixes. One from Paolo, fixing a silly mistake in BFQ. The other one is from me, ensuring that we have ->file cleared in the io_uring request a bit earlier. That avoids a use-before-free, if we encounter an error before ->file is assigned" * tag 'for-linus-20190628' of git://git.kernel.dk/linux-block: block, bfq: fix operator in BFQQ_TOTALLY_SEEKY io_uring: ensure req->file is cleared on allocation	2019-06-29 16:58:35 +08:00
Oleg Nesterov	97abc889ee	signal: remove the wrong signal_pending() check in restore_user_sigmask() This is the minimal fix for stable, I'll send cleanups later. Commit `854a6ed568` ("signal: Add restore_user_sigmask()") introduced the visible change which breaks user-space: a signal temporary unblocked by set_user_sigmask() can be delivered even if the caller returns success or timeout. Change restore_user_sigmask() to accept the additional "interrupted" argument which should be used instead of signal_pending() check, and update the callers. Eric said: : For clarity. I don't think this is required by posix, or fundamentally to : remove the races in select. It is what linux has always done and we have : applications who care so I agree this fix is needed. : : Further in any case where the semantic change that this patch rolls back : (aka where allowing a signal to be delivered and the select like call to : complete) would be advantage we can do as well if not better by using : signalfd. : : Michael is there any chance we can get this guarantee of the linux : implementation of pselect and friends clearly documented. The guarantee : that if the system call completes successfully we are guaranteed that no : signal that is unblocked by using sigmask will be delivered? Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com Fixes: `854a6ed568` ("signal: Add restore_user_sigmask()") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Eric Wong <e@80x24.org> Tested-by: Eric Wong <e@80x24.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jason Baron <jbaron@akamai.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Laight <David.Laight@ACULAB.COM> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-06-29 16:43:45 +08:00
Jann Horn	867bfa4a5f	fs/binfmt_flat.c: make load_flat_shared_library() work load_flat_shared_library() is broken: It only calls load_flat_file() if prepare_binprm() returns zero, but prepare_binprm() returns the number of bytes read - so this only happens if the file is empty. Instead, call into load_flat_file() if the number of bytes read is non-negative. (Even if the number of bytes is zero - in that case, load_flat_file() will see nullbytes and return a nice -ENOEXEC.) In addition, remove the code related to bprm creds and stop using prepare_binprm() - this code is loading a library, not a main executable, and it only actually uses the members "buf", "file" and "filename" of the linux_binprm struct. Instead, call kernel_read() directly. Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com Fixes: `287980e49f` ("remove lots of IS_ERR_VALUE abuses") Signed-off-by: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-06-29 16:43:45 +08:00
John Ogness	cb8f381f16	fs/proc/array.c: allow reporting eip/esp for all coredumping threads `0a1eb2d474` ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp and `fd7d56270b` ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") reintroduced the feature to fix a regression with userspace core dump handlers (such as minicoredumper). Because PF_DUMPCORE is only set for the primary thread, this didn't fix the original problem for secondary threads. Allow reporting the eip/esp for all threads by checking for PF_EXITING as well. This is set for all the other threads when they are killed. coredump_wait() waits for all the tasks to become inactive before proceeding to invoke a core dumper. Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de Fixes: `fd7d56270b` ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reported-by: Jan Luebbe <jlu@pengutronix.de> Tested-by: Jan Luebbe <jlu@pengutronix.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-06-29 16:43:44 +08:00
Trond Myklebust	68f461593f	NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O Fix a typo where we're confusing the default TCP retrans value (NFS_DEF_TCP_RETRANS) for the default TCP timeout value. Fixes: `15d03055cf` ("pNFS/flexfiles: Set reasonable default ...") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-06-28 11:48:52 -04:00
Linus Torvalds	7a702b4e82	for-linus-20190627 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAl0UnRoACgkQjrBW1T7s sS1T0w/+PFooDZNaKJkhJCGm0XyRDYmmuivEX9ydUR1x9/doRbDZTqfjsQBJLoVK PulxDiuFbQWXzhBJFEMuU6YBR2fjFqUGsXz5qAXPB0zaahWcSY/0Y8VCU/PKq7A6 3oJPl/lYwYkLTYUKsnN08hByosUA7WeRQRAxbSFWdCTlUfIw72mDhprMGjJIVAlu snLA5lUoy7hyoFdXR5qNhYAcX8sASmi01hXhdnsKMOv4z2Vb5NoQsgqL1W8tAnsf BdJKL82Qd7vWQahlbOtur46aeJAL2ukGSTskuA2jOQqsKxmpos+hWq36gToq7usa XgPii0Rz7/2s6ZvhmxV5kmzqHylT9giU1DxWybSVo9IZBsU2i1o9DV+yBY50tr45 s0bmpSA/u4DP2uT8oRvh47LbDqiQFA8dyVWQKE25smSdjekuZHTO0tgXf8mwC2CW hDci4z+ONOyqIQyFrhP7UaKuSK6tAAUbYKtXUIN6rnuq1FjuTA2+wtIlOPZuDZQ2 yrsSUefh4/sFMBSAgoGTg9f+PiCejBMKcxoqhU2/27mvkiInAyDPfoc4oGQcinOy OVX3B0A8B88l26sDkWdv15d92E1GKzZLj8h66TlYwDpN+seevftKtpblZ9fJWsSf 0NejoMV/GcA/KsAp1sxqWwouRob8H6pbGXWb97DYRA2IVyjK3q4= =APjA -----END PGP SIGNATURE----- Merge tag 'for-linus-20190627' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux Pull pidfd fixes from Christian Brauner: "Userspace tools and libraries such as strace or glibc need a cheap and reliable way to tell whether CLONE_PIDFD is supported. The easiest way is to pass an invalid fd value in the return argument, perform the syscall and verify the value in the return argument has been changed to a valid fd. However, if CLONE_PIDFD is specified we currently check if pidfd == 0 and return EINVAL if not. The check for pidfd == 0 was originally added to enable us to abuse the return argument for passing additional flags along with CLONE_PIDFD in the future. However, extending legacy clone this way would be a terrible idea and with clone3 on the horizon and the ability to reuse CLONE_DETACHED with CLONE_PIDFD there's no real need for this clutch. So remove the pidfd == 0 check and help userspace out. Also, accordig to Al, anon_inode_getfd() should only be used past the point of no failure and ksys_close() should not be used at all since it is far too easy to get wrong. Al's motto being "basically, once it's in descriptor table, it's out of your control". So Al's patch switches back to what we already had in v1 of the original patchset and uses a anon_inode_getfile() + put_user() + fd_install() sequence in the success path and a fput() + put_unused_fd() in the failure path. The other two changes should be trivial" * tag 'for-linus-20190627' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: proc: remove useless d_is_dir() check copy_process(): don't use ksys_close() on cleanups samples: make pidfd-metadata fail gracefully on older kernels fork: don't check parent_tidptr with CLONE_PIDFD	2019-06-28 08:41:18 +08:00

1 2 3 4 5 ...

59005 commits