linux-hardened/drivers
NeilBrown 18a25da843 dm: ensure bio submission follows a depth-first tree walk
A dm device can, in general, represent a tree of targets, each of which
handles a sub-range of the range of blocks handled by the parent.

The bio sequencing managed by generic_make_request() requires that bios
are generated and handled in a depth-first manner.  Each call to a
make_request_fn() may submit bios to a single member device, and may
submit bios for a reduced region of the same device as the
make_request_fn.

In particular, any bios submitted to member devices must be expected to
be processed in order, so a later one must never wait for an earlier
one.

This ordering is usually achieved by using bio_split() to reduce a bio
to a size that can be completely handled by one target, and resubmitting
the remainder to the originating device. bio_queue_split() shows the
canonical approach.

dm doesn't follow this approach, largely because it has needed to split
bios since long before bio_split() was available.  It currently can
submit bios to separate targets within the one dm_make_request() call.
Dependencies between these targets, as can happen with dm-snap, can
cause deadlocks if either bios gets stuck behind the other in the queues
managed by generic_make_request().  This requires the 'rescue'
functionality provided by dm_offload_{start,end}.

Some of this requirement can be removed by changing the order of bio
submission to follow the canonical approach.  That is, if dm finds that
it needs to split a bio, the remainder should be sent to
generic_make_request() rather than being handled immediately.  This
delays the handling until the first part is completely processed, so the
deadlock problems do not occur.

__split_and_process_bio() can be called both from dm_make_request() and
from dm_wq_work().  When called from dm_wq_work() the current approach
is perfectly satisfactory as each bio will be processed immediately.
When called from dm_make_request(), current->bio_list will be non-NULL,
and in this case it is best to create a separate "clone" bio for the
remainder.

When we use bio_clone_bioset() to split off the front part of a bio
and chain the two together and submit the remainder to
generic_make_request(), it is important that the newly allocated
bio is used as the head to be processed immediately, and the original
bio gets "bio_advance()"d and sent to generic_make_request() as the
remainder.  Otherwise, if the newly allocated bio is used as the
remainder, and if it then needs to be split again, then the next
bio_clone_bioset() call will be made while holding a reference a bio
(result of the first clone) from the same bioset.  This can potentially
exhaust the bioset mempool and result in a memory allocation deadlock.

Note that there is no race caused by reassigning cio.io->bio after already
calling __map_bio().  This bio will only be dereferenced again after
dec_pending() has found io->io_count to be zero, and this cannot happen
before the dec_pending() call at the end of __split_and_process_bio().

To provide the clone bio when splitting, we use q->bio_split.  This
was previously being freed by bio-based dm to avoid having excess
rescuer threads.  As bio_split bio sets no longer create rescuer
threads, there is little cost and much gain from restoring the
q->bio_split bio set.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13 12:15:57 -05:00
..
accessibility
acpi Merge branch 'acpi-ec' into acpi 2017-11-30 13:37:29 +01:00
amba A couple of dma-mapping updates: 2017-11-14 16:54:12 -08:00
android Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
ata Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2017-11-15 14:11:41 -08:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-29 13:10:25 -08:00
auxdisplay auxdisplay: img-ascii-lcd: Only build on archs that have IOMEM 2017-11-27 12:36:45 -08:00
base treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts 2017-11-21 16:35:54 -08:00
bcma Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-11-15 11:56:19 -08:00
block Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-12-01 08:05:45 -05:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-04 09:26:51 +09:00
bus ARM: SoC driver updates for v4.15 2017-11-16 16:05:01 -08:00
cdrom Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
char Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-26 14:11:54 -08:00
clk We have two changes to the core framework this time around. The first being a 2017-11-17 20:04:24 -08:00
clocksource - final batch of "non trivial" timer conversions (multi-tree dependencies, 2017-11-23 16:29:05 +01:00
connector
cpufreq cpufreq: mediatek: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE 2017-11-22 00:00:14 +01:00
cpuidle powerpc updates for 4.15 2017-11-16 12:47:46 -08:00
crypto powerpc updates for 4.15 2017-11-16 12:47:46 -08:00
dax device-dax: implement ->split() to catch invalid munmap attempts 2017-11-29 18:40:42 -08:00
dca
devfreq Merge branches 'pm-devfreq' and 'pm-tools' 2017-11-13 01:41:39 +01:00
dio
dma dmaengine updates for 4.15-rc1 2017-11-14 16:49:31 -08:00
dma-buf Tracing updates for 4.15: 2017-11-17 14:58:01 -08:00
edac Modules updates for v4.15 2017-11-15 13:46:33 -08:00
eisa
extcon USB/PHY patches for 4.15-rc1 2017-11-13 21:14:07 -08:00
firewire Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:56:58 -08:00
firmware drivers/firmware: psci: Convert timers to use timer_setup() 2017-11-21 15:46:44 -08:00
fmc
fpga Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
fsi
gpio This is the bulk of pin control changes for the v4.15 2017-11-16 10:57:11 -08:00
gpu Merge branch 'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux into drm-fixes 2017-12-01 09:15:57 +10:00
hid treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
hsi HSI changes for the v4.15 series 2017-11-15 13:35:43 -08:00
hv Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
hwmon hwmon: (jc42) optionally try to disable the SMBUS timeout 2017-11-30 13:12:44 -08:00
hwspinlock hwspinlock update for v4.15 2017-11-17 20:16:20 -08:00
hwtracing Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
i2c i2c: i2c-boardinfo: fix memory leaks on devinfo 2017-11-27 19:14:29 +01:00
ide Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide 2017-11-19 08:04:41 -10:00
idle Merge branch 'pm-cpuidle' 2017-11-13 01:34:14 +01:00
iio treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
infiniband IB/core: disable memory registration of filesystem-dax vmas 2017-11-29 18:40:42 -08:00
input treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts 2017-11-21 16:35:54 -08:00
iommu treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
ipack
irqchip Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-26 14:39:20 -08:00
isdn treewide: setup_timer() -> timer_setup() (2 field) 2017-11-21 15:57:09 -08:00
leds LED updates for 4.15rc1 2017-11-14 18:09:31 -08:00
lightnvm lightnvm: Convert timers to use timer_setup() 2017-11-21 15:46:44 -08:00
macintosh Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:56:58 -08:00
mailbox Change to POLL api and fixes for FlexRM and OMAP driver 2017-11-15 13:39:18 -08:00
mcb
md dm: ensure bio submission follows a depth-first tree walk 2017-12-13 12:15:57 -05:00
media v4l2: disable filesystem-dax mapping support 2017-11-29 18:40:42 -08:00
memory ARM: SoC driver updates for v4.15 2017-11-16 16:05:01 -08:00
memstick treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
message Modules updates for v4.15 2017-11-15 13:46:33 -08:00
mfd treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
misc Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2017-12-03 10:48:24 -05:00
mmc MMC core: 2017-12-01 08:14:22 -05:00
mtd Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
mux
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-29 13:10:25 -08:00
nfc treewide: setup_timer() -> timer_setup() (2 field) 2017-11-21 15:57:09 -08:00
ntb treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
nubus m68k updates for 4.15 2017-11-13 12:10:24 -08:00
nvdimm libnvdimm for 4.15 2017-11-17 09:51:57 -08:00
nvme nvme-pci: fix NULL pointer dereference in nvme_free_host_mem() 2017-11-28 08:49:26 -08:00
nvmem Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
of DeviceTree fixes for 4.15: 2017-11-20 21:38:41 -10:00
opp
oprofile
parisc
parport Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
pci Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-26 14:11:54 -08:00
pcmcia drivers/pcmcia/sa1111_badge4.c: avoid unused function warning 2017-11-17 16:10:04 -08:00
perf arm64 updates for 4.15 2017-11-15 10:56:56 -08:00
phy USB/PHY patches for 4.15-rc1 2017-11-13 21:14:07 -08:00
pinctrl This is the bulk of pin control changes for the v4.15 2017-11-16 10:57:11 -08:00
platform Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-25 08:37:16 -10:00
pnp
power power supply and reset changes for the v4.15 series 2017-11-15 13:37:15 -08:00
powercap
pps treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
ps3
ptp xen: features and fixes for v4.15-rc1 2017-11-16 13:06:27 -08:00
pwm pwm: Changes for v4.15-rc1 2017-11-22 21:09:18 -10:00
rapidio Merge branch 'akpm' (patches from Andrew) 2017-11-17 16:56:17 -08:00
ras Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:56:58 -08:00
regulator - New Drivers 2017-11-16 09:15:57 -08:00
remoteproc remoteproc updates for v4.15 2017-11-17 20:14:10 -08:00
reset ARM: SoC driver updates for v4.15 2017-11-16 16:05:01 -08:00
rpmsg rpmsg updates for v4.15 2017-11-17 20:12:08 -08:00
rtc Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-25 08:37:16 -10:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2017-11-30 08:13:36 -08:00
sbus Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc 2017-11-17 20:21:44 -08:00
scsi Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-25 08:37:16 -10:00
sfi
sh A couple of dma-mapping updates: 2017-11-14 16:54:12 -08:00
sn
soc ARM: SoC driver updates for v4.15 2017-11-16 16:05:01 -08:00
spi Merge remote-tracking branches 'spi/topic/sh-msiof', 'spi/topic/slave', 'spi/topic/spreadtrum' and 'spi/topic/tegra114' into spi-next 2017-11-10 21:33:51 +00:00
spmi
ssb
staging Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
target Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-25 08:37:16 -10:00
tc
tee
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2017-11-17 14:31:27 -08:00
thunderbolt Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
tty treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts 2017-11-21 16:35:54 -08:00
uio
usb treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
uwb treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
vfio VFIO Updates for Linux v4.15 2017-11-14 16:47:47 -08:00
vhost Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-11-17 12:08:18 -08:00
video fbdev changes for v4.15: 2017-11-20 21:50:24 -10:00
virt
virtio virtio_balloon: fix deadlock on OOM 2017-11-14 23:57:38 +02:00
vlynq
vme Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
w1 Char/Misc patches for 4.15-rc1 2017-11-16 09:10:59 -08:00
watchdog treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
xen treewide: Switch DEFINE_TIMER callbacks to struct timer_list * 2017-11-21 15:57:05 -08:00
zorro
Kconfig Merge branches 'pm-cpufreq-sched' and 'pm-opp' 2017-11-13 01:40:52 +01:00
Makefile Merge branches 'pm-cpufreq-sched' and 'pm-opp' 2017-11-13 01:40:52 +01:00