linux-hardened/fs/xfs
Fengguang Wu 1f7decf6d9 writeback: remove pages_skipped accounting in __block_write_full_page()
Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:

> The following strange behavior can be observed:
>
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
>
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.

It can be produced by the following test scheme:

# cat bin/test-writeback.sh
grep nr_dirty /proc/vmstat
echo 1 > /proc/sys/fs/inode_debug
dd if=/dev/zero of=/var/x bs=1K count=204800&
while true; do grep nr_dirty /proc/vmstat; sleep 1; done

# bin/test-writeback.sh
nr_dirty 19207
nr_dirty 19207
nr_dirty 30924
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s
nr_dirty 47150
nr_dirty 47141
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47205
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47215
nr_dirty 47216
nr_dirty 47216
nr_dirty 47216
nr_dirty 47154
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47134
nr_dirty 47134
nr_dirty 47135
nr_dirty 47135
nr_dirty 47135
nr_dirty 46097 <== -1038
nr_dirty 46098
nr_dirty 46098
nr_dirty 46098
[...]
nr_dirty 46091
nr_dirty 46092
nr_dirty 46092
nr_dirty 45069 <== -1023
nr_dirty 45056
nr_dirty 45056
nr_dirty 45056
[...]
nr_dirty 37822
nr_dirty 36799 <== -1023
[...]
nr_dirty 36781
nr_dirty 35758 <== -1023
[...]
nr_dirty 34708
nr_dirty 33672 <== -1024
[...]
nr_dirty 33692
nr_dirty 32669 <== -1023

% ls -li /var/x
847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x

% dmesg|grep 847824  # generated by a debug printk
[  529.263184] redirtied inode 847824 line 548
[  564.250872] redirtied inode 847824 line 548
[  594.272797] redirtied inode 847824 line 548
[  629.231330] redirtied inode 847824 line 548
[  659.224674] redirtied inode 847824 line 548
[  689.219890] redirtied inode 847824 line 548
[  724.226655] redirtied inode 847824 line 548
[  759.198568] redirtied inode 847824 line 548

# line 548 in fs/fs-writeback.c:
543                 if (wbc->pages_skipped != pages_skipped) {
544                         /*
545                          * writeback is not making progress due to locked
546                          * buffers.  Skip this inode for now.
547                          */
548                         redirty_tail(inode);
549                 }

More debug efforts show that __block_write_full_page()
never has the chance to call submit_bh() for that big dirty file:
the buffer head is *clean*. So basicly no page io is issued by
__block_write_full_page(), hence pages_skipped goes up.

Also the comment in generic_sync_sb_inodes():

544                         /*
545                          * writeback is not making progress due to locked
546                          * buffers.  Skip this inode for now.
547                          */

and the comment in __block_write_full_page():

1713                 /*
1714                  * The page was marked dirty, but the buffers were
1715                  * clean.  Someone wrote them back by hand with
1716                  * ll_rw_block/submit_bh.  A rare case.
1717                  */

do not quite agree with each other. The page writeback should be skipped for
'locked buffer', but here it is 'clean buffer'!

This patch fixes this bug. Though I'm not sure why __block_write_full_page()
is called only to do nothing and who actually issued the writeback for us.

This is the two possible new behaviors after the patch:

1) pretty nice: wait 30s and write ALL:)
2) not so good:
	- during the dd: ~16M
	- after 30s:      ~4M
	- after 5s:       ~4M
	- after 5s:     ~176M

The next patch will fix case (2).

Cc: David Chinner <dgc@sgi.com>
Cc: Ken Chen <kenchen@google.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:02 -07:00
..
linux-2.6 writeback: remove pages_skipped accounting in __block_write_full_page() 2007-10-17 08:43:02 -07:00
quota [XFS] fix nasty quota hashtable allocation bug 2007-09-05 14:51:04 +10:00
support [XFS] fix ASSERT and ASSERT_ALWAYS 2007-09-05 14:49:30 +10:00
Kbuild kbuild/xfs: introduce fs/xfs/Kbuild 2006-01-09 20:48:03 +01:00
Kconfig [PATCH] BLOCK: Make it possible to disable the block layer [try #6] 2006-09-30 20:52:31 +02:00
Makefile
Makefile-linux-2.6 [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs.h [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_acl.c [XFS] Remove unused header files for MAC and CAP checking functionality. 2007-02-10 18:37:28 +11:00
xfs_acl.h [XFS] Resolve a namespace collision on remaining vtypes for FreeBSD 2006-06-09 17:07:12 +10:00
xfs_ag.h [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_alloc.c [XFS] Clean up function name handling in tracing code 2007-07-14 15:41:24 +10:00
xfs_alloc.h [XFS] Lazy Superblock Counters 2007-07-14 15:28:50 +10:00
xfs_alloc_btree.c [XFS] Lazy Superblock Counters 2007-07-14 15:28:50 +10:00
xfs_alloc_btree.h [XFS] Remove unused arguments from the XFS_BTREE_*_ADDR macros. 2007-02-10 18:37:33 +11:00
xfs_arch.h [XFS] Merge in trivial changes, sync up headers with userspace 2006-01-12 10:29:53 +11:00
xfs_attr.c [XFS] The last argument "lsn" of xfs_trans_commit() is always called with 2007-05-08 13:48:42 +10:00
xfs_attr.h [XFS] Add EA list callbacks for xfs kernel use. Cleanup some namespace 2006-09-28 11:01:37 +10:00
xfs_attr_leaf.c [XFS] The last argument "lsn" of xfs_trans_commit() is always called with 2007-05-08 13:48:42 +10:00
xfs_attr_leaf.h [XFS] Add EA list callbacks for xfs kernel use. Cleanup some namespace 2006-09-28 11:01:37 +10:00
xfs_attr_sf.h [XFS] endianess annotations for xfs_attr_shortform_t 2006-03-17 17:29:25 +11:00
xfs_behavior.c [XFS] remove bhv_lookup, _range version works aswell and has more useful 2006-09-28 10:58:52 +10:00
xfs_behavior.h [XFS] remove bhv_lookup, _range version works aswell and has more useful 2006-09-28 10:58:52 +10:00
xfs_bit.c [XFS] Kill off xfs_count_bits 2007-07-14 15:36:43 +10:00
xfs_bit.h [XFS] Kill off xfs_count_bits 2007-07-14 15:36:43 +10:00
xfs_bmap.c [XFS] Clean up function name handling in tracing code 2007-07-14 15:41:24 +10:00
xfs_bmap.h [XFS] Clean up function name handling in tracing code 2007-07-14 15:41:24 +10:00
xfs_bmap_btree.c [XFS] Clean up function name handling in tracing code 2007-07-14 15:41:24 +10:00
xfs_bmap_btree.h [XFS] Remove a bunch of unused functions from XFS. 2007-02-10 18:37:40 +11:00
xfs_btree.c [XFS] endianess annotations for xfs_bmbt_key Trivial as there are no 2006-09-28 10:58:17 +10:00
xfs_btree.h [XFS] Simplify XFS min/max macros. 2007-07-14 15:36:53 +10:00
xfs_buf_item.c [XFS] Kill off xfs_count_bits 2007-07-14 15:36:43 +10:00
xfs_buf_item.h Revert "[XFS] Avoid replaying inode buffer initialisation log items if on-disk version is newer." 2007-10-01 07:59:03 -07:00
xfs_clnt.h [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_da_btree.c [XFS] fix sparse shadowed variable warnings 2007-09-05 14:50:26 +10:00
xfs_da_btree.h [XFS] Remove a bunch of unused functions from XFS. 2007-02-10 18:37:40 +11:00
xfs_dfrag.c [XFS] propogate return codes from flush routines 2007-05-08 13:49:27 +10:00
xfs_dfrag.h [XFS] Add parameters to xfs_bmapi() and xfs_bunmapi() to have them report 2006-06-09 14:48:12 +10:00
xfs_dinode.h [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_dir2.c [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2.h [XFS] Remove version 1 directory code. Never functioned on Linux, just 2006-06-20 13:04:51 +10:00
xfs_dir2_block.c [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_block.h [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_data.c [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_data.h [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_leaf.c [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_leaf.h [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_node.c [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_node.h [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_sf.c [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_sf.h [XFS] Reduce shouting by removing unnecessary macros from dir2 code. 2007-07-14 15:37:02 +10:00
xfs_dir2_trace.c [XFS] Remove version 1 directory code. Never functioned on Linux, just 2006-06-20 13:04:51 +10:00
xfs_dir2_trace.h [XFS] Update license/copyright notices to match the prefered SGI 2005-11-02 14:58:39 +11:00
xfs_dmapi.h [XFS] Remove KERNEL_VERSION macros from xfs_dmapi.h 2006-11-11 18:05:06 +11:00
xfs_dmops.c [XFS] Remove version 1 directory code. Never functioned on Linux, just 2006-06-20 13:04:51 +10:00
xfs_error.c [XFS] reducing the number of random number functions. 2007-05-08 13:49:03 +10:00
xfs_error.h [XFS] Remove a bunch of unused functions from XFS. 2007-02-10 18:37:40 +11:00
xfs_extfree_item.c [XFS] Keep stack usage down for 4k stacks by using noinline. 2007-02-10 18:34:56 +11:00
xfs_extfree_item.h [XFS] cleanup the field types of some item format structures 2006-09-28 10:55:43 +10:00
xfs_filestream.c [XFS] fix filestreams on 32-bit boxes 2007-09-20 19:40:19 +10:00
xfs_filestream.h [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_fs.h [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_fsops.c [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_fsops.h [XFS] Write log dummy record when freezing filesystem 2006-01-11 15:30:08 +11:00
xfs_ialloc.c [XFS] Lazy Superblock Counters 2007-07-14 15:28:50 +10:00
xfs_ialloc.h [XFS] Lazy Superblock Counters 2007-07-14 15:28:50 +10:00
xfs_ialloc_btree.c [XFS] use NULL for pointer initialisation instead of zero-cast-to-ptr 2006-09-28 10:58:40 +10:00
xfs_ialloc_btree.h [XFS] Remove unused arguments from the XFS_BTREE_*_ADDR macros. 2007-02-10 18:37:33 +11:00
xfs_iget.c [XFS] Add lockdep support for XFS 2007-05-08 13:50:19 +10:00
xfs_imap.h [XFS] Update license/copyright notices to match the prefered SGI 2005-11-02 14:58:39 +11:00
xfs_inode.c [XFS] Clean up function name handling in tracing code 2007-07-14 15:41:24 +10:00
xfs_inode.h [XFS] Fix lockdep annotations for xfs_lock_inodes 2007-07-14 18:09:42 +10:00
xfs_inode_item.c [XFS] Keep stack usage down for 4k stacks by using noinline. 2007-02-10 18:34:56 +11:00
xfs_inode_item.h [XFS] cleanup the field types of some item format structures 2006-09-28 10:55:43 +10:00
xfs_inum.h [XFS] Update license/copyright notices to match the prefered SGI 2005-11-02 14:58:39 +11:00
xfs_iocore.c [XFS] Fix to prevent the notorious 'NULL files' problem after a crash. 2007-05-08 13:49:46 +10:00
xfs_iomap.c [XFS] Cleanup inode extent size hint extraction 2007-07-14 15:35:36 +10:00
xfs_iomap.h [XFS] Fix to prevent the notorious 'NULL files' problem after a crash. 2007-05-08 13:49:46 +10:00
xfs_itable.c [XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode 2007-07-14 15:42:50 +10:00
xfs_itable.h [XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode 2007-07-14 15:42:50 +10:00
xfs_log.c [XFS] Fix sparse NULL vs 0 warnings 2007-09-05 14:47:33 +10:00
xfs_log.h [XFS] Remove several macros that are no longer used anywhere 2006-09-28 11:02:57 +10:00
xfs_log_priv.h [XFS] Fixes the leak in reservation space because we weren't ungranting 2006-09-28 11:04:16 +10:00
xfs_log_recover.c Revert "[XFS] Avoid replaying inode buffer initialisation log items if on-disk version is newer." 2007-10-01 07:59:03 -07:00
xfs_log_recover.h [XFS] Update license/copyright notices to match the prefered SGI 2005-11-02 14:58:39 +11:00
xfs_mount.c [XFS] Use do_div() on 64 bit types. 2007-07-14 15:36:08 +10:00
xfs_mount.h [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_mru_cache.c [XFS] On-demand reaping of the MRU cache 2007-09-17 16:42:02 +10:00
xfs_mru_cache.h [XFS] On-demand reaping of the MRU cache 2007-09-17 16:42:02 +10:00
xfs_qmops.c [XFS] The last argument "lsn" of xfs_trans_commit() is always called with 2007-05-08 13:48:42 +10:00
xfs_quota.h [XFS] Fix uquota and oquota enforcement problems. 2007-05-08 13:49:33 +10:00
xfs_refcache.h [XFS] Cleanup cosmetic differences between source trees. 2005-11-03 16:14:31 +11:00
xfs_rename.c [XFS] The last argument "lsn" of xfs_trans_commit() is always called with 2007-05-08 13:48:42 +10:00
xfs_rtalloc.c [XFS] Don't grow filesystems past the size they can index. 2007-07-14 15:21:29 +10:00
xfs_rtalloc.h [XFS] Remove a bunch of unused functions from XFS. 2007-02-10 18:37:40 +11:00
xfs_rw.c [XFS] The last argument "lsn" of xfs_trans_commit() is always called with 2007-05-08 13:48:42 +10:00
xfs_rw.h [XFS] Cleanup inode extent size hint extraction 2007-07-14 15:35:36 +10:00
xfs_sb.h [XFS] Lazy Superblock Counters 2007-07-14 15:28:50 +10:00
xfs_trans.c [XFS] Apply transaction delta counts atomically to incore counters 2007-07-14 15:32:09 +10:00
xfs_trans.h [XFS] Lazy Superblock Counters 2007-07-14 15:28:50 +10:00
xfs_trans_ail.c [XFS] Workaround log space issue by increasing XFS_TRANS_PUSH_AIL_RESTARTS 2007-02-10 18:35:52 +11:00
xfs_trans_buf.c Revert "[XFS] Avoid replaying inode buffer initialisation log items if on-disk version is newer." 2007-10-01 07:59:03 -07:00
xfs_trans_extfree.c [XFS] Remove version 1 directory code. Never functioned on Linux, just 2006-06-20 13:04:51 +10:00
xfs_trans_inode.c [XFS] Remove version 1 directory code. Never functioned on Linux, just 2006-06-20 13:04:51 +10:00
xfs_trans_item.c [XFS] Portability changes: remove prdev, stick to one diagnostic 2006-06-09 15:29:40 +10:00
xfs_trans_priv.h [XFS] Add lock annotations to xfs_trans_update_ail and 2006-09-28 11:04:07 +10:00
xfs_trans_space.h [XFS] Remove version 1 directory code. Never functioned on Linux, just 2006-06-20 13:04:51 +10:00
xfs_types.h [XFS] Update license/copyright notices to match the prefered SGI 2005-11-02 14:58:39 +11:00
xfs_utils.c [XFS] propogate return codes from flush routines 2007-05-08 13:49:27 +10:00
xfs_utils.h [XFS] Resolve a namespace collision on remaining vtypes for FreeBSD 2006-06-09 17:07:12 +10:00
xfs_vfsops.c [XFS] Concurrent Multi-File Data Streams 2007-07-14 15:40:53 +10:00
xfs_vnodeops.c [XFS] Ensure file size updates have been completed before writing inode to disk. 2007-09-18 20:12:51 +10:00