summaryrefslogtreecommitdiff
path: root/fs/f2fs/segment.c
AgeCommit message (Collapse)Author
2026-01-30f2fs: fix incomplete block usage in compact SSA summariesDaeho Jeong
In a previous commit, a bug was introduced where compact SSA summaries failed to utilize the entire block space in non-4KB block size configurations, leading to inefficient space management. This patch fixes the calculation logic to ensure that compact SSA summaries can fully occupy the block regardless of the block size. Reported-by: Chris Mason <clm@meta.com> Fixes: e48e16f3e37f ("f2fs: support non-4KB block size without packed_ssa feature") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: support non-4KB block size without packed_ssa featureDaeho Jeong
Currently, F2FS requires the packed_ssa feature to be enabled when utilizing non-4KB block sizes (e.g., 16KB). This restriction limits the flexibility of filesystem formatting options. This patch allows F2FS to support non-4KB block sizes even when the packed_ssa feature is disabled. It adjusts the SSA calculation logic to correctly handle summary entries in larger blocks without the packed layout. Cc: stable@kernel.org Fixes: 7ee8bc3942f2 ("f2fs: revert summary entry count from 2048 to 512 in 16kb block support") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: make FAULT_DISCARD obsoleteChao Yu
__blkdev_issue_discard() in __submit_discard_cmd() will never fail, so let's make FAULT_DISCARD fault injection obsolete. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-07f2fs: sysfs: introduce inject_lock_timeoutChao Yu
This patch adds a new sysfs node in /sys/fs/f2fs/<disk>/inject_lock_timeout, it relies on CONFIG_F2FS_FAULT_INJECTION kernel config. It can be used to simulate different type of timeout in lock duration. ========== =============================== Flag_Value Flag_Description ========== =============================== 0x00000000 No timeout (default) 0x00000001 Simulate running time 0x00000002 Simulate IO type sleep time 0x00000003 Simulate Non-IO type sleep time 0x00000004 Simulate runnable time ========== =============================== Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-07f2fs: rename FAULT_TIMEOUT to FAULT_ATOMIC_TIMEOUTChao Yu
No logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-07f2fs: trace elapsed time for gc_lock lockChao Yu
Use f2fs_{down,up}_write_trace for gc_lock to trace lock elapsed time. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-07f2fs: trace elapsed time for cp_rwsem lockChao Yu
Use f2fs_{down,up}_read_trace for cp_rwsem to trace lock elapsed time. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-01f2fs: return immediately after submitting the specified folio in ↵Yongpeng Yang
__submit_merged_write_cond f2fs_folio_wait_writeback ensures the folio write is submitted to the block layer via __submit_merged_write_cond, then waits for the folio to complete. Other I/O submissions are irrelevant to f2fs_folio_wait_writeback. Thus, if the folio write bio is already submitted, the function can return immediately. This patch adds a writeback parameter to __submit_merged_write_cond(), which signals an immediate return after submitting the target folio, and waitting writeback can use this parameter. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: ignore discard return valueChaitanya Kulkarni
__blkdev_issue_discard() always returns 0, making the error assignment in __submit_discard_cmd() dead code. Initialize err to 0 and remove the error assignment from the __blkdev_issue_discard() call to err. Move fault injection code into already present if branch where err is set to -EIO. This preserves the fault injection behavior while removing dead error handling. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: change default schedule timeout valueChao Yu
This patch changes default schedule timeout value from 20ms to 1ms, in order to give caller more chances to check whether IO or non-IO congestion condition has already been mitigable. In addition, default interval of periodical discard submission is kept to 20ms. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: introduce f2fs_schedule_timeout()Chao Yu
In f2fs retry logic, we will call f2fs_io_schedule_timeout() to sleep as uninterruptible state (waiting for IO) for a while, however, in several paths below, we are not blocked by IO: - f2fs_write_single_data_page() return -EAGAIN due to racing on cp_rwsem. - f2fs_flush_device_cache() failed to submit preflush command. - __issue_discard_cmd_range() sleeps periodically in between two in batch discard submissions. So, in order to reveal state of task more accurate, let's introduce f2fs_schedule_timeout() and call it in above paths in where we are waiting for non-IO reasons. Then we can get real reason of uninterruptible sleep for a thread in tracepoint, perfetto, etc. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: use memalloc_retry_wait() as much as possibleChao Yu
memalloc_retry_wait() is recommended in memory allocation retry logic, use it as much as possible. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: revert summary entry count from 2048 to 512 in 16kb block supportDaeho Jeong
The recent increase in the number of Segment Summary Area (SSA) entries from 512 to 2048 was an unintentional change in logic of 16kb block support. This commit corrects the issue. To better utilize the space available from the erroneous 2048-entry calculation, we are implementing a solution to share the currently unused SSA space with neighboring segments. This enhances overall SSA utilization without impacting the established 8MB segment size. Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: fix age extent cache insertion skip on counter overflowXiaole He
The age extent cache uses last_blocks (derived from allocated_data_blocks) to determine data age. However, there's a conflict between the deletion marker (last_blocks=0) and legitimate last_blocks=0 cases when allocated_data_blocks overflows to 0 after reaching ULLONG_MAX. In this case, valid extents are incorrectly skipped due to the "if (!tei->last_blocks)" check in __update_extent_tree_range(). This patch fixes the issue by: 1. Reserving ULLONG_MAX as an invalid/deletion marker 2. Limiting allocated_data_blocks to range [0, ULLONG_MAX-1] 3. Using F2FS_EXTENT_AGE_INVALID for deletion scenarios 4. Adjusting overflow age calculation from ULLONG_MAX to (ULLONG_MAX-1) Reproducer (using a patched kernel with allocated_data_blocks initialized to ULLONG_MAX - 3 for quick testing): Step 1: Mount and check initial state # dd if=/dev/zero of=/tmp/test.img bs=1M count=100 # mkfs.f2fs -f /tmp/test.img # mkdir -p /mnt/f2fs_test # mount -t f2fs -o loop,age_extent_cache /tmp/test.img /mnt/f2fs_test # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551612 # ULLONG_MAX - 3 Inner Struct Count: tree: 1(0), node: 0 Step 2: Create files and write data to trigger overflow # touch /mnt/f2fs_test/{1,2,3,4}.txt; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551613 # ULLONG_MAX - 2 Inner Struct Count: tree: 5(0), node: 1 # dd if=/dev/urandom of=/mnt/f2fs_test/1.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551614 # ULLONG_MAX - 1 Inner Struct Count: tree: 5(0), node: 2 # dd if=/dev/urandom of=/mnt/f2fs_test/2.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551615 # ULLONG_MAX Inner Struct Count: tree: 5(0), node: 3 # dd if=/dev/urandom of=/mnt/f2fs_test/3.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 0 # Counter overflowed! Inner Struct Count: tree: 5(0), node: 4 Step 3: Trigger the bug - next write should create node but gets skipped # dd if=/dev/urandom of=/mnt/f2fs_test/4.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 1 Inner Struct Count: tree: 5(0), node: 4 Expected: node: 5 (new extent node for 4.txt) Actual: node: 4 (extent insertion was incorrectly skipped due to last_blocks = allocated_data_blocks = 0 in __get_new_block_age) After this fix, the extent node is correctly inserted and node count becomes 5 as expected. Fixes: 71644dff4811 ("f2fs: add block_age-based extent cache") Cc: stable@kernel.org Signed-off-by: Xiaole He <hexiaole1994@126.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-02f2fs: allocate HOT_DATA for IPU writesJaegeuk Kim
Let's split IPU writes in hot data area to improve the GC efficiency. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-29f2fs: Use allocate_section_policy to control write priority in multi-devices ↵Liao Yuanhong
setups Introduces two new sys nodes: allocate_section_hint and allocate_section_policy. The allocate_section_hint identifies the boundary between devices, measured in sections; it defaults to the end of the device for single storage setups, and the end of the first device for multiple storage setups. The allocate_section_policy determines the write strategy, with a default value of 0 for normal sequential write strategy. A value of 1 prioritizes writes before the allocate_section_hint, while a value of 2 prioritizes writes after it. This strategy addresses the issue where, despite F2FS supporting multiple devices, SOC vendors lack multi-devices support (currently only supporting zoned devices). As a workaround, multiple storage devices are mapped to a single dm device. Both this workaround and the F2FS multi-devices solution may require prioritizing writing to certain devices, such as a device with better performance or when switching is needed due to performance degradation near a device's end. For scenarios with more than two devices, sort them at mount time to utilize this feature. When using this feature with a single storage device, it has almost no impact. However, for configurations where multiple storage devices are mapped to the same dm device using F2FS, utilizing this feature can provide some optimization benefits. Therefore, I believe it should not be limited to just multi-devices usage. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-11f2fs: add error checking in do_write_page()mason.zhang
Otherwise, the filesystem may unaware of potential file corruption. Signed-off-by: mason.zhang <masonzhang.linuxer@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to f2fs_submit_merged_write_cond()Matthew Wilcox (Oracle)
Most callers pass NULL, and the one that passes a page already has a folio. Also convert __submit_merged_write_cond() to take a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to ADDRS_PER_PAGE()Matthew Wilcox (Oracle)
All callers now have a folio so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to IS_DNODE()Matthew Wilcox (Oracle)
All callers now have a folio so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to is_cold_node()Matthew Wilcox (Oracle)
All callers now have a folio so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Add fio->folioMatthew Wilcox (Oracle)
Put fio->page insto a union with fio->folio. This lets us remove a lot of folio->page and page->folio conversions. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to fill_node_footer_blkaddr()Matthew Wilcox (Oracle)
The only caller has a folio so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to f2fs_inode_chksum_set()Matthew Wilcox (Oracle)
All callers have a folio so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22f2fs: Pass a folio to f2fs_allocate_data_block()Matthew Wilcox (Oracle)
Most callers pass NULL, and the one which passes a page already has a folio, so we can pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-09f2fs: introduce is_cur{seg,sec}()Chao Yu
There are redundant codes in IS_CUR{SEG,SEC}() macros, let's introduce inline is_cur{seg,sec}() functions, and use a loop in it for cleanup. Meanwhile, it enhances expansibility, as it doesn't need to change is_cur{seg,sec}() when we add a new log header. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-09f2fs: use kfree() instead of kvfree() to free some memoryJiazi Li
options in f2fs_fill_super is alloc by kstrdup: options = kstrdup((const char *)data, GFP_KERNEL) sit_bitmap[_mir], nat_bitmap[_mir] are alloc by kmemdup: sit_i->sit_bitmap = kmemdup(src_bitmap, sit_bitmap_size, GFP_KERNEL); sit_i->sit_bitmap_mir = kmemdup(src_bitmap, sit_bitmap_size, GFP_KERNEL); nm_i->nat_bitmap = kmemdup(version_bitmap, nm_i->bitmap_size, GFP_KERNEL); nm_i->nat_bitmap_mir = kmemdup(version_bitmap, nm_i->bitmap_size, GFP_KERNEL); write_io is alloc by f2fs_kmalloc: sbi->write_io[i] = f2fs_kmalloc(sbi, array_size(n, sizeof(struct f2fs_bio_info)) Use kfree is more efficient. Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com> Signed-off-by: peixuan.qiu <peixuan.qiu@transsion.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-01f2fs: fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()Chao Yu
As syzbot reported as below: F2FS-fs (loop9): inject invalid blkaddr in f2fs_is_valid_blkaddr of do_write_page+0x277/0xb10 fs/f2fs/segment.c:3956 ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:3957! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 10538 Comm: syz-executor Not tainted 6.16.0-rc3-next-20250627-syzkaller #0 PREEMPT(full) Call Trace: <TASK> f2fs_outplace_write_data+0x11a/0x220 fs/f2fs/segment.c:4017 f2fs_do_write_data_page+0x12ea/0x1a40 fs/f2fs/data.c:2752 f2fs_write_single_data_page+0xa68/0x1680 fs/f2fs/data.c:2851 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3282 [inline] f2fs_write_data_pages+0x195b/0x3000 fs/f2fs/data.c:3309 do_writepages+0x32b/0x550 mm/page-writeback.c:2636 filemap_fdatawrite_wbc mm/filemap.c:386 [inline] __filemap_fdatawrite_range mm/filemap.c:419 [inline] __filemap_fdatawrite mm/filemap.c:425 [inline] filemap_fdatawrite+0x199/0x240 mm/filemap.c:430 f2fs_sync_dirty_inodes+0x31f/0x830 fs/f2fs/checkpoint.c:1108 block_operations fs/f2fs/checkpoint.c:1247 [inline] f2fs_write_checkpoint+0x95a/0x1df0 fs/f2fs/checkpoint.c:1638 kill_f2fs_super+0x2c3/0x6c0 fs/f2fs/super.c:5081 deactivate_locked_super+0xb9/0x130 fs/super.c:474 cleanup_mnt+0x425/0x4c0 fs/namespace.c:1417 task_work_run+0x1d4/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xec/0x110 kernel/entry/common.c:114 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline] do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f If we inject block address fault, it may trigger kernel panic, we need to use f2fs_is_valid_blkaddr_raw() instead of f2fs_is_valid_blkaddr() in do_write_page() to avoid such issue. Fixes: 70b6e8500431 ("f2fs: do sanity check on fio.new_blkaddr in do_write_page()") Reported-by: syzbot+9201a61c060513d4be38@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68639520.a70a0220.3b7e22.17e6.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-23f2fs: do sanity check on fio.new_blkaddr in do_write_page()Chao Yu
F2FS-fs (dm-55): access invalid blkaddr:972878540 Call trace: dump_backtrace+0xec/0x128 show_stack+0x18/0x28 dump_stack_lvl+0x40/0x88 dump_stack+0x18/0x24 __f2fs_is_valid_blkaddr+0x360/0x3b4 f2fs_is_valid_blkaddr+0x10/0x20 f2fs_get_node_info+0x21c/0x60c __write_node_page+0x15c/0x734 f2fs_sync_node_pages+0x4f8/0x700 f2fs_write_checkpoint+0x4a8/0x99c __checkpoint_and_complete_reqs+0x7c/0x20c issue_checkpoint_thread+0x4c/0xd8 kthread+0x11c/0x1b0 ret_from_fork+0x10/0x20 If f2fs_allocate_data_block() fails, we may update nat.blkaddr w/ uninitialized fio.new_blkaddr. - __write_node_folio - f2fs_do_write_node_page - do_write_page - f2fs_allocate_data_block : once it fails, it may not allocate new blkaddr - set_node_addr : update w/ uninitialized fio.new_blkaddr variable I've checked all error paths in f2fs_allocate_data_block(), it should be tagged w/ CP_ERROR_FLAG. In addition, f2fs_allocate_data_block() succeeds, fio.new_blkaddr should be valid. Let's add f2fs_bug_on() to check above two conditions to detect any potential bugs. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-23f2fs: make sure zoned device GC to use FG_GC in shortage of free sectionDaeho Jeong
We already use FG_GC when we have free sections under gc_boost_zoned_gc_percent. So, let's make it consistent. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28f2fs: fix to skip f2fs_balance_fs() if checkpoint is disabledChao Yu
Syzbot reports a f2fs bug as below: INFO: task syz-executor328:5856 blocked for more than 144 seconds. Not tainted 6.15.0-rc6-syzkaller-00208-g3c21441eeffc #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor328 state:D stack:24392 pid:5856 tgid:5832 ppid:5826 task_flags:0x400040 flags:0x00004006 Call Trace: <TASK> context_switch kernel/sched/core.c:5382 [inline] __schedule+0x168f/0x4c70 kernel/sched/core.c:6767 __schedule_loop kernel/sched/core.c:6845 [inline] schedule+0x165/0x360 kernel/sched/core.c:6860 io_schedule+0x81/0xe0 kernel/sched/core.c:7742 f2fs_balance_fs+0x4b4/0x780 fs/f2fs/segment.c:444 f2fs_map_blocks+0x3af1/0x43b0 fs/f2fs/data.c:1791 f2fs_expand_inode_data+0x653/0xaf0 fs/f2fs/file.c:1872 f2fs_fallocate+0x4f5/0x990 fs/f2fs/file.c:1975 vfs_fallocate+0x6a0/0x830 fs/open.c:338 ioctl_preallocate fs/ioctl.c:290 [inline] file_ioctl fs/ioctl.c:-1 [inline] do_vfs_ioctl+0x1b8f/0x1eb0 fs/ioctl.c:885 __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0x82/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is after commit 84b5bb8bf0f6 ("f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable"), we will get chance to allow f2fs_is_checkpoint_ready() to return true once below conditions are all true: 1. checkpoint is disabled 2. there are not enough free segments 3. there are enough free blocks Then it will cause f2fs_balance_fs() to trigger foreground GC. void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need) ... if (!f2fs_is_checkpoint_ready(sbi)) return; And the testcase mounts f2fs image w/ gc_merge,checkpoint=disable, so deadloop will happen through below race condition: - f2fs_do_shutdown - vfs_fallocate - gc_thread_func - file_start_write - __sb_start_write(SB_FREEZE_WRITE) - f2fs_fallocate - f2fs_expand_inode_data - f2fs_map_blocks - f2fs_balance_fs - prepare_to_wait - wake_up(gc_wait_queue_head) - io_schedule - bdev_freeze - freeze_super - sb->s_writers.frozen = SB_FREEZE_WRITE; - sb_wait_write(sb, SB_FREEZE_WRITE); - if (sbi->sb->s_writers.frozen >= SB_FREEZE_WRITE) continue; : cause deadloop This patch fix to add check condition in f2fs_balance_fs(), so that if checkpoint is disabled, we will just skip trigger foreground GC to avoid such deadloop issue. Meanwhile let's remove f2fs_is_checkpoint_ready() check condition in f2fs_balance_fs(), since it's redundant, due to the main logic in the function is to check: a) whether checkpoint is disabled b) there is enough free segments f2fs_balance_fs() still has all logics after f2fs_is_checkpoint_ready()'s removal. Reported-by: syzbot+aa5bb5f6860e08a60450@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/682d743a.a00a0220.29bc26.0289.GAE@google.com Fixes: 84b5bb8bf0f6 ("f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable") Cc: Qi Han <hanqi@vivo.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28f2fs: add ckpt_valid_blocks to the section entryyohan.joung
when performing buffered writes in a large section, overhead is incurred due to the iteration through ckpt_valid_blocks within the section. when SEGS_PER_SEC is 128, this overhead accounts for 20% within the f2fs_write_single_data_page routine. as the size of the section increases, the overhead also grows. to handle this problem ckpt_valid_blocks is added within the section entries. Test insmod null_blk.ko nr_devices=1 completion_nsec=1 submit_queues=8 hw_queue_depth=64 max_sectors=512 bs=4096 memory_backed=1 make_f2fs /dev/block/nullb0 make_f2fs -s 128 /dev/block/nullb0 fio --bs=512k --size=1536M --rw=write --name=1 --filename=/mnt/test_dir/seq_write --ioengine=io_uring --iodepth=64 --end_fsync=1 before SEGS_PER_SEC 1 2556MiB/s SEGS_PER_SEC 128 2145MiB/s after SEGS_PER_SEC 1 2556MiB/s SEGS_PER_SEC 128 2556MiB/s Signed-off-by: yohan.joung <yohan.joung@sk.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06f2fs: support FAULT_TIMEOUTChao Yu
Support to inject a timeout fault into function, currently it only support to inject timeout to commit_atomic_write flow to reproduce inconsistent bug, like the bug fixed by commit f098aeba04c9 ("f2fs: fix to avoid atomicity corruption of atomic file"). By default, the new type fault will inject 1000ms timeout, and the timeout process can be interrupted by SIGKILL. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06f2fs: fix to bail out in get_new_segment()Chao Yu
------------[ cut here ]------------ WARNING: CPU: 3 PID: 579 at fs/f2fs/segment.c:2832 new_curseg+0x5e8/0x6dc pc : new_curseg+0x5e8/0x6dc Call trace: new_curseg+0x5e8/0x6dc f2fs_allocate_data_block+0xa54/0xe28 do_write_page+0x6c/0x194 f2fs_do_write_node_page+0x38/0x78 __write_node_page+0x248/0x6d4 f2fs_sync_node_pages+0x524/0x72c f2fs_write_checkpoint+0x4bc/0x9b0 __checkpoint_and_complete_reqs+0x80/0x244 issue_checkpoint_thread+0x8c/0xec kthread+0x114/0x1bc ret_from_fork+0x10/0x20 get_new_segment() detects inconsistent status in between free_segmap and free_secmap, let's record such error into super block, and bail out get_new_segment() instead of continue using the segment. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: zone: fix to calculate first_zoned_segno correctlyChao Yu
A zoned device can has both conventional zones and sequential zones, so we should not treat first segment of zoned device as first_zoned_segno, instead, we need to check zone type for each zone during traversing zoned device to find first_zoned_segno. Otherwise, for below case, first_zoned_segno will be 0, which could be wrong. create_null_blk 512 2 1024 1024 mkfs.f2fs -m /dev/nullb0 Testcase: export SCRIPTS_PATH=/share/git/scripts test multiple devices w/ zoned device for ((i=0;i<8;i++)) do { zonesize=$((2<<$i)) conzone=$((4096/$zonesize)) seqzone=$((4096/$zonesize)) $SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone mkfs.f2fs -f -m /dev/vdb -c /dev/nullb0 mount /dev/vdb /mnt/f2fs touch /mnt/f2fs/file f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2)) stat /mnt/f2fs/file df cat /proc/fs/f2fs/vdb/segment_info umount /mnt/f2fs $SCRIPTS_PATH/nullblk_remove.sh 0 } done test single zoned device for ((i=0;i<8;i++)) do { zonesize=$((2<<$i)) conzone=$((4096/$zonesize)) seqzone=$((4096/$zonesize)) $SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone mkfs.f2fs -f -m /dev/nullb0 mount /dev/nullb0 /mnt/f2fs touch /mnt/f2fs/file f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2)) stat /mnt/f2fs/file df cat /proc/fs/f2fs/nullb0/segment_info umount /mnt/f2fs $SCRIPTS_PATH/nullblk_remove.sh 0 } done Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Convert fsync_node_entry->page to folioMatthew Wilcox (Oracle)
Convert all callers to set/get a folio instead of a page. Removes five calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Convert dnode_of_data->node_page to node_folioMatthew Wilcox (Oracle)
All assignments to this struct member are conversions from a folio so convert it to be a folio and convert all users. At the same time, convert data_blkaddr() to take a folio as all callers now have a folio. Remove eight calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in f2fs_wait_on_block_writeback()Matthew Wilcox (Oracle)
Fetch a folio from the pagecache and use it. Removes two calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in change_curseg()Matthew Wilcox (Oracle)
Get a folio and use it. Saves a call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Add f2fs_get_sum_folio()Matthew Wilcox (Oracle)
Convert f2fs_get_sum_page() to f2fs_get_sum_folio() and add a f2fs_get_sum_page() wrapper. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Convert f2fs_get_meta_page_retry() to f2fs_get_meta_folio_retry()Matthew Wilcox (Oracle)
Also convert get_current_nat_page() to get_current_nat_folio(). Removes three hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in read_normal_summaries()Matthew Wilcox (Oracle)
Get a folio instead of a page. Saves a hidden call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in read_compacted_summaries()Matthew Wilcox (Oracle)
Get a folio instead of a page. Saves two hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in build_sit_entries()Matthew Wilcox (Oracle)
Convert get_current_sit_page() to get_current_sit_folio() and then use the folio in build_sit_entries(). Saves a hidden call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in write_compacted_summaries()Matthew Wilcox (Oracle)
Grab a folio instead of a page. Saves four hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in write_current_sum_page()Matthew Wilcox (Oracle)
Grab a folio instead of a page. Saves two hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in f2fs_update_meta_page()Matthew Wilcox (Oracle)
Grab a folio instead of a page. Saves two hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Convert get_next_sit_page() to get_next_sit_folio()Matthew Wilcox (Oracle)
Grab a folio instead of a page. Also convert seg_info_to_sit_page() to seg_info_to_sit_folio() and use a folio in f2fs_flush_sit_entries(). Saves a couple of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Pass a folio to f2fs_submit_merged_ipu_write()Matthew Wilcox (Oracle)
The only caller which passes a page already has a folio, so pass it in. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Introduce fio_inode()Matthew Wilcox (Oracle)
This helper returns the inode associated with the f2fs_io_info. That's a relatively common thing to want, mildly awkward to get and provides one place to change if we decide to record it directly, or change fio->page to fio->folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [Jaegeuk Kim: fix wrong fio_inode conversion] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>