Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive file fragmentation on parallel file creation #496

Open
bolausson opened this issue Aug 15, 2024 · 2 comments
Open

Excessive file fragmentation on parallel file creation #496

bolausson opened this issue Aug 15, 2024 · 2 comments

Comments

@bolausson
Copy link

Hi there,

It looks like parallel file creation with IOR (4.0.0) causes unnecessary file fragmentation.

Here is an example and a comparison with FIO (serialised and parallelised file creation).
Even with parallel file creation, FIO does a very good job of keeping fragmentation to a minimum.

Is there any chance of improving this?

  1. DD single file (just for reference)
bo@x440-01 $ dd if=/dev/random of=test-dd bs=1M count=1024 oflag=sync
bo@x440-01 $ filefrag test-dd 
test-dd: 1 extent found
  1. IOR single process
bo@x440-01 $ mpirun -np 1 ior -a POSIX -w -F -e -g -t 1m -b 1g -k -o /scratch/bolausson/hdd/ior/test-plain.ior
bo@x440-01 $ filefrag test-plain.ior.00000000 
test-plain.ior.00000000: 1 extent found
  1. IOR 10 processes
bo@x440-01 $ mpirun -np 10 ior -a POSIX -w -F -e -g -t 1m -b 1g -k -o /scratch/bolausson/hdd/ior/test-plain.ior
bo@x440-01 $ for i in test-plain.ior.0000000* ; do filefrag ${i} ; done
test-plain.ior.00000000: 116 extents found
test-plain.ior.00000001: 115 extents found
test-plain.ior.00000002: 90 extents found
test-plain.ior.00000003: 95 extents found
test-plain.ior.00000004: 91 extents found
test-plain.ior.00000005: 116 extents found
test-plain.ior.00000006: 97 extents found
test-plain.ior.00000007: 118 extents found
test-plain.ior.00000008: 107 extents found
test-plain.ior.00000009: 111 extents found
  1. FIO single process
bo@x440-01 $ fio --name fio-serial --numjobs=1 --create_serialize=1 --ioengine=sync --size=1g --blocksize=1M --group_reporting=1 --rw=write --directory=/scratch/bolausson/hdd/ior
bo@x440-01 $ filefrag fio-serial.0.0
fio-serial.0.0: 1 extent found
  1. FIO 10 processes, create searialize (default behaviour)
fio --name fio-serial-multi --numjobs=10 --create_serialize=1 --ioengine=sync --size=1g --blocksize=1M --group_reporting=1 --rw=write --directory=/scratch/bolausson/hdd/ior
bo@x440-01 $ for i in fio-serial-multi.* ; do filefrag ${i} ; done
fio-serial-multi.0.0: 1 extent found
fio-serial-multi.1.0: 1 extent found
fio-serial-multi.2.0: 1 extent found
fio-serial-multi.3.0: 1 extent found
fio-serial-multi.4.0: 1 extent found
fio-serial-multi.5.0: 1 extent found
fio-serial-multi.6.0: 1 extent found
fio-serial-multi.7.0: 1 extent found
fio-serial-multi.8.0: 1 extent found
fio-serial-multi.9.0: 1 extent found
  1. FIO 10 processes, create parallel
bo@x440-01 $ fio --name fio-parallel-multi --numjobs=10 --create_serialize=0 --ioengine=sync --size=1g --blocksize=1M --group_reporting=1 --rw=write --directory=/scratch/bolausson/hdd/ior
bo@x440-01 $ for i in fio-parallel-multi.* ; do filefrag ${i} ; done
fio-parallel-multi.0.0: 1 extent found
fio-parallel-multi.1.0: 1 extent found
fio-parallel-multi.2.0: 1 extent found
fio-parallel-multi.3.0: 2 extents found
fio-parallel-multi.4.0: 1 extent found
fio-parallel-multi.5.0: 1 extent found
fio-parallel-multi.6.0: 1 extent found
fio-parallel-multi.7.0: 1 extent found
fio-parallel-multi.8.0: 1 extent found
fio-parallel-multi.9.0: 2 extents found
@glennklockwood
Copy link
Contributor

File fragmentation is not a concept known to POSIX, so what you’re seeing is caused by your specific file system. On which one are you running this test?

@bolausson
Copy link
Author

Oh yes, sorry, I though I mentioned the filesystemt. It is Lustre.
Here some more information a colleague gathered:

The fio benchmark does preallocate files as shown in the snippet below of an fio strace

1599909 13:46:38.047201 openat(AT_FDCWD, "/hdd/ior-16m_dne2-nostriping/fio.blktracesingle/fiojob.0.0", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 6

1599909 13:46:38.047689 fallocate(6, 0, 0, 34359738368) = 0

In the IOR cases without preallocation, the files are written sequentially, although there is a SEEK to the same file offset that would have been appended anyway. Snippet of strace of an IOR process handling writing one file:

1599661 12:17:18.744826 lseek(18, 4714397696, SEEK_SET) = 4714397696

1599661 12:17:18.744877 write(18, "\272\263\262e\0\0\0\0\10\0\0\0\0\0\0\0\272\263\262e\0\0\0\0\30\0\0\0\0\0\0\0\272\263\262e\0\0\0\0(\0\0\0\0\0\0\0\272\263\262e\0\0\0\08\0\0\0\0\0\0\0\272\263\262e\0\0\0\0H\0\0\0\0\0\0\0\272\263\262e\0\0\0\0X\0\0\0\0\0\0\0\272\263\262e\0\0\0\0h\0\0\0\0\0\0\0\272\263\262e\0\0\0\0x\0\0\0\0\0\0\0"..., 16777216) = 16777216

1599661 12:17:18.761087 lseek(18, 4731174912, SEEK_SET) = 4731174912

1599661 12:17:18.761133 write(18, "\272\263\262e\0\0\0\0\10\0\0\0\0\0\0\0\272\263\262e\0\0\0\0\30\0\0\0\0\0\0\0\272\263\262e\0\0\0\0(\0\0\0\0\0\0\0\272\263\262e\0\0\0\08\0\0\0\0\0\0\0\272\263\262e\0\0\0\0H\0\0\0\0\0\0\0\272\263\262e\0\0\0\0X\0\0\0\0\0\0\0\272\263\262e\0\0\0\0h\0\0\0\0\0\0\0\272\263\262e\0\0\0\0x\0\0\0\0\0\0\0"..., 16777216) = 16777216

1599661 12:17:18.777467 lseek(18, 4747952128, SEEK_SET) = 4747952128

1599661 12:17:18.777513 write(18, "\272\263\262e\0\0\0\0\10\0\0\0\0\0\0\0\272\263\262e\0\0\0\0\30\0\0\0\0\0\0\0\272\263\262e\0\0\0\0(\0\0\0\0\0\0\0\272\263\262e\0\0\0\08\0\0\0\0\0\0\0\272\263\262e\0\0\0\0H\0\0\0\0\0\0\0\272\263\262e\0\0\0\0X\0\0\0\0\0\0\0\272\263\262e\0\0\0\0h\0\0\0\0\0\0\0\272\263\262e\0\0\0\0x\0\0\0\0\0\0\0"..., 16777216) = 16777216

1599661 12:17:18.793840 lseek(18, 4764729344, SEEK_SET) = 4764729344

1599661 12:17:18.793887 write(18, "\272\263\262e\0\0\0\0\10\0\0\0\0\0\0\0\272\263\262e\0\0\0\0\30\0\0\0\0\0\0\0\272\263\262e\0\0\0\0(\0\0\0\0\0\0\0\272\263\262e\0\0\0\08\0\0\0\0\0\0\0\272\263\262e\0\0\0\0H\0\0\0\0\0\0\0\272\263\262e\0\0\0\0X\0\0\0\0\0\0\0\272\263\262e\0\0\0\0h\0\0\0\0\0\0\0\272\263\262e\0\0\0\0x\0\0\0\0\0\0\0"..., 16777216) = 16777216

The offset+lengths are sequential with no gaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants