Sparse file

From ArchWiki

According to Wikipedia, a sparse file is a type of file that attempts to use file system space more efficiently when blocks allocated to a file are mostly empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual "empty" space which makes up the block, using less disk space. The full block size is written to disk as the actual size only when the block contains "real" (non-empty) data.

When reading sparse files, the file system transparently converts metadata representing empty blocks into "real" blocks filled with zero bytes at runtime. The application is unaware of this conversion.

Most modern file systems support sparse files, including most Unix variants and NTFS, but notably not Apple's HFS+. Sparse files are commonly used for disk images (not to be confused with sparse images), database snapshots, log files and in scientific applications.

The advantage of sparse files is that storage is only allocated when actually needed: disk space is saved, and large files can be created even if there is insufficient free space on the file system.

Disadvantages are that sparse files may become fragmented; file system free space reports may be misleading; filling up file systems containing sparse files can have unexpected effects; and copying a sparse file with a program that does not explicitly support them may copy the entire file, including the empty blocks which are not on explicitly stored on the disk, which wastes the benefits of the sparse property of a file.

Creating sparse files

The truncate utility can create sparse files. This command creates a 512 MiB sparse file:

$ truncate -s 512M file.img

The dd utility can also be used, for example:

$ dd if=/dev/zero of=file.img bs=1 count=0 seek=512M

Sparse files have different apparent file sizes (the maximum size to which they may expand) and actual file sizes (how much space is allocated for data on disk). To check a file's apparent size, just run:

$ du -h --apparent-size file.img
512M    file.img

and, to check the actual size of a file on disk:

$ du -h file.img
0       file.img

As you can see, although the apparent size of the file is 512 MiB, its "actual" size is really zero—that's because due to the nature and beauty of sparse files, it will "expand" arbitrarily to minimize the space required to store its contents.

Making existing files sparse

The fallocate utility can make existing files sparse on supported file systems:

$ fallocate -d copy.img
$ du -h copy.img
0       copy.img

Making existing files non-sparse

The following command creates a non-sparse copy of a (sparse) file:

$ cp file.img copy.img --sparse=never
$ du -h copy.img 
512M    copy.img

Creating a filesystem in a sparse file

This article or section needs language, wiki syntax or style improvements. See Help:Style for reference.

Reason: Sparse files do not have to contain a file system, the purpose should be explained. (Discuss in Talk:Sparse file)

Now that we have created a sparse file, it is time to format it with a filesystem. For example, ext4:

$ mkfs.ext4 file.img

We can now check its size to see how a filesystem has affected it:

$ du -h --apparent-size file.img
512M    file.img
$ du -h file.img
456K     file.img

As you may have expected, formatting it with a filesystem has increased its actual size, but left its apparent size the same. Now we can create a directory which we will use to mount our file:

# mount --mkdir -o loop file.img mountpoint

Tada! We now have both a file and a folder into which we may store almost 512 MiB worth of information!

Mounting a file at boot

To mount a sparse image automatically at boot, add an entry to your fstab:

/path/to/file.img  /path/to/mountpoint  ext4  loop,defaults  0  0
Note: Be sure to include the loop option, otherwise it will not mount.

Detecting sparse files

Since sparse files occupy less blocks than the apparent file size would require, they can be detected by comparing the two sizes. This is not a bulletproof method if the filesystem uses compression, extended attributes take up the difference in space, file is internally fragmented, has indirect blocks, and similar. Still, the standard way to check is:

$ ls -ls sparse-file.bin

If a file size is greater than the allocated size in the first column a file is sparse. The same can be achieved with du by comparing:

$ du sparse-file.bin
$ du --apparent-size sparse-file.bin

A step further is to print sparsiness value with find:

$ find sparse-file.bin -printf '%S\t%p\n'

A sparse file has a sparsiness value of less than one whereas normal files have exactly one or just slightly above. The above command can be easily extended to list sparse files in a desired path:

$ find path/ -type f -printf '%S\t%p\n' | gawk '$1 < 1.0 {print}' | cut -f '2-'

Copying a sparse file

Copying with cp

Normally, cp is good at detecting whether a file is sparse, so it suffices to run:

$ cp file.img new_file.img

Then new_file.img will be sparse. However, cp does have a --sparse=when option. This is especially useful if a sparse file has somehow become non sparse (i.e. the empty blocks have been written out to disk in full). Disk space can be recovered by:

$ cp --sparse=always new_file.img recovered_file.img

Archiving with tar

This article or section needs language, wiki syntax or style improvements. See Help:Style for reference.

Reason: Informal writing, see Help:Style#Language register. (Discuss in Talk:Sparse file)

One day, you may decide to back up your well-loved sparse file, and choose the tar utility for that very purpose; however, you soon realize you have a problem:

$ du -h file.img
33M     file.img
$ tar -cf file.tar file.img
$ du -h file.tar
513M    file.tar

Apparently, even though the current size of the sparse file is only 33 MB, archiving it with tar created an archive of the ENTIRE SIZE OF THE FILE! Luckily for you, though, tar has a `--sparse' (`-S') flag, that when used in conjunction with the `--create' (`-c') operation, tests all files for sparseness while archiving. If tar finds a file to be sparse, it uses a sparse representation of the file in the archive. This is useful when archiving files, such as dbm files, likely to contain many nulls, and dramatically decreases the amount of space needed to store such an archive.

$ tar -Scf file.tar file.img
$ du -h file.tar
12K     file.tar

Resizing a sparse file

This article or section needs language, wiki syntax or style improvements. See Help:Style for reference.

Reason: Dependence on #Creating a filesystem in a sparse file is not apparent, not every sparse file contains a file system. Also too informal writing, see Help:Style#Language register. (Discuss in Talk:Sparse file)

Before we resize a sparse file, let us populate it with a couple small files for testing purposes:

$ for f in {1..5}; do touch folder/file${f}; done
$ ls folder/
file1  file2  file3  file4  file5

Now, let us add some content to one of the files:

$ echo "This is a test to see if it works..." >> folder/file1
$ cat folder/file1
This is a test to see if it works...

Growing a file

Should you ever need to grow a file, you may do the following:

# umount folder
$ dd if=/dev/zero of=file.img bs=1 count=0 seek=1G
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.2978e-05 s, 0.0 kB/s

This will increase its size to 1 GiB and leave its information intact. Next, we need to increase the size of its filesystem:

$ e2fsck -f file.img
$ resize2fs file.img
resize2fs 1.47.1 (20-May-2024)
Resizing the filesystem on file.img to 262144 (4k) blocks.
The filesystem on file.img is now 262144 (4k) blocks long.

...and, remount it:

# mount -o loop file.img folder

Checking its size gives us:

# du -h --apparent-size file.img
1.0G    file.img
# du -h file.img
564K     file.img

...and to check for consistency:

# df -h folder
Filesystem            Size  Used Avail Use% Mounted on
/tmp/file.img         1.0G   33M  992M   4% /tmp/folder
# ls folder
file1  file2  file3  file4  file5
# cat folder/file1
This is a test to see if it works...

Tools

  • sparse-fiodd-like program to work with files that are sparsely filled with non-zero data
https://github.com/anyc/sparse-fio || sparse-fio-gitAUR
  • sparseutils — utilities to work with sparsely-populated files, provides mksparse and sparsemap, can be installed with pip
https://pypi.org/project/sparseutils/ || not packaged? search in AUR

See also