dd
dd is a core utility whose primary purpose is to copy a file and optionally convert it during the copy process.
Similarly to cp, by default dd makes a bit-to-bit copy of the file, but with lower-level I/O flow control features.
For more information, see dd(1) or the full documentation.
status=progress
option to the command.Installation
dd is part of the GNU coreutils. For other utilities in the package, please refer to Core utilities.
Disk cloning and restore
The dd command is a simple, yet versatile and powerful tool. It can be used to copy from source to destination, block-by-block, regardless of their filesystem types or operating systems. A convenient method is to use dd from a live environment, as in a Live CD.
if=
) and output file (of=
) and do not reverse them! Always ensure that the destination drive or partition (of=
) is of equal or greater size than the source (if=
).
Cloning a partition
From physical disk /dev/sda
, partition 1, to physical disk /dev/sdb
, partition 1:
# dd if=/dev/sda1 of=/dev/sdb1 bs=64K conv=noerror,sync status=progress
of=
(sdb1
in the example) does not exist, dd will create a file with this name and will start filling up your root file system.Cloning an entire hard disk
From physical disk /dev/sda
to physical disk /dev/sdb
:
# dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync status=progress
This will clone the entire drive, including the partition table, bootloader, all partitions, UUIDs, and data.
-
bs=
sets the block size. Defaults to 512 bytes, which is the "classic" block size for hard drives since the early 1980s, but is not the most convenient. Use a bigger value, 64K or 128K. Also, please read the note below, because there is more to this than just "block sizes" — it also influences how read errors propagate. See [1] and [2] for details and to figure out the best bs value for your use case. -
noerror
instructs dd to continue operation, ignoring all read errors. Default behavior for dd is to halt at any error. -
sync
fills input blocks with zeroes at the end of the block if there were any read errors somewhere in the block, so data offsets stay in sync (see below for detailed explanation of read error behavior with sync if you are suspecting possible read errors). -
status=progress
shows periodic transfer statistics which can be used to estimate when the operation may be complete.
The dd utility technically has an "input block size" (IBS) and an "output block size" (OBS). When you set bs
, you effectively set both IBS and OBS. Normally, if your block size is, say, 1 MiB, dd will read 1024×1024 bytes and write as many bytes. But if a read error occurs, things will go wrong. Many people seem to think that dd will "fill up read errors with zeroes" if you use the noerror,sync
options, but this is not what happens. dd will, according to documentation, fill up the OBS to IBS size after completing its read, which means adding zeroes at the end of the block. This means, for a disk, that effectively the whole 1 MiB would become messed up because of a single 512 byte read error in the beginning of the read: 12ERROR89 would become 128900000 instead of 120000089.
If you are positive that your disk does not contain any errors, you could proceed using a larger block size, which will increase the speed of your copying several fold. For example, changing bs from 512 to 64K changed copying speed from 35 MB/s to 120 MB/s on a simple Celeron 2.7 GHz system. But keep in mind that read errors on the source disk will end up as block errors on the destination disk, i.e. a single 512-byte read error will mess up the whole 64 KiB output block.
- To regain unique UUIDs of an ext2/3/4 filesystem, use
tune2fs /dev/sdXY -U random
on every partition. For swap partitions, usemkswap -U random /dev/sdXY
instead. - If you are cloning a GPT disk, you can use sgdisk to randomize the disk and partition GUIDs and regain their uniqueness.
- Partition table changes from dd are not registered by the kernel. To notify of changes without rebooting, use a utility like partprobe (part of GNU Parted).
Backing up the partition table
See fdisk#Backup and restore partition table or gdisk#Backup and restore partition table.
Create disk image
Boot from a live medium and make sure no partitions are mounted from the source hard drive.
Then mount the external hard drive and backup the drive:
# dd if=/dev/sda conv=sync,noerror bs=64M status=progress | lz4 -z > /path/to/backup.img.lz4
If necessary (e.g. when the resulting files will be stored on a FAT32 file system), split the disk image into multiple parts (see also split(1)):
# dd if=/dev/sda conv=sync,noerror bs=64M status=progress | lz4 -z | split -a3 -b2G - /path/to/backup.img.lz4
If there is not enough disk space locally, you may send the image through ssh:
# dd if=/dev/sda conv=sync,noerror bs=64M status=progress | lz4 -z | ssh user@local dd of=backup.img.lz4
Finally, save extra information about the drive geometry necessary in order to interpret the partition table stored within the image. The most important of which is the sector size.
# fdisk -l /dev/sda > /path/to/list_fdisk.info
- You may wish to use a block size (
bs=
) that is equal to the amount of cache on the HDD you are backing up. For example,bs=512M
works for an 512 MiB cache. The 64 MiB mentioned in this article is better than the defaultbs=512
bytes, but it may run faster with a largerbs=
. - The above examples use lz4(1) which uses multiple cores for compression, but you can exchange it for any other compression program.
Restore system
To restore your system:
# lz4 -dc /path/to/backup.img.lz4 | dd of=/dev/sda status=progress
When the image has been split, use the following instead:
# cat /path/to/backup.img.lz4* | lz4 -dc | dd of=/dev/sda status=progress
Backup and restore MBR
Before making changes to a disk, you may want to backup the partition table and partition scheme of the drive. You can also use a backup to copy the same partition layout to numerous drives.
The MBR is stored in the first 512 bytes of the disk. It consists of 4 parts:
- The first 440 bytes contain the bootstrap code (boot loader).
- The next 6 bytes contain the disk signature.
- The next 64 bytes contain the partition table (4 entries of 16 bytes each, one entry for each primary partition).
- The last 2 bytes contain a boot signature.
To save the MBR as mbr_file.img
:
# dd if=/dev/sdX of=/path/to/mbr_file.img bs=512 count=1
You can also extract the MBR from a full dd disk image:
# dd if=/path/to/disk.img of=/path/to/mbr_file.img bs=512 count=1
To restore (be careful, this destroys the existing partition table and with it access to all data on the disk):
# dd if=/path/to/mbr_file.img of=/dev/sdX bs=512 count=1
If you only want to restore the boot loader, but not the primary partition table entries, just restore the first 440 bytes of the MBR:
# dd if=/path/to/mbr_file.img of=/dev/sdX bs=440 count=1
To restore only the partition table, one must use:
# dd if=/path/to/mbr_file.img of=/dev/sdX bs=1 skip=446 count=64
Remove bootloader
To erase the MBR bootstrap code (may be useful if you have to do a full reinstall of another operating system), only the first 440 bytes need to be zeroed:
# dd if=/dev/zero of=/dev/sdX bs=440 count=1
As some readers might have already realised, the dd(1) core utility has a different command-line syntax compared to other utilities. Moreover, while supporting some unique features not found in other commodity utilities, several default behaviours it has are either less-ideal or potentially error-prone if applied to specific scenarios. For that reason, users may want to use some alternatives that are better in some aspects in lieu of the dd core utility.
That said, it is still worth to note that since dd is a core utility, which is installed by default on Arch and many other systems, it is preferable to some alternatives or more specialised utilities if it is inconvenient to install a new package on your system.
To cover the two aspects that are addressed above, this section is dedicated to summarising the features of the dd(1) core utility that are rarely found in other commodity utilities — in a form that resembles the pacman/Rosetta article but with the quantity of examples being cut down to examine the features of dd (as denoted by i.e. or To-clause in "Tip:" box under subsection), either in practice or pseudocode.
For more alternatives, see coreutils#dd alternatives.
Patching a binary file, block-by-block in-place
It is not an uncommon practice to use dd as a feature-limited binary file patcher in an automated shell script as it supports:
-
seek
ing the output file by a given offset before writing. - writing to an output file (without truncating the size of the output file by adding the
conv=notrunc
option).
Here is an example to modify the timestamp part of the first member in a cpio(5) § Portable ASCII Format archive, which starts at the 49th byte of the file (or with an offset of 0x30
if you prefer hex notation):
$ touch a-randomly-chosen-file $ bsdtar -cf example-modify-ts.cpio --format odc -- a-randomly-chosen-file
$ printf '%011o' "$(date -d "2019-12-21 00:00:00" +%s)" | dd conv=notrunc of=example-modify-ts.cpio seek=48 oflag=seek_bytes
seek_bytes
output flag is added above for seeking the output in offset of bytes instead of blocks before starting to write(2) to output.- the input file of dd is a pipe connected with a program that utilize splice(2) system call, and the user want to avoid unnecessary userspace I/O of dd for a better performance
- or, to avoid frequently fork(2) in a shell script loop to reduce the performance penalty
$ zsh
$ local +xr openToWriteFD $ zmodload zsh/system $ sysopen -wu openToWriteFD example-modify-ts.cpio $ sysseek -u $openToWriteFD 48 $ printf '%011o' "$(date -d "2019-12-21 00:00:00" +%s)" >&${openToWriteFD} ... $ : finally close the fd {openToWriteFD}>&-
Printing volume label of a VFAT filesystem image
To read the filesystem volume label of an VFAT image file, which should be in total length of 11 bytes that padded by ASCII spaces, with an offset of 0x047
:
$ truncate -s 36M empty-hole.img $ mkfs.fat -F 32 -n LabelMe empty-hole.img
$ dd iflag=skip_bytes,count_bytes count=11 skip=$((0x047)) if=empty-hole.img | sed -e 's% *$%%'
- The former
skip_bytes
instruct dd to seek (or skip if input is not seekable) on input file by offset in quatity of byte instead of number of blocks before starting to read(2) from it. - The later
count_bytes
allow user to specifiy the total quantity of blocks to copy from input file in byte, instead of a number of blocks. It could be confusing since specifying this option could still subject to partial read(2), think it like a fractional value of input blockcount
to better understand this behaviour.
$ socat -u -,seek=$((0x047)),readbytes=11 - < empty-hole.img | sed -e 's% *$%%'
Sponge between piped commands
$TMPDIR
(or /tmp
if it is unset) at first.In the following example, to avoid unnecessary long-lasting TCP connection on input end if the output end blocks longer than expected, one may put a dd between two commands with an output block size certainly larger than input while still reasonably smaller than available memory:
$ curl -qgsfL http://example.org/mirrors/ftp.archlinux.org/mirrored.md5deep | dd ibs=128k obs=200M | poor-mirroring-script-that-perform-mirroring-on-input-paths-line-by-line-wo-buffer-entire-list-first
Transfering data with size limitation
It is a general practice to use dd in a data streaming shell script for limiting total length of data that a piped command may consume. For example, to inspect an ustar header block (tar(5) § POSIX ustar Archives) using a shell script function in a streaming manner:
B
suffix in argument to count
option is a newly introduced feature as of GNU coreutils v9.1 that has same effect of count_bytes input flag, is potentially confused with option in forms like count=256k
which indicate dd to copy 262144 input blocks instead of bytes.hexdump-field() { set -o pipefail printf '%s[%d]:\n' $1 $2 dd count=${2}B status=none | hexdump -e $2'/1 "%3.2x"' -e '" | " '$2'/1 "%_p" "\n"' } inspect-tar-header-block() { local -a hdrstack=( name 100 mode 8 uid 8 gid 8 size 12 mtime 12 checksum 8 typeflag 1 linkname 100 magic 6 version 2 uname 32 gname 32 devmajor 8 devminor 8 prefix 155 pad 12 ) set - ${hdrstack[@]} while test $# -gt 0; do hexdump-field $1 $2 || return shift 2 done }
$ bsdtar -cf - /dev/tty /dev/null 2>&- | dd count=1 skip=1 status=none | inspect-tar-header-block
- a shell with builtin seeking capability (as already mentioned as alternative in a previous subsection.)
- or, a Bourne-like shell (e.g. bash), with help of xxd(1) § s for one-off lseek(2) on shell-opened file descriptor,
ls -l /proc/self/fd
in bash at first):$ bash
$ exec 9<dummy-but-rather-large.img $ xxd -g 0 -l 0 -s $((0x47ffff)) <&9 $ pv -qSs 104857601200 <&9 | program-that-process-load-of-data-but-does-not-limit-read-length-as-desired-nor-support-offset-read $ exec 9<&-
count=0
and skip
options as an example in coreutils test suite.Writing a bootable disk image to block device, optionally display progress information
See USB flash installation medium#Using basic command line utilities for examples of commodity utilities include the potential least adapted dd for that case.
Troubleshooting
Partial read: copied data is smaller than requested
Files created with dd can end up with a smaller size than requested if a full input block is not available for the moment, as per documentation:
- In addition, if no data-transforming
conv
operand [i.e. option as in this wiki article] is specified, input is copied to the output as soon as it is read, even if it is smaller than the block size.
On Linux, the underlying read(2) system call may returns early (i.e. partial read) when reading from a pipe(7), or when reading a device file like /dev/urandom
and /dev/random
(e.g. due to hardcoded limitation of underlying kernel device driver) Which make the total size of copied data smaller than expected when bs
in conjunction of count=n
option is used, where n
limited the maximum number of (potential partial) input block(s) to copy to output.
It is possible, but not guaranteed, that dd will warn you about such kind of issue:
dd: warning: partial read (X bytes); suggest iflag=fullblock
The solution is to do as the warning says, add iflag=fullblock
option in addition to the input file option to the dd command. For example, to create a new file filled up with random data in total length of 40 mebibytes:
$ dd if=/dev/urandom of=new-file-filled-by-urandom.bin bs=40M count=1 iflag=fullblock
count=n
option being specified, it is suggested to, or always strongly recommended to add the iflag=fullblock
option to the dd command if in case of wiping a portion of device or file.When reading from a pipe, an alternative to iflag=fullblock
is to limit bs
to the PIPE_BUF
constant value as defined in linux/limits.h to make the pipe(7) I/O atomic. For example, to prepare a text file filled up will random alphanumeric string in total length of 5 mebibytes:
$ LC_ALL=C tr -dc '[:alnum:]' </dev/urandom | dd of=passtext-5m.txt bs=4k count=1280
Since the output file is not a pipe, one may prefer to use ibs
and obs
options to set block size separately for the (input) pipe and the (output) on-disk file. For example, to set a more efficient block size for output file:
$ LC_ALL=C tr -dc '[:alnum:]' </dev/urandom | dd of=passtext-5m.txt ibs=4k obs=64k count=1280
PIPE_BUF
constant may already be optimal.Total transfered bytes count readout is wrong
The total transferred bytes count readout may be greater than actual if an error is encountered on writing to output (i.e. partial write, caused by e.g. SIGPIPE, faulty medium, or accidentally disconnected the target network block device), like in following proof of concept where the second dd obviously will not read more than 512200 bytes, but the first dd instance still report an inaccurate bytes count 512400 bytes:
$ yes 'x' | dd bs=4096 count=512400B | dd ibs=1 count=512200 status=none >/dev/null 125+1 records in 125+1 records out 512400 bytes (512 kB, 500 KiB) copied, 10.7137 s, 47.8 kB/s
When resuming an interrupted transfer like the above PoC, it is recommended to only rely on the readout of number of whole output blocks already copied, as denoted by the number before "+" sign.
iflag=fullblock
option, which means partial I/O occurred more than once. In that case, for reliable resuming your transfer progress, it is suggested to:
- use ddrescue to rerun the transfer instead, to deal with partial reads on a potential faulty medium more flexibly.
- use dd_rescue(1) to rerun the transfer with direct I/O, in case of writing to nbd with a poor network connectivity.
- avoid writing to faulty medium.