dd

From ArchWiki

dd is a core utility whose primary purpose is to copy a file and optionally convert it during the copy process.

Similarly to cp, by default dd makes a bit-to-bit copy of the file, but with lower-level I/O flow control features.

For more information, see dd(1) or the full documentation.

Tip: By default, dd outputs nothing until the task has finished. To monitor the progress of the operation, add the status=progress option to the command.
Warning: One should be extremely cautious using dd, as with any command of this kind it can destroy data irreversibly.

Installation

dd is part of the GNU coreutils. For other utilities in the package, please refer to Core utilities.

Disk cloning and restore

The dd command is a simple, yet versatile and powerful tool. It can be used to copy from source to destination, block-by-block, regardless of their filesystem types or operating systems. A convenient method is to use dd from a live environment, as in a Live CD.

Warning: As with any command of this type, you should be very cautious when using it; it can destroy data. Remember the order of input file (if=) and output file (of=) and do not reverse them! Always ensure that the destination drive or partition (of=) is of equal or greater size than the source (if=).

Cloning a partition

From physical disk /dev/sda, partition 1, to physical disk /dev/sdb, partition 1:

# dd if=/dev/sda1 of=/dev/sdb1 bs=64K conv=noerror,sync status=progress
Note: Be careful that if the output partition of= (sdb1 in the example) does not exist, dd will create a file with this name and will start filling up your root file system.

Cloning an entire hard disk

From physical disk /dev/sda to physical disk /dev/sdb:

# dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync status=progress

This will clone the entire drive, including the partition table, bootloader, all partitions, UUIDs, and data.

  • bs= sets the block size. Defaults to 512 bytes, which is the "classic" block size for hard drives since the early 1980s, but is not the most convenient. Use a bigger value, 64K or 128K. Also, please read the note below, because there is more to this than just "block sizes" — it also influences how read errors propagate. See [1] and [2] for details and to figure out the best bs value for your use case.
  • noerror instructs dd to continue operation, ignoring all read errors. Default behavior for dd is to halt at any error.
  • sync fills input blocks with zeroes at the end of the block if there were any read errors somewhere in the block, so data offsets stay in sync (see below for detailed explanation of read error behavior with sync if you are suspecting possible read errors).
  • status=progress shows periodic transfer statistics which can be used to estimate when the operation may be complete.
Note: The block size you specify influences how read errors are handled. Read below. For data recovery, use ddrescue.

The dd utility technically has an "input block size" (IBS) and an "output block size" (OBS). When you set bs, you effectively set both IBS and OBS. Normally, if your block size is, say, 1 MiB, dd will read 1024×1024 bytes and write as many bytes. But if a read error occurs, things will go wrong. Many people seem to think that dd will "fill up read errors with zeroes" if you use the noerror,sync options, but this is not what happens. dd will, according to documentation, fill up the OBS to IBS size after completing its read, which means adding zeroes at the end of the block. This means, for a disk, that effectively the whole 1 MiB would become messed up because of a single 512 byte read error in the beginning of the read: 12ERROR89 would become 128900000 instead of 120000089.

If you are positive that your disk does not contain any errors, you could proceed using a larger block size, which will increase the speed of your copying several fold. For example, changing bs from 512 to 64K changed copying speed from 35 MB/s to 120 MB/s on a simple Celeron 2.7 GHz system. But keep in mind that read errors on the source disk will end up as block errors on the destination disk, i.e. a single 512-byte read error will mess up the whole 64 KiB output block.

Note:
  • To regain unique UUIDs of an ext2/3/4 filesystem, use tune2fs /dev/sdXY -U random on every partition. For swap partitions, use mkswap -U random /dev/sdXY instead.
  • If you are cloning a GPT disk, you can use sgdisk to randomize the disk and partition GUIDs and regain their uniqueness.
  • Partition table changes from dd are not registered by the kernel. To notify of changes without rebooting, use a utility like partprobe (part of GNU Parted).

Backing up the partition table

See fdisk#Backup and restore partition table or gdisk#Backup and restore partition table.

Create disk image

Boot from a live medium and make sure no partitions are mounted from the source hard drive.

Then mount the external hard drive and backup the drive:

# dd if=/dev/sda conv=sync,noerror bs=64M status=progress | lz4 -z  > /path/to/backup.img.lz4

If necessary (e.g. when the resulting files will be stored on a FAT32 file system), split the disk image into multiple parts (see also split(1)):

# dd if=/dev/sda conv=sync,noerror bs=64M status=progress | lz4 -z | split -a3 -b2G - /path/to/backup.img.lz4

If there is not enough disk space locally, you may send the image through ssh:

# dd if=/dev/sda conv=sync,noerror bs=64M status=progress | lz4 -z | ssh user@local dd of=backup.img.lz4

Finally, save extra information about the drive geometry necessary in order to interpret the partition table stored within the image. The most important of which is the sector size.

# fdisk -l /dev/sda > /path/to/list_fdisk.info
Tip:
  • You may wish to use a block size (bs=) that is equal to the amount of cache on the HDD you are backing up. For example, bs=512M works for an 512 MiB cache. The 64 MiB mentioned in this article is better than the default bs=512 bytes, but it may run faster with a larger bs=.
  • The above examples use lz4(1) which uses multiple cores for compression, but you can exchange it for any other compression program.

Restore system

To restore your system:

# lz4 -d /path/to/backup.img.lz4 | dd of=/dev/sda status=progress

When the image has been split, use the following instead:

# cat /path/to/backup.img.lz4* | lz4 -d | dd of=/dev/sda status=progress

Backup and restore MBR

Before making changes to a disk, you may want to backup the partition table and partition scheme of the drive. You can also use a backup to copy the same partition layout to numerous drives.

The MBR is stored in the first 512 bytes of the disk. It consists of 4 parts:

  1. The first 440 bytes contain the bootstrap code (boot loader).
  2. The next 6 bytes contain the disk signature.
  3. The next 64 bytes contain the partition table (4 entries of 16 bytes each, one entry for each primary partition).
  4. The last 2 bytes contain a boot signature.

To save the MBR as mbr_file.img:

# dd if=/dev/sdX of=/path/to/mbr_file.img bs=512 count=1

You can also extract the MBR from a full dd disk image:

# dd if=/path/to/disk.img of=/path/to/mbr_file.img bs=512 count=1

To restore (be careful, this destroys the existing partition table and with it access to all data on the disk):

# dd if=/path/to/mbr_file.img of=/dev/sdX bs=512 count=1
Warning: Restoring the MBR with a mismatching partition table will make your data unreadable and nearly impossible to recover. If you simply need to reinstall the bootloader, see their respective pages as they also employ the DOS compatibility region: GRUB or Syslinux.

If you only want to restore the boot loader, but not the primary partition table entries, just restore the first 440 bytes of the MBR:

# dd if=/path/to/mbr_file.img of=/dev/sdX bs=440 count=1

To restore only the partition table, one must use:

# dd if=/path/to/mbr_file.img of=/dev/sdX bs=1 skip=446 count=64

Remove bootloader

To erase the MBR bootstrap code (may be useful if you have to do a full reinstall of another operating system), only the first 440 bytes need to be zeroed:

# dd if=/dev/zero of=/dev/sdX bs=440 count=1

Extra scenario in addition to disk-related scenes

This article or section needs language, wiki syntax or style improvements. See Help:Style for reference.

Reason: Some paragraphs in this section are difficult to read; they need to be proofread. (Discuss in Talk:Dd)

As some readers might have already realised, the dd(1) core utility has a different command-line syntax compared to other utilities. Moreover, while supporting some unique features not found in other commodity utilities, several default behaviours it has are either less-ideal or potentially error-prone if applied to specific scenarios. For that reason, users may want to use some alternatives that are better in some aspects in lieu of the dd core utility.

That said, it is still worth to note that since dd is a core utility, which is installed by default on Arch and many other systems, it is preferable to some alternatives or more specialised utilities if it is inconvenient to install a new package on your system.

To cover the two aspects that are addressed above, this section is dedicated to summarising the features of the dd(1) core utility that are rarely found in other commodity utilities — in a form that resembles the pacman/Rosetta article but with the quantity of examples being cut down to examine the features of dd (as denoted by i.e. or To-clause in "Tip:" box under subsection), either in practice or pseudocode.

Note: To keep this section concise, the comparison of alternative utilities only includes those found in the official repositories. We check the cases where dd has an obvious advantage over them. If necessary, we explain the advantage in detail.
For more alternatives, see coreutils#dd alternatives.

Patching a binary file, block-by-block in-place

It is not an uncommon practice to use dd as a feature-limited binary file patcher in an automated shell script as it supports:

  • seeking the output file by a given offset before writing.
  • writing to an output file (without truncating the size of the output file by adding the conv=notrunc option).

Here is an example to modify the timestamp part of the first member in a cpio(5) § Portable ASCII Format archive, which starts at the 49th byte of the file (or with an offset of 0x30 if you prefer hex notation):

$ touch a-randomly-chosen-file
$ bsdtar -cf example-modify-ts.cpio --format odc -- a-randomly-chosen-file
$ printf '%011o' "$(date -d "2019-12-21 00:00:00" +%s)" | dd conv=notrunc of=example-modify-ts.cpio seek=48 oflag=seek_bytes
Note: The currently undocumented seek_bytes output flag is added above for seeking the output in offset of bytes instead of blocks before starting to write(2) to output.
Tip: To print byte stream from command-line input hex notation, use basenc(1) § base16 and/or printf(1).
Tip: In this feature grid (i.e. write(2) with offset, with no truncation), instead of using dd, one may consider using a shell that supports to call lseek(2) on shell-opened file descriptor if in case of:
  • the input file of dd is a pipe connected with a program that utilize splice(2) system call, and the user want to avoid unnecessary userspace I/O of dd for a better performance
  • or, to avoid frequently fork(2) in a shell script loop to reduce the performance penalty
Then it would be necessary to let that shell open the file descriptor at first, and perform some seeking operation on the file descriptor to assign this file descriptor as output end of corresponding utility that use splice(2) system call (or a shell builtin command does not forking, as in following example of zshmodules(1) § sysseek):
$ zsh
$ local +xr openToWriteFD
$ zmodload zsh/system
$ sysopen -wu openToWriteFD example-modify-ts.cpio
$ sysseek -u $openToWriteFD 48
$ printf '%011o' "$(date -d "2019-12-21 00:00:00" +%s)" >&${openToWriteFD}
...
$ : finally close the fd {openToWriteFD}>&-
Warning: Avoiding using this approach if you cannot be certain about the program which you need it to write with offset really utilize splice(2), which implies the program should not perform any kind of seeking or truncating on its output. Some programs may seek/truncate on input/output file descriptor by themselves even if this behaviour is unspecified on command line flags, that will invalidate your shell's lseek(3) call, or unexpectedly truncate on your opened file descriptor.

Printing volume label of a VFAT filesystem image

Tip: For this specific scenario, a more practical choice is file.

To read the filesystem volume label of an VFAT image file, which should be in total length of 11 bytes that padded by ASCII spaces, with an offset of 0x047:

$ truncate -s 36M empty-hole.img
$ mkfs.fat -F 32 -n LabelMe empty-hole.img
$ dd iflag=skip_bytes,count_bytes count=11 skip=$((0x047)) if=empty-hole.img | sed -e 's% *$%%'
Note: Both input flags are currently undocumented:
  • The former skip_bytes instruct dd to seek (or skip if input is not seekable) on input file by offset in quatity of byte instead of number of blocks before starting to read(2) from it.
  • The later count_bytes allow user to specifiy the total quantity of blocks to copy from input file in byte, instead of a number of blocks. It could be confusing since specifying this option could still subject to partial read(2), think it like a fractional value of input block count to better understand this behaviour.
Tip: To transfer data from input (with an offset) to output within given length, in shell scripting, one may also consider curl(1) § r, as a commodity alternative that use range notation instead.
Note: curl does not support seeking/skipping when input file is a device/pipe, another alternative socat(1) does support these operation on input file (incl. block device, excl. pipe and character device) but is less commodifying than curl:
$ socat -u -,seek=$((0x047)),readbytes=11 - < empty-hole.img | sed -e 's% *$%%'

Sponge between piped commands

Tip: As already noted in the title, a practical choice for this specific scenario is sponge(1), which supports atomic write by writing to $TMPDIR (or /tmp if it is unset) at first.

In the following example, to avoid unnecessary long-lasting TCP connection on input end if the output end blocks longer than expected, one may put a dd between two commands with an output block size certainly larger than input while still reasonably smaller than available memory:

$ curl -qgsfL http://example.org/mirrors/ftp.archlinux.org/mirrored.md5deep | dd ibs=128k obs=200M | poor-mirroring-script-that-perform-mirroring-on-input-paths-line-by-line-wo-buffer-entire-list-first
Warning: It should never be considered as a generic alternative to sponge(1) as dd truncates the output file before starting the entire copy operation.

Transfering data with size limitation

It is a general practice to use dd in a data streaming shell script for limiting total length of data that a piped command may consume. For example, to inspect an ustar header block (tar(5) § POSIX ustar Archives) using a shell script function in a streaming manner:

Note: The B suffix in argument to count option is a newly introduced feature as of GNU coreutils v9.1 that has same effect of count_bytes input flag, is potentially confused with option in forms like count=256k which indicate dd to copy 262144 input blocks instead of bytes.
hexdump-field() {
  set -o pipefail
  printf '%s[%d]:\n' $1 $2
  dd count=${2}B status=none | hexdump -e $2'/1 "%3.2x"' -e '" | " '$2'/1 "%_p" "\n"'
}

inspect-tar-header-block() {
  local -a hdrstack=(
    name 100
    mode 8
    uid 8
    gid 8
    size 12
    mtime 12
    checksum 8
    typeflag 1
    linkname 100
    magic 6
    version 2
    uname 32
    gname 32
    devmajor 8
    devminor 8
    prefix 155
    pad 12
  )
  set - ${hdrstack[@]}
  while test $# -gt 0; do
    hexdump-field $1 $2 || return
    shift 2
  done
}
$ bsdtar -cf - /dev/tty /dev/null 2>&- | dd count=1 skip=1 status=none | inspect-tar-header-block
Tip: To streaming data from input to output within given length, an alternative is pv(1) § S, which supports splice(2) system call.
Note: Another candidate alternative is head(1) § c, though implementation other than GNU coreutils and glibc may consume more data than requested, causing data misalignment issue in a streamingly shell script.
Tip: In addition to above feature grid, if input file shall be lseek(2)'d by specific offset before streaming, and the output end of dd is a pipe connected with a program utilize splice(2), then, as an alternative, one may consider use: and pv(1) § S, (mentioned above) as in following bash example (assuming that file descriptor has not allocated in shell by ls -l /proc/self/fd in bash at first):
$ bash
$ exec 9<dummy-but-rather-large.img
$ xxd -g 0 -l 0 -s $((0x47ffff)) <&9
$ pv -qSs 104857601200 <&9 | program-that-process-load-of-data-but-does-not-limit-read-length-as-desired-nor-support-offset-read
$ exec 9<&-
Note: Though incompatible with POSIX and some non-GNU implementation, it is feasible to replace the usage of xxd(1) § s in above example with dd in conjunction of count=0 and skip options as an example in coreutils test suite.

Writing a bootable disk image to block device, optionally display progress information

See USB flash installation medium#Using basic command line utilities for examples of commodity utilities include the potential least adapted dd for that case.

Tip: To write the content of a file to block device (with progress indicator), a suggested alternative is dd_rescue(1) § W. It is capable to avoid unnecessary writing in case of overwriting the old version of image on device with a newer version.

Troubleshooting

Partial read: copied data is smaller than requested

Files created with dd can end up with a smaller size than requested if a full input block is not available for the moment, as per documentation:

In addition, if no data-transforming conv operand [i.e. option as in this wiki article] is specified, input is copied to the output as soon as it is read, even if it is smaller than the block size.

On Linux, the underlying read(2) system call may returns early (i.e. partial read) when reading from a pipe(7), or when reading a device file like /dev/urandom and /dev/random (e.g. due to hardcoded limitation of underlying kernel device driver) Which make the total size of copied data smaller than expected when bs in conjunction of count=n option is used, where n limited the maximum number of (potential partial) input block(s) to copy to output.

It is possible, but not guaranteed, that dd will warn you about such kind of issue:

dd: warning: partial read (X bytes); suggest iflag=fullblock

The solution is to do as the warning says, add iflag=fullblock option in addition to the input file option to the dd command. For example, to create a new file filled up with random data in total length of 40 mebibytes:

$ dd if=/dev/urandom of=new-file-filled-by-urandom.bin bs=40M count=1 iflag=fullblock
Note: When reading from a pipe or a special device file like we just mentioned below to copy a portion of file in a fixed length with count=n option being specified, it is suggested to, or always strongly recommended to add the iflag=fullblock option to the dd command if in case of wiping a portion of device or file.

When reading from a pipe, an alternative to iflag=fullblock is to limit bs to the PIPE_BUF constant value as defined in linux/limits.h to make the pipe(7) I/O atomic. For example, to prepare a text file filled up will random alphanumeric string in total length of 5 mebibytes:

$ LC_ALL=C tr -dc '[:alnum:]' </dev/urandom | dd of=passtext-5m.txt bs=4k count=1280

Since the output file is not a pipe, one may prefer to use ibs and obs options to set block size separately for the (input) pipe and the (output) on-disk file. For example, to set a more efficient block size for output file:

$ LC_ALL=C tr -dc '[:alnum:]' </dev/urandom | dd of=passtext-5m.txt ibs=4k obs=64k count=1280
Tip: In some circumstances, merely keeping the output block size the same the value as the input block size with a value defined by the PIPE_BUF constant may already be optimal.

Total transfered bytes count readout is wrong

The total transferred bytes count readout may be greater than actual if an error is encountered on writing to output (i.e. partial write, caused by e.g. SIGPIPE, faulty medium, or accidentally disconnected the target network block device), like in following proof of concept where the second dd obviously will not read more than 512200 bytes, but the first dd instance still report an inaccurate bytes count 512400 bytes:

$ yes 'x' | dd bs=4096 count=512400B | dd ibs=1 count=512200 status=none >/dev/null
125+1 records in
125+1 records out
512400 bytes (512 kB, 500 KiB) copied, 10.7137 s, 47.8 kB/s

When resuming an interrupted transfer like the above PoC, it is recommended to only rely on the readout of number of whole output blocks already copied, as denoted by the number before "+" sign.

Note: If partial I/O blocks count is greater than 1 even after adding iflag=fullblock option, which means partial I/O occurred more than once. In that case, for reliable resuming your transfer progress, it is suggested to:
  • use ddrescue to rerun the transfer instead, to deal with partial reads on a potential faulty medium more flexibly.
  • use dd_rescue(1) to rerun the transfer with direct I/O, in case of writing to nbd with a poor network connectivity.
  • avoid writing to faulty medium.

See also