165 lines
5.4 KiB
Groff
165 lines
5.4 KiB
Groff
.\" See file COPYING in distribution for details.
|
|
.TH MDMON 8 "" v3.1
|
|
.SH NAME
|
|
mdmon \- monitor MD external metadata arrays
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.BI mdmon " CONTAINER [NEWROOT]"
|
|
|
|
.SH OVERVIEW
|
|
The 2.6.27 kernel brings the ability to support external metadata arrays.
|
|
External metadata implies that user space handles all updates to the metadata.
|
|
The kernel's responsibility is to notify user space when a "metadata event"
|
|
occurs, like disk failures and clean-to-dirty transitions. The kernel, in
|
|
important cases, waits for user space to take action on these notifications.
|
|
|
|
.SH DESCRIPTION
|
|
.SS Metadata updates:
|
|
To service metadata update requests a daemon,
|
|
.IR mdmon ,
|
|
is introduced.
|
|
.I Mdmon
|
|
is tasked with polling the sysfs namespace looking for changes in
|
|
.BR array_state ,
|
|
.BR sync_action ,
|
|
and per disk
|
|
.BR state
|
|
attributes. When a change is detected it calls a per metadata type
|
|
handler to make modifications to the metadata. The following actions
|
|
are taken:
|
|
.RS
|
|
.TP
|
|
.B array_state \- inactive
|
|
Clear the dirty bit for the volume and let the array be stopped
|
|
.TP
|
|
.B array_state \- write pending
|
|
Set the dirty bit for the array and then set
|
|
.B array_state
|
|
to
|
|
.BR active .
|
|
Writes
|
|
are blocked until userspace writes
|
|
.BR active.
|
|
.TP
|
|
.B array_state \- active-idle
|
|
The safe mode timer has expired so set array state to clean to block writes to the array
|
|
.TP
|
|
.B array_state \- clean
|
|
Clear the dirty bit for the volume
|
|
.TP
|
|
.B array_state \- read-only
|
|
This is the initial state that all arrays start at.
|
|
.I mdmon
|
|
takes one of the three actions:
|
|
.RS
|
|
.TP
|
|
1/
|
|
Transition the array to read-auto keeping the dirty bit clear if the metadata
|
|
handler determines that the array does not need resyncing or other modification
|
|
.TP
|
|
2/
|
|
Transition the array to active if the metadata handler determines a resync or
|
|
some other manipulation is necessary
|
|
.TP
|
|
3/
|
|
Leave the array read\-only if the volume is marked to not be monitored; for
|
|
example, the metadata version has been set to "external:\-dev/md127" instead of
|
|
"external:/dev/md127"
|
|
.RE
|
|
.TP
|
|
.B sync_action \- resync\-to\-idle
|
|
Notify the metadata handler that a resync may have completed. If a resync
|
|
process is idled before it completes this event allows the metadata handler to
|
|
checkpoint resync.
|
|
.TP
|
|
.B sync_action \- recover\-to\-idle
|
|
A spare may have completed rebuilding so tell the metadata handler about the
|
|
state of each disk. This is the metadata handler's opportunity to clear
|
|
any "out-of-sync" bits and clear the volume's degraded status. If a recovery
|
|
process is idled before it completes this event allows the metadata handler to
|
|
checkpoint recovery.
|
|
.TP
|
|
.B <disk>/state \- faulty
|
|
A disk failure kicks off a series of events. First, notify the metadata
|
|
handler that a disk has failed, and then notify the kernel that it can unblock
|
|
writes that were dependent on this disk. After unblocking the kernel this disk
|
|
is set to be removed+ from the member array. Finally the disk is marked failed
|
|
in all other member arrays in the container.
|
|
.IP
|
|
+ Note This behavior differs slightly from native MD arrays where
|
|
removal is reserved for a
|
|
.B mdadm --remove
|
|
event. In the external metadata case the container holds the final
|
|
reference on a block device and a
|
|
.B mdadm --remove <container> <victim>
|
|
call is still required.
|
|
.RE
|
|
|
|
.SS Containers:
|
|
.P
|
|
External metadata formats, like DDF, differ from the native MD metadata
|
|
formats in that they define a set of disks and a series of sub-arrays
|
|
within those disks. MD metadata in comparison defines a 1:1
|
|
relationship between a set of block devices and a raid array. For
|
|
example to create 2 arrays at different raid levels on a single
|
|
set of disks, MD metadata requires the disks be partitioned and then
|
|
each array can created be created with a subset of those partitions. The
|
|
supported external formats perform this disk carving internally.
|
|
.P
|
|
Container devices simply hold references to all member disks and allow
|
|
tools like
|
|
.I mdmon
|
|
to determine which active arrays belong to which
|
|
container. Some array management commands like disk removal and disk
|
|
add are now only valid at the container level. Attempts to perform
|
|
these actions on member arrays are blocked with error messages like:
|
|
.IP
|
|
"mdadm: Cannot remove disks from a \'member\' array, perform this
|
|
operation on the parent container"
|
|
.P
|
|
Containers are identified in /proc/mdstat with a metadata version string
|
|
"external:<metadata name>". Member devices are identified by
|
|
"external:/<container device>/<member index>", or "external:-<container
|
|
device>/<member index>" if the array is to remain readonly.
|
|
|
|
.SH OPTIONS
|
|
.TP
|
|
CONTAINER
|
|
The
|
|
.B container
|
|
device to monitor. It can be a full path like /dev/md/container, a simple md
|
|
device name like md127, or /proc/mdstat which tells
|
|
.I mdmon
|
|
to scan for containers and launch an
|
|
.I mdmon
|
|
instance for each one found.
|
|
.TP
|
|
[NEWROOT]
|
|
In order to support an external metadata raid array as the rootfs
|
|
.I mdmon
|
|
needs to be started in the initramfs environment. Once the initramfs
|
|
environment mounts the final rootfs
|
|
.I mdmon
|
|
needs to be restarted in the new namespace. When NEWROOT is specified
|
|
.I mdmon
|
|
will terminate any
|
|
.I mdmon
|
|
instances that are running in the current namespace,
|
|
.IR chroot (2)
|
|
to NEWROOT, and continue monitoring the container.
|
|
.PP
|
|
Note that
|
|
.I mdmon
|
|
is automatically started by
|
|
.I mdadm
|
|
when needed and so does not need to be considered when working with
|
|
RAID arrays. The only times it is run other that by
|
|
.I mdadm
|
|
is when the boot scripts need to restart it after mounting the new
|
|
root filesystem.
|
|
|
|
.SH SEE ALSO
|
|
.IR mdadm (8),
|
|
.IR md (4).
|