If the metadata handler can not find its platform support components
then there is no way for it to verify that the raid configuration will
be supported by the option-rom. Provide a generic method for metadata
handlers to warn the user that the array they are about to create may
not work as intended with a given platform.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Let handlers specifiy their own defaults, specifically needed for the
imsm-raid5 case where mdadm defaults to 'ls' and imsm to 'la'.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Metadata formats like imsm work in concert with platform firmware and
hardware, so provide a way for mdadm to display this info to the user.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If, when creating an array, a signal target device is given which
is a container, then allow the metadata handler to choose which
devices to use.
This is currently only supported for DDF.
Signed-off-by: NeilBrown <neilb@suse.de>
Now that names in /dev are usually created (eventually) by udev,
it isn't really safe to rely in finding a name in /dev to pass to
mdmon to identify which array to monitor.
And it isn't really necessary to have a name in /dev.
So just pass the symbolic name, e.g. md127 or md123.
Change util.c to pass that name, and change mdmon to process the
name sensibly.
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible for the mapfile to become wrong, and that gets
very confusing. So validate entries before returning them.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than appending the md minor number, we now append a small
sequence number to make sure name in /dev/md/ that aren't LOCAL are
unique. As the map file is locked while we do this, we are sure
of no losing any races.
Signed-off-by: NeilBrown <neilb@suse.de>
Try to treat members of containers much like other arrays for
assembly.
We still look through the list of devices for a match (it will be
the container), then find the relevant 'info' and try to assemble
the array.
Signed-off-by: NeilBrown <neilb@suse.de>
Factor out, from Incremental_container, the code for assembling an
array based on information extracted from a container. We will
shortly use this from Assemble too.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This only applies if udev isn't installed or is disabled
by MDADM_NO_UDEV
We try to remove partitions too.
We find names to remove by looking in /var/run/mdadm/map
Signed-off-by: NeilBrown <neilb@suse.de>
When a 'container' gets started, we need udev to notice, but the
kernel has no way of knowing that a KOBJ_CHANGE event is needed. So
send one directly via the 'uevent' sysfs attribute.
Also, uevents don't get generated when md arrays are stopped (prior to
2.6.28) so send 'change' events then too.
Signed-off-by: NeilBrown <neilb@suse.de>
When mdadm.conf is automatically generated, we might not know a
suitable /dev/name. But we do know the uuid of the container.
So allow that as an option.
Signed-off-by: NeilBrown <neilb@suse.de>
Change the "env_check_mdmon" function to be more generic, accepting
and environment variable name, as soon we will have a new use for it.
Signed-off-by: NeilBrown <neilb@suse.de>
We will shortly be feeding more information into the process of
creating array devices, so delay the creation. Still open them
early if the device already exists.
This involves making sure the autof flag is in the right place
so that it can be found at creation time.
Also, Assemble, Build, and Create now always close 'mdfd'.
Signed-off-by: NeilBrown <neilb@suse.de>
This reflect that fact that more often than not it is creating things
in /dev, and allows for a new open_mddev which does just that.
Signed-off-by: NeilBrown <neilb@suse.de>
Given an mdadm.conf like the following allow /dev/imsm and /dev/md/r1 to be
created by "mdadm -As".
DEVICES partitions
ARRAY /dev/imsm metadata=imsm auto=md UUID=b98f5dbe-aa859e7b-0e369b89-a80986d4
ARRAY /dev/md/r1 container=/dev/imsm member=0 auto=mdp UUID=3538e39c-b397c2e9-1aa031f9-2bc0eca4
spares=1
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Previously it was possible to set the WRITEMOSTLY flag when
adding a device to an array, but not to clear the flag when re-adding.
This is now possible with --readwrite.
Signed-off-by: NeilBrown <neilb@suse.de>
The uuid returned for an imsm spare device will never match the uuid of an
active disk. So make mdadm interpret a uuid of all f's as "match any".
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When arrays do not startup correctly it would be nice to know why. Need
to move the dprintf definition to mdadm.h
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Showing e.g.
near=1, far=2
for the 'far2' layout of raid10 is confusing even though there is a
sense in which is it correct.
Make it less confusing by only printing whichever number is not 1.
If both are 1, make that clear too (i.e. no redundancy).
When adding a device to an array, we check that it is large enough.
Currently the check makes sure there is also room for a reasonably
sized bitmap. But if the array doesn't have a bitmap, then this test
might be too restrictive.
So when adding, only insist there is enough space for the current
bitmap.
When Creating, still require room for the standard sized bitmap.
This resolved Debian Bug 500309
For now, this means that the lack of a homehost doesn't always prevent
assembly.
Soon we will allow assembly anyway, but have different messages if
homehost isn't supported.
mdadm -I /dev/part-of-container
should add that to a container, creating if it needed,
and then try to assemble any arrays in the container.
Signed-off-by: NeilBrown <neilb@suse.de>
When we assemble an array, there are three different approaches
depending on whether metadata is internal or external, and on
kernel version.
Move all this to a common helper instead of duplicating in 3 places.
Signed-off-by: NeilBrown <neilb@suse.de>
The variety of approaches to 'add_disk' are factored out into
a separate function, and Incremental mode benefits by being
closer to supporting the assembly of containers.
Also remove the adding-to-array-data-structure out of sysfs_add_disk
and into add_disk.
And add some tests for --incremental mode to make sure we don't break it.
Signed-off-by: NeilBrown <neilb@suse.de>
Make --detail on a container more useful by suppressing irrelevant
detail and adding useful detail like a list of member arrays.
Ditto for members of a container: report the name of the container
array.
Signed-off-by: NeilBrown <neilb@suse.de>
For use in distro shutdown scripts with a RAID root file system.
Returns immediately if the array is 'readonly', or not an externally
managed array. It is up to the distro's scripts to make sure no new
writes hit the device after this returns 'true'.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The action we are waiting for may not be complete until the monitor has
had a chance to take action on the result.
The following script can now remove the device on the first attempt,
versus a few attempts with the original Wait():
#!/bin/bash
#export MDADM_NO_MDMON=1
export IMSM_DEVNAME_AS_SERIAL=1
./mdadm -Ss
./mdadm --zero-superblock /dev/loop[0-3]
echo 2 > /proc/sys/dev/raid/speed_limit_max
./mdadm --create /dev/imsm /dev/loop[0-3] -n 4 -e imsm -a md
./mdadm --create /dev/md/r1 /dev/loop[0-3] -n 4 -l 5 --force -a mdp
./mdadm --fail /dev/md/r1 /dev/loop3
./mdadm --wait /dev/md/r1
x=0
while ! ./mdadm --remove /dev/imsm /dev/loop3 > /dev/null 2>&1
do
x=$((x+1))
done
echo "removed after $x attempts"
./mdadm --add /dev/imsm /dev/loop3
Include 2 small cleanups:
* remove the almost open coded fd2devnum() in Wait() by introducing a
new utility routine stat2devnum()
* teach connect_monitor() to parse the container device from a subarray
string
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If the metadata_version is
-mdXXX/whatever
rather than
/mdXXX/whatever
then the array is readonly and should be left alone by mdmon.
Signed-off-by: NeilBrown <neilb@suse.de>
We are about to change the syntax of the version string
for 'subarray's. So factor out the test into a single function.
Signed-off-by: NeilBrown <neilb@suse.de>
When we first start an array, it might be good to start recovery
straight away. That requires setting the array to 'dirty', but
only the metadata handler can know if that is required or not.
So have a third possible 'consistent' option to set_array_state.
Either 'no' or 'yes' or 'you choose'.
Return value indicates what was chosen.
'1' (no) should be chosen unless there is a good reason.
Signed-off-by: NeilBrown <neilb@suse.de>
Transition readauto arrays to active before failing drives.
Hmm... why do we keep reblocking / renotifying in the readonly case?
Need to bottom out on this, but not right now.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The cmd_filter patch merged for 2.6.27 broke retrieving the serial
number via an ioctl to /dev/sgN. In debugging this I found that other
utilities like sdparm simply run the ioctl on /dev/sdX. So just convert
to that for protection in numbers, but scream on the mailing list for
the inconvenience grr...
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Using buffered IO risks non-atomic updates to parts of the
device that we don't actually want to write to. This isn't in
general safe.
So switch to O_DIRECT for all that IO and make sure we have
properly aligned buffers.
The returned value was never used, and we don't really want
this return path anyway as writing to a pipe could conceivably
block, and the monitor must not block.
This really should be done in mdadm, not mdmon.
We ensure the device won't be suddenly commited as a hot-spare
using O_EXCL, then check the 'holders' sysfs directory
to make sure it is only in use once.
When loading the metadata for a subarray (super_by_fd), we set
->subarray to be the name read from md/metadata_version so that
getinfo_super can return info about the correct array.
With this we can differentiate between a container and
an array within the container by looking at ->subarray[0].
Only one superswitch should be externally visible for each
general type. Others which handle different flavours
(e.g. container/data-array) should be internal only.
Code in manager can now just call queue_metadata_update with a
(freeable) buf holding the update, and it will get passed to the
monitor and written out.
'container_member' isn't really a well defined concept.
Each metadata might enumerate members differently, so just
let each format /mdX/YYYY as appropriate.
I want the metadata handler to have more control over the 'version',
particularly for arrays which are members of containers.
So discard st->text_version and instead use info->text_version
which getinfo_super can initialise.
mark_dirty is just a special case of mark_clean - with sync_pos == 0.
mark_sync is not required. We don't modify the metadata when sync
finishes. Only when the array becomes non-writeable at which point we
use mark_clean to record how far the resync progressed.
From: Dan Williams <dan.j.williams@intel.com>
Each md_message encapsulates a single command. A command includes an 'action'
member which describes what if any data comes after the action. Communication
with the monitor involves updating the active_cmd pointer and then writing to
mgr_pipe. Pass/fail status is returned via mon_pipe.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
Added curr_state as a parameter to set_disk. Handlers look at this to
record components failures, and set global 'degraded' or 'failed'
status.
When reading the state as faulty:
1/ mark the disk failed in the metadata
2/ write '-blocked' to the rdev state to allow the kernel's failure
mechanism to advance
3/ the kernel will take away the drive's role in remove_and_add_spares()
4/ once the disk no longer has a role writing 'remove' to the rdev state
will get the disk out of array.
There is a window after writing '-blocked' where the kernel will return
-EBUSY to remove requests. We rely on the fact that the disk will
continue to show faulty so we lazily wait until the kernel is ready to
remove the disk. If the manager thread needs to get the disk out of the
way it can ping the monitor and wait, just like the replace_array()
case.
[buglet fix: swap the parameters of attr_match in read_dev_state]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
1/ Block attempts to add/remove devices from container members
2/ Forward add/remove requests to containers
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
Metadata handlers set mdinfo.resync_start depending on the state of the
array. By default mdadm assumes the array is dirty and needs a full
resync.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
The following now work:
--examine
--examine --brief
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Create a ddf array by naming the device /dev/ddf* or
specifying metadata 'ddf'.
If ddf is specified with no level, assume a container (indeed,
anything else would be wrong).
**Need to use text_Version to set external metadata...
More ddf support
Load a ddf container. Now
--examine /dev/ddf
works.
super-ddf: fix compile warning
From: Dan Williams <dan.j.williams@intel.com>
super-ddf.c:723: format %lu expects type long unsigned int, but argument 3 has type unsigned int
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The current model for creating arrays involves writing
a superblock to each device in the array.
With containers (as with DDF), that model doesn't work.
Every device in the container may need to be updated
for an array made from just some the devices in a container.
So instead of calling write_init_super for each device,
we call it once for the array and have it iterate over
all the devices in the array.
To help with this, ->add_to_super now passes in an 'fd' and name for
the device. These get saved for use by write_init_super. So
add_to_super takes ownership of the fd, and write_init_super will
close it.
This information is stored in the new 'info' field of supertype.
As part of this, write_init_super now removes any old traces of raid
metadata rather than doing this in common code.
klibc still misses a lot functionality to let mdadm link against,
this small step helps to get to the real trouble.. :)
Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: Neil Brown <neilb@suse.de>
From: Doug Ledford <dledford@redhat.com>
This one fixes a bug where once manage mode is set, the -a short option
is no longer parsed correctly (true of grow mode as well). This happens
because when you switch the short opts to the bitmap_auto version, it
specifies that the argument must follow a, yet the loop expects to get
an undecorated option and parse it as the disk dev instead of trying to
parse optarg. So, create a new short opt array that is used for manage
and grow that doesn't list a as having an argument.
udev likes to get information about a device as key=value pairs so it
can create disk/by-id links etc. So add --export flag which causes
the output of --detail to easily parsable.
From: Kay Sievers <kay.sievers@novell.com>
Depending on the size of the array we reserve space for up to 128K
of bitmap, and we use it where possible.
When hot-adding to a version 1.0 we can still only use the 3K at the
end though - need a sysfs interface to improve that.
If a small chunksize is requested on Create, we don't auto-enlarge
the reserved space - this still needs to be fixed.
If not 'ftw' is available, still allow openning of devices by dev number.
More recent version of uclibc support nftw, so add support to check
for that.
From: Luca Berra <bluca@vodka.it>
glibc 2.4 is pedantic on ignoring return values from fprintf, fwrite and
write, so now we check the rval and actually do something with it.
in the Grow.c case i only print a warning, since i don't think we can do
anything in case we fail invalidating those superblocks (is should never
happen, but then...)
Signed-off-by: Neil Brown <neilb@suse.de>
glibc says of rand
Do not use this function in applications intended to be
portable when good randomness is needed.
Go figure...
Signed-off-by: Neil Brown <neilb@suse.de>
pass CFLAGS to mdassemble build, enabling -Wall -Werror showed some
issues also fixed by the patch.
From: Luca Berra <bluca@vodka.it>
Signed-off-by: Neil Brown <neilb@suse.de>
This can be used to bootstrape homehost tagging.
If no arrays are found that are tagged, we look for any array
and tag it.
Signed-off-by: Neil Brown <neilb@suse.de>
i.e. if assembling with --name or --super-minor, then if we find two
different arrays with the same apparent identity, and one was built
for 'this' host, then prefer that one instead of giving up in disgust.
Signed-off-by: Neil Brown <neilb@suse.de>
We make sure all devices can are consistent before doing any --update
This saves us from updating some but not all of an array, and then
aborting.
It also means we can backtrack on out decisions, which we might want to
do later.
Signed-off-by: Neil Brown <neilb@suse.de>
When an array is created, if the homehost is know,
the superblock gets it, either in the uuid, (via sha1)
or in the name field.
Signed-off-by: Neil Brown <neilb@suse.de>
So when you say auto=md or auto=part in mdadm.conf, it give a preference
for type of array, but standard name will override.
But --auto=md is more insistant.
FIXME I'm not at all happy about handling of names that already exist.
I don't think that should be removed if the device is active.
Signed-off-by: Neil Brown <neilb@suse.de>
A pending patch to the kernel causes bitmap file updates
to not go through the page cache, so O_DIRECT is needed to
ensure that we read current data.
Signed-off-by: Neil Brown <neilb@suse.de>
To support resizing an array without a spare, mdadm now understands
--backup-file=
which should point to a file for storing a backup of critical data.
This can be given to --grow which will create the file, or
--assemble which will restore from the file if needed.
Signed-off-by: Neil Brown <neilb@suse.de>
- report Intent Bitmap in --detail
- report internal bitmap in --examine
- pass' --force through to --grow --bitmap
- support v.large arrays in --grow --bitmap
Signed-off-by: Neil Brown <neilb@suse.de>
clean up 'long long' usage for size of array, so that
with v-1 superblocks a raid1 larger than 2TB is possible.
Signed-off-by: Neil Brown <neilb@suse.de>
From: ross@jose.lug.udel.edu (Ross Vandegrift)
Hi Neil,
While adding the text message mode, I saw a FIXME asking for syslog
support in monitor mode.
This patch adds exactly that.
Signed-off-by: Neil Brown <neilb@suse.de>
Support "--build"ing arrays with bitmaps.
hot-removal of bitmaps
--re-add of drives recently removed.
assorted extra tests
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Currently this includes
--write-behind to set level of write-behind supported
--write-mostly to flag devices as write-mostly.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
This includes:
adding --metadata= option to choose metadata format
adding metadata= word to config file.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>