This allows the info for a single array to be extracted,
so we don't have to write it into st->subarray.
For consistency, implement container_content for super0 and super1,
to just return the mdinfo for the single array.
Signed-off-by: NeilBrown <neilb@suse.de>
If a device - typically in a mirrored set - is assembled independently
of the other devices, and then attempted to be brought back into the
set it could contain inconsistent data. It should not be included.
So detect this situation by ensuring that the 'most recent' device is
believed to be active by every other device. If a device is wayward,
it will only consider fellow wayward devices to be active and will
think all others are failed or missing.
This patch only fixes --assemble, not --incremental
Signed-off-by: NeilBrown <neilb@suse.de>
To accurately detect when an array has been split and is now being
recombined, we need to track which other devices each thinks is
working.
We should never include a device in an array if it thinks that the
primary device has failed.
This patch just allows get_info_super to return a list of devices
and whether they are thought to be working or not.
Signed-off-by: NeilBrown <neilb@suse.de>
If an --update is requested by the relevant metadata doesn't
understand it, print a useful message rather than silently ignoring
the issue.
Signed-off-by: NeilBrown <neilb@suse.de>
To support incorpating a new bare device into a collection of arrays -
one partition each - mdadm needs a modest understanding of partition
tables.
The main needs to be able to recognise a partition table on one device
and copy it onto another.
This will be done using pseudo metadata types 'mbr' and 'gpt'.
Signed-off-by: NeilBrown <neilb@suse.de>
When we find a device that was recently part of the array but is now
out of date (based on the event count) we might want to add it back in
(like --re-add) if the likely cause was a connection problem or we
might not if the likely cause was device failure.
So make this a policy issue: if action=re-add or better, try to re-add
any device that looks like it might be part of the array.
This applies:
when we assemble the array: old devices will be evicted by the
kernel and need to be re-added.
when we assemble the array during --incr for the same reason.
when we find a device that could be added to a running array.
This doesn't affect arrays with external metadata at all.
For such arrays:
When the container is assembled, the most recent instance of each
device is included without reference to whether it is too old or not.
Then the metadata handler must which slices of which devices to
include in which array and with what state. So the
->container_content should probably check the policy and compare the
sequence numbers/event counts.
When a device is added (--add) to a container with active arrays
we only add as a 'spare'. --re-add doesn't seem to be an option.
When a device is added with -I ->container_content gets another
chance to assess things again. So again it should check the policy.
Signed-off-by: NeilBrown <neilb@suse.de>
commit 1ff9833928
broke the checking of metadata types via the 'auto' line.
Be moving 'load_super" before "conf_test_metadata" we left
tst->sb set even if conf_test_metadata fails, so the device will
actually be accepted and used.
So if we decide to reject the device, free the superblock so it is
clear that it is rejected.
Signed-off-by: NeilBrown <neilb@suse.de>
Found during testing:
- cannot check metadata for homehost before loading metadata.
- As 1.x metadata can has a state 'rebuilding' between
'spare' and 'ok', we need to include that in our calculations.
Signed-off-by: NeilBrown <neilb@suse.de>
If we find we cannot add the requested bitmap file when
assembling the array, then make sure to clean up properly
and don't leave a half-configured array.
Signed-off-by: NeilBrown <neilb@suse.de>
If --assemble is given a container and some other devices to assemble
an array from, it complains with an error because that doesn't make
sense.
However it currently also complains if the list of devices was extract
from the config file rather than being given on the command line.
That is not appropriate.
So add an '&& inargv' test to ensure that we are really complaining
about the right thing.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
1.x metadata allows a device to be a member of the array while it
is still recoverying. So it is a working member, but is not
completely in-sync.
mdadm/assemble does not understand this distinction and assumes that a
work member is fully in-sync for the purpose of determining if there
are enough in-sync devices for the array to be functional.
So collect the 'recovery_start' value from the metadata and use it in
assemble when determining how useful a given device is.
Reported-by: Mikael Abrahamsson <swmike@swm.pp.se>
Signed-off-by: NeilBrown <neilb@suse.de>
Once load_super has succeeded, it should continue to succeed. However
devices can disappear etc so it is prudent to always check the return
status of load_super.
Signed-off-by: NeilBrown <neilb@suse.de>
Previously such things did not exist: ACTIVE and SYNC were either both
set or both clear. Recent changes with reshape means that a device
can be ACTIVE but not yet fully in-sync, so they need to be handled
and included in the array as active devices.
Signed-off-by: NeilBrown <neilb@suse.de>
When disks have conflicting container memberships (same container ids
but incompatible member arrays) --update=uuid can be used to move
offenders to a new container id by changing 'orig_family_num'.
Note that this only supports random updates of the uuid as the actual
uuid is synthesized. We also need to communicate the new
'orig_family_num' value to all disks involved in the update. A new
field 'update_private' is added to struct mdinfo to allow this
information to be transmitted.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If an array name contains a "hostname:" prefix, then
--assemble will tend to leave it there, while --incremental
will strip it off (when chosing a device name during auto-assembly).
Make this more consistent: strip the name off if we decide that
the name will be treated as 'local'. Leave it on if it will be
treated as 'foreign'.
Signed-off-by: NeilBrown <neilb@suse.de>
If mdadm.conf contains
HOMEHOST <ignore>
or commandline contains
--homehost=<ignore>
then the check that array metadata mentions the given homehost is
replace by a check that the name recorded in the metadata is not
already used by some other array mentioned in mdadm.conf.
This allows more arrays to use their native name rather than having
an _NN suffix added.
This should only be used during boot time if all arrays required for
normal boot are listed in mdadm.conf.
If auto-assembly is used to find all array during boot, then the
HOMEHOST feature should be used to ensure there is no room for
confusion in choosing array names, and so it should not be set
to <ignore>.
Signed-off-by: NeilBrown <neilb@suse.de>
For container= and member= to be effective in an mdadm.conf line
they must both be present. So when checking for their absence we
need container != NULL || member != NULL.
Signed-off-by: NeilBrown <neilb@suse.de>
The line 'auto' in mdadm.conf can be used to disable assembly
of specific metadata types, or of all arrays.
This does not affect assembly of arrays listed in mdadm.conf
or on command line.
auto -all
will disable all auto-assembly.
auto -ddf
will cause mdadm to ignore ddf arrays that are not explicitly
mentioned, and auto assemble anything else it finds.
Signed-off-by: NeilBrown <neilb@suse.de>
If an array is created with --homehost=any, then --assemble and
--incremental will treat it as being local to 'this' host, no matter
what the name of this host is.
This is useful for array that will be given unique names and be
moved between machines.
This needs to be documented.
Signed-off-by: NeilBrown <neilb@suse.de>
When building container members with -IR, we need to ensure that
devices added to an active array preserve the 'in_sync' status so they
don't needlessly get rebuilt.
So allow sysfs_add_disk to do this (only works in kernels since
2.6.30) and pass the relevant flag down.
Signed-off-by: NeilBrown <neilb@suse.de>
wait not only for the name to appear, but for it to refer to the
correct device.
Sometimes old symlinks left lying around can be confusing.
Signed-off-by: NeilBrown <neilb@suse.de>
If we are assembling an array in a container and it isn't complete
enough to start yet, then
- don't start mdmon
- don't say the array is started
- don't wait for the device to appear in /dev
Signed-off-by: NeilBrown <neilb@suse.de>
Use mddev_busy() as GET_ARRAY_INFO can succeed on 'clear' arrays.
Ran into this after an encountering a case where mdadm -Ss ended in
segfault (missing check for NULL return from map_by_devnum() in
sles11:Manage.c). So, tried to stop the array by hand with echo clear >
md/array_state, after which I could not reassemble since GET_ARRAY_INFO
was succeeding.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reshape with large chunk size can require a large stripe_cache.
We make this work when starting the reshape but not when
restarting at assemble time. So fix that.
Signed-off-by: NeilBrown <neilb@suse.de>
This is only significant for --assemble --force where some old
devices might be included into the array. If anything looks like
it isn't clean, the kernel will not allow a degraded array to be started.
Signed-off-by: NeilBrown <neilb@suse.de>
If any superblocks in a confused array had an event count of 0,
"mdadm -Af" would not update the event counts to assemble the array.
I don't remember why that text is there, and it has caused at least
one situation to be difficult to recover from. So remove the
test. --force means --force!
Signed-off-by: NeilBrown <neilb@suse.de>
1/ when we choose not to use a device, must set ->used to 2, not 1.
2/ When we give up on a member, clear st and content.
Signed-off-by: NeilBrown <neilb@suse.de>
We probably should pass a flag down saying 'this is auto-assembly',
but for now, if there is no identity information set, it must
be auto-assemble.
Signed-off-by: NeilBrown <neilb@suse.de>
Try to treat members of containers much like other arrays for
assembly.
We still look through the list of devices for a match (it will be
the container), then find the relevant 'info' and try to assemble
the array.
Signed-off-by: NeilBrown <neilb@suse.de>
Attempting to open(O_EXCL) each candidate device usually filters out all
busy raid components. However, containers do not behave like components
and will return container_content that may describe active member
arrays.
This patch just adds a function that will be used to check if a
container member is busy. It will be used shortly.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Factor out, from Incremental_container, the code for assembling an
array based on information extracted from a container. We will
shortly use this from Assemble too.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
In preparation for handling the container case where we may need to handle
a list of potential member arrays.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When a 'container' gets started, we need udev to notice, but the
kernel has no way of knowing that a KOBJ_CHANGE event is needed. So
send one directly via the 'uevent' sysfs attribute.
Also, uevents don't get generated when md arrays are stopped (prior to
2.6.28) so send 'change' events then too.
Signed-off-by: NeilBrown <neilb@suse.de>
We previously only updated /var/run/mdadm/map when starting an
array with --incremental. However we now make more use of
that file (to pass the dev name to udev) so always update it.
Signed-off-by: NeilBrown <neilb@suse.de>
This delays the create_mddev call even further in the case where
an array device name is given for --assemble. It is now delayed
until the 'name' of the array is also available.
We will shortly be feeding more information into the process of
creating array devices, so delay the creation. Still open them
early if the device already exists.
This involves making sure the autof flag is in the right place
so that it can be found at creation time.
Also, Assemble, Build, and Create now always close 'mdfd'.
Signed-off-by: NeilBrown <neilb@suse.de>
This reflect that fact that more often than not it is creating things
in /dev, and allows for a new open_mddev which does just that.
Signed-off-by: NeilBrown <neilb@suse.de>
Given an mdadm.conf like the following allow /dev/imsm and /dev/md/r1 to be
created by "mdadm -As".
DEVICES partitions
ARRAY /dev/imsm metadata=imsm auto=md UUID=b98f5dbe-aa859e7b-0e369b89-a80986d4
ARRAY /dev/md/r1 container=/dev/imsm member=0 auto=mdp UUID=3538e39c-b397c2e9-1aa031f9-2bc0eca4
spares=1
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Raid disk and disk number information is not relevant at the container
level, especially for imsm. So arrange for getinfo_super_imsm() to
always publish devices as spares and report the number of spares at
Assemble() time.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Made the mistake of recompiling the F9 mdadm rpm which has a patch to
remove -Werror and add "-Wp,-D_FORTIFY_SOURCE -O2" which turns on lots
of errors:
config.c:568: warning: ignoring return value of asprintf
Assemble.c:411: warning: ignoring return value of asprintf
Assemble.c:413: warning: ignoring return value of asprintf
super0.c:549: warning: ignoring return value of posix_memalign
super0.c:742: warning: ignoring return value of posix_memalign
super0.c:812: warning: ignoring return value of posix_memalign
super1.c:692: warning: ignoring return value of posix_memalign
super1.c:1039: warning: ignoring return value of posix_memalign
super1.c:1155: warning: ignoring return value of posix_memalign
super-ddf.c:508: warning: ignoring return value of posix_memalign
super-ddf.c:645: warning: ignoring return value of posix_memalign
super-ddf.c:696: warning: ignoring return value of posix_memalign
super-ddf.c:715: warning: ignoring return value of posix_memalign
super-ddf.c:1476: warning: ignoring return value of posix_memalign
super-ddf.c:1603: warning: ignoring return value of posix_memalign
super-ddf.c:1614: warning: ignoring return value of posix_memalign
super-ddf.c:1842: warning: ignoring return value of posix_memalign
super-ddf.c:2013: warning: ignoring return value of posix_memalign
super-ddf.c:2140: warning: ignoring return value of write
super-ddf.c:2143: warning: ignoring return value of write
super-ddf.c:2147: warning: ignoring return value of write
super-ddf.c:2150: warning: ignoring return value of write
super-ddf.c:2162: warning: ignoring return value of write
super-ddf.c:2169: warning: ignoring return value of write
super-ddf.c:2172: warning: ignoring return value of write
super-ddf.c:2176: warning: ignoring return value of write
super-ddf.c:2181: warning: ignoring return value of write
super-ddf.c:2686: warning: ignoring return value of posix_memalign
super-ddf.c:2690: warning: ignoring return value of write
super-ddf.c:3070: warning: ignoring return value of posix_memalign
super-ddf.c:3254: warning: ignoring return value of posix_memalign
bitmap.c:128: warning: ignoring return value of posix_memalign
mdmon.c:94: warning: ignoring return value of write
mdmon.c:221: warning: ignoring return value of pipe
mdmon.c:327: warning: ignoring return value of write
mdmon.c:330: warning: ignoring return value of chdir
mdmon.c:335: warning: ignoring return value of dup
monitor.c:415: warning: rv may be used uninitialized in this function
...some of these like the write() ones are not so trivial so save those
fixes for the next patch.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
For now, this means that the lack of a homehost doesn't always prevent
assembly.
Soon we will allow assembly anyway, but have different messages if
homehost isn't supported.
When we assemble an array, there are three different approaches
depending on whether metadata is internal or external, and on
kernel version.
Move all this to a common helper instead of duplicating in 3 places.
Signed-off-by: NeilBrown <neilb@suse.de>
The variety of approaches to 'add_disk' are factored out into
a separate function, and Incremental mode benefits by being
closer to supporting the assembly of containers.
Also remove the adding-to-array-data-structure out of sysfs_add_disk
and into add_disk.
And add some tests for --incremental mode to make sure we don't break it.
Signed-off-by: NeilBrown <neilb@suse.de>
Since we made free_super a superswitch call, we need to be careful
that st is non NULL before calling st->ss->free_super(st).
Also when updating byteorder there is a chance of a similar NULL
deref.
I want the metadata handler to have more control over the 'version',
particularly for arrays which are members of containers.
So discard st->text_version and instead use info->text_version
which getinfo_super can initialise.
If you have stacked arrays, then
mdadm -As --homehost=fred
should work but doesn't. It gets into an infinite loop!
So write some tests, and fix the bugs.
If two drives in a raid5 disappear at the same time, then "-Af"
will add them both in rather than just one and forcing the array
to 'clean'. This is slightly safer in some cases.
If an auto-assembly attempt failes because the array cannot be
opened or because the array has already been created, then we
get into an infinite loop.
Reported-by: Dan Pascu <dan@ag-projects.com>
Fixes-debian-bug: 396582
If they are for a partition and a whole device (common case)
they old message doesn't really cover the situation. So add
the "overlap" option to the text.
Also detect whether the device list was in mdadm.conf and
act accordingly.
Make -assemble a bit more resilient to finding strange
information in superblocks.
Don't claim newly added spares are InSync!! (don't know why that
code was ever in there)
As version-0.90 superblock don't record the superblock
offset, it is possible for overlapping partitions,
or a partition that starts on a 64K boundary in the whole device
to result in mis-detection - one partition or device might
be detected where the other was intended.
To avoid this awkward possibility, we reject assembly attempts
which seem to have two devices that are different but have the
same version-0.90 superblock.
To avoid this problem altogether, switch to version-1 metadata.
Signed-off-by: Neil Brown <neilb@suse.de>
From: Luca Berra <bluca@vodka.it>
glibc 2.4 is pedantic on ignoring return values from fprintf, fwrite and
write, so now we check the rval and actually do something with it.
in the Grow.c case i only print a warning, since i don't think we can do
anything in case we fail invalidating those superblocks (is should never
happen, but then...)
Signed-off-by: Neil Brown <neilb@suse.de>
This can be used to bootstrape homehost tagging.
If no arrays are found that are tagged, we look for any array
and tag it.
Signed-off-by: Neil Brown <neilb@suse.de>
This cannot be used yet, but it is working towards auto-assembly.
When auto-assembling an array, we make a name in /dev/md/
giving a number (from the peferred minor) or name (from set-name).
Signed-off-by: Neil Brown <neilb@suse.de>
i.e. if assembling with --name or --super-minor, then if we find two
different arrays with the same apparent identity, and one was built
for 'this' host, then prefer that one instead of giving up in disgust.
Signed-off-by: Neil Brown <neilb@suse.de>
We make sure all devices can are consistent before doing any --update
This saves us from updating some but not all of an array, and then
aborting.
It also means we can backtrack on out decisions, which we might want to
do later.
Signed-off-by: Neil Brown <neilb@suse.de>
Use to avoid starting arrays if there are
fewer devices available than last time the array was started.
This is only needed with --scan, as with --scan, that behaviour
is the default.
Signed-off-by: Neil Brown <neilb@suse.de>
To support resizing an array without a spare, mdadm now understands
--backup-file=
which should point to a file for storing a backup of critical data.
This can be given to --grow which will create the file, or
--assemble which will restore from the file if needed.
Signed-off-by: Neil Brown <neilb@suse.de>
As spared don't have a position in the raid array with verion-1 superblocks,
we need to handle them a bit differently.
Signed-off-by: Neil Brown <neilb@suse.de>
This includes:
adding --metadata= option to choose metadata format
adding metadata= word to config file.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>