For containers, it is always appropriate to include a device in the
container.
Whether it should then be included in an array is a separate question.
Signed-off-by: NeilBrown <neilb@suse.de>
By default Incremental places all imsm spares in separate container
with uuid=0:0:0:0. (patch giving spares uuid_zero needed)
When we find enough members to start an array
we are able to determine domain so we search spare container
for suitable spares and move them to the container that
is currently assembled.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
mdinfo read with sysfs_read do not contain information about the space
needed to store data of all volumes created in that container, so that
spare can be used as replacement for existing subarrays in the future.
Signed-off-by: NeilBrown <neilb@suse.de>
If lost disk was the only one that belonged to particular domain, array
won't match with that domain any longer. We can achieve this by moving
domain check below the 'target' test.
Signed-off-by: NeilBrown <neilb@suse.de>
spare-same-slot allows re-adding of missing array member with disk
re-inserted into the same slot where previous member was plugged in.
If in the meantime another spare has been used for recovery, same slot
cookie should be ignored.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Container degradation here is defined as the number of failed disks in
mostly degraded sub-array. This number is used as value for
array.failed_disks and used in comparison to find best match.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We need to refuse to assemble an arrays with bad blocks.
Initially there was condition in container_content function
that returns error value in the case when metadata store information
about bad blocks.
When the container_content function is called from functions NOT connected
with assemble (Kill_subarray, Detail) we get faulty error return value.
Patch introduces new flag in array.status - MD_SB_BBM_ERRORS. It is set
in container_content when bad blocks are detected and can be checked by
container_content caller.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
return value should remain the same as result of Manage_Subdevs (last
significant operation). Right now it is inverted what results in
error status for successful operation.
Signed-off-by: NeilBrown <neilb@suse.de>
After incremental has added spare, monitor should be woken up in order
to see if anything has changed. If mdmon is not waken up, recovery do not
start.
Signed-off-by: NeilBrown <neilb@suse.de>
This is useful with 1.1 and 1.2 metadata to update the metadata if
the device size has changed.
The same functionality can be achieved by writing to the device size
in sysfs after re-adding normally, but in some cases this might be
easier.
Signed-off-by: NeilBrown <neilb@suse.de>
counterpart of 417f346ee0 for incremental.
If md device has metadata_version="none" super_by_fd() matches
supertype=super0.
Call of load_container() dereferences null, so we have to forbid it.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If a devices - typically in a mirrored set - is assembled
independently of the other devices, and then attempted to be brought
back into the set, it could contain inconsistent data. It should not
be included.
So detect this situation by ensuring that the 'most recent' device is
believed to be active by every other device. If a device is wayward,
it will only consider fellow wayward devices to be active and will
think all others are failed or missing.
This patches fixes --incremental, --assemble was done in an earlier
patch.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than calling sysfs_read whenever we want data from sysfs, call
it once at the start will all the requests of interest, then just use
that,
Make sure we free it properly too.
Signed-off-by: NeilBrown <neilb@suse.de>
When "mdadm -I" is given a device with no metadata, mdadm tries to add
it as a 'spare' somewhere based on policy.
This patch changes the behaviour in two ways:
1/ If the device is at a 'path' where a previous device was removed
from an array or container, then we preferentially add the spare to
that array or container.
2/ Previously only 'bare' devices were considered for adding as
spares. Now if action=spare-same-slot is active, we will add
non-bare devices, but *only* if the path was previously in use
for some array, and the device will only be added to that array.
Based on code
From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The current act_spare tests only test if it is allowed for some
metadata.
As we check each array or partitioning type, we need to double-check
that sparing is allowed for that array or partitioning type.
Signed-off-by: NeilBrown <neilb@suse.de>
We only want to try partition_try_spare if array_try_spare failed.
If it succeeded, there is nothing more to try.
Signed-off-by: NeilBrown <neilb@suse.de>
Instead of open coding (and using horrible gotos), make this
a separate function.
Also fix the check for end of device - SEEK_END doesn't work on
block devices.
Signed-off-by: NeilBrown <neilb@suse.de>
If the disk is taken out from its port this port information is
lost. Only udev rule can provide us with this information, and then we
have to store it somehow. This patch adds writing 'cookie' file in
/dev/.mdadm/failed-slots directory in form of file named with value of
f<path-id> containing the metadata type and uuid of the array (or
container) that the device was a member of. The uuid is in exactly
the same format as in the mapfile.
FAILED_SLOTS_DIR constant has been added to hold the location of
cookie files.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When we -I -R a device in a container, we must first fail it
from each member array before we can remove it from the container.
Signed-off-by: NeilBrown <neilb@suse.de>
<path-id> allows to identify the port to which given device is plugged
in. In case of hot-removal, udev can pass this information for future
use (eg. write this name as 'cookie' allowing to detect the fact of
reinserting device to the same port).
--path <path-id> parameter has been added to device removal handle
(and char *path has been added to IncrementalRemove() to pass this
value) in order to pass path-id to this handler.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Remove the _t pointer typedef and remove the _s suffix for the
structure,
These things do not help readability.
Signed-off-by: NeilBrown <neilb@suse.de>
We more clearly separate out -I on a container, and use
load_container in that case and load_super only for true members.
This removes another use of loaded_container.
Signed-off-by: NeilBrown <neilb@suse.de>
This allows the info for a single array to be extracted,
so we don't have to write it into st->subarray.
For consistency, implement container_content for super0 and super1,
to just return the mdinfo for the single array.
Signed-off-by: NeilBrown <neilb@suse.de>
If the first device found has a much smaller event count than a
subsequent device, that device will not be entered in the 'avail'
array properly.
Signed-off-by: NeilBrown <neilb@suse.de>
To accurately detect when an array has been split and is now being
recombined, we need to track which other devices each thinks is
working.
We should never include a device in an array if it thinks that the
primary device has failed.
This patch just allows get_info_super to return a list of devices
and whether they are thought to be working or not.
Signed-off-by: NeilBrown <neilb@suse.de>
If a device is bare and policy suggests that it can be used as a spare
for virtual 'partitions' array, find an appropriate partition table
and write it to the device.
Signed-off-by: NeilBrown <neilb@suse.de>
Adding a spare to a group of partitioned devices is quite different
from adding one to an array. So detect which option is worth trying
based on policy and then try one or the other - or possibly both - as
appropriate.
Signed-off-by: NeilBrown <neilb@suse.de>
To support incorpating a new bare device into a collection of arrays -
one partition each - mdadm needs a modest understanding of partition
tables.
The main needs to be able to recognise a partition table on one device
and copy it onto another.
This will be done using pseudo metadata types 'mbr' and 'gpt'.
Signed-off-by: NeilBrown <neilb@suse.de>
If policy allows act_spare or act_force_spare, -I will add a
bare device as a spare to an appropriate array.
We don't support adding non-bare devices as spares yet.
Signed-off-by: NeilBrown <neilb@suse.de>
When we find a device that was recently part of the array but is now
out of date (based on the event count) we might want to add it back in
(like --re-add) if the likely cause was a connection problem or we
might not if the likely cause was device failure.
So make this a policy issue: if action=re-add or better, try to re-add
any device that looks like it might be part of the array.
This applies:
when we assemble the array: old devices will be evicted by the
kernel and need to be re-added.
when we assemble the array during --incr for the same reason.
when we find a device that could be added to a running array.
This doesn't affect arrays with external metadata at all.
For such arrays:
When the container is assembled, the most recent instance of each
device is included without reference to whether it is too old or not.
Then the metadata handler must which slices of which devices to
include in which array and with what state. So the
->container_content should probably check the policy and compare the
sequence numbers/event counts.
When a device is added (--add) to a container with active arrays
we only add as a 'spare'. --re-add doesn't seem to be an option.
When a device is added with -I ->container_content gets another
chance to assess things again. So again it should check the policy.
Signed-off-by: NeilBrown <neilb@suse.de>
Commit 3a6ec29ad5 stopped us from adding apparently-working devices
to an active array with --incremental as there is a good chance that they
are actually old/failed devices.
Unfortunately it also stopped spares from being added to an active
array, which is wrong. This patch refines the test to be more
careful.
Reported-by: <fibreraid@gmail.com>
Analysed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Commit 97b4d0e9 "Incremental: honor an 'enough' flag from external
handlers" introduced a regression in that it changed the error return
code for successful invocations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Ignacy Kasperowicz <ignacy.kasperowicz@intel.com>
GET_ARRAY_INFO always succeeds on an inactive container, so we need to
be a bit more diligent about adding a disk to an active container.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
--test can be given in Manage mode.
This can be used when there is an attempt to fail or remove 'faulty',
'failed' or 'detached' devices, or to re-add 'missing' devices.
If no devices were failed, removed, or re-added, then mdadm will
exit with status '2'.
Signed-off-by: NeilBrown <neilb@suse.de>
Adding devices to active arrays in --incremental is a bit dubious.
Normally the array won't be activated until all expected devices are
present, so this situation would mean that the given device is not
expected, so is probably failed. In that case it should only be added
by explicit sysadmin request.
However if --run was given, then quite possibly the array was
assembled earlier when not complete, so it is less clear whether it is
wrong to add this device or not. In that case add it as that is
generally safest.
It would be nice to allow policy for this to be explicitly given by
sysadmin.
Signed-off-by: NeilBrown <neilb@suse.de>
This can be used for hot-unplug. When a device has been remove,
udev can call
mdadm --incremental --fail sda
and mdadm will find the array holding sda and remove sda from
the array.
Based on code from Doug Ledford <dledford@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
...i.e. GET_DEVS == (GET_DEVS|SKIP_GONE_DEVS)
A null pointer dereference in Incremental.c can be triggered by
replugging a disk while the old name is in use. When mdadm -I is called
on the new disk we fail the call to sysfs_read(). I audited all the
locations that use GET_DEVS and it appears they can tolerate missing a
drive. So just make SKIP_GONE_DEVS the default behaviour.
Also fix up remaining unchecked usages of the sysfs_read() return value.
Reported-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This reverts commit fdb482f99b.
Now that containers can report state for ->container_enough we can
automatically determine when the array can be started, and no longer
need the --no-degraded hammer.
Conflicts:
Incremental.c
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This is needed for imsm where:
1/ we want to report raid_disks as zero to allow mdadm -As to
incorporate all spares
2/ we can't determine stale disks by looking at the event counts.
3/ we can't see per-subarray expectations with the info returned from
the container level ->getinfo_super()
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If an array name contains a "hostname:" prefix, then
--assemble will tend to leave it there, while --incremental
will strip it off (when chosing a device name during auto-assembly).
Make this more consistent: strip the name off if we decide that
the name will be treated as 'local'. Leave it on if it will be
treated as 'foreign'.
Signed-off-by: NeilBrown <neilb@suse.de>
If mdadm.conf contains
HOMEHOST <ignore>
or commandline contains
--homehost=<ignore>
then the check that array metadata mentions the given homehost is
replace by a check that the name recorded in the metadata is not
already used by some other array mentioned in mdadm.conf.
This allows more arrays to use their native name rather than having
an _NN suffix added.
This should only be used during boot time if all arrays required for
normal boot are listed in mdadm.conf.
If auto-assembly is used to find all array during boot, then the
HOMEHOST feature should be used to ensure there is no room for
confusion in choosing array names, and so it should not be set
to <ignore>.
Signed-off-by: NeilBrown <neilb@suse.de>
The line 'auto' in mdadm.conf can be used to disable assembly
of specific metadata types, or of all arrays.
This does not affect assembly of arrays listed in mdadm.conf
or on command line.
auto -all
will disable all auto-assembly.
auto -ddf
will cause mdadm to ignore ddf arrays that are not explicitly
mentioned, and auto assemble anything else it finds.
Signed-off-by: NeilBrown <neilb@suse.de>
Sometimes we want to ensure particular arrays are never
assembled automatically. This might include an array made of
devices that are shared between hosts.
To support this, allow ARRAY lines in mdadm.conf to use the word
"ignore" rather than a device name. Arrays which match such lines
are never automatically assembled (though they can still be assembled
by explicitly giving identification information on the mdadm command
line.
Signed-off-by: NeilBrown <neilb@suse.de>
If an array is created with --homehost=any, then --assemble and
--incremental will treat it as being local to 'this' host, no matter
what the name of this host is.
This is useful for array that will be given unique names and be
moved between machines.
This needs to be documented.
Signed-off-by: NeilBrown <neilb@suse.de>
wait not only for the name to appear, but for it to refer to the
correct device.
Sometimes old symlinks left lying around can be confusing.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ if homehost matches, then we need to set trustworthy to 'LOCAL'
2/ if we decide to set trustworthy to 'METADATA' because we have to
use the metadata version name, do that *after* we have checked if
we are going to assemble within a container, as inside the
container there could be different sources of names to use.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently Incremental_container is being called after adding each disk.
In the imsm case where spares are not tracked in the raid_disks field we
can use --no-degraded to block premature assembly.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Just like the Assemble case, default to the text_version of the
container if another name is not specified.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
It is possible for some arrays to be created e.g. by initrd, and so
not get mentioned in /var/run/mdadm/map.
As "-I" depends on things being listed in 'map', we create it by
scanning all devices if it doesn't exist.
Signed-off-by: NeilBrown <neilb@suse.de>
If udev hasn't created the array yet, we might still want to
open it. So open directly by major:minor.
Also, of array in map file doesn't appear to exist, do use
the name associated with it.
Signed-off-by: NeilBrown <neilb@suse.de>
Factor out, from Incremental_container, the code for assembling an
array based on information extracted from a container. We will
shortly use this from Assemble too.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When a 'container' gets started, we need udev to notice, but the
kernel has no way of knowing that a KOBJ_CHANGE event is needed. So
send one directly via the 'uevent' sysfs attribute.
Also, uevents don't get generated when md arrays are stopped (prior to
2.6.28) so send 'change' events then too.
Signed-off-by: NeilBrown <neilb@suse.de>
When mdadm.conf is automatically generated, we might not know a
suitable /dev/name. But we do know the uuid of the container.
So allow that as an option.
Signed-off-by: NeilBrown <neilb@suse.de>
i.e. in mdadm.conf you can have a line like
ARRAY uuid=whatever
and it will use auto-name-generation to give a name to the array at
assemble-time. The is different from blind auto-assembly in that the
array will be treated as 'local'.
This reflect that fact that more often than not it is creating things
in /dev, and allows for a new open_mddev which does just that.
Signed-off-by: NeilBrown <neilb@suse.de>
It doesn't really make sense for the --auto setting to ever over-ride
the setting on an ARRAY line. That could cause failure if the
ARRAY line has a 'standard' now. So revert to the array line having
precedence over command line, then CREATE line last.
Signed-off-by: NeilBrown <neilb@suse.de>
If a foreign (i.e. not known to be local) array is discovered
by --incremental assembly, we now assemble it. However we ignore
any name information in the array so as not to potentially create
a name that conflict with a 'local' array.
Also, foreign arrays are always assembled 'read-auto' to avoid writing
anything until the array is actually used.
Signed-off-by: NeilBrown <neilb@suse.de>
If incremental assembly finds an array mentioned in mdadm.conf,
with a 'standard partitioned' name like /dev/md_d0 or /dev/md/d0,
it will not create a partitioned array like it should.
This is because it mishandled the 'devnum' returned by
is_standard.
That is a devnum that does not have the partition-or-not encoded
into it. So we need to check the actual return value of
is_standard and encode the partition-or-not info into the devnum.
Also fix a couple of comments.
Signed-off-by: NeilBrown <neilb@suse.de>
RAID10 is the only raid level that uses the avail char array pointer
during the enough() operation, so it was the only one that saw this.
The code in incremental assumes unconditionally that count_active will
allocate the avail char array, that it might be used by enough, and that
it will need to be freed afterward. Once you make count_active actually
do that, then the oops goes away.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Given an mdadm.conf like the following allow /dev/imsm and /dev/md/r1 to be
created by "mdadm -As".
DEVICES partitions
ARRAY /dev/imsm metadata=imsm auto=md UUID=b98f5dbe-aa859e7b-0e369b89-a80986d4
ARRAY /dev/md/r1 container=/dev/imsm member=0 auto=mdp UUID=3538e39c-b397c2e9-1aa031f9-2bc0eca4
spares=1
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When -I get a new device for a container and tries to incrementally
assemble the container array, it calls sysfs_set_array to create the
array without first checking if it already exists. This produces
unpleasant error messages.
So check first.
Signed-off-by: NeilBrown <neilb@suse.de>
For now, this means that the lack of a homehost doesn't always prevent
assembly.
Soon we will allow assembly anyway, but have different messages if
homehost isn't supported.
mdadm -I /dev/part-of-container
should add that to a container, creating if it needed,
and then try to assemble any arrays in the container.
Signed-off-by: NeilBrown <neilb@suse.de>
When we assemble an array, there are three different approaches
depending on whether metadata is internal or external, and on
kernel version.
Move all this to a common helper instead of duplicating in 3 places.
Signed-off-by: NeilBrown <neilb@suse.de>
The variety of approaches to 'add_disk' are factored out into
a separate function, and Incremental mode benefits by being
closer to supporting the assembly of containers.
Also remove the adding-to-array-data-structure out of sysfs_add_disk
and into add_disk.
And add some tests for --incremental mode to make sure we don't break it.
Signed-off-by: NeilBrown <neilb@suse.de>
'container_member' isn't really a well defined concept.
Each metadata might enumerate members differently, so just
let each format /mdX/YYYY as appropriate.
Some kernel versions don't put a space between 'active' and '(auto-read-only)'
in /proc/mdstat. This causes a parsing problem leaving 'level' set to
NULL which causes a crash.
So synthesise a space there if it is missing, and check for 'level' to
be NULL and don't de-ref if it is.