When metadata is managed externally - probably as a container - we
need to examine that metadata to see which devices are spares.
So use the getinfo_super_disk message and use the info returned.
Signed-off-by: NeilBrown <neilb@suse.de>
choosing a spare from a container is more complicated that
from a native array. So separate out choose_spare to make it easier
to use an alternate implementation
Signed-off-by: NeilBrown <neilb@suse.de>
When growing the number of raid disks the reshape process will promote
container-spares to subarray-spares (later the kernel promotes them to
subarray-members in raid5_start_reshape()). The automatic spare
promotion that mdmon performs upon seeing a degraded array must be
disabled until the reshape process has been initiated. Otherwise, mdmon
may start a rebuild before the reshape parameters can be specified.
In the external case we arrange for the monitor to be blocked, and
turn off the safemode delay.
Mdmon is updated to check sync_action is not frozen before initiating
recovery.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
In the native metadata case Grow_reshape() and the kernel validate what
reshapes are possible / supported and the kernel handles all the metadata
updates. In the external case the metadata format may have specific
constraints above this baseline. External formats also introduce the
constraint of only permitting some reshapes at container scope versus subarray
scope. For exmaple imsm changes to 'raiddisks' must be applied to all arrays
in the container.
This operation assumes that its 'st' parameter has been obtained from
super_by_fd() (such that st->subarray is up to date), and that a snapshot of
the metadata has been loaded from the container.
Why a new method, versus extending an existing one?
->validate_geometry: this routine assumes it is being called from Create(),
adding reshape complicates the cases that this routine needs to handle. Where
we find that checks can be shared between the two cases those routines
refactored into common code internal to the metadata handler, i.e. no need to
provide a unified external interface. ->validate_geometry() also does not
expect to update the metadata.
->update_super: this is meant to update single fields at Assembly() and only at
the container scope. Reshape potentially wants to update multiple fields at
either container or subarray scope.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Precludes needing to deduce this information later, like in Detail.c and
soon in Grow.c.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Support metadata specific level, layout and chunksize defaults. Kill an
uneeded superswitch methods ahead of adding more for the reshape case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If the user does not specify a layout, don't skip asking about retaining
the non-standard raid6 layout which may be implicitly changed.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Incremental assembly works on such an array because the kernel sees the
disk as in-sync and that the array is reshaping. Teach Assemble() the
same assumptions.
This is only needed on kernels that do not initialize ->recovery_offset
when activating spares for reshape.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Going through the Grow api found some local routines that could be
marked static.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm --readwrite <subarray> will clear the external readonly flag ('-'
to '/'), but only for redudant arrays. Allow raid0 arrays as well so
the user has a simple helper to control this flag.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
In order to support reshape and atomic removal of spares from containers
we need to prevent mdmon from activating spares. In the reshape case we
additionally need to freeze sync_action while the reshape transaction is
initiated with the kernel and recorded in the metadata.
When reshaping a raid0 array we need to freeze the array *before* it is
transitioned to a redundant raid level. Since sync_action does not exist
at this point we extend the '-' prefix of a subarray string to flag
mdmon not to activate spares.
Mdadm needs to be reasonably certain that the version of mdmon in the
system honors this 'freeze' indication. If mdmon is not already active
then we assume the version that gets started is the same as the mdadm
version. Otherwise, we check the version of mdmon as returned by the
extended ping_monitor() operation. This is to catch cases where mdadm
is upgraded in the filesystem, but mdmon started in the initramfs is
from a previous release.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
...before introducing another open coded instace of this conversion.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When checking that a container matches the required uuid,
we need to call 'getinfo_super' before we have a 'content'
to test.
Reported-by: "Czarnowska, Anna" <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Platform (metadata) domain let the metadata handlers differentiate
disk domains based on controllers that the disk belongs to.
Platform domain is sub-domain inside user specified domain
in mdadm.conf configuration files inheriting all parameters from it.
The metadata domain name is used disk domain matching functions.
The disk with the same metadata domain name belong to the same metadata
domain.
New metadata handler is added that retrieves platform domain string based
on disk path:
const char *(*get_disk_controller_domain)(const char *path);
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When trying to move a spare, move to the container of a degraded
array, not to the array itself.
And don't try to move from a subarray, only from a native or container
array.
And don't move from a container which contains degraded subarrays.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than only migrating between arrays with the same spare_group,
we now migrate based on domains set in the policy.
In order for spare_group to continue to work, we treat it as a domain
of the destination array, and a domain of any device we might remove
from a source array.
Signed-off-by: NeilBrown <neilb@suse.de>
Checking compatibility between arrays for spare migration is going to
become a little more complicated, so split it out into a separate
function.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than passing mailaddr, mailfrom, cmd, dosyslog around in
argument lists, create a structure to hold them all.
Signed-off-by: NeilBrown <neilb@suse.de>
If getinfo_super is called on a container supertype we only get information
on first disk. As a parameter it uses reference to preallocated
mdinfo structure. Amending getinfo_super to return full list of disks
would require ammending all previous calls and subsequently freeing memory
allocated for mdinfo list.
Function container_content that returns a mdinfo list is written
specifically for assembly, performing actions not needed to just fill
mdinfo. It also does not include spares so is unsuitable.
As an alternative a new function getinfo_super_disks is created
to obtain information about all disks states in array.
Existing function sysfs_free is used to free memory
allocated by getinfo_super_disks.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Each containers has list of its subarrays. Each subarray
has back link to its parent container.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Monitor() has become way too big. Break it up into multiple smaller
functions that are all called from the main loop.
Signed-off-by: NeilBrown <neilb@suse.de>
For subarrays, record the devid of the parent.
For others arrays, record the metadata type.
This will be used in a subsequent patch to link related arrays
together and allow spare migration between containers.
Signed-off-by: NeilBrown <neilb@suse.de>
utime is not correct for external metadata so we must
not risk the observed time ever matching the old time.
Reported-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
--no-sharing option disables moving spares between arrays/containers.
Without the option spares are moved if needed according to config rules.
We only allow one process moving spares started with --scan option.
If there is such process running and another instance of Monitor
is starting without --scan, then we issue a warning but allow it
to continue.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
mse can be NULL when the array was not in mdstat when we read it
but existed in statelist and was recreated after reading mdstat.
In this case we set err as we can't get full update on this array
this time.
If the same array is given twice in command line it appears twice
in statelist. The first one will mark mse->devnum=INT_MAX
so the second one can't find mse. We set err on the second one as
it's not needed. Also if it becomes degraded we would look for spares
twice for the same array.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When "mdadm -I" is given a device with no metadata, mdadm tries to add
it as a 'spare' somewhere based on policy.
This patch changes the behaviour in two ways:
1/ If the device is at a 'path' where a previous device was removed
from an array or container, then we preferentially add the spare to
that array or container.
2/ Previously only 'bare' devices were considered for adding as
spares. Now if action=spare-same-slot is active, we will add
non-bare devices, but *only* if the path was previously in use
for some array, and the device will only be added to that array.
Based on code
From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The current act_spare tests only test if it is allowed for some
metadata.
As we check each array or partitioning type, we need to double-check
that sparing is allowed for that array or partitioning type.
Signed-off-by: NeilBrown <neilb@suse.de>
We only want to try partition_try_spare if array_try_spare failed.
If it succeeded, there is nothing more to try.
Signed-off-by: NeilBrown <neilb@suse.de>
Instead of open coding (and using horrible gotos), make this
a separate function.
Also fix the check for end of device - SEEK_END doesn't work on
block devices.
Signed-off-by: NeilBrown <neilb@suse.de>
If the disk is taken out from its port this port information is
lost. Only udev rule can provide us with this information, and then we
have to store it somehow. This patch adds writing 'cookie' file in
/dev/.mdadm/failed-slots directory in form of file named with value of
f<path-id> containing the metadata type and uuid of the array (or
container) that the device was a member of. The uuid is in exactly
the same format as in the mapfile.
FAILED_SLOTS_DIR constant has been added to hold the location of
cookie files.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When we -I -R a device in a container, we must first fail it
from each member array before we can remove it from the container.
Signed-off-by: NeilBrown <neilb@suse.de>
ID_FS_TYPE for IMSM members is set to isw_raid_member, so they must also
be handled in udev.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
<path-id> allows to identify the port to which given device is plugged
in. In case of hot-removal, udev can pass this information for future
use (eg. write this name as 'cookie' allowing to detect the fact of
reinserting device to the same port).
--path <path-id> parameter has been added to device removal handle
(and char *path has been added to IncrementalRemove() to pass this
value) in order to pass path-id to this handler.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We are about to lose the loaded_container field, and we don't really
need to use it to protect the calculation of container_enough.
So drop the test.
Signed-off-by: NeilBrown <neilb@suse.de>
Separate the load_container call from the load_super call,
and use different validity tests as appropriate.
Add some general code tidying and a bit of indent change to make
structure a little clearer.
Signed-off-by: NeilBrown <neilb@suse.de>
This is somewhat inconsistent with the last member of a
container getting special handling.
Just simplify it so the code seems to make sense and important
is easy to follow.
Signed-of-by: NeilBrown <neilb@suse.de>