Current "mdadm --run /dev/mdX" will not handle external metadata
properly. mdmon won't be started etc.
So use the code from "mdadm -IRs" instead - that already does all
the right things.
Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This call to validate_geometry is really rather gratuitous.
It is purely about the fact that super0 cannot use more than 4TB.
So just make it an explicit test - less confusing that way.
With this, validate_geometry is only called from Create, which
makes it easier to reason about.
Also validate_geometry is now never passed NULL for the 'chunk'
parameter, so we can remove those annoying tests for NULL.
Signed-off-by: NeilBrown <neilb@suse.de>
If we stop too soon after reshape starts (probably only during
testing), we can get confused by the status of the reshape.
If that might be happening - sleep a bit longer.
Also allow for reshape going unusually slowly (again, probably only
during testing).
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible for 'sync_completed' to be further ahead than
we deduced from 'reshape_position'. However we cannot read it while
the array is frozen, so it is hard to know.
Once that array is unfrozen, check and if sync_completed is ahead of
'sync_max', push 'sync_max' well ahead if 'sync_completed' so it
will all synchronise up properly.
Signed-off-by: NeilBrown <neilb@suse.de>
To be able to revert-reshape of raid4/5/6 which is changing
the number of devices, the reshape must has been stopped on a multiple
of the old and new stripe sizes.
The kernel only enforces the new stripe size multiple.
So we enforce the old-stripe-size multiple by careful use of
"sync_max" and monitoring "reshape_position".
Signed-off-by: NeilBrown <neilb@suse.de>
Manage_runstop has an open-coded version of use_udev() which is no
longer correct. So make it use use_udev() explicitly.
Signed-off-by: NeilBrown <neilb@suse.de>
A RAID10 array can have 'sets' of devices which are reported by
--detail.
They can now be collectively failed or removed.
Signed-off-by: NeilBrown <neilb@suse.de>
When stopping an mdmon array, at reshape might be being aborted
which inhibets O_EXCL. So if that is possible, call flush_mdmon
to make sure mdmon isn't still busy.
Reported-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
There are a number of fields which should not
be left uninitialised. e.g. attempt_re_add can get
confused if ->writemostly is not set correctly.
Signed-off-by: NeilBrown <neilb@suse.de>
When asked to incrementally-remove a device, try marking the array
read-auto first. That will delay recording the failure in the
metadata until it is really relevant.
This way, if the device are just unplugged when the array is not
really in use, the metadata will remain clean.
If marking the default as faulty fails because it is EBUSY, that
implies that the array would be failed without the device. As the
device has (presumably gone) - that means the array is dead. So try
to stop it. If that fails because it is in use, send a uevent to
report that it is gone. Hopefully whoever mounted it will now let go.
This means that if you plug in some devices and they are
auto-assembled, then unplugging them will auto-deassemble relatively
cleanly.
To be complete, we really need the kernel to disassemble the array
after the last close somehow. Maybe if a REMOVE has failed and a STOP
has failed and nothing else much has happened, it could safely stop
the array on last close.
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm /dev/mdXX --re-add faulty
will identify any faulty devices in the array, remove them, and
--re-add them.
Signed-off-by: NeilBrown <neilb@suse.de>
A recent change to improve error messages for subdev management broken
all use cases were device names like %d:%d were used.
Re-arrange the code again so we use dev_open first - which understands
those names - and then only try 'stat' if that failed.
The important thing is to base the 'Cannot find' message on the result
of 'stat', not on the result of 'open'.
Signed-off-by: NeilBrown <neilb@suse.de>
As dev_open uses O_DIRECT it will fail on directories and such.
So we never get to report that it isn't a block device.
So do a 'stat' earlier and if it is a block device, report the
error there.
Signed-off-by: NeilBrown <neilb@suse.de>
--replace can be used to replace a device without completely failing
it. Once the replacement completes the device will be failed.
--with can indicate which of several spares to use.
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ...
The size is in K unless a suffix: K M G is given.
The suffix 's' means sectors.
Signed-off-by: NeilBrown <neilb@suse.de>
This is currently only useful for 1.x metadata and will allow an
explicit --data-offset request on command line.
Signed-off-by: NeilBrown <neilb@suse.de>
We must only remove from a container if the device isn't a
member of any member array.
To check we look at the 'holders' directory in sysfs.
We currently skip that check if ->devname is "detached", however
that can never be true since the change that introduced
add_detached().
Also sysfs_unique_holder returns status in 'errno' which isn't
entirely safe as e.g. closedir() is probably allowed to clear it.
So make sysfs_unique_holder return an unambigious value, and us
it to decide what to report.
Signed-off-by: NeilBrown <neilb@suse.de>
'external' arrays don't support --re-add yet so old metadata is no
value, and 'ddf' gets confusing in mdmon if old metadata is found.
So for now, zero out any old metadata found before adding a spare to
an externally-managed array.
Reported-by: Albert Pauw <albert.pauw@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This patch fixes the following make everything compilation error:
Manage.c: In function ‘Manage_add’:
Manage.c:538: error: ‘dev_st’ may be used uninitialized in this function
make: *** [mdadm.Os] Error 1
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This makes Manage_subdevs smaller, and makes the error-path handling
for Manage_add much cleaner and probably less buggy.
Signed-off-by: NeilBrown <neilb@suse.de>
The indent level is way too deep here, and this is a well defined
task, so split it out to a separate function.
Signed-off-by: NeilBrown <neilb@suse.de>
'st' is use to examine the metadata on the device being added
to see if a 're-add' is possible. However it is loaded long before
the 're-add' attempt is made.
So move the 'load_super' closer to were it is used - allowing us to
discard a number of 'free_super' call - and rename it to 'dev_st'
to emphasize that it related to the current device.
Signed-off-by: NeilBrown <neilb@suse.de>
We currently have rather hard-to-follow loop to iterate
through all the matches for 'missing' or 'faulty' or 'detached'.
Simplify it by creating a list of possible devices for each
of those and splicing the new list into the device list.
This removes the need for 'jnext' and 'next' and various other hacks.
Signed-off-by: NeilBrown <neilb@suse.de>
If we change some functions to accept 'verbose', where <0 means to be
quiet, in place of 'quiet', then we will be able to merge
'quiet' and 'verbose' together for simplicity.
Signed-off-by: NeilBrown <neilb@suse.de>
malloc should never fail, and if it does it is unlikely
that anything else useful can be done. Best approach is to
abort and let some super-daemon restart.
So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit. Then use those
removing all the tests for failure.
Also replace all "malloc;memset" sequences with 'xcalloc'.
Signed-off-by: NeilBrown <neilb@suse.de>
The restriction that --add was not allowed on a device which
looked like a recent member of an array was overly harsh.
The real requirement was to avoid using --add when the array had
failed, and the device being added might contain necessary
information which can only be incorporated by stopping and
re-assembling with --force.
So change the test to reflect the need.
Reported-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When we can for devices using GET_DISK_INFO we currently
limit to 1024. But some arrays can have more than this.
So raise it to 4096 and make the constant a #define.
Signed-off-by: NeilBrown <neilb@suse.de>
If the kernel supports it, freeze recovery over multiple adds,
so that they can all be added to the array at the same time and
be recovered in parallel.
Signed-off-by: NeilBrown <neilb@suse.de>
If both "legs" of a RAID1 (or equivalent in RAID10) fail, then one
of the becomes available again it maybe appropriate to re-add the
failed device(s).
So remove the restriction that an array must has 'enough' devices
before being re-added, and if there is no-where to read a superblock
from for matching, then assume the kernel will do necessary checks.
Signed-off-by: NeilBrown <neilb@suse.de>
If you create a two drive raid1 array with one device writemostly, then
fail the readwrite drive, when you add a new device, it will get the
writemostly bit copied out of the remaining device's superblock into
it's own. You can then remove the new drive and readd it as readwrite,
which will work for the readd, but it leaves the stale WriteMostly1 bit
in devflags resulting in the device going back to writemostly on the
next assembly.
The fix is to make sure that A) when we readd a device and we might have
filled the st->sb info from a running device instead of the device being
readded, then clear/set the WriteMostly1 bit in the super1 struct in
addition to setting the disk state (ditto for super0, but slightly
different mechanism) and B) when adding a clean device to an array (when
we most certainly did copy the superblock info from an existing device),
then clear any writemostly bits.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
0.90 arrays can only use up to 4TB per device. So when a larger
device is added, complain a bit. Still allow it if --force is given
as there could be a valid use.
Signed-off-by: NeilBrown <neilb@suse.de>
The loop over all member devices in enough_fd could easily stop
before it had found all devices. This would cause --re-add to
fail incorrectly.
So change the loop to be based on the reported number of devices
in the device - with a safe-guard limit of 1024.
Change some other loops to be more careful too.
Reported-by: "Schmidt, Annemarie" <Annemarie.Schmidt@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If using an old kernel we should still check if a re-add might be
intended, so we can refuse and require a '--zero' first if it is not
possible.
Signed-off-by: NeilBrown <neilb@suse.de>
Stopping an md array requires that there is no other user of it.
However with udev and udisks and such there can be transient other
users of md devices which can interfere with stopping the array.
If there is a transient users, we really want "mdadm --stop" to wait a
little while and retry.
However if the array is genuinely in-use (e.g. mounted), then we
don't want to wait at all - we want to fail immediately.
So before trying to stop, re-open device with O_EXCL. If this fails
then the device is probably in use, so give up.
If it succeeds, but a subsequent STOP_ARRAY fails, then it is possibly
a transient failure, so try again for a few seconds.
Signed-off-by: NeilBrown <neilb@suse.de>
sync_metadata() requires st->sb to be loaded, otherwise exception is
generated. This fails expansion, because spares cannot be added.
metadata update uses tst instead st pointer, it is better than
loading anchor for st as I proposed previously.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Issue observed:
Sporadicaly stopping arrays using "mdadm -Ss" command does not succeded.
Cause:
Writting "inactive" to the array state not succeded- array is busy
(accessed by udev, blkid etc.)
Resolution:
If writing 'inactive' fails, wait and retry again (because it is possibly
a transient failure)
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When for ping_monitor() input devnum2devname() is used,
received string pointer should be passed to free() for memory release.
It is not made in several places. This use case should have function
to avoid memory leak.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When I separated the 'native metadata' case more cleanly from the
"external metadata" case for adding a drive, I left some 'external'
code in the 'native' case, and didn't copy it to the 'external' case.
When - in the external case - we add to super, we much check for
mdmon first, so we know whether to do the metadata update ourselves
or not, then afterwards call either flush_metadata_updates (to send
to mdmon) or sync_metadata (to do it directly).
Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Stopping an md array requires that there is no other user of it.
However with udev and udisks and such there can be transient other
users of md devices which can interfere with stopping the array.
If there is a transient users, we really want "mdadm --stop" to wait a
little while and retry.
However if the array is genuinely in-use (e.g. mounted), then we
don't want to wait at all - we want to fail immediately.
So before trying to stop, re-open device with O_EXCL. If this fails
then the device is probably in use, so give up.
If it succeeds, but a subsequent STOP_ARRAY fails, then it is possibly
a transient failure, so try again for a few seconds.
Signed-off-by: NeilBrown <neilb@suse.de>
add_to_super could use information from the current superblock (ddf
does), so add_to_super for external metadata should be called with
the O_EXCL lock held on the container to ensure the update is complete
before any other process tries to make any changes (like adding
another device to array).
Signed-off-by: NeilBrown <neilb@suse.de>
If an --add is requested and a re-add looks promising but fails or
cannot possibly succeed, then don't try the add. This avoids
inadvertently turning devices into spares when an array is failed but
the devices seem to actually work.
Signed-off-by: NeilBrown <neilb@suse.de>
Loading container may fail if e.g. one of the disks in container
has been detached but udev has not realized the change.
Addition to such array will fail because reading superblock
from one of disks in array fails.
Current message is a bit confusing.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If a request to remove all 'failed' or 'detached' devices chooses to
remove the first device, it will not actually try the removal and will
skip any following devices.
This fixes it.
Reported-by: Rémi Rérolle <rrerolle@lacie.com>
Tested-by: Rémi Rérolle <rrerolle@lacie.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If a request to remove all 'failed' or 'detached' devices chooses to
remove the first device, it will not actually try the removal and will
skip any following devices.
This fixes it.
Reported-by: Rémi Rérolle <rrerolle@lacie.com>
Tested-by: Rémi Rérolle <rrerolle@lacie.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Now that write_init_super doesn't close fds any more, we need
to call free_super before the ADD_NEW_DISK ioctl.
Also call free_super before some error returns, for cleanliness.
Signed-off-by: NeilBrown <neilb@suse.de>
We previously closed all 'fds' associated with an array in
write_init_super .. sometimes, and sometimes at bad times.
This isn't neat and free_super is a better place to close them.
So make sure free_super always closes the fds that the metadata
manager kept hold of, and stop closing them in write_init_super.
Also add a few more calls to free_super to make sure they really do
get closed.
Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When user wants to add spares to container with raid0 arrays only
it is not possible to update metadata due to lack of running mdmon.
To allow for this direct metadata update by mdadm is used in such case.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This is useful with 1.1 and 1.2 metadata to update the metadata if
the device size has changed.
The same functionality can be achieved by writing to the device size
in sysfs after re-adding normally, but in some cases this might be
easier.
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm --readwrite <subarray> will clear the external readonly flag ('-'
to '/'), but only for redudant arrays. Allow raid0 arrays as well so
the user has a simple helper to control this flag.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Remove the _t pointer typedef and remove the _s suffix for the
structure,
These things do not help readability.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than hiding this in the 'st', return it explicitly.
In the one case we still need it, copy it into st where needed.
This will disappear in a future patch.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than hiding this arg in the 'st' structure, pass it explicitly.
This is a first step to getting rid of 'subarray' from 'supertype'.
The strcpy in open_subarray should have better error checking, but it
will disappear soon so there is little point.
Signed-off-by: NeilBrown <neilb@suse.de.
To accurately detect when an array has been split and is now being
recombined, we need to track which other devices each thinks is
working.
We should never include a device in an array if it thinks that the
primary device has failed.
This patch just allows get_info_super to return a list of devices
and whether they are thought to be working or not.
Signed-off-by: NeilBrown <neilb@suse.de>
If an --add is requested and a re-add looks promising but fails or
cannot possibly succeed, then don't try the add. This avoids
inadvertently turning devices into spares when an array is failed but
the devices seem to actually work.
Signed-off-by: NeilBrown <neilb@suse.de>
If udev is not in use, we create device in /dev when assembling
arrays and remove them when stopping the array.
However it may not always be correct to remove the device. If
the array was started with kernel auto-detect, them mdadm didn't
create anything and so shouldn't remove anything.
We don't record whether we created things, so just don't remove
anything with a 'standard' name. Only remove symlinks to the
standard name as we almost certainly created those.
Reported-by: Petre Rodan <petre.rodan@avira.com>
Signed-off-by: NeilBrown <neilb@suse.de>
One: a single character typo (of instead of or in an error printout)
Two: Audited usage of tfd file descriptor. Make sure that the tfd file
is always closed after usage, and that the tfd variable is reset to -1
if we are going to continue in our loop (not necessary if we know we
will return from our function without going through the dv loop again).
Signed-off-by: Doug Ledford <dledford@redhat.com>
--test can be given in Manage mode.
This can be used when there is an attempt to fail or remove 'faulty',
'failed' or 'detached' devices, or to re-add 'missing' devices.
If no devices were failed, removed, or re-added, then mdadm will
exit with status '2'.
Signed-off-by: NeilBrown <neilb@suse.de>
If the device name "missing" is given for --re-add, then mdadm will
attempt to find any device which should be a member of the array but
currently isn't and will --re-add it to the array.
This can be useful if a device disappeared due to a cabling problem,
and was then re-connected.
The appropriate sequence would be
mdadm /dev/mdX --fail detached
mdadm /dev/mdX --remove detached
mdadm /dev/mdX --re-add missing
Signed-off-by: NeilBrown <neilb@suse.de>
When using 0.90 metadata, devices can be renumbered when
earlier devices are removed.
So when iterating all devices looking for 'failed' or 'detached'
devices, we need to re-check the same slot we checked last time
to see if maybe it has a different device now.
Reported-by: Jim Paris <jim@jtan.com>
Resolves-Debian-Bug: 587550
Signed-off-by: NeilBrown <neilb@suse.de>
This can be used for hot-unplug. When a device has been remove,
udev can call
mdadm --incremental --fail sda
and mdadm will find the array holding sda and remove sda from
the array.
Based on code from Doug Ledford <dledford@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Allow kernel names like "sda" and "hdb1" to be used to
fail/remove devices from an array.
This is useful as after a device has been removed it can be difficult
to get the major/minor number.
Signed-off-by: NeilBrown <neilb@suse.de>
Allow the name of the array stored in the metadata to be updated. In
some cases the metadata format may not be able to support this rename
without modifying the UUID. In these cases the request will be blocked.
Otherwise we allow the rename to take place, even for active arrays.
This assumes that the user understands the difference between the kernel
node name, the device node symlink name, and the metadata specific name.
Anticipating further need to modify subarrays in-place, introduce the
->update_subarray() superswitch method. A future potential use
case is setting storage pool (spare-group) identifiers.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
mdadm prevents creation when device names are duplicated on the command
line, but leaves the partially created array intact. Detect this case
in the error code from add_to_super() and cleanup the partially created
array. The imsm handler is updated to report this conflict in
add_to_super_imsm_volume().
Note that since neither mdmon, nor userspace for that matter, ever saw an
active array we only need to perform a subset of the cleanup actions.
So call ioctl(STOP_ARRAY) directly and arrange for Create() to cleanup
the map file rather than calling Manage_runstop().
Reported-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If /dev is static, a name may remain there after the
device has been detached from the system.
Using 'mdadm' to remove such a device from the array
should still work (even though "mdadm --remove detached"
might be preferred).
So when processing a device for '-r', don't insist on
being able to open the device.
Signed-off-by: NeilBrown <neilb@suse.de>
Minimal changes needed to permit reassembling partially recovered
external metadata arrays. The biggest logical change is that
->container_content() can now surface partially rebuilt members rather
than omitting them from the disk list.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If a device is marked as faulty, then a re-add will cause it to be
added as a faulty drive, which is not what it wanted.
So just refuse to try to re-add a device which is marked 'faulty'.
Signed-off-by: NeilBrown <neilb@suse.de>
As --add can destroy important data on a disk, and
--re-add is not suppose to, it is wrong to silently
try --add if --re-add fails.
So print a message and abort instead.
Signed-off-by: NeilBrown <neilb@suse.de>
As mdadm is normally a short-lived program it isn't always necessary
to free memory that was allocated, as the 'exit()' call will
automatically free everything. But it is more obviously correct if
the 'free' is there.
So this patch add a few calls to 'free'
Signed-off-by: NeilBrown <neilb@suse.de>