It's not necessary to check for 0xffffffff, which is a valid
sequential number.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When stopping an mdmon array, at reshape might be being aborted
which inhibets O_EXCL. So if that is possible, call flush_mdmon
to make sure mdmon isn't still busy.
Reported-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Transition from "degraded" to "recovery" made in OROM is slightly different
than the same transision in mdadm. Missing disk is not removed from list of
raid devices, but just from map. Therefore mdadm should not end migration
basing on existence of list of missing disks but should rely on count of
failed disks.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Tested-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Mdmon does not update component_size now. It is wrong because in case
of size's expansion component_size is changed by mdadm but mdmon does not
reread its new value and uses a wrong, old one. As a result the metadata
is incorrect during size's expansion. It contains no information that
resync is in progress (there is no checkpoint too). The metadata is
as if resync has already been finished but it has not.
Component_size will be set to match information in sysfs. This value
will be updated by manager thread in manage_member() function.
Now mdmon uses the correct, current value of component_size and the
correct metadata (containing information about resync and checkpoint)
is written.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
There are a number of fields which should not
be left uninitialised. e.g. attempt_re_add can get
confused if ->writemostly is not set correctly.
Signed-off-by: NeilBrown <neilb@suse.de>
Here, "large" means components are 100G or more. It is
usually beneficial to have write-intent bitmaps on such arrays.
They can be suppressed with --bitmap=none
Signed-off-by: NeilBrown <neilb@suse.de>
When asked to incrementally-remove a device, try marking the array
read-auto first. That will delay recording the failure in the
metadata until it is really relevant.
This way, if the device are just unplugged when the array is not
really in use, the metadata will remain clean.
If marking the default as faulty fails because it is EBUSY, that
implies that the array would be failed without the device. As the
device has (presumably gone) - that means the array is dead. So try
to stop it. If that fails because it is in use, send a uevent to
report that it is gone. Hopefully whoever mounted it will now let go.
This means that if you plug in some devices and they are
auto-assembled, then unplugging them will auto-deassemble relatively
cleanly.
To be complete, we really need the kernel to disassemble the array
after the last close somehow. Maybe if a REMOVE has failed and a STOP
has failed and nothing else much has happened, it could safely stop
the array on last close.
Signed-off-by: NeilBrown <neilb@suse.de>
Without calling load_container at this point, the
info structure may be missing some important information.
In particular, information about secondary DDF RAID levels
may be wrong if information is only read from a single disk.
If this fails, fall back to the previous code.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
If a match is found in compare_super_ddf, check the other SB
for local DDF information (VD config records, physical disk data)
which is not available in the current superblock, and add it
if needed.
This is important for the mdmon - when disks are added to a
auto read-only array, they must be present in the DDF structure
in order to guarantee consistent writeback of metadata to all
disks.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Besides container GUID, also check seqnum, physical and virtual
disk numbers, and check match between local and global sections.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When writing back the DDF structure, make sure that on each disk
we write the configs that include this disk even if a secondary
RAID level is present. Otherwise the secondary RAID will not be
read correctly any more when we open the device next time.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Check for supportable secondary RAID configurations.
There is currently only one: RAID 10, if the stripe
sizes and Basic volume sizes are all equal.
With this patch, mdadm will not try to start unsupported
secondary RAID level configurations any more.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When searching for container elements, loop over the known phys
disks rather than the elements of the current configuration.
This patch changes nothing in the logic or return value of the code.
It just prepares extended logic for handling RAID10.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Store VD config for other BVDs in the other_bvds array.
This allows handling secondary RAID levels in container_content_ddf.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The VD config structures of different BVDs in the same SVD may be
different. This pointer stores the other BVDs.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Cleanly increase the seq number when the DDF structures are
written, instead of always setting it back to 1.
Also, make sure that the sequential number of all headers and
VD conf records is the same.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Some RAID BIOSes apparently use hard-coded LBA offsets (presumably
from the end of the disk) for the primary and secondary DDF
structure, ignoring the values given in the DDF anchor. This is
broken BIOS behavior, but it will cause any changes made by MD
(e.g. setting the init_state flag after a full initialization)
to be "forgotten" after the next reboot.
This patch fixes this by using the exiting LBA locations if
available. Verified that this fixes MD+LSI Mega Software RAID
BIOS.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
So far, mdadm only saved the header of the secondary structure.
With this patch, the full secondary DDF structure is saved
consistently, too. Some vendor DDF implementations need it.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
As 'layout' doesn't map neatly from RAID4 to RAID5, we need to
set it correctly for RAID4.
Also, when no reshape is needed we should set re->level to the final
desired level.
Signed-off-by: NeilBrown <neilb@suse.de>
Right now, the rules that run blkid on raid arrays are executed after
the assembly rules. This means incremental assembly will always fail
when raid arrays are again physical components of raid arrays.
Instead of simply reversing the order, split the rules up into two files,
one dealing with array properties and one dealing with assembly.
Signed-off-by: NeilBrown <neilb@suse.de>
* $tempnode is deprecated, use $devnode
* blkid -o udev output is deprecated, use IMPORT{builtin}="blkid" instead
Signed-off-by: NeilBrown <neilb@suse.de>
the code that was exposed on anything else than dietlibc and klibc
is entirely glibc specific and broke the build on musl libc.
Signed-off-by: John Spencer <maillist-mdadm@barfooze.de>
Signed-off-by: NeilBrown <neilb@suse.de>
this is a GLIBC specific feature and should not be used.
according to its manpage:
"The call canonicalize_file_name(path) is equivalent
to the call realpath(path, NULL)."
thus, we use realpath so it works everywhere.
Signed-off-by: John Spencer <maillist-mdadm@barfooze.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Commit 1e2b276535 (Report error in --update
string is not recognised) broke homehost updating functionality because it
depended on each string comparison being done even after we already found
a match. Make it work again by restructuring code.
Reported-by: (and original version by) Justin Maggard <jmaggard10@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Now that we use O_DIRECT for all device IO, BLKFLSBUF is not needed to
ensure we get current data, and it can impose a cost if any flush-out
is needed. So remove it.
To be safe, add O_DIRECT to one place where it isn't currently used:
when reading a bitmap.
Signed-off-by: NeilBrown <neilb@suse.de>
If externally menaged metadata is in use, array.major_version will
be zero, so the test here to consider using get_component_size()
is wrong. So if sra is present, use the major_version from there.
Signed-off-by: NeilBrown <neilb@suse.de>
While not strictly necessary for systemd, it is cleaner to avoid
forking when running from a management daemon. So add a --foreground
option to mdmon.
Signed-off-by: NeilBrown <neilb@suse.de>
If launching mdmon via systemctl fails, we fall back to the old method
of fork/exec. This allows for having mdmon launched via systemctl
which avoids problems with it getting killed by systemd due to it
ending up in the parent's cgroup (udev).
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We still allow --offroot to be given - for compatibility with scripts
- but ignore it.
The whole point of --offroot is to get systemd to not auto-kill mdmon,
and we always want that.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
action=re-add will only re-add a recently removed device if a
bitmap is present.
Otherwise a force-space is needed.
Signed-off-by: NeilBrown <neilb@suse.de>
map_dev can be slow, and doesn't really provide a better result
than just creating a temporary device.
So discard it and use mknod/open/unlink to open a major:minor device.
Signed-off-by: NeilBrown <neilb@suse.de>
find_intel_devices() has take a little while to run as it scans
some directory tree, and the result isn't likely to change
often.
So cache the value and only discard it after 10 seconds.
Signed-off-by: NeilBrown <neilb@suse.de>
map_dev can be slow so it is best to not call it when
not necessary.
The final test in "find_free_devnum" is not relevant when
udev is being used, so remove the test in that case.
Signed-off-by: NeilBrown <neilb@suse.de>
When auto-assembling we might find an array which appear in
mdadm.conf.
This can happen if the array (based on UUID) doesn't match what is
in mdadm.conf.
For consistency we should avoid auto-assembling such an array just as
we avoid regular-assembling of the array.
Reported-by: Ross Boylan <ross@biostat.ucsf.edu>
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm /dev/mdXX --re-add faulty
will identify any faulty devices in the array, remove them, and
--re-add them.
Signed-off-by: NeilBrown <neilb@suse.de>
A recent change to improve error messages for subdev management broken
all use cases were device names like %d:%d were used.
Re-arrange the code again so we use dev_open first - which understands
those names - and then only try 'stat' if that failed.
The important thing is to base the 'Cannot find' message on the result
of 'stat', not on the result of 'open'.
Signed-off-by: NeilBrown <neilb@suse.de>
It isn't enough to simply not assemble arrays found to be called
<ignore>, as the final stage of auto-assemble doesn't check for names
in mdadm.conf.
So add a check to Assemble, similar to the check in Incremental()
Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NeilBrown <neilb@suse.de>
We currently complain if mdadm.conf contains multiple
definitions for the same name. Unfortunately this stops
multiple arrays from being <ignored>d.
So exclude "<ignore>" from the duplicate-names test.
Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NeilBrown <neilb@suse.de>
If a resync is delayed, then e->percent will be negative but not
RESYNC_NONE. In that case we still want to wait.
Reported-by: Ross Boylan <ross@biostat.ucsf.edu>
Signed-off-by: NeilBrown <neilb@suse.de>
commit 1f9b0e2845
Grow - be careful about 'delayed' reshapes.
Introduced a bug where a list of devices longer than 1
would cause an infinite loop. Oops.
Signed-off-by: NeilBrown <neilb@suse.de>
'test' is really a bash script more than an 'sh' script, so
don't say "run 'sh ./test'", just say "run './test'".
Reported-by: Gilles Espinasse <g.esp@free.fr>
Signed-off-by: NeilBrown <neilb@suse.de>
It fixes the following uninitialized variables compilation-time error:
WARN - Grow.c: In function ‘reshape_array’:
WARN - Grow.c:2413:21: error: ‘min_space_after’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
WARN - Grow.c:2376:39: note: ‘min_space_after’ was declared here
WARN - Grow.c:2414:22: error: ‘min_space_before’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
WARN - Grow.c:2376:21: note: ‘min_space_before’ was declared here
WARN - cc1: all warnings being treated as errors
WARN - make: *** [Grow.o] Error 1
It occurs during compilation of mdadm on Fedora 17.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>