1.x metadata allows a device to be a member of the array while it
is still recoverying. So it is a working member, but is not
completely in-sync.
mdadm/assemble does not understand this distinction and assumes that a
work member is fully in-sync for the purpose of determining if there
are enough in-sync devices for the array to be functional.
So collect the 'recovery_start' value from the metadata and use it in
assemble when determining how useful a given device is.
Reported-by: Mikael Abrahamsson <swmike@swm.pp.se>
Signed-off-by: NeilBrown <neilb@suse.de>
Replace occurrences of ~0ULL to make it clear we are talking about maximal
resync/recovery position.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We don't need to sprinkle reads of this attribute all over the place,
just once at the entry of read_and_act(). Also, the mdinfo structure
for the array already has a 'resync_start' member, so just reuse that.
Finally, rename get_resync_start() to read_resync_start to make it
consistent with the other sysfs accessors in monitor.c.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
->validate_geometry is called to validate overall parameters,
and to validate each individual device.
If it ever fails, it needs to report the reason, as common code
cannot possible know.
Signed-off-by: NeilBrown <neilb@suse.de>
Previously such things did not exist: ACTIVE and SYNC were either both
set or both clear. Recent changes with reshape means that a device
can be ACTIVE but not yet fully in-sync, so they need to be handled
and included in the array as active devices.
Signed-off-by: NeilBrown <neilb@suse.de>
The full fix would be to support updating ddf metadata, but this minimal
fix just prevents the superblock from being zeroed when someone
inadvertently passes an unsupported --update option during assembly.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A previous patch moved move the '--examine --brief' reporting of
member arrays to before their containers. This breaks "mdadm -As"
assembly. So put them back, but still fix the problem addressed by
previous patch.
Signed-off-by: NeilBrown <neilb@suse.de>
The family_number field can change. The option-rom will change the
family number when it starts a rebuild process (flags a container for
rebuild). This was not seen previously as mdadm would usually start the
rebuild process, preserving the family number.
This is the mechanism that helps to prevent a prodigal array member from
being returned to its original system and cause a rebuild to go in the
wrong direction. With the change we will end up with a container that
will fail to assemble unless the device with the incompatible family
number is left out of the assembly.
So, take several actions:
1/ Convert uuid generation to use orig_family_num, being careful to
preserve the existing uuid in the case where orig_family_num is not
set (i.e. previous mdadm created imsm arrays)
2/ Set orig_family_num at Create. For arrays created by mdadm prior to
this release orig_family_num will be zero, so set it to family_num at
the first metadata write.
3/ Add checks for orig_family_num to compare_super_imsm
4/ Update the family number when initiating rebuild
5/ The option-rom mixes some random data into the family number, add
this functionality to the mdadm implementation.
Reported-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The map file needs to be updated after adding the first member array to
an Intel metadata container. The uuid for an imsm container uses the
->family_num field of the metadata. This field is static, but is only
set after the first member array has been created. Prior to this all
devices are free floating spares and do not have any information that
can identify specific container membership. At Create() time we take
the uninitialized uuid from ->get_info_super() prior to updating the
metadata. So the current result is:
# mdadm --create /dev/md/imsm /dev/sd[b-e] -n 4 -e imsm
# mdadm --create /dev/md/vol0 /dev/md/imsm -n 4 -l 0
# cat /var/run/mdadm/map
md126 /md127/0 3e03aee2:78c3c593:1e8ecaf0:eefb53ed /dev/md/vol0
md127 imsm 53d6f8b1:7a783f24:f30483c5:705c48c7 /dev/md/imsm
# mdadm -Ebs
ARRAY metadata=imsm UUID=589d2d2c:4221a54d:acb63c06:c3907f52
ARRAY /dev/md/vol0 container=589d2d2c:4221a54d:acb63c06:c3907f52
member=0 UUID=57b89b63:5cd0eae1:17dd26b3:51cc78d4
So, before we write out the new metadata check to see if the member
array uuid has changed as a result of this addition. If it has, update
its uuid in the map file and flag its parent container for updating. In
support of updating the container uuid the semantics of
->write_init_super are changed to clear any metadata specific member
array cursors (e.g. ddf_super.currentconf or intel_super.current_vol)
such that a subsequent call to ->getinfo_super returns container
information.
Reported-by: Ignacy Kasperowicz <ignacy.kasperowicz@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When performing an "-Ebs -e <metadata type>" we segfault because the
superblock has been freed too early. We also leak memory for 'ddf' and
'imsm' because, unlike super[01], we do not implicitly free when
->load_super is called on an already loaded supertype.
So, fix up imsm and ddf to match type 0 and 1 ->load_super() semantics,
and update Examine to not free the superblock until all usages have been
exhausted.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
RebuildMap wants to poll through mdstat and retrieve a (kernel name,
uuid, user name) tuple for each array. Teach imsm and ddf to honor
st->sub_array at ->load_super() time to set their internal subarray
pointers to the value specified in st->subarray, or return an error if
st->subarray specifies an invalid array.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
With --verbose, --examine --brief prints dev= information after
the personality has done its bit.
But with containers, the member array are printed in between.
So in super-ddf and super-intel, move printing of the member
arrays to before printing of the container. This avoids
confusion.
Signed-off-by: NeilBrown <neilb@suse.de>
Because ---examine --brief, or --detail --brief are
often used to create mdadm.conf, and because people don't want to
have to update their mdadm.conf unnecessarily, we don't want to
include information that might change.
And now that level changing is supported, that is almost everything
but UUID.
So move some more fields into the "Only print with --verbose" class.
Signed-off-by: NeilBrown <neilb@suse.de>
imsm arrays round down the effective array size to the closest 1
megabyte boundary so teach get_info_super_imsm and sysfs_set_array to
set 'md/array_size' if available (and make sure ddf uses the default
size).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Simple patch to silence some compile warnings that only show up on
64bit arches.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The 'work_disks' number should be the number that is expected, not the
number found so far. This is needed for Incremental assembly to
start the array at the right time.
Signed-off-by: NeilBrown <neilb@suse.de>
The variable 'i' was being used as a loop variable, and also
for something else inside the loop. So make the larger loop have a
more meaningful name.
Signed-off-by: NeilBrown <neilb@suse.de>
In some cases we should only print an error message if
'devname' is defined. In fact we were only returning
the error at all in that case!!
Signed-off-by: NeilBrown <neilb@suse.de>
If mdmon sees a device added to a container, it should assume it is
a new spare. It could be a part of the array that just hadn't been
assembled yet. So check first.
Signed-off-by: NeilBrown <neilb@suse.de>
If we haven't got hold of all the devices yet, we need to be
ready to skip over some while gathering content information.
Signed-off-by: NeilBrown <neilb@suse.de>
DDF raid6 layouts are subtly different from the standard 'md' layouts.
From 2.6.30 the kernel knows about these.
Teach mdadm about them, and also allow 'ddf' to set an appropriate default.
Signed-off-by: NeilBrown <neilb@suse.de>
All operations that rely on loading from an existing container (like
--add) will fail after a disk has been removed. Provide an option to
skip missing / offline disks rather than abort. We attempt to do this
in the load_super_{imsm,ddf}_all cases when mdmon is running i.e. we
already have a consitent version of the metadata running in the system.
Otherwise, we fail as normal and let the administrator fix up the
container.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Metadata formats like imsm work in concert with platform firmware and
hardware, so provide a way for mdadm to display this info to the user.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If, when creating an array, a signal target device is given which
is a container, then allow the metadata handler to choose which
devices to use.
This is currently only supported for DDF.
Signed-off-by: NeilBrown <neilb@suse.de>
When we create our own ddf array, store the homehost in the vendor
information so it can be so to ensure 'LOCAL' name choices.
Signed-off-by: NeilBrown <neilb@suse.de>
Both super-ddf and super-intel ignore memory allocation failures during
->activate_spare. Fix these up by cancelling the activation.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Made the mistake of recompiling the F9 mdadm rpm which has a patch to
remove -Werror and add "-Wp,-D_FORTIFY_SOURCE -O2" which turns on lots
of errors:
config.c:568: warning: ignoring return value of asprintf
Assemble.c:411: warning: ignoring return value of asprintf
Assemble.c:413: warning: ignoring return value of asprintf
super0.c:549: warning: ignoring return value of posix_memalign
super0.c:742: warning: ignoring return value of posix_memalign
super0.c:812: warning: ignoring return value of posix_memalign
super1.c:692: warning: ignoring return value of posix_memalign
super1.c:1039: warning: ignoring return value of posix_memalign
super1.c:1155: warning: ignoring return value of posix_memalign
super-ddf.c:508: warning: ignoring return value of posix_memalign
super-ddf.c:645: warning: ignoring return value of posix_memalign
super-ddf.c:696: warning: ignoring return value of posix_memalign
super-ddf.c:715: warning: ignoring return value of posix_memalign
super-ddf.c:1476: warning: ignoring return value of posix_memalign
super-ddf.c:1603: warning: ignoring return value of posix_memalign
super-ddf.c:1614: warning: ignoring return value of posix_memalign
super-ddf.c:1842: warning: ignoring return value of posix_memalign
super-ddf.c:2013: warning: ignoring return value of posix_memalign
super-ddf.c:2140: warning: ignoring return value of write
super-ddf.c:2143: warning: ignoring return value of write
super-ddf.c:2147: warning: ignoring return value of write
super-ddf.c:2150: warning: ignoring return value of write
super-ddf.c:2162: warning: ignoring return value of write
super-ddf.c:2169: warning: ignoring return value of write
super-ddf.c:2172: warning: ignoring return value of write
super-ddf.c:2176: warning: ignoring return value of write
super-ddf.c:2181: warning: ignoring return value of write
super-ddf.c:2686: warning: ignoring return value of posix_memalign
super-ddf.c:2690: warning: ignoring return value of write
super-ddf.c:3070: warning: ignoring return value of posix_memalign
super-ddf.c:3254: warning: ignoring return value of posix_memalign
bitmap.c:128: warning: ignoring return value of posix_memalign
mdmon.c:94: warning: ignoring return value of write
mdmon.c:221: warning: ignoring return value of pipe
mdmon.c:327: warning: ignoring return value of write
mdmon.c:330: warning: ignoring return value of chdir
mdmon.c:335: warning: ignoring return value of dup
monitor.c:415: warning: rv may be used uninitialized in this function
...some of these like the write() ones are not so trivial so save those
fixes for the next patch.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
md/resync_start reports different terminal values depending on kernel
configuration (~0UL versus ~0ULL). Make detection of the
resync-complete state more robust by comparing against array size.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
mdadm -I /dev/part-of-container
should add that to a container, creating if it needed,
and then try to assemble any arrays in the container.
Signed-off-by: NeilBrown <neilb@suse.de>
When we assemble an array, there are three different approaches
depending on whether metadata is internal or external, and on
kernel version.
Move all this to a common helper instead of duplicating in 3 places.
Signed-off-by: NeilBrown <neilb@suse.de>
When we first start an array, it might be good to start recovery
straight away. That requires setting the array to 'dirty', but
only the metadata handler can know if that is required or not.
So have a third possible 'consistent' option to set_array_state.
Either 'no' or 'yes' or 'you choose'.
Return value indicates what was chosen.
'1' (no) should be chosen unless there is a good reason.
Signed-off-by: NeilBrown <neilb@suse.de>
Using buffered IO risks non-atomic updates to parts of the
device that we don't actually want to write to. This isn't in
general safe.
So switch to O_DIRECT for all that IO and make sure we have
properly aligned buffers.
1/ track if there are any actual updates pending, and only
write metadata when we have changed something.
2/ when writing null virtual-configs, write full blocks,
not just the first 4 bytes. This will allow O_DIRECT
writes in a subsequent patch.
When loading the metadata for a subarray (super_by_fd), we set
->subarray to be the name read from md/metadata_version so that
getinfo_super can return info about the correct array.
With this we can differentiate between a container and
an array within the container by looking at ->subarray[0].
Only one superswitch should be externally visible for each
general type. Others which handle different flavours
(e.g. container/data-array) should be internal only.
Code in manager can now just call queue_metadata_update with a
(freeable) buf holding the update, and it will get passed to the
monitor and written out.
'container_member' isn't really a well defined concept.
Each metadata might enumerate members differently, so just
let each format /mdX/YYYY as appropriate.
I want the metadata handler to have more control over the 'version',
particularly for arrays which are members of containers.
So discard st->text_version and instead use info->text_version
which getinfo_super can initialise.
mark_dirty is just a special case of mark_clean - with sync_pos == 0.
mark_sync is not required. We don't modify the metadata when sync
finishes. Only when the array becomes non-writeable at which point we
use mark_clean to record how far the resync progressed.
From: Dan Williams <dan.j.williams@intel.com>
Added curr_state as a parameter to set_disk. Handlers look at this to
record components failures, and set global 'degraded' or 'failed'
status.
When reading the state as faulty:
1/ mark the disk failed in the metadata
2/ write '-blocked' to the rdev state to allow the kernel's failure
mechanism to advance
3/ the kernel will take away the drive's role in remove_and_add_spares()
4/ once the disk no longer has a role writing 'remove' to the rdev state
will get the disk out of array.
There is a window after writing '-blocked' where the kernel will return
-EBUSY to remove requests. We rely on the fact that the disk will
continue to show faulty so we lazily wait until the kernel is ready to
remove the disk. If the manager thread needs to get the disk out of the
way it can ping the monitor and wait, just like the replace_array()
case.
[buglet fix: swap the parameters of attr_match in read_dev_state]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>