Since the introduction of replacement devices, we reserve
to places in the "disks" array for each raid disk.
That means we should allocate to twice "max_disk" as the array
could have that many raid_disks (though that would limit the
number of replacements).
A couple of other places need to use "max_disks*2" instead of
"max_disks" to co-ordinate with this.
Reported-by: Or Sagi <ors@reduxio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
dm devices which only have a single underlying md device
will respond to md ioctls as though they were that md device.
This can confuse mdadm and lead it to violating its segments.
So add tests for NULL where appropriate. You might not get exactly
the right answer when you "mdadm -D" a dm device, but at least it won't
crash now.
Reported-by: Willy Weisz <Willy.Weisz@univie.ac.at>
Resolves: https://bugzilla.novell.com/show_bug.cgi?id=887821
Signed-off-by: NeilBrown <neilb@suse.de>
Have mdadm --Detail --brief --verbose print the list of devices in
alphabetical order.
This is useful for debugging purposes. E.g. the test script
10ddf-create compares the output of two mdadm -Dbv calls which
may be different if the order is not deterministic.
(I confess: I use a modified "test" script that always runs
"mdadm --verbose" rather than "mdadm --quiet", otherwise this
wouldn't happen in 10ddf-create).
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Array can be inactive when e.g. -I is in the process of assembling them.
This change allows --detail to report limited information about
these arrays.
Signed-off-by: NeilBrown <neilb@suse.de>
This pair of options should give a --brief listing including devices=
information. But recent changes to flag passing broke this.
So fix it.
Signed-off-by: NeilBrown <neilb@suse.de>
Without calling load_container at this point, the
info structure may be missing some important information.
In particular, information about secondary DDF RAID levels
may be wrong if information is only read from a single disk.
If this fails, fall back to the previous code.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
If externally menaged metadata is in use, array.major_version will
be zero, so the test here to consider using get_component_size()
is wrong. So if sra is present, use the major_version from there.
Signed-off-by: NeilBrown <neilb@suse.de>
--detail needs to be read to report 2 devices in each slot,
and --examine need to report if the device is the original or
the replacement.
Signed-off-by: NeilBrown <neilb@suse.de>
If a device is faulty, then that is all there is too it.
Even if it isn't 'removed' yet, it shouldn't be reported as 'spare'
or 'rebuilding'.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ When printing the "name=" entry for --brief output,
enclose name in quotes if it contains spaces etc.
Quotes are already supported for reading mdadm.conf
2/ When a name is used as a device name, translate spaces
and tabs to '_', as well as the current translation of
'/' to '-'.
Signed-off-by: NeilBrown <neilb@suse.de>
Variable 'err' is initially set to 1, so changing its value with
'|=' won't set it to 0 even if the operation is successful.
Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If disk has been removed, 'st' and 'info' can be NULL. It causes segfault.
'st' and 'info' should be checked against being NULL before being used.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If we change some functions to accept 'verbose', where <0 means to be
quiet, in place of 'quiet', then we will be able to merge
'quiet' and 'verbose' together for simplicity.
Signed-off-by: NeilBrown <neilb@suse.de>
malloc should never fail, and if it does it is unlikely
that anything else useful can be done. Best approach is to
abort and let some super-daemon restart.
So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit. Then use those
removing all the tests for failure.
Also replace all "malloc;memset" sequences with 'xcalloc'.
Signed-off-by: NeilBrown <neilb@suse.de>
->percent sometimes stores negative values recording states
like 'pending' or 'delayed'.
The value '-2' means both 'delayed' and in Monitor, 'unknown'.
Also, '-1' has a meaning but not #define.
So change the #defines to be prefixed with "RESYNC_", instead
of "PROCESS_", add new "_NONE" and "_UNKNOWN", and use correct
value in each location.
Signed-off-by: NeilBrown <neilb@suse.de>
A RAID10 can be though of as having 2 sets of devices
(if there are 2 copies and an even number of devices in total).
With this patch "mdadm --detail" shows which 'set' each device
belongs to - set-A or set-B.
If there are more than 3 copies, there can be more than 3 sets.
If the number of copies does not evenly divide the number of devices,
there are not distinct 'sets' so none are reported.
Signed-off-by: NeilBrown <neilb@suse.de>
Both --detail and --monitor can report the names of member
devices on an array, and do so by searching /dev and finding
the shortest name that matches.
If
--prefer=foo
is given, they will instead prefer a name that contain /foo/.
So
mdadm --detail /dev/md0 --prefer=by-path
will list the component devices via their /dev/disk/by-path/xxx
names.
Signed-off-by: NeilBrown <neilb@suse.de>
It can easily be calculated from 'avail' and 'raid_disks', and we
will soon have a case where we don't have it easily available to pass
in.
Signed-off-by: NeilBrown <neilb@suse.de>
Initially there is no proper translation mdstat's DELAYED/PENDING processes
to "--detail" output.
For example, if we have recover=DELAYED in mdstat, "--detail"
shows "State: recovering" and "Rebuild Status = 0%".
It was incorrect in case of process waiting on checkpoint different
than 0%. In fact rebuild status is differnt than 0% and user is misled.
The patch fix the problem. Current "--detail" command shows
in the exampe: "State: recovering (DELAYED)" and no information
about precentage.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Since info->delta_disks is signed it doesn't need to be special-cased.
This allowed my 9->8 reshape to display correctly instead of as 8->7
Signed-off-by: NeilBrown <neilb@suse.de>
uuid_match_any is replaced by uuid_zero for imsm spares.
Function fixup_container_spare_uuid not needed as it gives
unwanted uuid to spares.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Precludes needing to deduce this information later, like in Detail.c and
soon in Grow.c.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Now that we can ask container_content for a specific subarray,
we don't need to pass the subarray name to load_super, and have it
secretly modify the returned state.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than hiding this in the 'st', return it explicitly.
In the one case we still need it, copy it into st where needed.
This will disappear in a future patch.
Signed-off-by: NeilBrown <neilb@suse.de>
To accurately detect when an array has been split and is now being
recombined, we need to track which other devices each thinks is
working.
We should never include a device in an array if it thinks that the
primary device has failed.
This patch just allows get_info_super to return a list of devices
and whether they are thought to be working or not.
Signed-off-by: NeilBrown <neilb@suse.de>
Detail: report reshape and check as well as resync and recovery
Wait: if the resync is pending or delayed, wait for that too.
Signed-off-by: NeilBrown <neilb@suse.de>
The loop for loading it was hard to follow, so restructure that
and avoid a theoretical use-before-set error.
Also there was a second 'info' structure which hid the first and was
pointless.
Signed-off-by: NeilBrown <neilb@suse.de>
When left-shifting we must be sure that the value being
shifted is large enough to not lose bits.
The 'chunkssize' in CreateBitmap is only 'long' so it
can overflow. So cast to 'long long' first.
Also fix a similar issue in Detail even though it isn't currently
being compiled.
Signed-off-by: NeilBrown <neilb@suse.de>
Reported-by: Tomasz Chmielewski <mangoo@wpkg.org>
We already have a call to 'enough' in Detail which is the check for
"do we have enough devices". We just need to calculate the required
data a bit earlier, then use the same 'enough' call to possibly
print FAILED.
This is motivated by Debian bug 495755.
The other request in that bug is not practical.
It would be very nice if output of `mdadm' is more clear in case of a
broken array.
Currently the only hint you get from `mdadm' that your array is broken
is this:
# mdadm -A /dev/md0 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
mdadm: /dev/md0 assembled from 1 drive and 3 spares - not enough to start the
array.
It could say something like `Your array is broken, you can't use it anymore'
It is not valid to report that array as 'broken' if the user hasn't
listed all the devices, which could be the case here.
Resolves-Debian-Bug: 495755
Signed-off-by: NeilBrown <neilb@suse.de>
As mdadm is normally a short-lived program it isn't always necessary
to free memory that was allocated, as the 'exit()' call will
automatically free everything. But it is more obviously correct if
the 'free' is there.
So this patch add a few calls to 'free'
Signed-off-by: NeilBrown <neilb@suse.de>