In repair mode, raid6check will rewrite one single stripe
by regenerating the data (or parity) of two raid devices that
are specified via the command line.
If you need to rewrite just one slot, pick any other slot
at random.
Note that the repair option will change data on the disks
directly, so both the md layer above as well as any layers
above md (such as filesystems) may be accessing the stripe
data from cached buffers. Either instruct the kernels
to drop the caches or reassemble the raid after repair.
Signed-off-by: NeilBrown <neilb@suse.de>
We currently have rather hard-to-follow loop to iterate
through all the matches for 'missing' or 'faulty' or 'detached'.
Simplify it by creating a list of possible devices for each
of those and splicing the new list into the device list.
This removes the need for 'jnext' and 'next' and various other hacks.
Signed-off-by: NeilBrown <neilb@suse.de>
Both are impossible, and '1' allows size to be unsigned,
which is neater.
Also #define MAX_SIZE to be '1' to make it all more explicit.
Signed-off-by: NeilBrown <neilb@suse.de>
The value of 'verbose' is sometimes mixed into 'brief', particularly
for Examine.
This is messy and confusing. So keep them separate.
'brief' still gets assumed when 'scan' is set, unless we are very
verbose.
Signed-off-by: NeilBrown <neilb@suse.de>
If we change some functions to accept 'verbose', where <0 means to be
quiet, in place of 'quiet', then we will be able to merge
'quiet' and 'verbose' together for simplicity.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than passing a long list of little flags etc to various
functions we will soon pass a small collection of structures.
This first step combines a collection of variables local to
'main' into a single structure.
Signed-off-by: NeilBrown <neilb@suse.de>
malloc should never fail, and if it does it is unlikely
that anything else useful can be done. Best approach is to
abort and let some super-daemon restart.
So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit. Then use those
removing all the tests for failure.
Also replace all "malloc;memset" sequences with 'xcalloc'.
Signed-off-by: NeilBrown <neilb@suse.de>
As we don't allow '-K' for '--zero-super' there is no point
using it internally. Just define a 'KillOpt' like with
other options.
Signed-off-by: NeilBrown <neilb@suse.de>
-t aka --takeover should not be setting container_name.
It sets it to NULL which causes failure when you try
mdmon --all --takeover
Signed-off-by: NeilBrown <neilb@suse.de>
The tests here were specific to 0.90 metadata and didn't
work properly for 1.x metadata, where a device's "number"
doesn't change.
By checking if this is a new array we can avoid some
corner cases. Then we test mostly based on state and
not based on 'number' at all.
Signed-off-by: NeilBrown <neilb@suse.de>
->percent sometimes stores negative values recording states
like 'pending' or 'delayed'.
The value '-2' means both 'delayed' and in Monitor, 'unknown'.
Also, '-1' has a meaning but not #define.
So change the #defines to be prefixed with "RESYNC_", instead
of "PROCESS_", add new "_NONE" and "_UNKNOWN", and use correct
value in each location.
Signed-off-by: NeilBrown <neilb@suse.de>
A RAID10 can be though of as having 2 sets of devices
(if there are 2 copies and an even number of devices in total).
With this patch "mdadm --detail" shows which 'set' each device
belongs to - set-A or set-B.
If there are more than 3 copies, there can be more than 3 sets.
If the number of copies does not evenly divide the number of devices,
there are not distinct 'sets' so none are reported.
Signed-off-by: NeilBrown <neilb@suse.de>
To match the add-bitmap tests, here is a set of tests checking the
removal of bitmaps.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This set of tests verifies that it is possible to add an internal
bitmap to an existing array, and that the device can be written to
after the bitmap is added. This should catch cases such as the one
fixed by 4474ca42e2577563a919fd3ed782e2ec55bf11a2
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
b8e91a32cd was applied incorrectly.
It changed the name of the variable set when specifying --no-error,
without changing the places checking it.
Set it back as it was to make --no-error work correctly again.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We do not check the return value of sysfs_get_ll() now. It is wrong.
If reading of the sysfs "degraded" key does not succeed,
the "new_degraded" variable will not be initiated
and accidentally it can have the value of "degraded" variable.
In that case the change of degradation will not be checked.
It happens if mdadm is compiled with gcc's "-fstack-protector" option
when one tries to stop a volume under reshape (e.g. OLCE).
Reshape seems to be finished then (metadata is in normal/clean state)
but it is not finished, it is broken and data are corrupted.
Now we always check the return value of sysfs_get_ll().
Even if reading of the sysfs "degraded" key does not succeed
(rv == -1) the change of degradation will be checked.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
In function write_init_super1():
If "rv = store_super1(st, di->fd)" return error and the di is the last.
Then the di = NULL && rv > 0, so exec:
if (rv)
fprintf(stderr, Name ": Failed to write metadata to%s\n",
di->devname);
will be segmentation fault.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>