The order of devices used for the syndrome calculation is not
the same as the order of data in the array.
The D block immediately after Q is first, then they continue
cyclicly in raid-disk order, skipping over the P disk if it is seen.
This gets the 'check' right for all layouts other than DDF, which is
quite different.
I haven't confirmed that this does't break repair.
Signed-off-by: NeilBrown <neilb@suse.de>
revert-inplace would sometimes find that the original reshape had
finished.
So slow down the reshaping during --stop (which needs to be a little
bit fast so that stop doesn't timeout waiting) and don't wait quite
so long before stopping.
Signed-off-by: NeilBrown <neilb@suse.de>
This checks that raid6check finds no errors in newly created array
with all different layouts.
(it doesn't...)
Signed-off-by: NeilBrown <neilb@suse.de>
If --save-logs is given we already save all logs to --logdir
If not, we should still save erroneous logs to --logdir.
Signed-off-by: NeilBrown <neilb@suse.com>
Some actions only appear in /proc/mdstat after a little delay,
so check in sync_action as well.
This applies when checking for recovery etc, and when waiting for idle.
Signed-off-by: NeilBrown <neilb@suse.de>
If the array is reshaping to more devices, then stopping
during that initial critical section is a bad idea.
So check for it and wait a bit.
Should probably handle final critical section of a reduction
too.
same-size reshape should be handled correctly already.
Signed-off-by: NeilBrown <neilb@suse.de>
A race can allow 'completed' to read as 2^63-1, which takes
a long time to count up to.
So guard against that possibility.
Signed-off-by: NeilBrown <neilb@suse.com>
If Wait() finds the array resync is 'frozen', then wait
a little while to avoid races, but don't wait forever.
Signed-off-by: NeilBrown <neilb@suse.com>
If a read fills the whole buffer, then we possibly
missed something of the end, and we definitely shouldn't
put a '\0' beyond the end, so just return an error.
This should never happen anyway.
Signed-off-by: NeilBrown <neilb@suse.com>
A 'devnm' never starts with '/', so this test is pointless.
The code should use the passed-in devname unless it is clearly
not usable. So fix it to do that.
Signed-off-by: NeilBrown <neilb@suse.de>
These both have the same value, and have done since the
'devnm' concept was introduced.
So discard the pointless duplicate.
Signed-off-by: NeilBrown <neilb@suse.de>
'recover' etc doesn't appear in /proc/mdstat immediately.
The "sync" thread must be started first.
But 'sync_action' shows it as soon as MD_RECOVERY_NEEDED is set
in the kernel. So look there too.
Now maybe I can get rid of some of those silly 'sleep' calls.
Signed-off-by: NeilBrown <neilb@suse.de>
If an array is being reshaped using backup space on a 'spare' device,
then
mdadm --grow --continue
won't find it as by the time it runs, nothing looks like a spare are
more. The spare has been added to the array, but has no data yet.
So allow reshape_prepare_fdlist to find a newly-incorporated spare and
report this so it can be used.
Reported-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When the array is stopped during a critical section, we sometimes
erase the backup, which is bad.
This happens when 'completed' is zero.
This can happen easily when 'stop' freezes reshape.
So try to be more careful and check 'reshape_position'.
Signed-off-by: NeilBrown <neilb@suse.de>
Appologies if this is the wrong mailing list for this patch.
This is a very small patch for the manual page for the mdadm utility.
Thanks,
Andrew
Signed-off-by: NeilBrown <neilb@suse.de>
Function add_new_arrays() expects that function get_md_name() should
return pointer to devname, but also get_md_name() may return NULL. So
check the pointer before use it in add_new_arrays().
Signed-off-by: Sergey Vidishev <sergeyv@yandex-team.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
sometimes these can get left around, and udev can be looking
at them at awkward times so they don't disappear.
So be forceful.
Signed-off-by: NeilBrown <neilb@suse.de>
Some old kernels set 'completed' to '0' too soon.
But modern kernels don't.
And when 'mdadm --stop' freezes and resume the grow,
'completed' goes back to zero briefly, which can confuse this
logic.
So only think '0' might be wrong from an old kernel when
the reshape has gone idle.
Signed-off-by: NeilBrown <neilb@suse.de>
EBUSY can be returned if something has recently happened
to cause md to want to check if recovery is needed, but hasn't
had a chance yet.
This can easily happen in testing.
So retry a few times in that case.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ use correct data-offset for cmp - that has changed.
2/ flushbufs on the block device before reading to avoid cache issues
Signed-off-by: NeilBrown <neilb@suse.de>
I don't really know why this is needed, but there is a delay
between the reshape finishing and the level/etc changing.
So add some sleeps.
Signed-off-by: NeilBrown <neilb@suse.de>
The current sleep/wait doesn't seem long enough,
particularly when two arrays are being reshaped in the one
container.
So wait a bit more...
Signed-off-by: NeilBrown <neilb@suse.de>
We might be trying to set_new_data_offset() for RAID10, when it is
a necessary requirement, or for RAID5 where it is optional.
In the latter case, a message about metadata versions is no helpful.
Signed-off-by: NeilBrown <neilb@suse.de>
avail_size1 requires ->sb, so we must only call it if ->sb
was loaded.
If ->sb wasn't loaded, then we are only proceding on the basis that
the kernel might be able to work something out - we don't need to
do any tests on size.
Reported-by: Christoffer Hammarström <christoffer.hammarstrom@linuxgods.com>
Signed-off-by: NeilBrown <neilb@suse.de>
URL: https://bugs.debian.org/784874
This can report non-zero if there was nothing to do,
and that isn't really an error.
If the array doesn't get started, something else
will complain.
Signed-off-by: NeilBrown <neilb@suse.de>
This is a very corner-case, but the self-tests tripped on it,
and it makes sense not to trust the uuid when it is being changed.
Signed-off-by: NeilBrown <neilb@suse.de>
Since commit 30bee0201, the anchor is updated from the active
DDF header. This requires fixing the header type before the
anchor is written.
The LSI Software RAID code will reject DDF meta data with wrong
anchor type and will erase all meta data when it encounters
such a broken anchor. Thus starting Linux md once on a system
with LSI RAID BIOS may cause the meta data to get destroyed.
Signed-off-by: NeilBrown <neilb@suse.de>
"--wait" will return non-zero status if it didn't need to wait.
This is no a reason to fail a test.
So ignore the return status from those commands.
Signed-off-by: NeilBrown <neilb@suse.de>
We 'active_disks' does not count spares, so if array is rebuilding,
this will not necessarily find all devices, so may report an array
as failed when it isn't.
Counting up to nr_disks is better.
Signed-off-by: NeilBrown <neilb@suse.de>