This test adds a new unit test similar to 009imsm-create-fail-rebuild.
With the previous patches, it actually succeeds on my system.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
In order to track kernel state changes, the monitor needs to
notice changes in sysfs. If the changes are transient, and the
monitor is busy writing meta data, it can happen that the changes
are missed. This will cause the meta data to be inconsistent with
the real state of the array.
I can reproduce this in a test scenario with a DDF container and
two subarrays, where I set a disk to "failed" and then add a global
hot-spare. On a typical MD test setup with loop devices, I can
reliably reproduce a failure where the metadata show degraded members
although the kernel finished the recovery successfully.
This patch fixes this problem by applying two changes. First, when
a metadata update is queued, wait until it is certain that the monitor
actually applied these meta data (the for loop is actually needed to
avoid failures completely in my test case). Second, after triggering the
recovery, set prev_state of the changed array to "recover", in case
the monitor misses the transient "recover" state.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Correctly print out wake reason if it was a signal. Previous code
would print misleading select events (pselect(2) man page says the
fdsets become undefined in case of error).
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
read_and_act() currently prints a debug message only very late.
Print the status seen by mdmon right away, to track mdmon's
actions more closely. Add a time stamp to observe long delays
between read_and_act calls, e.g. caused by meta data writes.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Adds more verbose debugging in ddf_set_disk, to understand failures
better.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Try to determine problem if load_ddf_header fails. May be useful
for determining compatibility problems with Fake RAID BIOSes.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
I needed this for tracking a bug with wrong offsets after array
creation.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
In particular, include refnum for better tracking. This makes
it a little easier for humans to track what happened to which disk.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Move the check for good drives in the dl loop - otherwise dl
may be NULL and mdmon may crash.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
For RAID10, 'sync' numbers go up to the array size rather than the
component size. is_resync_complete() needs to allow for this.
Reported-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Coverity discovered a possible double close(fd2) in Grow.c. Avoided by
invalidating fd2 after the first close.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Currently the extra space to leave before the data in the array
is calculated in two separate places, and they can be inconsistent.
Instead, do it all in validate_geometry. This records the
'data_offset' chosen which all other devices then use.
'write_init_super' now just uses the value rather than doing all the
calculations again.
This results in more consistent numbers.
Also, load_super sets st->data_offset so that it is used by "--add",
so the new device has a data offset matching a pre-existing device.
Signed-off-by: NeilBrown <neilb@suse.de>
_avail_space1() is calls from both avail_space1() and validate_geometry1()
and does slightly different things.
The partial code sharing doesn't really help. In particularly the
responsibility for setting the size of the array is currently
confused.
So duplicate the code into the two locations - one where 'super' is
always NULL (validate_geometry1) and one where it is never NULL
(avail_space1), and simplify.
No behaviour change - just code re-organisation.
Signed-off-by: NeilBrown <neilb@suse.de>
This call to validate_geometry is really rather gratuitous.
It is purely about the fact that super0 cannot use more than 4TB.
So just make it an explicit test - less confusing that way.
With this, validate_geometry is only called from Create, which
makes it easier to reason about.
Also validate_geometry is now never passed NULL for the 'chunk'
parameter, so we can remove those annoying tests for NULL.
Signed-off-by: NeilBrown <neilb@suse.de>
Metadata updates for secondary RAID (RAID10) need to cover
all BVDs. Compare with code in write_init_super_ddf().
Signed-off-by: NeilBrown <neilb@suse.de>
If we will need to change array level when a reshape completes, a copy
of mdadm waits in the background.
Currently this copy hold the device (/dev/mdX) open. This prevents
the array from being stopped.
So close the file descriptor and re-open after the reshape completes.
Signed-off-by: NeilBrown <neilb@suse.de>
The "data_size" is with respect to "data_offset". When the kernel
changes "data_offset" it modifies "data_size" to match - see
md_finish_reshape() in the kernel.
So when mdadm switches the data_offset for the new data_offset, it
must update data_size correspondingly.
Signed-off-by: NeilBrown <neilb@suse.de>
Let the first created array be RAID5 rather than RAID0. This makes
the test harder than before, because everything after the first
Create has do be done indirectly through mdmon.
Signed-off-by: NeilBrown <neilb@suse.de>
Fix a bug that caused the wrong conf record to be used to derive
data offset and size on secondary RAID (RAID10).
Signed-off-by: NeilBrown <neilb@suse.de>
Factor out single disk from __write_init_super_ddf to a new function
_write_super_to_disk. Use this function in store_super_ddf.
Signed-off-by: NeilBrown <neilb@suse.de>
Increase seq number only when there's actually a metadata change.
This is better then increasing it at every write.
This also fixes another endianness bug.
Signed-off-by: NeilBrown <neilb@suse.de>
This was waiting on "reshape_position" which doesn't
get update events.
Before sysfs_wait was introduced, the code to wait didn't
wait at all, so it spun.
With sysfs_wait, it would wait forever.
Change to wait in sync_completed which does get events.
Signed-off-by: NeilBrown <neilb@suse.de>
* Aligned the spelling of “RAID” to use captial letters in all places.
* Aligned the spelling of the RAID level names (LINEAR, RAID1, …) to use capital
letters in all places, except for the string “faulty” in places where not the
RAID level was meant.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
If we stop too soon after reshape starts (probably only during
testing), we can get confused by the status of the reshape.
If that might be happening - sleep a bit longer.
Also allow for reshape going unusually slowly (again, probably only
during testing).
Signed-off-by: NeilBrown <neilb@suse.de>
Having a fix time for a wait is clumsy and can make us
wait much too long.
So use mdstat_wait and keep the mdstat_fd open.
This requires an 'mdstat_close' so it doesn't stay open
forever.
Signed-off-by: NeilBrown <neilb@suse.de>
We only need 'hold' if we want to mdstat_wait for a change.
These two callers don't care about a change, so they shouldn't
use the 'hold' flag.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently we compare fields between primary and secondary
superblocks, before we check if the primary is even valid.
This is a bit backwards, so reverse it.
Signed-off-by: NeilBrown <neilb@suse.de>
The "indirect" code path for adding VDs was not working correctly
for secondary RAID level. The "other BVDs" were not transmitted
to mdmon. Thus mdmon wouldn't build up correct information, and
RAID creation would fail when mdmon was already running on the container.
This patch fixes this.
Signed-off-by: NeilBrown <neilb@suse.de>