This is a bit of a hack and the code need to be made more
general. But this adds the special case of a RAID0 being
reshaped which looks like a RAID4 but doesn't need as many
devices.
Signed-off-by: NeilBrown <neilb@suse.de>
If we open with O_EXCL before checking that the device is one that
we really want, then that could cause some other process to think
the device is busy when it isn't really.
This particularly affects running "mdadm -A devname" in parallel for
different arrays. One might be looking at a device that it won't
end up using while another trys and fails to look at a device that
it needs.
So delay the O_EXCL until after all identity checks.
Multiple "mdadm -As" will still have races, but that is fundamentally
racy anyway.
Signed-off-by: NeilBrown <neilb@suse.de>
When added disk is disk added by expansion and this is last disk added
to array, assemble_container_content() will not even try to run such array.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If we have to --force assembly during reshape, we need to
check by the 'before' and 'after' cases to make sure there
are enough devices.
Reported-by: Richard Herd <2001oddity@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
It can easily be calculated from 'avail' and 'raid_disks', and we
will soon have a case where we don't have it easily available to pass
in.
Signed-off-by: NeilBrown <neilb@suse.de>
This fields doesn't work any more as ->getinfo_super clears the info
structure at an awkward time. So get rid of it and do it differently.
The issue is that the metadata handler cannot tell if the uuid it has
was randomly generated or explicitly requested, except on the first
call.
And we don't want to accept explicit requests for IMSM.
So when it was auto-generated, make it look distinctive by having the
same int copied in all 4 positions. If someone requests a uuid like
that, I guess they get away with it.
Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The problem occurs when array under migration is assembled incrementally.
st->update_tail is not initialized in function
assemble_container_content() and during reshape
the checkpoint information in metadata is not being updated.
The value of st->update_tail is now initialized in function
assemble_container_content() and during reshape the checkpoint
information in metadata is being updated correctly on all disks.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
container_content retrieves volume information from disks in the
container. For unsupported volumes the function was not returning
mdinfo. When all volumes were unsupported the function was returning
NULL pointer to block actions on the volumes. Therefore, such volumes
were not activated in Incremental and Assembly. As side effect they
also could not be deleted using kill-subarray since "kill" function
requires to obtain a valid mdinfo from container_content.
This patch fixes the kill-subarray problem by allowing to obtain
mdinfo of all volumes types including unsupported and introducing new
array.status flags.
There are following changes:
1. Added MD_SB_BLOCK_VOLUME for blocking an array, other arrays in the
container can be activated.
2. Added MD_SB_BLOCK_CONTAINER_RESHAPE block container wide reshapes
(like changing disk numbers in arrays).
3. IMSM container_content handler is to load mdinfo for all volumes
and set both blocking flags in array.state field in mdinfo of
unsupported volumes. In case of some errors, all volumes can be
affected. Only blocked array is not activated (also reshaped as
result). The container wide reshapes are also blocked since by
metadata definition they require modifications of both arrays.
4. Incremental_container and Assemble functions check array.state and
do not activate volumes with blocking bits set.
5. assemble_container_content is changed to check container wide reshapes
before activating reshapes of assembled containers.
6. Grow_reshape and Grow_continue_command checks blocking bits
before starting reshapes or continueing (-G --continue) reshapes.
7. kill-subarray ignores array.state info and can remove requested array.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
So far there were 2 reshape continuation cases:
1. array is started /e.g. reshape was already invoked during initrd
start-up stage using "--freeze-reshape" option/
2. array is not started yet /"normal" assembling array under reshape case/
This patch narrows continuation cases in to single one. To do this
array should be started /set readonly in to array_state/ before calling
Grow_continue() function.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reshape can be run for monitored arrays only /external metadata case/.
Before reshape can be executed, make sure that just starter array/container
is monitored. If not, run mdmon for it.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When container is assembled while reshape is active on one of its member
whole container can be required to be blocked from monitoring.
For such purpose field recovery blocked is added to mdinfo structure.
When metadata handler finds active reshape in container it should set
recovery_blocked field to disable whole container monitoring during
reshape.
For arrays that doesn't use containers, recovery_blocked field
has the same value as reshape_active field e.g. super0/1.
In fact,recovery is blocked during reshape for such arrays.
For ddf, metadata handler doesn't set reshape_active field,
so recovery_blocked is not set also.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
During initrd phase continuing reshape will cause file system context
lost. This blocks ability to control reshape using checkpoints.
To avoid this, during initrd phase assemble has to be executed with
'--freeze-reshape' option. This causes that mdadm restores reshape
critical section only.
Reshape can be continued later after system full boot.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reshape backup should be able to be restored during reshape continuation
also. To reuse already existing code it is moved to function.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
For fdlist pointer allocated in assemble_container_content() function,
free() is never called. This patch fixes this memory leak.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The 'best' array only has 'bestcnt' entries allocated, so 'i' should
always be "< bestcnt", not "<= bestcnt".
Reported-by: "Lawrence, Joe" <Joe.Lawrence@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Patch introduces support for reshape process restart for external metadata
using metadata specific data handling methods.
It introduces recover_backup() function that restores array to stable state
It is equivalent to Grow_restart() functionality for native metadata.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Some code currently clears 'info' before calling getinfo_super,
some code doesn't.
To be consistent, change it so no caller ever clears 'info',
but ever getinfo_super function must clear it.
Note that ->raid_disk may be meaningful if that 'map' is passed
non-NULL. In that case it is copied out before the structure
is zeroed.
Signed-off-by: NeilBrown <neilb@suse.de>
When array is in reshape state raid_disks field contains final disks number.
To know how many disks were added, disk.raid_disk index has to be compared
against old disk number computed using delta_disks.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If a degraded dirty array has some superblocks which are clean and
others that are dirty, and the dirty ones are newer by precisely '1'
in the event count, then the current code to force the array to be
clean will not work.
We need to make sure to find a superblock with most recent event count
and force that one to be 'clean'.
Reported-by: A J Wyborny <ajwyborny@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When for ping_monitor() input devnum2devname() is used,
received string pointer should be passed to free() for memory release.
It is not made in several places. This use case should have function
to avoid memory leak.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Currently whole array geometry is set in sysfs_set_array(),
so none of disks (even for expansion) should fail during sysfs_add_disk()
Due to this expansion counter should be used for reshaped array when
disk slot is bigger than number of disks in array.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When array under reshape is assembled it has to be disabled from
monitoring as soon as possible. It can occur that this is i.e second
array in container and mdmon is loaded already.
Lack of blocking monitoring can cause change array state to active,
and reshape continuation will be not possible.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
As containers can now grow, we need to use both Grow_restart (to
replay any backup-file) and Grow_continue when assembling the content
of a container.
Note that we don't pass a backup-file when doing incremental assembly.
If such is needed in that case, the assembly will fail.
To restart such arrays, explicit assembly is required.
Signed-off-by: NeilBrown <neilb@suse.de>
assemble_container_content() cannot close mdfd handle, as it could be
required by reshape continuation.
mdfd handle is closed outside this function, when it is not longer
necessary.
Call to Grow_continue is added for reshape continuation after
assembly.
In the nearest future, simple condition:
if (content->reshape_active)
before Grow_continue() call will be replaced by check function
for support container operation /reshape/.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
During expansion there is more working disks that array can have.
Disks with set raid_disk (not a spare disk) during reshape should be counted
to allow array state transition to read_only state.
Array reconfiguration to new geometry should be done before reshape will
be started.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When the metadata doesn't identify which array a spare belongs to
we normally require an explicit domain match to connect a spare
with an array.
However when the spare is explicitly listed in argv, it should be
safe to include as long as there is no domain conflict.
Signed-off-by: NeilBrown <neilb@suse.de>
Sometime we will need to know the difference between no domains found
and domains didn't match.
So allow domain_test to return different values and fix up all callers
to maintain current behaviour.
Signed-off-by: NeilBrown <neilb@suse.de>
If we find a device that has not superblock, we currently fail
unless in auto_assem mode.
However we really should only fail if the device was explicitly listed
in the arg list. So add a test for that.
Signed-off-by: NeilBrown <neilb@suse.de>
When there are any arrays in config file the spares with
domain not matching any array are not assembled because
auto assembly is not attempted.
Addition of ARRAY line with uuid=0:0:0:0 in config will work
with modified condition for gathering spares.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If we find spares but no members of given array
we create container with just spares.
This allows auto assemble to pick up all lose imsm spares when there
is no config file.
When there is a valid config file and any array is assembled from it
we don't try auto assembly so we will not assemble spares that don't
match any array.
To remedy this we must add
ARRAY metadata=imsm UUID=00000000:00000000:00000000:00000000
to config file.
This container will include all remaining spares.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Policy must be read on all disks identified as array members
to get array's domains list.
Currently it is only read on first array member in auto assembly mode.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Imsm spare will only be taken if it matches domain of
identified members of currently assembled array.
This implies that:
- spare with null domain will match first array assembled.
- if array has null domain then no spare will match
If we allow spares to set st they may block assembly of subarrays.
This is because in auto-assembly tmpdev->used=0 for a spare not matching
any array. If we find such spare before container and set st, the content
will not get assembled.
We allow uuid_zero match any uuid in assembly as unsuitable spares will
be rejected on domain check.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We need to refuse to assemble an arrays with bad blocks.
Initially there was condition in container_content function
that returns error value in the case when metadata store information
about bad blocks.
When the container_content function is called from functions NOT connected
with assemble (Kill_subarray, Detail) we get faulty error return value.
Patch introduces new flag in array.status - MD_SB_BBM_ERRORS. It is set
in container_content when bad blocks are detected and can be checked by
container_content caller.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Though not having the proper backup file can cause data corruption, it
is not enough to justify not being able to start the array at all.
So allow "--invalid-backup" to be specified which says "just continue
even if a backup cannot be restored".
Signed-off-by: NeilBrown <neilb@suse.de>
An attempt to invoke super_by_fd() on device that has
metadata_version="none" always matches super0 (as test_version is "").
In Assemble() it results in segfault when load_container is invoked
(=null for super0).
As of now load_container is only started if it points to valid pointer.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We shouldn't call remove_partitions until we have made a really firm
decision to include the device into the array.
Signed-off-by: NeilBrown <neilb@suse.de>
Incremental assembly works on such an array because the kernel sees the
disk as in-sync and that the array is reshaping. Teach Assemble() the
same assumptions.
This is only needed on kernels that do not initialize ->recovery_offset
when activating spares for reshape.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When checking that a container matches the required uuid,
we need to call 'getinfo_super' before we have a 'content'
to test.
Reported-by: "Czarnowska, Anna" <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Separate the load_container call from the load_super call,
and use different validity tests as appropriate.
Add some general code tidying and a bit of indent change to make
structure a little clearer.
Signed-off-by: NeilBrown <neilb@suse.de>
This is somewhat inconsistent with the last member of a
container getting special handling.
Just simplify it so the code seems to make sense and important
is easy to follow.
Signed-of-by: NeilBrown <neilb@suse.de>