Implement kill_subarray, for mdmon running and not running.
The way Kill_subarray() is implemented, this requires that the
DDF layer uses "currentconf" to remember the last subarray
queried with container_content(), and use it as the one to kill.
I don't like this much but IMSM does it the same way.
Signed-off-by: NeilBrown <neilb@suse.de>
commit d682f344 inserted this call to "Kill" in write_init_super_ddf:
"Matching the functionality already in super0 and super1, when
we first create a container, remove any other recognisable metadata to
ensure it doesn't cause confusion."
But we should do this only at first container creation, not when
subarrays are created later.
Signed-off-by: NeilBrown <neilb@suse.de>
The kernel docs state that meta data is never written in states
clear, inactive, suspended, readonly, and read_auto.
Why should this be different for containers?
We need to write metadata when the array is disabled, though.
Tested with the DDF (10*) and IMSM (9*) tests, works.
Signed-off-by: NeilBrown <neilb@suse.de>
This patch adds RAID10 support to the DDF test script.
It actually passes!
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
With secondary RAID level, disks may belong to other BVDs in
a given conf record. This needs to be taken into account
when fixing the vlist.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
For secondary RAID, we need to check which BVD needs to be updated.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
For debugging DDF structure changes, it is important to be able
to identify VCs by their GUIDs.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
If there are already disks in the container, reserve the same amount
of workspace as the first disk. Fill in the primary/secondary/
workspace LBA values.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The used slots in the phys disk table aren't necessarily the
first ones. Rather, unused entries are represented by entries
where the GUID is all 0xff.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Use get_pd_index_from_refnum() in get_extents to determine
matching VD. This will ensure RAID 10 (secondary RAID level)
support, too.
This also fixes a bug in the previous get_extents() code (missing
__be16_to_cpu for conf.prim_elmnt_count).
DDF test case (10ddf-create) verified.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Remove the lba_offset field from struct vcl. This field acted as
a "cache" for the address of the lba_offset field in the vd_config
structure. This isn't useful any more if there are multiple
vd_configs in a vcl.
This patch also adds __cpu_to_be64 in two places where it has been
quite obviously forgotten (ddf_set_disk, ddf_activate_spare).
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Make validate_geometry_ddf() use the same logic to check supported
RAID levels that init_super_ddf_bvd() uses.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Properly initialize the data structures of the other BVDs
in Create().
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Instead of allocating the other_bvds array and every element
separately, allocate all in a single chunk. Also, move allocation
in a subroutine as it's used in several places.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Support for RAID 10 makes it necessary to rewrite the algorithm
for deriving DDF layout from MD layout. The functions level_to_prl
and layout_to_rlq are combined in a single function that takes
md layout parameters and converts them to DDF.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
layout_ddf2md() is a new RAID layout conversion routine.
It obsoletes the previous separate routines for obtaining
md level and layout (map_num1, rlq_to_layout).
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The DDF code was assuming that the VD slots 0..populated_vdes
were used and the rest was unused. Remove this assumption and
deal with empty slots instead.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Implement logic to derive the status of a secondary RAID
from its members. Use it in ddf_set_disk.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Moved code to determine RAID status to a separate function
get_bvd_status(). I need this to account for secondary RAID level.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
If secondary RAID level is taken into account, translation between
the md RAID member (raid_disk) and the index of a physical disk
in a BVD becomes more complex.
Also, take into account that the member list can have unused entries
(this is independent of secondary RAID level).
Adapt usage of find_vdcr() accordingly
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
This patch implements the previously unsupported case where
store_super_ddf is called with a non-empty superblock.
For DDF, writing meta data to just one disk makes no sense.
We would run the risk of writing inconsistent meta data
to the devices. So just call __write_init_super_ddf and
write to all devices, including the one passed by the caller.
This patch assumes that the device to store the superblock on
has already been added to the DDF structure. Otherwise, an
error message will be emitted.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The DDF spec mandates that the "open flag" be set to non-0 before
writing a configuration, and reset to 0 when finished to indicate
success.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When the primary header can't be read, use the secondary header
as fallback.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm will normally not include a device into an array if that device
reports that the "best" device has failed, as this normally implies
some sort of inconsistency.
However when --force is given it means that the given drives really
should be assembled if at all possible so in that case the test should
be avoided.
The particular case where this was a problem was a RAID5 were all
devices had the same event count but three of them reported that the
first two had failed.
As they all had the same event count the first was taken as the 'best'
and that caused the later ones to be excluded. Listing one of the
later ones first allowed the array to be assembled. So in this case
the test clearly just got in the way and did nothing useful.
Reported-by: "Marek Jaros" <mjaros1@nbox.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
--stop now tries to wait for a reshape to be at just the right spot.
However for a reducing reshape, mdadm will be running in the
background watching, and might adjust sync_max and mess things up.
So teach "progress_reshape" to notice when "sync_max" is modified, and
leave it alone.
Signed-off-by: NeilBrown <neilb@suse.de>
progress_reshape() may not set reshape_completed if the reshape is
interrupted, so we need to initialize it to the current value before
hand, so the value used afterwards is credible.
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible for 'sync_completed' to be further ahead than
we deduced from 'reshape_position'. However we cannot read it while
the array is frozen, so it is hard to know.
Once that array is unfrozen, check and if sync_completed is ahead of
'sync_max', push 'sync_max' well ahead if 'sync_completed' so it
will all synchronise up properly.
Signed-off-by: NeilBrown <neilb@suse.de>
Recent rearrangement of library code broke 'raid6check' and this
wasn't noticed because 'make everything' doesn't build it.
So fix the breakage and have 'make everything' built it.
Signed-off-by: NeilBrown <neilb@suse.de>
If the restarted reshape needs a backup file and we don't have one,
that should be reported before we try to start the array.
Also we shouldn't say the "Cannot grow" but "cannot complete".
Signed-off-by: NeilBrown <neilb@suse.de>
If "container=" is present, then we are going to assemble from the
given container where that container is made of those devices or not.
So in this case the "devices=" is purely documentation and is best
ignored.
As part of this, move the test on the "container=" value when that
start with "/" up before the device is opened. There sooner we test
things, the better.
Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When "containers" appears on the "DEVICES" line (which is does by
default), use names from the mdadm map file instead of kernel names,
when possible.
This mean that the name will be more likely to appear in mdadm.conf
and so more likely to match "container=" tags.
Signed-off-by: NeilBrown <neilb@suse.de>
If the container metadata doesn't know how many device to expect (as
is the case with IMSM), don't fail an --assemble which over-specifies
the number of devices.
Signed-off-by: NeilBrown <neilb@suse.de>
There is only one called to find_free_devnum and it is in mdopen.c
The removes a dependency between util.c and config.c which allows
us to now drop config.o from mdmon.
Signed-off-by: NeilBrown <neilb@suse.de>
As they are uses for mdstat as well as mdadm.conf, they don't really
belong in conf.c
This removes a dependency between mdmon and conf.c
Signed-off-by: NeilBrown <neilb@suse.de>