If there isn't, we currently write the second copy at some
random location :-)
Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Some fake RAID BIOSes (in particular, LSI ones) change the
VD GUID at every boot. These GUIDs are not suitable for
identifying an array. Luckily the header GUID appears to
remain constant.
We construct a pseudo-UUID from the header GUID and those
properties of the subarray that we expect to remain constant.
This is only array name and index; all else might change e.g.
during grow.
Don't do this for all non-MD arrays, only for those known
to use varying volume GUIDs.
This patch obsoletes my previous patch "DDF: new algorithm
for subarray UUID"
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
That is the same value that IMSM uses. The current default of 200ms
seems to have been copied from the native MD meta data. That value
appears to be much too low for DDF, given that writing the DDF meta
data means that easily several MB worth of data need to be written to
disk.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Set safe_mode_delay to something >0, otherwise all container subarrays
assembled will have safe_mode_delay=0. That will break the assumption that
meta data becomes clean after running mdadm --wait-clean.
Use the same value as in getinfo_super_ddf_bvd. It would be cleaner
to call that directly from container_content_ddf, but I need to check
possible side effects first.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Have mdadm -E --export print the number of RAID devices,
like other meta data formats do. Anaconda (RHEL/CentOS installer)
depends on it.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
At this point 'di' and 'rv' both have the same value. gcc doesn't
realise that and a human reader might not either.
'rv' makes more sense too, so use that.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible that mdadm creates a new subarray containing failed
devices. This may happen if a device has failed, but the meta data
containing that information hasn't been written out yet.
This code tests for this situation, and handles it in the monitor.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The recent change to skip over invalid conf entries was bad because
it could leave garbage on the disk.
But we don't to write each entry separately as the writes a O_DIRECT
and so synchronous so it takes way too long.
So allocate a large buffer (probably the one used to read the config records)
and fill that then write it all at once.
Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
We should skip known failed disks when allocating space for
new arrays. This fixes the problem with 10ddf-fail-spare.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Commit c7079c84 arrange for DDF to forget about any device
that is failed and not still marked as part of any array.
However such devices could still be part of the container and this
removal and updating of 'pdnum' can result in multiple devices having
the same pdnum. This in turn easily leads to confusion and
corruption.
So only discard pd entries for devices which are failed, not listed in
any virtual device, and for which we don't have a handle on the
device.
pd entries will not get removed until a new device is added after
the device has been removed from the container, either by
"mdadm --remove" or by assembling without the failed devices.
Reported-by: Albert Pauw <albert.pauw@gmail.com>
Analysed-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Recent commit 273989b93a
skipped writing some large blocks of 0xFF, but didn't seek
over the space, so subsequent data was written wrongly.
When we don't write, we need to seek.
Signed-off-by: NeilBrown <neilb@suse.de>
With the previous patch, mdmon will provide the layout property for us.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Adds more verbose debugging in ddf_set_disk, to understand failures
better.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Try to determine problem if load_ddf_header fails. May be useful
for determining compatibility problems with Fake RAID BIOSes.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
I needed this for tracking a bug with wrong offsets after array
creation.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
In particular, include refnum for better tracking. This makes
it a little easier for humans to track what happened to which disk.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Move the check for good drives in the dl loop - otherwise dl
may be NULL and mdmon may crash.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
This call to validate_geometry is really rather gratuitous.
It is purely about the fact that super0 cannot use more than 4TB.
So just make it an explicit test - less confusing that way.
With this, validate_geometry is only called from Create, which
makes it easier to reason about.
Also validate_geometry is now never passed NULL for the 'chunk'
parameter, so we can remove those annoying tests for NULL.
Signed-off-by: NeilBrown <neilb@suse.de>
Metadata updates for secondary RAID (RAID10) need to cover
all BVDs. Compare with code in write_init_super_ddf().
Signed-off-by: NeilBrown <neilb@suse.de>
Fix a bug that caused the wrong conf record to be used to derive
data offset and size on secondary RAID (RAID10).
Signed-off-by: NeilBrown <neilb@suse.de>
Factor out single disk from __write_init_super_ddf to a new function
_write_super_to_disk. Use this function in store_super_ddf.
Signed-off-by: NeilBrown <neilb@suse.de>
Increase seq number only when there's actually a metadata change.
This is better then increasing it at every write.
This also fixes another endianness bug.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently we compare fields between primary and secondary
superblocks, before we check if the primary is even valid.
This is a bit backwards, so reverse it.
Signed-off-by: NeilBrown <neilb@suse.de>
The "indirect" code path for adding VDs was not working correctly
for secondary RAID level. The "other BVDs" were not transmitted
to mdmon. Thus mdmon wouldn't build up correct information, and
RAID creation would fail when mdmon was already running on the container.
This patch fixes this.
Signed-off-by: NeilBrown <neilb@suse.de>
The return value of disk.raid_disk may be wrong.
The old code was using raiddisk, which is only valid with auto
layout. This leads to errors when arrays are created with
specified disks and mdmon is already running, like this:
mdadm -CR /dev/md/container -n5 $d1 $d2 $d3 $d4 $d5
mdadm -CR /dev/md/r5 -n5 -l5 /dev/md/container -z 5000
mdadm -CR /dev/md/r1 -n2 -l1 $d1 $d2
=> resulting array will use wrong disks
This patch fixes that.
Signed-off-by: NeilBrown <neilb@suse.de>
Use refnum rather than raiddisk for identifying the physical disk.
raiddisk should only be used for auto-layout.
Signed-off-by: NeilBrown <neilb@suse.de>
Implement kill_subarray, for mdmon running and not running.
The way Kill_subarray() is implemented, this requires that the
DDF layer uses "currentconf" to remember the last subarray
queried with container_content(), and use it as the one to kill.
I don't like this much but IMSM does it the same way.
Signed-off-by: NeilBrown <neilb@suse.de>
commit d682f344 inserted this call to "Kill" in write_init_super_ddf:
"Matching the functionality already in super0 and super1, when
we first create a container, remove any other recognisable metadata to
ensure it doesn't cause confusion."
But we should do this only at first container creation, not when
subarrays are created later.
Signed-off-by: NeilBrown <neilb@suse.de>
With secondary RAID level, disks may belong to other BVDs in
a given conf record. This needs to be taken into account
when fixing the vlist.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
For secondary RAID, we need to check which BVD needs to be updated.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
For debugging DDF structure changes, it is important to be able
to identify VCs by their GUIDs.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
If there are already disks in the container, reserve the same amount
of workspace as the first disk. Fill in the primary/secondary/
workspace LBA values.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The used slots in the phys disk table aren't necessarily the
first ones. Rather, unused entries are represented by entries
where the GUID is all 0xff.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Use get_pd_index_from_refnum() in get_extents to determine
matching VD. This will ensure RAID 10 (secondary RAID level)
support, too.
This also fixes a bug in the previous get_extents() code (missing
__be16_to_cpu for conf.prim_elmnt_count).
DDF test case (10ddf-create) verified.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Remove the lba_offset field from struct vcl. This field acted as
a "cache" for the address of the lba_offset field in the vd_config
structure. This isn't useful any more if there are multiple
vd_configs in a vcl.
This patch also adds __cpu_to_be64 in two places where it has been
quite obviously forgotten (ddf_set_disk, ddf_activate_spare).
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Make validate_geometry_ddf() use the same logic to check supported
RAID levels that init_super_ddf_bvd() uses.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Properly initialize the data structures of the other BVDs
in Create().
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Instead of allocating the other_bvds array and every element
separately, allocate all in a single chunk. Also, move allocation
in a subroutine as it's used in several places.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Support for RAID 10 makes it necessary to rewrite the algorithm
for deriving DDF layout from MD layout. The functions level_to_prl
and layout_to_rlq are combined in a single function that takes
md layout parameters and converts them to DDF.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
layout_ddf2md() is a new RAID layout conversion routine.
It obsoletes the previous separate routines for obtaining
md level and layout (map_num1, rlq_to_layout).
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The DDF code was assuming that the VD slots 0..populated_vdes
were used and the rest was unused. Remove this assumption and
deal with empty slots instead.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Implement logic to derive the status of a secondary RAID
from its members. Use it in ddf_set_disk.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Moved code to determine RAID status to a separate function
get_bvd_status(). I need this to account for secondary RAID level.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
If secondary RAID level is taken into account, translation between
the md RAID member (raid_disk) and the index of a physical disk
in a BVD becomes more complex.
Also, take into account that the member list can have unused entries
(this is independent of secondary RAID level).
Adapt usage of find_vdcr() accordingly
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
This patch implements the previously unsupported case where
store_super_ddf is called with a non-empty superblock.
For DDF, writing meta data to just one disk makes no sense.
We would run the risk of writing inconsistent meta data
to the devices. So just call __write_init_super_ddf and
write to all devices, including the one passed by the caller.
This patch assumes that the device to store the superblock on
has already been added to the DDF structure. Otherwise, an
error message will be emitted.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The DDF spec mandates that the "open flag" be set to non-0 before
writing a configuration, and reset to 0 when finished to indicate
success.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When the primary header can't be read, use the secondary header
as fallback.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
This allows the metadata on a device to be saved and later restored.
This can be useful before experimenting on an array that is misbehaving.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Fix bug in previous patch
"DDF: compare_super_ddf: merge local info of other superblock"
Just discovered this bug in my last patch set - unfortunately, just after
you committed it.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The 10ddf-create test case fails sporadically because wrong meta
data is written, making the array appear inconsistent when it's
restarted. Added code to aid debugging this.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Commit c1ea5a98 caused brief_detail_super_ddf() to be called
for subarrays. But the UUID printed was always the one of the
container. This is wrong and actually worse than printing no UUID
at all, and causes the DDF test case (10ddf-create) to fail.
This patch adds code to determine the MD UUID of a subarray correctly.
The hard part is to figure out for which subarray the function is
called. Moved that to an extra function.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
It's not necessary to check for 0xffffffff, which is a valid
sequential number.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
If a match is found in compare_super_ddf, check the other SB
for local DDF information (VD config records, physical disk data)
which is not available in the current superblock, and add it
if needed.
This is important for the mdmon - when disks are added to a
auto read-only array, they must be present in the DDF structure
in order to guarantee consistent writeback of metadata to all
disks.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Besides container GUID, also check seqnum, physical and virtual
disk numbers, and check match between local and global sections.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When writing back the DDF structure, make sure that on each disk
we write the configs that include this disk even if a secondary
RAID level is present. Otherwise the secondary RAID will not be
read correctly any more when we open the device next time.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Check for supportable secondary RAID configurations.
There is currently only one: RAID 10, if the stripe
sizes and Basic volume sizes are all equal.
With this patch, mdadm will not try to start unsupported
secondary RAID level configurations any more.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When searching for container elements, loop over the known phys
disks rather than the elements of the current configuration.
This patch changes nothing in the logic or return value of the code.
It just prepares extended logic for handling RAID10.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Store VD config for other BVDs in the other_bvds array.
This allows handling secondary RAID levels in container_content_ddf.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The VD config structures of different BVDs in the same SVD may be
different. This pointer stores the other BVDs.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Cleanly increase the seq number when the DDF structures are
written, instead of always setting it back to 1.
Also, make sure that the sequential number of all headers and
VD conf records is the same.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Some RAID BIOSes apparently use hard-coded LBA offsets (presumably
from the end of the disk) for the primary and secondary DDF
structure, ignoring the values given in the DDF anchor. This is
broken BIOS behavior, but it will cause any changes made by MD
(e.g. setting the init_state flag after a full initialization)
to be "forgotten" after the next reboot.
This patch fixes this by using the exiting LBA locations if
available. Verified that this fixes MD+LSI Mega Software RAID
BIOS.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
So far, mdadm only saved the header of the secondary structure.
With this patch, the full secondary DDF structure is saved
consistently, too. Some vendor DDF implementations need it.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ...
The size is in K unless a suffix: K M G is given.
The suffix 's' means sectors.
Signed-off-by: NeilBrown <neilb@suse.de>
This is currently only useful for 1.x metadata and will allow an
explicit --data-offset request on command line.
Signed-off-by: NeilBrown <neilb@suse.de>
When adding a spare to a DDF there is some confusion about the
'level' of the container. It is reported by kernel as unknown
-1000000.
I don't know why this broke but until I figure out why and fix it,
this hack gets us going again.
Signed-off-by: NeilBrown <neilb@suse.de>
If a DDF has two arrays sharing devices and one device fails, then
as soon as the spare is used to recover one of the arrays it isn't
spare any more and so is not chosen for the other array.
Work around this for now by allowing a non-spare to be used if it has
enough space.
Reported-by: Albert Pauw <albert.pauw@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>