Raid disk and disk number information is not relevant at the container
level, especially for imsm. So arrange for getinfo_super_imsm() to
always publish devices as spares and report the number of spares at
Assemble() time.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The uuid returned for an imsm spare device will never match the uuid of an
active disk. So make mdadm interpret a uuid of all f's as "match any".
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Both super-ddf and super-intel ignore memory allocation failures during
->activate_spare. Fix these up by cancelling the activation.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
load_imsm_disk() currently notices if spares missed their activation
update, but we allow a stale failed disk back in to the array because its
serial number is clobbered in the most up-to-date disk.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If a drive is removed while mdmon is not running we need a way to
identify what is missing and mark that disk as failed in the metadata.
At ->load_super() time create a list of missing disks defined as a disk
that is marked in-sync yet does not appear in super->disks.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Spotted a thinko... raid devices are dynamically sized, disks are not.
The space for disks is always mpb->num_disks * sizeof(struct imsm_disk).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When the array is shutdown, or when mdadm --wait-clean is called, any
active resync process will be idled allowing mdmon to record the current
resync position.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
md/resync_start reports different terminal values depending on kernel
configuration (~0UL versus ~0ULL). Make detection of the
resync-complete state more robust by comparing against array size.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
On ich6r the option-rom appears to reserve only 432 sectors rather than
the 418+4096 of newer implementations. For compatibility trust the
metadata in these cases.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
1/ near-2 indeed matches how the Windows driver lays out the data
2/ update imsm_check_degraded to check for rebuilding disks in the
raid10 case
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
zero-initialize the serial buffer to handle cases where the response is
less than MAX_RAID_SERIAL_LEN.
Tested-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
For now, this means that the lack of a homehost doesn't always prevent
assembly.
Soon we will allow assembly anyway, but have different messages if
homehost isn't supported.
mdadm -I /dev/part-of-container
should add that to a container, creating if it needed,
and then try to assemble any arrays in the container.
Signed-off-by: NeilBrown <neilb@suse.de>
When we assemble an array, there are three different approaches
depending on whether metadata is internal or external, and on
kernel version.
Move all this to a common helper instead of duplicating in 3 places.
Signed-off-by: NeilBrown <neilb@suse.de>
The variety of approaches to 'add_disk' are factored out into
a separate function, and Incremental mode benefits by being
closer to supporting the assembly of containers.
Also remove the adding-to-array-data-structure out of sysfs_add_disk
and into add_disk.
And add some tests for --incremental mode to make sure we don't break it.
Signed-off-by: NeilBrown <neilb@suse.de>
Allow the following sequence to rebuild the array
mdadm --fail /dev/md/r1 /dev/disk
mdadm --remove /dev/imsm /dev/disk
mdadm --add /dev/imsm /dev/disk
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* allows container_content() to pick up the safemode_delay
* removes some duplicate code
* fixes an endian bug setting info->array.chunk_size
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Trim trailing and leading whitespace
* Allow unterminated serial numbers up to MAX_RAID_SERIAL_LEN
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The secondary map is used to reflect the migration state of the array
i.e. from dev->vol.map[1] to dev->vol.map[0]. Ensure a rebuilding /
initializing array is marked in the second map, while normal status is
reflected in the first map. Also mark rebuilding drives with
IMSM_ORD_REBUILD.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* fix breakage from last merge (infinite loop in imsm_process_update())
* add ability to delete by index
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
IMSM_ORD_REBUILD is the 'insync' flag in MD terms. USABLE is a flag to
opt-in disks for use with the Windows driver.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Save some unnecessary calls to get_imsm_map() by teaching
get_imsm_disk_idx() to retrieve the map.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This allows spares to be associated with any family while not allowing
disks from different families to be assembled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We used to leave SPARE_DISK unset to indicate it was available to be
assimilated into other arrays. Now we explicitly check the size.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Truncate the first character of the serial number
* Set 'scsi_id' to all f's
* Expect to find disk entries with unmatchable serial numbers, i.e.
expect get_imsm_disk() to return NULL in some situations
* Allow discrepencies between mpb->num_disks and len(super->disks)
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Ensure that the mpb buffer is large enough to hold the extra imsm_map's
of migrating arrays and dynamically created raid devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
For example, this allows one to still say mdadm -A /dev/sd[b-e] even
though /dev/sde has replaced /dev/sdd. Otherwise mdadm will say:
mdadm: superblock on /dev/sdd doesn't match others - assembly aborted
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Adding a device updates the container and then mdmon takes action upon
noticing a change in devices. This reuses the container version of
add_to_super to create a new record for the device.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When we have determined that a disk is no longer of any value, remove
it from the data structure. This is now safe because the manager
will back off while any metadata update is pending in the monitor.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When we first start an array, it might be good to start recovery
straight away. That requires setting the array to 'dirty', but
only the metadata handler can know if that is required or not.
So have a third possible 'consistent' option to set_array_state.
Either 'no' or 'yes' or 'you choose'.
Return value indicates what was chosen.
'1' (no) should be chosen unless there is a good reason.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ Do not assemble !in_sync or failed devices in container_content.
2/ Prevent activation of failed or configured devices in activate_spare.
3/ Be sure to avoid dirty degraded if the array was shutdown cleanly.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
imsm_dev dynamically grows, so dev_idx needs to be moved up in the
definition to avoid getting clobbered.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
With dev->vol.map and mpb->disk entries entering and leaving the parameter
block write_super_imsm needs to update the size before writeback.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Disks that are not in-sync or failed are not assembled into member
arrays by mdadm. Teach mdmon to resolve this situation by checking for
spares at start. imsm_activate_spare() is updated to prefer devices
that can be re-added versus new spares.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The option-rom and the Matrix driver mark resyncs/rebuilds with the
migrate state bits. Update sizeof_imsm_dev to allow allocation of
imsm_dev entries large enough to grow if migr_state is later set.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This poses a small problem for the case of handling multiple raid1 arrays
across separate disk pairs i.e. 2 mirrors on 4 disks. The option-ROM will
configure this as two containers. We may need the capability for one
container to ask for an unused spare in another container. For now spares
will just maintain the affinity established at assemble time.
To support this configuration spare devices must be allowed to be assembled
into the container even though the metadata indicates the disk belongs to a
different family.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
removes the need to lookup the disk by index in a few cases and is a
preparation step for tracking spares outside the current anchor.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This is the initial defensive implementation of bad block management
support. It simply precludes assembly if there are entries in the bad
block logs. This is sufficient for now as the conditions that lead to
an entry in the bad block log would cause the array to be failed by MD
(as of 2.6.27).
[dan.j.williams@intel.com: general cleanups]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Maintaining a single global buffer is unwieldly when extending/rewriting
sections of the metadata. Parse the metadata into component data
structures upon reading and coalesce to a coherent buffer before
writing.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The cmd_filter patch merged for 2.6.27 broke retrieving the serial
number via an ioctl to /dev/sgN. In debugging this I found that other
utilities like sdparm simply run the ioctl on /dev/sdX. So just convert
to that for protection in numbers, but scream on the mailing list for
the inconvenience grr...
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Removes the need for the call to ->set_array_state when sync_action
transitions from 'recover' to 'idle'.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Neil rightly points out that imsm_activate_spare may skip valid free space
on a spare, fix this up.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Following the lead of 75ede16d. This incidentally fixes creation of a second
array by gating call to getinfo_super_imsm_volume with a valid ->current_vol.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Using buffered IO risks non-atomic updates to parts of the
device that we don't actually want to write to. This isn't in
general safe.
So switch to O_DIRECT for all that IO and make sure we have
properly aligned buffers.
When loading the metadata for a subarray (super_by_fd), we set
->subarray to be the name read from md/metadata_version so that
getinfo_super can return info about the correct array.
With this we can differentiate between a container and
an array within the container by looking at ->subarray[0].
Only one superswitch should be externally visible for each
general type. Others which handle different flavours
(e.g. container/data-array) should be internal only.
for development only as console output can block leading to monitor deadlocks
in low mem situations
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A spare device by definition will have raid_disk set to -1, but when
assembling the container we want this disk to by included.
Fixes a SIGSEGV when doing:
mdadm -A /dev/imsm -e imsm /dev/sd[b-e]
...where /dev/sde is marked as a global spare device
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
It is used later by container_content_imsm to determine set the
text_version of the member arrays.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Code in manager can now just call queue_metadata_update with a
(freeable) buf holding the update, and it will get passed to the
monitor and written out.
I want the metadata handler to have more control over the 'version',
particularly for arrays which are members of containers.
So discard st->text_version and instead use info->text_version
which getinfo_super can initialise.
mark_dirty is just a special case of mark_clean - with sync_pos == 0.
mark_sync is not required. We don't modify the metadata when sync
finishes. Only when the array becomes non-writeable at which point we
use mark_clean to record how far the resync progressed.
From: Dan Williams <dan.j.williams@intel.com>
Added curr_state as a parameter to set_disk. Handlers look at this to
record components failures, and set global 'degraded' or 'failed'
status.
When reading the state as faulty:
1/ mark the disk failed in the metadata
2/ write '-blocked' to the rdev state to allow the kernel's failure
mechanism to advance
3/ the kernel will take away the drive's role in remove_and_add_spares()
4/ once the disk no longer has a role writing 'remove' to the rdev state
will get the disk out of array.
There is a window after writing '-blocked' where the kernel will return
-EBUSY to remove requests. We rely on the fact that the disk will
continue to show faulty so we lazily wait until the kernel is ready to
remove the disk. If the manager thread needs to get the disk out of the
way it can ping the monitor and wait, just like the replace_array()
case.
[buglet fix: swap the parameters of attr_match in read_dev_state]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
Metadata handlers set mdinfo.resync_start depending on the state of the
array. By default mdadm assumes the array is dirty and needs a full
resync.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
The following now work:
--examine
--examine --brief
Signed-off-by: Dan Williams <dan.j.williams@intel.com>