If a device hasn't been found yet we can still tell if it is
expected to be working, and we must to do to make sure
'working_disks' is correct.
Signed-off-by: NeilBrown <neilb@suse.de>
If device name ends :number, e.g.
/dev/sda0:1234
then assume the RAID data starts that many sectors from start of
device.
Signed-off-by: NeilBrown <neilb@suse.de>
If we don't do this, then the unlink from /dev might happen
after the next step in the test creates something in /dev,
and device names seem to go missing.
Signed-off-by: NeilBrown <neilb@suse.de>
Hi Neil,
please find attached a patch, to mdadm-3.2 base, including
a standalone versione of the raid-6 check.
This is basically a re-working (and hopefully improvement)
of the already implemented check in "restripe.c".
I splitted the check function into "collect" and "stats",
so that the second one could be easily replaced.
The API is also simplified.
The command line option are reduced, since we only level
is raid-6, but the ":offset" option is included.
The output reports the block/stripe rotation, P/Q errors
and the possible HDD (or unknown).
BTW, the patch applies also to the already patched "restripe.c",
including the last ":offset" patch (which is not yet in git).
Other item is that due to "sysfs.c" linking (see below) the
"Makefile" needed some changes, I hope this is not a problem.
Next steps (TODO list you like) would be:
1) Add the "sysfs.c" code in order to retrieve the HDDs info
from the MD device. It is already linked, together with the
whole (mdadm) universe, since it seems it cannot leave alone.
I'll need some advice or hint on how to do use it. I checked
"sysfs.c", but before I dig deep into it maybe better to
have some advice (maybe just one function call will do it).
2) Add the suspend lo/hi control. Fellow John Robinson was
suggesting to look into "Grow.c", which I did, but I guess
the same story as 1) is valid: better to have some hint on
where to look before wasting time.
3) Add a repair option (future). This should have different
levels, like "all", "disk", "stripe". That is, fix everything
(more or less like "repair"), fix only if a disk is clearly
having problems, fix each stripe which has clearly a problem
(but maybe different stripes may belong to different HDDs).
So, for the point 1) and 2) would be nice to have some more
detail on where to look what. Point 3) we will discuss later.
Thanks, please consider for inclusion,
bye,
pg
Signed-off-by: NeilBrown <neilb@suse.de>
RstScuV and RstScuO variable names are supported.
First try reading from RstScuV, when it fails try RstScuO.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Tested-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
During adding spare disks to raid0, spare metadata is not written.
This is due to exit form sync_metadata() on empty updates_pending flag.
When mdmon is absent indicate sync_metadata() to flush changes to disks.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
sync_metadata() requires st->sb to be loaded, otherwise exception is
generated. This fails expansion, because spares cannot be added.
metadata update uses tst instead st pointer, it is better than
loading anchor for st as I proposed previously.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Issue observed:
Sporadicaly stopping arrays using "mdadm -Ss" command does not succeded.
Cause:
Writting "inactive" to the array state not succeded- array is busy
(accessed by udev, blkid etc.)
Resolution:
If writing 'inactive' fails, wait and retry again (because it is possibly
a transient failure)
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When for ping_monitor() input devnum2devname() is used,
received string pointer should be passed to free() for memory release.
It is not made in several places. This use case should have function
to avoid memory leak.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When I separated the 'native metadata' case more cleanly from the
"external metadata" case for adding a drive, I left some 'external'
code in the 'native' case, and didn't copy it to the 'external' case.
When - in the external case - we add to super, we much check for
mdmon first, so we know whether to do the metadata update ourselves
or not, then afterwards call either flush_metadata_updates (to send
to mdmon) or sync_metadata (to do it directly).
Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Stopping an md array requires that there is no other user of it.
However with udev and udisks and such there can be transient other
users of md devices which can interfere with stopping the array.
If there is a transient users, we really want "mdadm --stop" to wait a
little while and retry.
However if the array is genuinely in-use (e.g. mounted), then we
don't want to wait at all - we want to fail immediately.
So before trying to stop, re-open device with O_EXCL. If this fails
then the device is probably in use, so give up.
If it succeeds, but a subsequent STOP_ARRAY fails, then it is possibly
a transient failure, so try again for a few seconds.
Signed-off-by: NeilBrown <neilb@suse.de>
When we allowed a devlist to accompany some --grow modes - but not
--bitmap - we made --bitmap always fail, in stead of fail of a device
was given to add.
As 'devs_found' includes the md device, we need to compare against
'1'.
Signed-off-by: NeilBrown <neilb@suse.de>
Suggesting by Rory Jaffe <rsjaffe@gmail.com> to make the danger
of shrinking, and to recommended avoidance technique, more explicit.
Signed-off-by: NeilBrown <neilb@suse.de>
This is needed to remove devices from mdmon's knowledge when the
device is removed from the md container.
Now that ddf have a remove_from_super we don't need the code
that allows some personalities not to implement this.
Signed-off-by: NeilBrown <neilb@suse.de>
Manager thread shall pass the information to monitor thread (mdmon)
that some devices are removed from container. Otherwise, monitor
(mdmon) might use such devices (spares) to rebuild the array that has
gone degraded.
This problem happens for imsm containers, since a list of the
container disks is maintained in intel_super structure. When array
goes degraded, the list is searched to find a spare disks to start
rebuild. Without this fix the rebuild could be stared on the spare
device that was a member of the container, but has been removed from
it.
New super type function handler has been introduced to prepare
metadata format specific information about removed devices.
int (*remove_from_super)(struct supertype *st, mdu_disk_info_t *dinfo)
The message prepared in remove_from_super is later processed by
process_update handler in monitor thread.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If a RAID1 is meant to have more than 2 device and while it doesn't
have that many, it still has more than 1, then according to the
DDF spec it is "partially optional" rather than "degraded"
So make that so.
Signed-off-by: NeilBrown <neilb@suse.de>
The DDF spec requires we have a phys disk record for every physically
attached device. But it isn't clear what that means in the case
of soft raid in a general purpose Linux computer.
So remove phys disk records for any failed device that is not
active in any array.
Signed-off-by: NeilBrown <neilb@suse.de>
This is a big fragile, but DDF has wierd rules that we aren't really
set up to handle properly.
When we add a device to a degraded array it must be a spare, so
mark it as Rebuilding.
Signed-off-by: NeilBrown <neilb@suse.de>
Spare assignment requires full knowledge of array state. A pending
update might modify that state (such as a pending spare assignment)
so don't try while there are updates pending.
Signed-off-by: NeilBrown <neilb@suse.de>
add_to_super could use information from the current superblock (ddf
does), so add_to_super for external metadata should be called with
the O_EXCL lock held on the container to ensure the update is complete
before any other process tries to make any changes (like adding
another device to array).
Signed-off-by: NeilBrown <neilb@suse.de>
During normal test execution, backup file is deleted after test execution.
If test is interrupted/broken, backup file can remain for next run.
When backup file exists before unit test run, suits 12 and 13 fails.
To avoid this remove backup file before grow is executed.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This is needed to remove devices from mdmon's knowledge when the
device is removed from the md container.
Now that ddf have a remove_from_super we don't need the code
that allows some personalities not to implement this.
Signed-off-by: NeilBrown <neilb@suse.de>
We weren't setting ->vcnum at all when an array was added. This
meant that a subsequent device failure could be assigned to the
wrong array.
Reported-by: Albert Pauw <albert.pauw@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When set_disk is called, we need to check if the disk has changed or
recently appeared, and update everything properly if it has.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ ignore devices with "state_fd < 0" as these have been removed.
2/ Set update 'length' properly and clear 'space'.
Signed-off-by: NeilBrown <neilb@suse.de>
If single-disk RAID0 or RAID1 array is created, user may preserve data on
disk. If array given size covers all partitions on disk, all data will be
available on created array. If array size is too small (not covers
all partitions), data will be not accessible.
This patch introduces warning message during array creation if given size
is too small. User may interrupt creation process to avoid data loss.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We use PCI device id exposed by AHCI and ISCU drivers (SAS controller)
to find OROM version table.
In this way there is no need to maintain AHCI and ISCU device id list
in mdadm. The consequence is that the OROM properties can be found by mdadm when AHCI or
SAS drivers are loaded in the system.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
While last_checkpoint is counter in per disk units, checkpoints
should be stored in the same manner.
Restoring from checkpoint should should recalculate checkpoint in to
array position (reshape_progress).
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
last_checkpoint is variable that tracks sync_complete sysfs entry.
sync_complete is per disk counter, so initializing during starting from checkpoint
has to have this in mind and convert reshape position properly.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When reshape is restarted and active array in mdmon is being initialized,
mdmon has to know last checkpoint, otherwise reshape will be restarted
form '0' position.
mdadm when reshaped array is assembled stores reshape_position in sysfs
and runs mdmon. Initialize last_checkpoint in active array structure
to value present in sysfs for reshaped array start.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Unfreeze array on success only.
rv is initialized by restart variable so we have 2 cases.
1. regular reshape start
rv == restart == 0
this means that real error (returned by reshape) can cause leaving container frozen
If array is not touched by reshape it can be unfrozen
2. During reshape restart even untouched array under reshape is left unfrozen,
If reshape is started do not unfreeze array on error also.
This allows user for array repair action
(mdmon will not change array state).
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If an --add is requested and a re-add looks promising but fails or
cannot possibly succeed, then don't try the add. This avoids
inadvertently turning devices into spares when an array is failed but
the devices seem to actually work.
Signed-off-by: NeilBrown <neilb@suse.de>
We were setting ->container_member twice in ddf get_info.
Once to currentconf->vcnum,
once to atoi(st->subarray).
Both should be the same.
For consistency with super-intel, use the first.
Signed-off-by: NeilBrown <neilb@suse.de>
If adding a bitmap fails with EBUSY, then it is because the array is
currently resyncing/recovering/reshaping.
As this is non-obvious, give a message explaining the fact.
Signed-off-by: NeilBrown <neilb@suse.de>
We shouldn't call remove_partitions until we have made a really firm
decision to include the device into the array.
Signed-off-by: NeilBrown <neilb@suse.de>
As chunk_size in mdstat_ent is never set, we shouldn't copy
it into a->info.array.
In fact, it is safest to get rid of the field altogether.
Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>