We currently copy the anchor to both primary and secondary
blocks.
This assumes that the anchor is uptodate, but it might not be.
We should trust the 'active' block and copy from there.
Signed-off-by: NeilBrown <neilb@suse.de>
- we weren't updating this timestamp at all
- the 'vd_config' seqnum was updated on every write of the metadata,
which is excessive. Just update it when there is a change.
Signed-off-by: NeilBrown <neilb@suse.de>
As we are range-checking 'cd', there is a chance that it is not
in-range. In that case we should include all array indexes with 'cd'
inside the range-tested branch.
Signed-off-by: NeilBrown <neilb@suse.de>
The phys disks table should list all disks, but if the metadata
is corrupt, it might not even list the disk it was read from.
So check for and report any known disks that aren't listed.
Signed-off-by: NeilBrown <neilb@suse.de>
The "used_pdes" value counts the number of physdisk entries that
are in used.
It may not be the last one in use as there may be unused slots in
the middle.
So when were are iterating over phys disks, we need to use max_pdes
and skip unused entries.
Signed-off-by: NeilBrown <neilb@suse.de>
This is a better match for reshape_array() and means that
"mdadm --grow --continue" will run in the foreground, which
makes more sense.
Signed-off-by: NeilBrown <neilb@suse.de>
If "--assemble" or "--incremental" is started by udev, then
monitoring the reshape in the background won't work.
So try asking systemd to start a grow-continue.
If that fails, just do it the old way.
Signed-off-by: NeilBrown <neilb@suse.de>
Subsequent patch will allow the background part of "mdadm --grow" to
be run from systemd. This can require the passing of a backup file
name.
To do this, store that name as a symlink in /run/mdadm (or MAP_DIR)
and look for it when appropriate.
It might be useful to also store the name across reboot, but that
would be a different patch. We would need to use the uuid to identify
it, and store it in stable storage.
Signed-off-by: NeilBrown <neilb@suse.de>
For large arrays (component size > 100GB) if write-intent bitmap is not
enabled, then it is set by default to "internal", even if the metadata
format does support internal bitmaps, which causes Create to fail.
This patch adds checking if add_internal_bitmap is set in the
superswitch before setting bitmap_file to "internal".
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This modifies locking in Create to eliminate a situation where
--incremental can assemble a device between write_init_super() and
add_disk(), which causes Create to fail.
It sporadically occurs e.g. when metadata is written on a device,
causing an udev change event which triggers mdadm --incremental.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
1/ Add systemd shutdown script to ensure DDF and IMSM are
clean before we actually shutdown
2/ Get udev to tell systemd to run the mdmon@mdXXX.service
units when a member array appears.
If we boot off a member array (with dracut at least),
the mdmon started in the initramfs will lose track of
/sys etc, so we need to restart it.
systemd will try to forget about it too (but not actually
kill it because we said not to do this).
Having udev tell it to start it will allow a new mdmon to
run which can see /sys, and systemd will know about it.
3/ Always use --offroot and --takeover when starting mdmon with
systemd
--offroot is needed else shutdown will hang.
--takeover is needed incase an mdmon was started earlier
(e.g. in initramfs).
Neither hurt if they aren't actually needed.
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible that one device has seem some reconfig but the other
hasn't. In that case they are still the "same" DDF, even though
one might be older. Such age will be detected by 'seq' differences.
If A is new and B is old, then it is import that
mdadm -I B
mdadm -I A
doesn't get confused because A has the same uuid as B, but compare_super fails.
So: if the seq numbers are different, then just accept as two
different superblocks.
If they are the same, then look to copy data from new to old.
Signed-off-by: NeilBrown <neilb@suse.de>
Testing 'c' and then using 'vdc' assumes that the two are in sync,
but sometimes they aren't.
Testing 'vdc' is safer.
This avoids a crash in some cases when failing/removing/added devices
to a DDF.
Signed-off-by: NeilBrown <neilb@suse.de>
It is conceivable that ->pdnum could be -1, though only if
the metadata is corrupt.
We should be careful not to use it if it is.
Also remove an assignment for pdnum to ->container_member.
This is never used and cannot possibly mean anything.
Signed-off-by: NeilBrown <neilb@suse.de>
This patch cleans up a bit the code by moving
the second repair mode, that is the manual
repair, to a separate function.
Signed off: piergiorgio.sartor@nexgo.de
Signed-off-by: NeilBrown <neilb@suse.de>
This patch cleans up a bit the code by moving
the autorepair part into a separate function.
Signed off: piergiorgio.sartor@nexgo.de
Signed-off-by: NeilBrown <neilb@suse.de>
The stripe locking mechanism must be atomic between
the check and the, potential, autorepair.
For this reason, the autorepair code needs to be just
after the check and both parts (check and autorepair)
must be excuted under stripe lock.
Of course, the manual repair can operate as before.
This patch reorganize the code and provides the single,
atomic, stripe lock.
It should be confirmed that this new locking is not
too demanding.
In case it is, some other solutions will be required
(suggestions wellcome).
Signed off: piergiorgio.sartor@nexgo.de
Signed-off-by: NeilBrown <neilb@suse.de>
Change how sudden-degraded devices should appear.
We don't record failure, we record that the device isn't there.
Signed-off-by: NeilBrown <neilb@suse.de>
Also don't treat two devices with different seq numbers as completely
unrelated.
This allows split-brain detection to work properly for ddf.
Signed-off-by: NeilBrown <neilb@suse.de>
Having RAMFS_MAGIC defined as 0x858458f6 causing problems when trying
to compare it directly against statfs.f_type being cast from long to
unsigned long.
This hack is extremly ugly, but it should at least do the right thing
for every situation.
Thanks to Arnd Bergmann for suggesting the fix.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If we assemble a newly-degraded array, the missing devices must be marked
as 'failed' so we don't expect them in future.
Signed-off-by: NeilBrown <neilb@suse.de>
Mdadm does not wait enough time when mdmon is started by systemd.
It causes various problems with behaviour of a RAID volume with external metadata.
For example: mdmon does not update a value of checkpoint during migration
and second RAID5 volume is read-only after reboot done during
container reshape (both problems occur with IMSM matadata).
If a type of process start-up is changed to 'forking', systemctl will
wait until mdmon (parent) process exits after calling fork.
This way mdmon will always be fully initialized after start_mdmon
and these problems will not occur.
In this case it is recommended to add a path to PIDFile, so that systemd
does not have to guess a PID of the mdmon process.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This means that
st->ss->getinfo_super(st, content, NULL);
clean = content->array.state & 1;
will get an up-to-date value for 'clean'. This fix allows
tests/03r5assem-failed
to work.
Signed-off-by: NeilBrown <neilb@suse.de>
When we return in error, we need to free(tst), and ->free_super(tst);
Sometimes we didn't.
Also the final ->free_super(tst) should be followed by free(tst)
but wasn't.
Move that file free forward in the code a bit as we will want to use
the tst there in the next patch.
Signed-off-by: NeilBrown <neilb@suse.de>
The given 'st' might not be best. Making this interface change
will allow load_devices to return a better 'st'.
Signed-off-by: NeilBrown <neilb@suse.de>
This patch will remove some legacy code.
It is part of the verbosity "cleanup".
In any case, if information about the P
and Q parity mismatches is required, it
should go inside the code handling page
size blocks, not full stripe size.
Signed-off-by: NeilBrown <neilb@suse.de>
It could be better to make sure the
data reaches the disks, so open the
drives with O_SYNC flag.
Signed off: piergiorgio.sartor@nexgo.de
Signed-off-by: NeilBrown <neilb@suse.de>
In the transition to 4K page processing,
the Q parity generation had a wrong offset
in the buffer.
This patche fix this.
Signed off: piergiorgio.sartor@nexgo.de
Signed-off-by: NeilBrown <neilb@suse.de>
This patch make a bit more clear
the position, in the disk, where
an error is found.
Signed off: piergiorgio.sartor@nexgo.de
Signed-off-by: NeilBrown <neilb@suse.de>
This patch removes some printouts, which
are not really useful here.
These could be re-added later, in case a
verbosity parameter will be provided.
Signed off: piergiorgio.sartor@nexgo.de
Signed-off-by: NeilBrown <neilb@suse.de>
raid6check current performs checks and repair on a whole chunk at a
time. This is often not ideal as corruption can happen with smaller
granularity.
This patches changes raid6check to use a page-size (4K) granularity.
We still process a chunk at a time, but within each chunk we process a
page at a time.
Signed-off-by: NeilBrown <neilb@suse.de>
Redirecting output to /dev/null is unnecessary and hides any error
messages there might be. So leave as defaults which are none,
journal, inherit.
Signed-off-by: NeilBrown <neilb@suse.de>
As mdmon doesn't inherit environment from mdadm when it is started
by system, it cannot inherit IMSM_NO_PLATFORM.
But if an imsm array as assembled then mdmon really should handle it
whether there is a platform present or not.
So always set this var.
Signed-off-by: NeilBrown <neilb@suse.de>
When run with --foreground mdmon has no need to notify any
parent, so it shouldn't even try, let alone complain when it fails.
Also close an end of a pipe which is no longer used.
Signed-off-by: NeilBrown <neilb@suse.de>
'missing' devices are in a different list so when collection the
serial numbers of all devices we need to check both lists.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ when unfreezing, make sure the array is frozen first.
If it isn't we might end up interrupting a reshape.
2/ When the child finishes, don't call abort_reshape() as that
will interrupt the reshape. Just set suspend_* etc
explicitly.
Signed-off-by: NeilBrown <neilb@suse.de>
When we call "getinfo_super", we report the working/failed status
of the particular device, and also (via the 'map') the working/failed
status of every other device that this metadata is aware of.
It is important that the way we calculate "working or failed" is
consistent.
As it is, getinfo_super_ddf() will report a spare as "working", but
every other device will see it as "failed", which leads to failure to
assemble arrays with spares.
For getinfo_super_ddf (i.e. for the container), a device is assumed
"working" unless flagged as DDF_Failed.
For getinfo_super_ddf_bvd (for a member array), a device is assumed
"failed" unless DDF_Online is set, and DDF_Failed is not set.
Reported-by: "David F." <df7729@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When auto-assembling we loop until we get no successes.
If a device is found that look like it is part of an already-existing
container, but we subsequently fail to add that device, then the fact
that the container is running looks like a success. This can result
in infinite looping.
So if a container was already partially assemble, and is still only
partially assembled after we try to add devices, then don't treat that
as success.
Signed-off-by: NeilBrown <neilb@suse.de>
See commit 357ac10678
which made a similar change for super-intel, and really should have
fixed DDF at the same time.
Signed-off-by: NeilBrown <neilb@suse.de>
According to:
commit b451aa4846
Fix handling for "auto" line in mdadm.conf
a NULL path isn't really acceptable and the devname should be used instead.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>