md signals reshape completion (whole area or parts) by setting
sync_completed to 0. This causes in set_array_state() to rollback
metadata changes (super-intel.c:4977. To avoid this do not allow for
set last_checkpoint to 0 if reshape is finished.
This was also root cause of my previous fix for finalization reshape
that I agreed earlier is not necessary,
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When raid0 array is takeovered to raid4 for reshape it should be possible to detect
that array for reshape is monitored now for metadata update.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
For level migration support it is necessary to allow mdmon to react for level changes.
It has to have ability to change configuration of active array,
and for array level change to raid0 finish array monitoring.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We need to allow metadata to handle progress of reshape,
completion, and abort-before-start.
Include all those in ->set_array_state()
Signed-off-by: NeilBrown <neilb@suse.de>
When mdadm starts a reshape, it might add some devices to the array
first. mdmon needs to notice the reshape starting and check for any
new devices. If there are any they need to be provided to be
monitored.
Signed-off-by: NeilBrown <neilb@suse.de>
Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its device
and do not close it before exit(). The period between open and release of
handle is too long and md is not able stop device. Releasing handle before
exit does not block md.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When sync_action is idle mdmon takes the latest value of md/resync_start
or md/<dev>/recovery_start to record the resync/rebuild checkpoint in
the metadata. However, now that mdmon is reading sync_completed there
is no longer a need to wait for, or force an idle event to take a
checkpoint.
Simply update the forward progress of ->last_checkpoint at every wakeup
event and force it to be recorded at least every 1/16th array-size
interval. It may be recorded more frequently if a ->set_array_state()
event occurs.
This also cleans up some confusion in handling the dual-rebuild case.
If more than one spare has been activated the kernel starts the rebuild
at the lowest recovery offset, so we do not need to worry about
min_recovery_start().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The kernel updates and notifies md/sync_completed when it is time to
take a checkpoint. When this occurs (at 1/16 array size intervals)
write 'idle' to md/sync_action to have the current recovery position
updated in recovery_start and resync_start.
Requires the metadata handler to reset ->last_checkpoint when it has
determined that recovery has ended.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that we don't "mdadm --takeover" until /var/run is writable
there is no need to continually try to create files in there.
So only create these files at startup and fail if they cannot be
made. This means that to start an array with externally managed
metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be
writable. To 'takeover' from a previous mdmon instance, /var/run
must be writable.
This means we don't need to worry about SIGHUP (which was once used to
tell us it was time to create .pid) and SIGALRM.
Signed-off-by: NeilBrown <neilb@suse.de>
Replace occurrences of ~0ULL to make it clear we are talking about maximal
resync/recovery position.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We don't need to sprinkle reads of this attribute all over the place,
just once at the entry of read_and_act(). Also, the mdinfo structure
for the array already has a 'resync_start' member, so just reuse that.
Finally, rename get_resync_start() to read_resync_start to make it
consistent with the other sysfs accessors in monitor.c.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From 2.6.30, /proc/mounts and various /sys files will
probably always returns 'readable' to select, so we will need
to wait on POLLPRI to get the 'new data is available' signal.
When using select, this corresponds to an 'exception', so
adjust calls to select accordingly.
In one case we sometimes wait on a socket and sometime on
/proc/mounts, so we need to test which.
Signed-off-by: NeilBrown <neilb@suse.de>
Starting with 2.6.30 the md/resync_start attribute will no longer return
a non-sensical number when resync is complete, instead it now returns
'none'.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
mdmon may miss events because it re-reads state after read_and_act. The
additional read is used to determine dirty status before allowing a
sigterm to proceed. Since read_and_act is in the best position to
determine 'dirty' status and its return value is not used, modify it to
return true if the array is dirty.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that names in /dev are usually created (eventually) by udev,
it isn't really safe to rely in finding a name in /dev to pass to
mdmon to identify which array to monitor.
And it isn't really necessary to have a name in /dev.
So just pass the symbolic name, e.g. md127 or md123.
Change util.c to pass that name, and change mdmon to process the
name sensibly.
Signed-off-by: NeilBrown <neilb@suse.de>
We generally don't want mdmon to be terminated, but if a SIGTERM gets
through try to leave the monitored arrays in a clean state, block
attempts to mark the array dirty, and stop servicing the socket.
When we are killed by sigterm don't remove the pidfile let that be
cleaned up by the next monitor.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Made the mistake of recompiling the F9 mdadm rpm which has a patch to
remove -Werror and add "-Wp,-D_FORTIFY_SOURCE -O2" which turns on lots
of errors:
config.c:568: warning: ignoring return value of asprintf
Assemble.c:411: warning: ignoring return value of asprintf
Assemble.c:413: warning: ignoring return value of asprintf
super0.c:549: warning: ignoring return value of posix_memalign
super0.c:742: warning: ignoring return value of posix_memalign
super0.c:812: warning: ignoring return value of posix_memalign
super1.c:692: warning: ignoring return value of posix_memalign
super1.c:1039: warning: ignoring return value of posix_memalign
super1.c:1155: warning: ignoring return value of posix_memalign
super-ddf.c:508: warning: ignoring return value of posix_memalign
super-ddf.c:645: warning: ignoring return value of posix_memalign
super-ddf.c:696: warning: ignoring return value of posix_memalign
super-ddf.c:715: warning: ignoring return value of posix_memalign
super-ddf.c:1476: warning: ignoring return value of posix_memalign
super-ddf.c:1603: warning: ignoring return value of posix_memalign
super-ddf.c:1614: warning: ignoring return value of posix_memalign
super-ddf.c:1842: warning: ignoring return value of posix_memalign
super-ddf.c:2013: warning: ignoring return value of posix_memalign
super-ddf.c:2140: warning: ignoring return value of write
super-ddf.c:2143: warning: ignoring return value of write
super-ddf.c:2147: warning: ignoring return value of write
super-ddf.c:2150: warning: ignoring return value of write
super-ddf.c:2162: warning: ignoring return value of write
super-ddf.c:2169: warning: ignoring return value of write
super-ddf.c:2172: warning: ignoring return value of write
super-ddf.c:2176: warning: ignoring return value of write
super-ddf.c:2181: warning: ignoring return value of write
super-ddf.c:2686: warning: ignoring return value of posix_memalign
super-ddf.c:2690: warning: ignoring return value of write
super-ddf.c:3070: warning: ignoring return value of posix_memalign
super-ddf.c:3254: warning: ignoring return value of posix_memalign
bitmap.c:128: warning: ignoring return value of posix_memalign
mdmon.c:94: warning: ignoring return value of write
mdmon.c:221: warning: ignoring return value of pipe
mdmon.c:327: warning: ignoring return value of write
mdmon.c:330: warning: ignoring return value of chdir
mdmon.c:335: warning: ignoring return value of dup
monitor.c:415: warning: rv may be used uninitialized in this function
...some of these like the write() ones are not so trivial so save those
fixes for the next patch.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
For use in distro shutdown scripts with a RAID root file system.
Returns immediately if the array is 'readonly', or not an externally
managed array. It is up to the distro's scripts to make sure no new
writes hit the device after this returns 'true'.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If the metadata_version is
-mdXXX/whatever
rather than
/mdXXX/whatever
then the array is readonly and should be left alone by mdmon.
Signed-off-by: NeilBrown <neilb@suse.de>
When we first start an array, it might be good to start recovery
straight away. That requires setting the array to 'dirty', but
only the metadata handler can know if that is required or not.
So have a third possible 'consistent' option to set_array_state.
Either 'no' or 'yes' or 'you choose'.
Return value indicates what was chosen.
'1' (no) should be chosen unless there is a good reason.
Signed-off-by: NeilBrown <neilb@suse.de>
Transition readauto arrays to active before failing drives.
Hmm... why do we keep reblocking / renotifying in the readonly case?
Need to bottom out on this, but not right now.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Disks that are not in-sync or failed are not assembled into member
arrays by mdadm. Teach mdmon to resolve this situation by checking for
spares at start. imsm_activate_spare() is updated to prefer devices
that can be re-added versus new spares.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Removes the need for the call to ->set_array_state when sync_action
transitions from 'recover' to 'idle'.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If the array is shutdown as soon as resync finishes, we might not
notice the resync finish. So on array shutdown, check for current
resync pos.
Signed-off-by: Neil Brown <neilb@suse.de>
When a 'ping' (empty message) is sent to mdmon, we wait for
'monitor' to do a full loop to make sure it has caught up
with anything that needs doing.
This allows synchronisation between mdadm and mdmon.
Maybe monitor should signal managemon rather than managemon polling...
Signed-off-by: Neil Brown <neilb@suse.de>
The returned value was never used, and we don't really want
this return path anyway as writing to a pipe could conceivably
block, and the monitor must not block.
This really should be done in mdadm, not mdmon.
We ensure the device won't be suddenly commited as a hot-spare
using O_EXCL, then check the 'holders' sysfs directory
to make sure it is only in use once.
for development only as console output can block leading to monitor deadlocks
in low mem situations
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Code in manager can now just call queue_metadata_update with a
(freeable) buf holding the update, and it will get passed to the
monitor and written out.
"sync_complete" just tracks the current resync/recover/check/whatever pass.
"resync_start" tracks which parts of the array are known to be in-sync
(modulo active writes). So it is what we need to use to update the metadata.
Also we cannot call it when the array has stopped, as the value is no longer
available then. We must call it when the resync completes.
Possibly also call it preiodically if the array is quiescent.
mark_dirty is just a special case of mark_clean - with sync_pos == 0.
mark_sync is not required. We don't modify the metadata when sync
finishes. Only when the array becomes non-writeable at which point we
use mark_clean to record how far the resync progressed.