mdadm

Commit Graph

Author	SHA1	Message	Date
NeilBrown	879982efa9	Don't lie to systemd about mdadm's status. Telling systemd that mdadm was started from the initrd is often a lie and never necessary. Now that the reshape monitoring thread handles SIGTERM gracefully it is OK for system to kill and mdadm that it finds running. mdmon still have a bit of a question mark over it so I won't remove the '@' from there just yet. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-01 14:04:07 +10:00
NeilBrown	84d11e6c6a	Grow: exit background thread cleanly on SIGTERM. If the mdadm thread that monitors a reshape gets SIGTERM it should exit cleanly and clear the 'suspended' region of the array. However it mustn't clear 'sync_max' as that would allow the reshape to continue unmonitored. If the thread ever does get killed, the array should really be shutdown soon after if possible. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-01 13:58:10 +10:00
Martin Wilck	d81cc6a72e	tests/env-ddf-template: helper for new unit test I forgot to check in this helper script, similar to the one for IMSM. It is needed by tests/10ddf-create-fail-rebuild. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 16:48:57 +10:00
Martin Wilck	08b0ef8e50	tests/10ddf-create-fail-rebuild: new unit test for DDF This test adds a new unit test similar to 009imsm-create-fail-rebuild. With the previous patches, it actually succeeds on my system. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 13:04:07 +10:00
Martin Wilck	6ca1e6eccb	mdmon: manage_member: fix race condition during slow meta data writes In order to track kernel state changes, the monitor needs to notice changes in sysfs. If the changes are transient, and the monitor is busy writing meta data, it can happen that the changes are missed. This will cause the meta data to be inconsistent with the real state of the array. I can reproduce this in a test scenario with a DDF container and two subarrays, where I set a disk to "failed" and then add a global hot-spare. On a typical MD test setup with loop devices, I can reliably reproduce a failure where the metadata show degraded members although the kernel finished the recovery successfully. This patch fixes this problem by applying two changes. First, when a metadata update is queued, wait until it is certain that the monitor actually applied these meta data (the for loop is actually needed to avoid failures completely in my test case). Second, after triggering the recovery, set prev_state of the changed array to "recover", in case the monitor misses the transient "recover" state. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 13:00:46 +10:00
Martin Wilck	30b83120ed	mdmon: manage_member: debug messages for array state Add debug messages to watch the manager's steps. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 13:00:32 +10:00
Martin Wilck	c371936051	mdmon: wait_and_act: fix debug message for SIGUSR1 Correctly print out wake reason if it was a signal. Previous code would print misleading select events (pselect(2) man page says the fdsets become undefined in case of error). Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:59:40 +10:00
Martin Wilck	39da26ecf5	monitor: read_and_act: log status when called read_and_act() currently prints a debug message only very late. Print the status seen by mdmon right away, to track mdmon's actions more closely. Add a time stamp to observe long delays between read_and_act calls, e.g. caused by meta data writes. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:57:20 +10:00
Martin Wilck	ce6844b99c	DDF: ddf_set_disk: add some debug messages Adds more verbose debugging in ddf_set_disk, to understand failures better. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
Martin Wilck	0e5fa86239	DDF: load_ddf_header: more error logging Try to determine problem if load_ddf_header fails. May be useful for determining compatibility problems with Fake RAID BIOSes. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
Martin Wilck	0847945b8e	DDF: ddf_process_update: log offsets for conf changes I needed this for tracking a bug with wrong offsets after array creation. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
Martin Wilck	2a645ee220	DDF: log disk status changes more nicely In particular, include refnum for better tracking. This makes it a little easier for humans to track what happened to which disk. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
Martin Wilck	6f56dbb970	DDF: ddf_activate_spare: bugfix for `62ff3c40` Move the check for good drives in the dl loop - otherwise dl may be NULL and mdmon may crash. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
NeilBrown	71d68ff62f	Fix is_resync_complete for RAID10 For RAID10, 'sync' numbers go up to the array size rather than the component size. is_resync_complete() needs to allow for this. Reported-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 09:22:18 +10:00
Jes Sorensen	364a48c992	Avoid double close() Coverity discovered a possible double close(fd2) in Grow.c. Avoided by invalidating fd2 after the first close. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 08:51:16 +10:00
NeilBrown	23bf42cc79	super1: simplify setting of array size. Currently the extra space to leave before the data in the array is calculated in two separate places, and they can be inconsistent. Instead, do it all in validate_geometry. This records the 'data_offset' chosen which all other devices then use. 'write_init_super' now just uses the value rather than doing all the calculations again. This results in more consistent numbers. Also, load_super sets st->data_offset so that it is used by "--add", so the new device has a data offset matching a pre-existing device. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 17:05:47 +10:00
NeilBrown	641da74591	super1: separate to version of _avail_space1(). _avail_space1() is calls from both avail_space1() and validate_geometry1() and does slightly different things. The partial code sharing doesn't really help. In particularly the responsibility for setting the size of the array is currently confused. So duplicate the code into the two locations - one where 'super' is always NULL (validate_geometry1) and one where it is never NULL (avail_space1), and simplify. No behaviour change - just code re-organisation. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 15:59:03 +10:00
NeilBrown	7ccc4cc4fc	Manage: remove call to validate_geometry. This call to validate_geometry is really rather gratuitous. It is purely about the fact that super0 cannot use more than 4TB. So just make it an explicit test - less confusing that way. With this, validate_geometry is only called from Create, which makes it easier to reason about. Also validate_geometry is now never passed NULL for the 'chunk' parameter, so we can remove those annoying tests for NULL. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 13:45:22 +10:00
mwilck@arcor.de	0c78849f2b	DDF: ddf_activate_spare: fix metadata update for SVDs Metadata updates for secondary RAID (RAID10) need to cover all BVDs. Compare with code in write_init_super_ddf(). Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 10:57:14 +10:00
mwilck@arcor.de	62ff3c40c1	DDF: ddf_activate_spare: only activate good drives Do not try to activate drives marked missing or failed. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 10:57:13 +10:00
mwilck@arcor.de	7733b91d37	DDF: ddf_activate_spare: Add RAID10 code The check for degraded array is a bit more complex for RAID10. Fixing it. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 10:57:13 +10:00
mwilck@arcor.de	84e32e1977	DDF: find_vdcr: fix minor bug in debug message This code could find disk -1. Fixed. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 10:57:13 +10:00
NeilBrown	4b1679dd39	Change version to 3.3-rc2 Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-25 17:55:18 +10:00
NeilBrown	482383022d	Add test for --replace handling. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-24 15:32:31 +10:00
NeilBrown	51425978e5	Manage: fix typo in error for "--with" handling Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-24 15:32:26 +10:00
NeilBrown	7d7092ec6c	Improve revert tests 1/ perform revert-grow on more metadata versions 2/ add revert-inplace. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-24 12:23:04 +10:00
NeilBrown	2bf62891c1	super0/1: fix typo in error messages. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-24 12:22:58 +10:00
NeilBrown	3377ee4248	Grow: don't hold array open while waiting for reshape. If we will need to change array level when a reshape completes, a copy of mdadm waits in the background. Currently this copy hold the device (/dev/mdX) open. This prevents the array from being stopped. So close the file descriptor and re-open after the reshape completes. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-24 12:21:10 +10:00
NeilBrown	419e018284	super1: update data_size when performing "revert-reshape". The "data_size" is with respect to "data_offset". When the kernel changes "data_offset" it modifies "data_size" to match - see md_finish_reshape() in the kernel. So when mdadm switches the data_offset for the new data_offset, it must update data_size correspondingly. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-24 10:21:27 +10:00
NeilBrown	4441541f1f	super-ddf: allow mdassemble to compile. Just add/move some #ifdefs and move some code. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-23 14:00:56 +10:00
mwilck@arcor.de	a8173e4349	DDF: convert big-endian __u16 to be16 type Last step of endian-safe recoding. This requires also bit operations. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-23 13:53:34 +10:00
mwilck@arcor.de	9d0c6b7071	DDF: convert big-endian __u64 to be64 type Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-23 13:53:32 +10:00
mwilck@arcor.de	60931cf94a	DDF: convert big endian to be32 type Part 2 of endianness-safe conversion Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-23 13:49:41 +10:00
mwilck@arcor.de	4d1bdc1840	DDF: add endian-safe typedefs This adds typedefs for big-endian numbers. This will hopefully reduce the number of endianness bugs I make. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-23 13:49:11 +10:00
mwilck@arcor.de	5c41684539	tests/10ddf-geometry: new unit test Test various RAID geometries, creation and deletion of subarrays Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
mwilck@arcor.de	840ad583e0	test: increase number of devices to 13 extended DDF/RAID10 tests need 6 disks for DDF. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
mwilck@arcor.de	abbc450fc2	tests/10ddf-create: create RAID5 first Let the first created array be RAID5 rather than RAID0. This makes the test harder than before, because everything after the first Create has do be done indirectly through mdmon. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
mwilck@arcor.de	fbf0c2a7ac	DDF: getinfo_super_ddf_bvd: fix offset calculation for SVDs Fix a bug that caused the wrong conf record to be used to derive data offset and size on secondary RAID (RAID10). Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
mwilck@arcor.de	6a350d82b9	DDF: kill_subarray_ddf: fix case without mdmon running When mdmon wasn't runnning, meta data wasn't committed to disk. Fixed. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
mwilck@arcor.de	2aba583f28	DDF: err_bad_md_layout: fix return value This function must use -1 to indicate failure. Fix it. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
mwilck@arcor.de	9bf3870442	DDF: factor out writing super block to single disk Factor out single disk from __write_init_super_ddf to a new function _write_super_to_disk. Use this function in store_super_ddf. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
mwilck@arcor.de	8e9387ac9f	DDF: make "null_aligned" a static buffer Use a static buffer for this "zero page". This makes it easier to factor out the header writing code. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
mwilck@arcor.de	35c3606df7	DDF: increase seq number in ddf_set_updates_pending Increase seq number only when there's actually a metadata change. This is better then increasing it at every write. This also fixes another endianness bug. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-22 16:56:32 +10:00
NeilBrown	57666a41b2	Merge commit '956a13fb850321bed8568dfa8692c0c323538d7c'	2013-07-15 11:39:50 +10:00
NeilBrown	6fd2a36f9b	test: allow resync/reshape etc to go faster. Whenever we "check wait" - make the resync process go at full speed. Also allow script to adjust it manually. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-11 13:24:40 +10:00
NeilBrown	ca36d70735	Grow: pass INVALID_SECTORS to reshape_array, not 0. '0' means 'make it 0', which isn't what we want here. We want 'leave it unchanged'. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-11 12:42:12 +10:00
NeilBrown	85ca499c6b	IMSM: fix wait_for_reshape_imsm This was waiting on "reshape_position" which doesn't get update events. Before sysfs_wait was introduced, the code to wait didn't wait at all, so it spun. With sysfs_wait, it would wait forever. Change to wait in sync_completed which does get events. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-11 12:26:15 +10:00
Christoph Anton Mitterer	956a13fb85	align spelling of “RAID” and RAID levels * Aligned the spelling of “RAID” to use captial letters in all places. * Aligned the spelling of the RAID level names (LINEAR, RAID1, …) to use capital letters in all places, except for the string “faulty” in places where not the RAID level was meant. Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>	2013-07-10 23:32:22 +02:00
NeilBrown	3afaff930c	Stop: fix up synchronising end of reshape to good boundary. If we stop too soon after reshape starts (probably only during testing), we can get confused by the status of the reshape. If that might be happening - sleep a bit longer. Also allow for reshape going unusually slowly (again, probably only during testing). Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-10 16:28:25 +10:00
NeilBrown	a7a0d8a116	Grow: use mdstat_wait to wait for delayed reshape. Having a fix time for a wait is clumsy and can make us wait much too long. So use mdstat_wait and keep the mdstat_fd open. This requires an 'mdstat_close' so it doesn't stay open forever. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-10 11:10:54 +10:00

1 2 3 4 5 ...

2797 Commits All Branches Search

2797 Commits

All Branches