mdadm

Commit Graph

Author	SHA1	Message	Date
NeilBrown	8430855db5	Add ANNOUNCE-3.2.6 from different branch just for completeness... Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-26 15:28:43 +10:00
NeilBrown	305ea8b8c3	Add raid6check to .gitignore Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-26 15:26:54 +10:00
NeilBrown	d3786cdcd0	Change "mdadm --run" to use the same code as "mdadm --IRs". Current "mdadm --run /dev/mdX" will not handle external metadata properly. mdmon won't be started etc. So use the code from "mdadm -IRs" instead - that already does all the right things. Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-26 15:24:53 +10:00
NeilBrown	4c0ea7b0d9	super1: fix setting of data_offset for 1.0 metadata. commit `23bf42cc79` super1: simplify setting of array size. removed the setting for sb->data_offset for 1.0 metadata for some reason, and messed up the size calculation for 1.0 metadata too. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-14 17:16:35 +10:00
NeilBrown	632dc30ccf	Fix bug with adding to 0.90 array commit `7ccc4cc4fc` Manage: remove call to validate_geometry. used entirely the wrong number for "4TB" !! Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-14 15:20:02 +10:00
mwilck@arcor.de	5daa35ac1a	DDF: ddf_open_new: check device status for new subarray It is possible that mdadm creates a new subarray containing failed devices. This may happen if a device has failed, but the meta data containing that information hasn't been written out yet. This code tests for this situation, and handles it in the monitor. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-08 10:33:21 +10:00
mwilck@arcor.de	82c8e664cc	tests/10ddf-fail-create-race: test handling of fail/create race If a disk fails and simulaneously a new array is created, a race condition may arise because the meta data on disk doesn't reflect the disk failure yet. This is a test for that case. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-08 09:56:13 +10:00
mwilck@arcor.de	2bcf1873d0	tests/10ddf-fail-spare: more sophisticated result checks This test can succeed two ways, depending on timing. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-08 09:55:33 +10:00
mwilck@arcor.de	e7f9c72024	tests/10ddf-fail-two-spares: new unit test This is one more unit test for failure/recovery, this time with double redundancy, which isn't covered by the other tests. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-08 09:37:12 +10:00
NeilBrown	2538ba2abf	Create: fix warning about pre-existing filesystems. An ext[234] filesystem larger than 2TB was beign reported with a negative size - which looks odd. So fix it to use suitably large and unsigned values. Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-08 09:16:43 +10:00
NeilBrown	3921e41a9d	DDF: Write new conf entries with a single write. The recent change to skip over invalid conf entries was bad because it could leave garbage on the disk. But we don't to write each entry separately as the writes a O_DIRECT and so synchronous so it takes way too long. So allocate a large buffer (probably the one used to read the config records) and fill that then write it all at once. Reported-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-07 17:09:24 +10:00
mwilck@arcor.de	7d8a70bf23	test: allow LVM volumes or RAM disks as test devices Allow other device types for testing; this allows to test on a larger variety of devices. Option --dev=[loop\|lvm\|ram] selects loop device (default), lvm, and ram disk, respecively. To use RAM disks with DDF, the kernel parameter ramdisk_size=65536 must be used. For LVM, use --volgroup=<vg> to specify the name of the volume group in which the test LVs will be created. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-06 09:56:21 +10:00
mwilck@arcor.de	60056e1c3d	DDF: get_extents: don't allocate space on failed disks We should skip known failed disks when allocating space for new arrays. This fixes the problem with 10ddf-fail-spare. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-06 09:54:47 +10:00
mwilck@arcor.de	6b924b1e9f	tests/10ddf-fail-spare: new unit test This is Albert Pauw's latest test. Note that this FAILS. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-06 09:54:42 +10:00
mwilck@arcor.de	635ac286c3	tests/10ddf-fail-twice: remove hard-coded assumptions This test has some randomness because it is not always deterministic which of the two arrays gets the spare and which remains degraded. Handle it. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-06 09:52:58 +10:00
mwilck@arcor.de	ee7d0c09cd	tests/env-ddf-template: some helper functions helper functions to determine the list of devices in an array, etc. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-06 09:52:58 +10:00
NeilBrown	ebf916c526	Makefile: check that 'run' directory exists. mdadm default to using /run/mdadm. However not all distros provide /run yet. This can confuse people who build their own mdadm. So have "make" complain if the given directory doesn't exist. This will make it harder to build an mdadm which doesn't work. Reported-by: Albert Pauw <albert.pauw@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 16:42:12 +10:00
NeilBrown	e49a8a8026	mdmon: don't use 'ghost' values from an inactive array. It is possible for mdmon to see (in /proc/mdstat) and array in 'inactive' state, "mdadm -S" has written "inactive" to "array_state". In this state values such as "raid_disk" are not meaningful and so should be ignored by manage_member(). Reported-by: "Dorau, Lukasz" <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 15:40:16 +10:00
NeilBrown	92939eb291	DDF: fix removal of failed devices. Commit `c7079c84` arrange for DDF to forget about any device that is failed and not still marked as part of any array. However such devices could still be part of the container and this removal and updating of 'pdnum' can result in multiple devices having the same pdnum. This in turn easily leads to confusion and corruption. So only discard pd entries for devices which are failed, not listed in any virtual device, and for which we don't have a handle on the device. pd entries will not get removed until a new device is added after the device has been removed from the container, either by "mdadm --remove" or by assembling without the failed devices. Reported-by: Albert Pauw <albert.pauw@gmail.com> Analysed-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 15:10:05 +10:00
NeilBrown	9540cc244d	test: ensure testing uses correct mdmon When testing we want to run mdmon directly, not use systemctl to get systemd to run it. So allow an environment variable to make that choice. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 14:55:13 +10:00
NeilBrown	4389b648bb	managemon: fix typo affecting incrmental assembly. This clearly should be 'st2'. As it is the 'raid_disk' value being tested is completely meaningless in the context of the new device. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 14:25:15 +10:00
NeilBrown	ce45c81986	DDF: fix writing metadata updates. Recent commit `273989b93a` skipped writing some large blocks of 0xFF, but didn't seek over the space, so subsequent data was written wrongly. When we don't write, we need to seek. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 14:21:10 +10:00
mwilck@arcor.de	57631ca2ff	tests/10ddf-fail-twice: New unit test This is the test by Albert Pauw. Fail 2 disks, and add one. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 12:04:56 +10:00
mwilck@arcor.de	9591a2de77	DDF: no need for GET_LAYOUT any more With the previous patch, mdmon will provide the layout property for us. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 11:31:56 +10:00
mwilck@arcor.de	0c5d6054e4	mdmon: always get layout from sysfs commit `71d68ff62` uses the array layout. It needs to be initialized. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 11:31:56 +10:00
NeilBrown	a9c1584757	mdmon: don't lie to systemd. Now that mdmon responds fairly well to SIGTERM, stop lying to systemd about being started on the initrd. Note that if mdmon is rerun (--takeover) for some reason, and systemd chooses to kill processes before remounting / readonly, then the unmount will hang. If systemd ever lets us tell it that we don't want to be killed until root is readonly, then we should do that. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-01 15:59:24 +10:00
NeilBrown	2ef219630b	mdmon: clear safe_mode_delay on shutdown When we receive a signal, set the safemode delay to v.small so that we can ge clean arrays and exit quickly Signed-off-by: NeilBrown <neilb@suse.de>o	2013-08-01 15:54:24 +10:00
NeilBrown	273989b93a	DDF: differentiate between new metadata and metadata updates. When writing an update, we don't need to overwrite lots of empty fields. This makes updates somewhat faster. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-01 15:33:19 +10:00
NeilBrown	cc83a81972	DDF: use some #defines instead of bare constants. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-01 15:21:24 +10:00
NeilBrown	bc17158dcc	Introduce devid2kname - slightly different to devid2devnm. The purpose od devid2devnm is to return a kernel name of an md device, whether that device is a whole device or a partition, we want the whole device. md4, never md4p2. In one place I was using devid2devnm where I really wanted the partition if there was one ... and wasn't really interested in it being an md device. So introduce a new 'devid2kname' for that case. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-01 14:32:04 +10:00
NeilBrown	879982efa9	Don't lie to systemd about mdadm's status. Telling systemd that mdadm was started from the initrd is often a lie and never necessary. Now that the reshape monitoring thread handles SIGTERM gracefully it is OK for system to kill and mdadm that it finds running. mdmon still have a bit of a question mark over it so I won't remove the '@' from there just yet. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-01 14:04:07 +10:00
NeilBrown	84d11e6c6a	Grow: exit background thread cleanly on SIGTERM. If the mdadm thread that monitors a reshape gets SIGTERM it should exit cleanly and clear the 'suspended' region of the array. However it mustn't clear 'sync_max' as that would allow the reshape to continue unmonitored. If the thread ever does get killed, the array should really be shutdown soon after if possible. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-01 13:58:10 +10:00
Martin Wilck	d81cc6a72e	tests/env-ddf-template: helper for new unit test I forgot to check in this helper script, similar to the one for IMSM. It is needed by tests/10ddf-create-fail-rebuild. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 16:48:57 +10:00
Martin Wilck	08b0ef8e50	tests/10ddf-create-fail-rebuild: new unit test for DDF This test adds a new unit test similar to 009imsm-create-fail-rebuild. With the previous patches, it actually succeeds on my system. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 13:04:07 +10:00
Martin Wilck	6ca1e6eccb	mdmon: manage_member: fix race condition during slow meta data writes In order to track kernel state changes, the monitor needs to notice changes in sysfs. If the changes are transient, and the monitor is busy writing meta data, it can happen that the changes are missed. This will cause the meta data to be inconsistent with the real state of the array. I can reproduce this in a test scenario with a DDF container and two subarrays, where I set a disk to "failed" and then add a global hot-spare. On a typical MD test setup with loop devices, I can reliably reproduce a failure where the metadata show degraded members although the kernel finished the recovery successfully. This patch fixes this problem by applying two changes. First, when a metadata update is queued, wait until it is certain that the monitor actually applied these meta data (the for loop is actually needed to avoid failures completely in my test case). Second, after triggering the recovery, set prev_state of the changed array to "recover", in case the monitor misses the transient "recover" state. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 13:00:46 +10:00
Martin Wilck	30b83120ed	mdmon: manage_member: debug messages for array state Add debug messages to watch the manager's steps. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 13:00:32 +10:00
Martin Wilck	c371936051	mdmon: wait_and_act: fix debug message for SIGUSR1 Correctly print out wake reason if it was a signal. Previous code would print misleading select events (pselect(2) man page says the fdsets become undefined in case of error). Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:59:40 +10:00
Martin Wilck	39da26ecf5	monitor: read_and_act: log status when called read_and_act() currently prints a debug message only very late. Print the status seen by mdmon right away, to track mdmon's actions more closely. Add a time stamp to observe long delays between read_and_act calls, e.g. caused by meta data writes. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:57:20 +10:00
Martin Wilck	ce6844b99c	DDF: ddf_set_disk: add some debug messages Adds more verbose debugging in ddf_set_disk, to understand failures better. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
Martin Wilck	0e5fa86239	DDF: load_ddf_header: more error logging Try to determine problem if load_ddf_header fails. May be useful for determining compatibility problems with Fake RAID BIOSes. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
Martin Wilck	0847945b8e	DDF: ddf_process_update: log offsets for conf changes I needed this for tracking a bug with wrong offsets after array creation. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
Martin Wilck	2a645ee220	DDF: log disk status changes more nicely In particular, include refnum for better tracking. This makes it a little easier for humans to track what happened to which disk. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
Martin Wilck	6f56dbb970	DDF: ddf_activate_spare: bugfix for `62ff3c40` Move the check for good drives in the dl loop - otherwise dl may be NULL and mdmon may crash. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 12:47:44 +10:00
NeilBrown	71d68ff62f	Fix is_resync_complete for RAID10 For RAID10, 'sync' numbers go up to the array size rather than the component size. is_resync_complete() needs to allow for this. Reported-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 09:22:18 +10:00
Jes Sorensen	364a48c992	Avoid double close() Coverity discovered a possible double close(fd2) in Grow.c. Avoided by invalidating fd2 after the first close. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 08:51:16 +10:00
NeilBrown	23bf42cc79	super1: simplify setting of array size. Currently the extra space to leave before the data in the array is calculated in two separate places, and they can be inconsistent. Instead, do it all in validate_geometry. This records the 'data_offset' chosen which all other devices then use. 'write_init_super' now just uses the value rather than doing all the calculations again. This results in more consistent numbers. Also, load_super sets st->data_offset so that it is used by "--add", so the new device has a data offset matching a pre-existing device. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 17:05:47 +10:00
NeilBrown	641da74591	super1: separate to version of _avail_space1(). _avail_space1() is calls from both avail_space1() and validate_geometry1() and does slightly different things. The partial code sharing doesn't really help. In particularly the responsibility for setting the size of the array is currently confused. So duplicate the code into the two locations - one where 'super' is always NULL (validate_geometry1) and one where it is never NULL (avail_space1), and simplify. No behaviour change - just code re-organisation. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 15:59:03 +10:00
NeilBrown	7ccc4cc4fc	Manage: remove call to validate_geometry. This call to validate_geometry is really rather gratuitous. It is purely about the fact that super0 cannot use more than 4TB. So just make it an explicit test - less confusing that way. With this, validate_geometry is only called from Create, which makes it easier to reason about. Also validate_geometry is now never passed NULL for the 'chunk' parameter, so we can remove those annoying tests for NULL. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 13:45:22 +10:00
mwilck@arcor.de	0c78849f2b	DDF: ddf_activate_spare: fix metadata update for SVDs Metadata updates for secondary RAID (RAID10) need to cover all BVDs. Compare with code in write_init_super_ddf(). Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 10:57:14 +10:00
mwilck@arcor.de	62ff3c40c1	DDF: ddf_activate_spare: only activate good drives Do not try to activate drives marked missing or failed. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 10:57:13 +10:00

1 2 3 4 5 ...

2777 Commits All Branches Search

2777 Commits

All Branches