mdadm

Commit Graph

Author	SHA1	Message	Date
NeilBrown	0cf5ef67bb	ddf: fix up detection of failed/missing devices. If a device hasn't been found yet we can still tell if it is expected to be working, and we must to do to make sure 'working_disks' is correct. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-22 10:32:09 +11:00
Piergiorgio Sartor	6f38d7ae10	restripe: allow test code to have an offset on each device. If device name ends :number, e.g. /dev/sda0:1234 then assume the RAID data starts that many sectors from start of device. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-22 10:09:38 +11:00
NeilBrown	019ca1e1da	test: call "udevadm settle" after stopping array. If we don't do this, then the unlink from /dev might happen after the next step in the test creates something in /dev, and device names seem to go missing. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-22 10:09:30 +11:00
Piergiorgio Sartor	979afcb82b	RAID-6 check standalone Hi Neil, please find attached a patch, to mdadm-3.2 base, including a standalone versione of the raid-6 check. This is basically a re-working (and hopefully improvement) of the already implemented check in "restripe.c". I splitted the check function into "collect" and "stats", so that the second one could be easily replaced. The API is also simplified. The command line option are reduced, since we only level is raid-6, but the ":offset" option is included. The output reports the block/stripe rotation, P/Q errors and the possible HDD (or unknown). BTW, the patch applies also to the already patched "restripe.c", including the last ":offset" patch (which is not yet in git). Other item is that due to "sysfs.c" linking (see below) the "Makefile" needed some changes, I hope this is not a problem. Next steps (TODO list you like) would be: 1) Add the "sysfs.c" code in order to retrieve the HDDs info from the MD device. It is already linked, together with the whole (mdadm) universe, since it seems it cannot leave alone. I'll need some advice or hint on how to do use it. I checked "sysfs.c", but before I dig deep into it maybe better to have some advice (maybe just one function call will do it). 2) Add the suspend lo/hi control. Fellow John Robinson was suggesting to look into "Grow.c", which I did, but I guess the same story as 1) is valid: better to have some hint on where to look before wasting time. 3) Add a repair option (future). This should have different levels, like "all", "disk", "stripe". That is, fix everything (more or less like "repair"), fix only if a disk is clearly having problems, fix each stripe which has clearly a problem (but maybe different stripes may belong to different HDDs). So, for the point 1) and 2) would be nice to have some more detail on where to look what. Point 3) we will discuss later. Thanks, please consider for inclusion, bye, pg Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-21 13:52:44 +11:00
Labun, Marcin	8a0bf4f378	platform_intel: support EFI SCU OEM variable RstScuV and RstScuO variable names are supported. First try reading from RstScuV, when it fails try RstScuO. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Tested-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-20 15:47:33 +11:00
Adam Kwolek	ceaf0ee19e	imsm: FIX: indicate that metadada has to be written During adding spare disks to raid0, spare metadata is not written. This is due to exit form sync_metadata() on empty updates_pending flag. When mdmon is absent indicate sync_metadata() to flush changes to disks. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-20 15:47:31 +11:00
Adam Kwolek	c0f8269d57	FIX: Add spare throws exception (v2) sync_metadata() requires st->sb to be loaded, otherwise exception is generated. This fails expansion, because spares cannot be added. metadata update uses tst instead st pointer, it is better than loading anchor for st as I proposed previously. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-20 15:47:17 +11:00
Krzysztof Wojcik	1ae42d9d99	Retry writing 'inactive' state during stopping array Issue observed: Sporadicaly stopping arrays using "mdadm -Ss" command does not succeded. Cause: Writting "inactive" to the array state not succeded- array is busy (accessed by udev, blkid etc.) Resolution: If writing 'inactive' fails, wait and retry again (because it is possibly a transient failure) Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-18 12:42:17 +11:00
Adam Kwolek	983fff45a1	FIX: ping_monitor() usage causes memory leaks When for ping_monitor() input devnum2devname() is used, received string pointer should be passed to free() for memory release. It is not made in several places. This use case should have function to avoid memory leak. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-18 12:32:16 +11:00
NeilBrown	d6221e667f	Manage: fix the mess I made in earlier patch. When I separated the 'native metadata' case more cleanly from the "external metadata" case for adding a drive, I left some 'external' code in the 'native' case, and didn't copy it to the 'external' case. When - in the external case - we add to super, we much check for mdmon first, so we know whether to do the metadata update ourselves or not, then afterwards call either flush_metadata_updates (to send to mdmon) or sync_metadata (to do it directly). Reported-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-18 12:31:45 +11:00
NeilBrown	eb0af52689	--stop: separate 'is busy' test for 'did it stop properly'. Stopping an md array requires that there is no other user of it. However with udev and udisks and such there can be transient other users of md devices which can interfere with stopping the array. If there is a transient users, we really want "mdadm --stop" to wait a little while and retry. However if the array is genuinely in-use (e.g. mounted), then we don't want to wait at all - we want to fail immediately. So before trying to stop, re-open device with O_EXCL. If this fails then the device is probably in use, so give up. If it succeeds, but a subsequent STOP_ARRAY fails, then it is possibly a transient failure, so try again for a few seconds. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-17 13:35:10 +11:00
NeilBrown	1d8862cf61	Fix regression when using 'grow' to add a bitmap. When we allowed a devlist to accompany some --grow modes - but not --bitmap - we made --bitmap always fail, in stead of fail of a device was given to add. As 'devs_found' includes the md device, we need to compare against '1'. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 16:31:20 +11:00
NeilBrown	88b496c269	Merge branch 'master' into devel-3.2 Conflicts: Manage.c managemon.c super-ddf.c super-intel.c	2011-03-15 15:35:04 +11:00
NeilBrown	666bba9b50	mdadm.man: added encouragement to shrink filesystem before array. Suggesting by Rory Jaffe <rsjaffe@gmail.com> to make the danger of shrinking, and to recommended avoidance technique, more explicit. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 15:24:03 +11:00
NeilBrown	b0edee6efb	ddf: implement remove_from_super This is needed to remove devices from mdmon's knowledge when the device is removed from the md container. Now that ddf have a remove_from_super we don't need the code that allows some personalities not to implement this. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 15:10:32 +11:00
Labun, Marcin	258c3e921b	IMSM: Fix problem in mdmon monitor of using removed disk in imsm container. Manager thread shall pass the information to monitor thread (mdmon) that some devices are removed from container. Otherwise, monitor (mdmon) might use such devices (spares) to rebuild the array that has gone degraded. This problem happens for imsm containers, since a list of the container disks is maintained in intel_super structure. When array goes degraded, the list is searched to find a spare disks to start rebuild. Without this fix the rebuild could be stared on the spare device that was a member of the container, but has been removed from it. New super type function handler has been introduced to prepare metadata format specific information about removed devices. int (remove_from_super)(struct supertype st, mdu_disk_info_t *dinfo) The message prepared in remove_from_super is later processed by process_update handler in monitor thread. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 15:09:31 +11:00
NeilBrown	33b0edd78a	DDF Allow a RAID1 to be 'partially optimal'. If a RAID1 is meant to have more than 2 device and while it doesn't have that many, it still has more than 1, then according to the DDF spec it is "partially optional" rather than "degraded" So make that so. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 15:09:24 +11:00
NeilBrown	c7079c8441	ddf: remove failed devices that are no longer in use. The DDF spec requires we have a phys disk record for every physically attached device. But it isn't clear what that means in the case of soft raid in a general purpose Linux computer. So remove phys disk records for any failed device that is not active in any array. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 15:02:49 +11:00
NeilBrown	8401644c3a	ddf: set Rebuilding flag when adding devices to a degraded array This is a big fragile, but DDF has wierd rules that we aren't really set up to handle properly. When we add a device to a degraded array it must be a spare, so mark it as Rebuilding. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 14:57:46 +11:00
NeilBrown	e5cc7d469f	ddf: use correct loop variable in activate_spare Using 'i' when you mean 'j' just shows how silly it is to use variables named 'i' and 'j'. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 14:54:46 +11:00
NeilBrown	77632af906	ddf: Don't consider 'dl' entries with state_fd < 0 These have been marked as invalid (recently failed) so don't trust the major/minor associated with them. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 14:53:00 +11:00
NeilBrown	0c4f6e378b	managemon: Don't do spare assignment while any updates are pending. Spare assignment requires full knowledge of array state. A pending update might modify that state (such as a pending spare assignment) so don't try while there are updates pending. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 14:51:12 +11:00
NeilBrown	02c39ab1d5	Manage/external: for external metadata, add_to_super needs lock on container. add_to_super could use information from the current superblock (ddf does), so add_to_super for external metadata should be called with the O_EXCL lock held on the container to ensure the update is complete before any other process tries to make any changes (like adding another device to array). Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 14:48:20 +11:00
Adam Kwolek	e26c926209	imsm: FIX: existing backup file fails unit tests During normal test execution, backup file is deleted after test execution. If test is interrupted/broken, backup file can remain for next run. When backup file exists before unit test run, suits 12 and 13 fails. To avoid this remove backup file before grow is executed. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 08:46:54 +11:00
NeilBrown	4dd968cc54	ddf: implement remove_from_super This is needed to remove devices from mdmon's knowledge when the device is removed from the md container. Now that ddf have a remove_from_super we don't need the code that allows some personalities not to implement this. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:56:16 +11:00
NeilBrown	f50ae22e45	ddf: zero space_list in ddf_activate_spare. Currently ->space_list is uninitialised here, which is obviously bad. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:54:21 +11:00
NeilBrown	de6a92199e	Merge branch 'master' into devel-3.2	2011-03-14 18:49:57 +11:00
NeilBrown	1502a43a08	ddf: set vcnum correctly when creating a new virtual device in conflist We weren't setting ->vcnum at all when an array was added. This meant that a subsequent device failure could be assigned to the wrong array. Reported-by: Albert Pauw <albert.pauw@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:47:47 +11:00
NeilBrown	e1316fab98	ddf: teach set_disk to cope with new or changed devices. When set_disk is called, we need to check if the disk has changed or recently appeared, and update everything properly if it has. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:45:26 +11:00
NeilBrown	8a38cb04de	ddf: free_super should be add_list as well. It is possible there is data and even an open file descriptor on 'add_list' - so it must be freed too. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:32:38 +11:00
NeilBrown	7590d5623b	ddf: minor activate_super fixes. 1/ ignore devices with "state_fd < 0" as these have been removed. 2/ Set update 'length' properly and clear 'space'. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:30:34 +11:00
NeilBrown	e40512fddb	monitor: close recovery_fd when closing state_Fd These should be open or closed together. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:24:01 +11:00
Krzysztof Wojcik	53ed6ac36e	Warn the user about too small array size If single-disk RAID0 or RAID1 array is created, user may preserve data on disk. If array given size covers all partitions on disk, all data will be available on created array. If array size is too small (not covers all partitions), data will be not accessible. This patch introduces warning message during array creation if given size is too small. User may interrupt creation process to avoid data loss. Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:21:21 +11:00
Labun, Marcin	9c747fa0ff	platfrom_intel: find OROM based on Intel AHCI and SAS driver device id We use PCI device id exposed by AHCI and ISCU drivers (SAS controller) to find OROM version table. In this way there is no need to maintain AHCI and ISCU device id list in mdadm. The consequence is that the OROM properties can be found by mdadm when AHCI or SAS drivers are loaded in the system. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:18:46 +11:00
Adam Kwolek	6289d1e079	imsm: FIX: Store checkpoint in per disk units While last_checkpoint is counter in per disk units, checkpoints should be stored in the same manner. Restoring from checkpoint should should recalculate checkpoint in to array position (reshape_progress). Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:17:53 +11:00
Adam Kwolek	0d51bfa20e	FIX: Last_checkpoint has to be initialized in per disk units last_checkpoint is variable that tracks sync_complete sysfs entry. sync_complete is per disk counter, so initializing during starting from checkpoint has to have this in mind and convert reshape position properly. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:17:52 +11:00
Adam Kwolek	138477db4b	FIX: Last checkpoint is not initialized on reshape restart When reshape is restarted and active array in mdmon is being initialized, mdmon has to know last checkpoint, otherwise reshape will be restarted form '0' position. mdadm when reshaped array is assembled stores reshape_position in sysfs and runs mdmon. Initialize last_checkpoint in active array structure to value present in sysfs for reshaped array start. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:12:57 +11:00
Adam Kwolek	bcc9e9edd0	FIX: Unfreeze array on success only Unfreeze array on success only. rv is initialized by restart variable so we have 2 cases. 1. regular reshape start rv == restart == 0 this means that real error (returned by reshape) can cause leaving container frozen If array is not touched by reshape it can be unfrozen 2. During reshape restart even untouched array under reshape is left unfrozen, If reshape is started do not unfreeze array on error also. This allows user for array repair action (mdmon will not change array state). Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:12:57 +11:00
NeilBrown	18cb44962d	ddf: Failed should suppress Online and others. so the notes say, so make it so. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 18:14:43 +11:00
NeilBrown	ca6529edf6	Merge branch 'master' into devel-3.2 Conflicts: Grow.c Manage.c managemon.c mdadm.8.in util.c	2011-03-10 17:37:04 +11:00
NeilBrown	d6508f0cfb	Manage: be more careful about --add attempts. If an --add is requested and a re-add looks promising but fails or cannot possibly succeed, then don't try the add. This avoids inadvertently turning devices into spares when an array is failed but the devices seem to actually work. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:25:40 +11:00
NeilBrown	37e430d163	ddf: remove duplicate container_member setting. We were setting ->container_member twice in ddf get_info. Once to currentconf->vcnum, once to atoi(st->subarray). Both should be the same. For consistency with super-intel, use the first. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:24:44 +11:00
NeilBrown	01afe516ce	Fix warning about host-endian bitmaps. Hostendian bitmaps should be warned about on all arch's. And fix a speeling mistake. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:22:25 +11:00
NeilBrown	0d924c9a93	Grow: give useful message when adding bitmap gives EBUSY. If adding a bitmap fails with EBUSY, then it is because the array is currently resyncing/recovering/reshaping. As this is non-obvious, give a message explaining the fact. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:21:57 +11:00
NeilBrown	1f9476aaf8	Assemble: add --update=no-bitmap This allows an array with a corrupt internal bitmap to be assembled without the bitmap. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:21:43 +11:00
NeilBrown	321e575910	Assemble: call remove_partitions later. We shouldn't call remove_partitions until we have made a really firm decision to include the device into the array. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:21:19 +11:00
NeilBrown	515afde355	mdmon: don't copy an invalid chunk_size As chunk_size in mdstat_ent is never set, we shouldn't copy it into a->info.array. In fact, it is safest to get rid of the field altogether. Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:21:03 +11:00
NeilBrown	002a3de3d4	ddf: fail creation of new subarray with same name as old. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:20:25 +11:00
NeilBrown	e42eb35545	Create: report failure if array cannot be started. We weren't checking the result of writing 'active' to array_state Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:20:07 +11:00
NeilBrown	48b1fc9ddb	Grow: disallow placing backup file on array being reshaped. the tests here aren't perfect, but they could catch some cases. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:19:20 +11:00

... 3 4 5 6 7 ...

2034 Commits All Branches Search

2034 Commits

All Branches