mdadm

Commit Graph

Author	SHA1	Message	Date
Jes Sorensen	e5eb6857cd	Monitor/check_array: Use working_disks from sysfs sysfs now provides working_disks information, so lets use it too. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 17:15:14 -04:00
Jes Sorensen	b98943a4f8	Monitor/check_array: Get nr_disks, active_disks and spare_disks from sysfs This leaves working_disks and utime missing before we can eliminate check_array()'s call to md_get_array_info() Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 17:06:09 -04:00
Jes Sorensen	12a9d21f4e	Monitor/check_array: Get array_disks from sysfs Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 16:58:55 -04:00
Jes Sorensen	b8e5713c74	Monitor/check_array: Get 'failed_disks' from sysfs Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 16:54:19 -04:00
Jes Sorensen	48bc2ade86	Monitor/check_array: Obtain RAID level from syfs Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 16:52:44 -04:00
Jes Sorensen	aed5f5c34c	Monitor/check_array: Read sysfs entry earlier This will allow us to pull additional info from sysfs, such as level and device info. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 16:51:41 -04:00
Jes Sorensen	826522f0dc	Monitor/check_array: Declate mdinfo instance globally We can pull in more information from sysfs earlier, so move sra to the top. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 16:41:06 -04:00
Jes Sorensen	13e5d8455c	Monitor/check_array: Reduce duplicated error handling Avoid closing fd in multiple places, and duplicating the error message for when a device disappeared. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 16:38:06 -04:00
Jes Sorensen	1830e74b4c	Monitor/check_array: Centralize exit path Improve exit handling to make it easier to share error handling and free sysfs entries later. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 16:25:23 -04:00
Alexey Obitotskiy	4b57ecf6ce	Add sector size as spare selection criterion Add sector size as new spare selection criterion. Assume that 0 means there is no requirement for the sector size in the array. Skip disks with unsuitable sector size when looking for a spare to move across containers. Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com> Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 14:18:38 -04:00
Alexey Obitotskiy	fbfdcb06dc	Allow more spare selection criteria Disks can be moved across containers in order to be used as a spare drive for reubild. At the moment the only requirement checked for such disk is its size (if it matches donor expectations). In order to introduce more criteria rename corresponding superswitch method to more generic name and move function parameter to a structure. This change is a big edit but it doesn't introduce any changes in code logic, it just updates function naming and parameters. Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com> Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-09 14:18:36 -04:00
Jes Sorensen	f27904a53b	Monitor: Code is 80 characters per line Fix up some lines that are too long for no reason, and some that have silly line breaks. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-08 17:52:10 -04:00
Jes Sorensen	b9a0309c7f	Monitor: Use md_array_active() instead of manually fiddling in sysfs This removes a pile of clutter that can easily behandled with a simple check of array_state. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-08 17:41:00 -04:00
Zhilong Liu	9e04ac1c43	mdadm/util: unify stat checking blkdev into function declare function stat_is_blkdev() to integrate repeated stat checking blkdev operations, it returns 'true/1' when it is a block device, and returns 'false/0' when it isn't. The devname is necessary parameter, rdev is optional, parse the pointer of dev_t rdev, if valid, assigned device number to dev_t *rdev, if NULL, ignores. Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-05 11:05:32 -04:00
Jes Sorensen	32141c1765	Retire mdassemble mdassemble doesn't handle container based arrays, no support for sysfs, etc. It has not been actively maintained for years, so time to send it off to retirement. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-04-11 12:54:26 -04:00
Jes Sorensen	dae131379f	sysfs: Make sysfs_init() return an error code Rather than have the caller inspect the returned content, return an error code from sysfs_init(). In addition make all callers actually check it. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>	2017-03-30 16:52:37 -04:00
Jes Sorensen	d97572f5a5	util: Introduce md_get_disk_info() This removes all the inline ioctl calls for GET_DISK_INFO, allowing us to switch to sysfs in one place, and improves type checking. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>	2017-03-29 15:23:50 -04:00
Jes Sorensen	9cd39f0155	util: Introduce md_get_array_info() Remove most direct ioctl calls for GET_ARRAY_INFO, except for one, which will be addressed in the next patch. This is the start of the effort to clean up the use of ioctl calls and introduce a more structured API, which will use sysfs and fall back to ioctl for backup. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>	2017-03-29 14:35:41 -04:00
Zhilong Liu	75dd32a185	mdadm/Monitor: Fix NULL pointer dereference when stat2devnm return NULL Wait(): stat2devnm() returns NULL for non block devices. Check the pointer is valid derefencing it. This can happen when using --wait, such as the 'f' and 'd' file type, causing a core dump. such as: ./mdadm --wait /dev/md/ Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>	2017-03-27 18:24:19 -04:00
Tomasz Majchrzak	52209d6ee1	Monitor: release /proc/mdstat fd when no arrays present If md kernel module is reloaded, /proc/mdstat cannot be accessed ("cat: /proc/mdstat: No such file or directory"). The reason is mdadm monitor still holds a file descriptor to previous /proc/mdstat instance. It leads to really confusing outcome of the following operations - mdadm seems to run without errors, however some udev rules don't get executed and new array doesn't work. Add a check if lseek was successful as it fails if md kernel module has been unloaded - close a file descriptor then. The problem is mdadm monitor doesn't always do it before next operation takes place. To prevent it monitor always releases /proc/mdstat descriptor when there are no arrays to be monitored, just in case driver unload happens in a moment. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>	2016-07-21 11:37:17 -04:00
Jes Sorensen	26c62b8e76	Monitor: Use sysfs_free() to free object returned by sysfs_read() We should always use sysfs_free() to release sysfs_* allocated objects. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>	2016-06-10 14:56:23 -04:00
Xiao Ni	1d13b59960	Fix some type comparison problems As `26714713cd` said, 32 bit signed timestamps will overflow in the year 2038. It already changed the utime and ctime in struct mdu_array_info_s from int to unsigned int. So we need to change the values that compared with them to unsigned int too. Signed-off-by : Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>	2016-02-08 10:49:22 -05:00
NeilBrown	d3f6cf4f9b	Monitor: don't Wait forever on a 'frozen' array. If Wait() finds the array resync is 'frozen', then wait a little while to avoid races, but don't wait forever. Signed-off-by: NeilBrown <neilb@suse.com>	2015-07-06 13:26:41 +10:00
Sergey Vidishev	1e08717f0b	mdadm: monitor: fix nullptr dereference when get_md_name() returns NULL Function add_new_arrays() expects that function get_md_name() should return pointer to devname, but also get_md_name() may return NULL. So check the pointer before use it in add_new_arrays(). Signed-off-by: Sergey Vidishev <sergeyv@yandex-team.ru> Signed-off-by: NeilBrown <neilb@suse.de>	2015-05-20 13:16:09 +10:00
NeilBrown	04e27c2084	Monitor: use the "space protocol" for "Wrong-Level". "Wrong-Level" is a reason, not a component device, so it should start with a space to indiciate this to alert(). Signed-off-by: NeilBrown <neilb@suse.de>	2015-04-08 09:18:55 +10:00
NeilBrown	b033913a3c	Monitor: Obey "space protocol" when writing to syslog. "alert" treats the "disc" arg differently if it starts with a space. At least it does for sending email. It doesn't for writing to syslog. Make this consistent and obey the 'space protocol' when writing to syslog. Signed-off-by: NeilBrown <neilb@suse.de>	2015-04-08 09:17:17 +10:00
NeilBrown	7a862a020f	Don't break long strings onto multiple lines. It is best to keep strings all together so that they are easier to search for in the source code. If a string is so long that it looks ugly one line, them maybe it should be broken into multiple lines for display too. Only strings which contain a newline can be broken into multiple lines: "It is OK to\n" "break this string\n" Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-12 13:46:53 +11:00
Pawel Baldysiak	d56dd607ba	Change way of printing name of a process Sometimes mdadm prints messages with wrong name "mdmon", and vice versa. This patch solves this problem by changing method of determining process name. Now "Name" will be set in const at start of a program, previously was hardcoded as #define. Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-12 12:11:01 +11:00
Artur Paszkiewicz	19d3ea0f0b	Monitor: fix for regression with container devices This patch fixes 2 problems introduced by commit 9a518d8: not closing a file descriptor and ignoring container devices. Array state is always "inactive" for containers, so we make sure that the device is not a container by reading also the "level" sysfs entry. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-11 15:27:57 +11:00
NeilBrown	9a518d81fe	Monitor: don't open md array that doesn't exist. Opening a block-special-device for an array that doesn't exist causes that array to be instantiated (as an empty array). Races at array shutdown can cause the array to spontaneously re-appear if some deamon notices a 'change' event and goes to investigate. Teach "mdadm --monitor" to avoid this race by checking the "array_state" before opening the device. Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-11-25 11:44:29 +11:00
NeilBrown	73ff073271	Monitor: Stop monitoring devices that have disappeared. If we are only monitoring a device because we found it in /proc/mdstat, and it has been gone for 5 checks, forget about it completely. Signed-off-by: NeilBrown <neilb@suse.de>	2014-08-14 15:36:09 +10:00
NeilBrown	efc67e8e9f	New function: sysfs_wait We have several places that wait for activity on a sysfs file. Combine most of these into a single 'sysfs_wait' function. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-01 13:28:13 +10:00
NeilBrown	1011e8344a	Remove lots of unnecessary white space. Now that I am using white-space mode in Emacs I can see all of this, and I don't like it :-) Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-19 12:31:45 +10:00
NeilBrown	276be5147e	Wait: also wait if an action is about to start. If a sync/recover action is about to start but hasn't actually begun yet, /proc/mdstat won't show it, but md/sync_action will (it checks MD_RECOVERY_NEEDED). So when /proc/mdstat seems to say nothing is happening, double check with md/sync_action. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-01 10:23:40 +10:00
NeilBrown	4dd2df0966	Discard devnum in favour of devnm We widely use a "devnum" which is 0 or +ve for md%d devices and -ve for md_d%d devices. But I want to be able to use md_%s device names. So get rid of devnum (a number) and use devnm (a 32char string). eg. md0 md_d2 md_home Signed-off-by: NeilBrown <neilb@suse.de>	2013-02-21 17:05:23 +11:00
NeilBrown	639c3c103a	Allow --wait to wait for delayed resync. If a resync is delayed, then e->percent will be negative but not RESYNC_NONE. In that case we still want to wait. Reported-by: Ross Boylan <ross@biostat.ucsf.edu> Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-22 08:58:54 +11:00
NeilBrown	f1661bd71b	Monitor: don't complain about non-monitorable arrays in mdadm.conf If we are asked to monitor a RAID0 or Linear - which cannot be monitored - we complain with "Device Disappeared .... Wrong-Level". However if the RAID0 or Linear is being requested because it is in mdadm.conf then the message is inappropriate and confusing. So track which arrays are added from the config file, and suppress that message in that case. Reported-by: "Johnson Yan" <johnson_yan@usish.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-10-24 13:09:09 +11:00
NeilBrown	95c5020544	Change Monitor to take a struct context Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-09 17:20:19 +10:00
NeilBrown	503975b9d5	Remove scattered checks for malloc success. malloc should never fail, and if it does it is unlikely that anything else useful can be done. Best approach is to abort and let some super-daemon restart. So define xmalloc, xcalloc, xrealloc, xstrdup which don't fail but just print a message and exit. Then use those removing all the tests for failure. Also replace all "malloc;memset" sequences with 'xcalloc'. Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-09 17:14:16 +10:00
NeilBrown	e7b84f9d50	Introduce pr_err for printing error messages. 'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": ' cont_err() is also available. Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-09 17:14:16 +10:00
NeilBrown	721b662b5b	Monitor: fix reporting for Fail vs FailSpare etc. The tests here were specific to 0.90 metadata and didn't work properly for 1.x metadata, where a device's "number" doesn't change. By checking if this is a new array we can avoid some corner cases. Then we test mostly based on state and not based on 'number' at all. Signed-off-by: NeilBrown <neilb@suse.de>	2012-06-04 12:57:52 +10:00
NeilBrown	0f760384eb	Monitor: Report NewArray when an array the disappeared, reappears. Signed-off-by: NeilBrown <neilb@suse.de>	2012-06-04 12:52:36 +10:00
NeilBrown	9dad51d418	Monitor: fix inconsistencies in values for ->percent ->percent sometimes stores negative values recording states like 'pending' or 'delayed'. The value '-2' means both 'delayed' and in Monitor, 'unknown'. Also, '-1' has a meaning but not #define. So change the #defines to be prefixed with "RESYNC_", instead of "PROCESS_", add new "_NONE" and "_UNKNOWN", and use correct value in each location. Signed-off-by: NeilBrown <neilb@suse.de>	2012-06-04 12:31:40 +10:00
NeilBrown	b0599bda13	Monitor: Allow correct monitoring of more member devices. Having "MaxDisks == 384" is not good. Discard it in favour of MAX_DISKS which is 4096 Signed-off-by: NeilBrown <neilb@suse.de>	2012-06-04 09:30:56 +10:00
NeilBrown	c2ecf5f61a	Add --prefer option for --detail and --monitor Both --detail and --monitor can report the names of member devices on an array, and do so by searching /dev and finding the shortest name that matches. If --prefer=foo is given, they will instead prefer a name that contain /foo/. So mdadm --detail /dev/md0 --prefer=by-path will list the component devices via their /dev/disk/by-path/xxx names. Signed-off-by: NeilBrown <neilb@suse.de>	2012-04-18 11:00:07 +10:00
Jes Sorensen	0011874f7e	Use MDMON_DIR for pid files created in Monitor.c Other parts of mdadm/mdmon place .pid/.sock files in MDMON_DIR. This makes Monitor.c consistent with the rest. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-02-23 09:05:16 +11:00
Lukasz Dorau	8453f8d0df	fix: Monitor sometimes crashes The "char cnt [40]" buffer is sometimes too small to hold all message - in such case monitor crashes. The buffer must be larger to be able to hold all message. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-01-12 10:40:00 +11:00
Sergey B Kirpichev	d97a5e6050	Report raid level type to syslog on RebuildFinished event. Thus, for RAID1/RAID10 this can be filtered out in logcheck. Relates-to: Debian bug 599821 Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-07 08:41:57 +11:00
Jes Sorensen	b657208c50	Monitor(): free allocated memory on exit Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-11-02 10:48:53 +11:00
NeilBrown	9e6d929127	Check all member devices in enough_fd The loop over all member devices in enough_fd could easily stop before it had found all devices. This would cause --re-add to fail incorrectly. So change the loop to be based on the reported number of devices in the device - with a safe-guard limit of 1024. Change some other loops to be more careful too. Reported-by: "Schmidt, Annemarie" <Annemarie.Schmidt@stratus.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-05-23 17:21:35 +10:00

1 2 3 4

164 Commits