Bug 54712 - RFE: Simplify watchdog configuration on Servers with IPMI compatible hardware
Summary: RFE: Simplify watchdog configuration on Servers with IPMI compatible hardware
Status: NEW
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-10 07:56 UTC by Charles Rose
Modified: 2015-02-17 00:38 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Charles Rose 2012-09-10 07:56:53 UTC
Watchdog hardware on servers can typically be configured in three ways:
1. Configured via module parameters
The OpenIPMI project contains a startup script that loads IPMI kernel modules during startup controlled by /etc/sysconfig/ipmi. This script can optionally load ipmi_watchdog.ko as well.
    /etc/sysconfig/ipmi
	IPMI_WATCHDOG=yes
	IPMI_WATCHDOG_OPTIONS="timeout=300 action=reset nowayout=0"

2. Configured pre-boot
IPMI Watchdog hardware support out-of-band configuration (pre-OS). This is useful where the system admin wants to configure watchdog on systems from a pre-os configuration utility (like use factory set defaults) or remotely with tools like bmc-watchdog(8) for hundreds of systems.

3. Configured via a watchdog daemon
Systemd's RuntimeWatchdogSec, bmc-watchdog(8) or watchdog(5)

In scenarios #1 and #2, the timeout value is already set in the watchdog device (the timer is set to Stopped). But systemd does not currently probe/use this.

For such scenarios, it would be beneficial if systemd can first get the current timeout value (WDIOC_GETTIMEOUT), and if not set, only then set it to RuntimeWatchdogSec. This would ensure that timeout values set via other mechanisms still hold good and the system admin does not have to duplicate the timeout values in /etc/systemd/system.conf (especially for large number of systems, remotely).
Comment 1 Lennart Poettering 2012-09-11 13:26:43 UTC
(In reply to comment #0)
> Watchdog hardware on servers can typically be configured in three ways:
> 1. Configured via module parameters
> The OpenIPMI project contains a startup script that loads IPMI kernel modules
> during startup controlled by /etc/sysconfig/ipmi. This script can optionally
> load ipmi_watchdog.ko as well.
>     /etc/sysconfig/ipmi
>     IPMI_WATCHDOG=yes
>     IPMI_WATCHDOG_OPTIONS="timeout=300 action=reset nowayout=0"

Which code will ping the hw in this case?

Having init scripts that load kernel modules is something we really should try to avoid these days. Modules should be auto-loading depending on hw showing up. Which means the ipmi watchdog module should just be loaded like any other module if IPMI is available, and that makes configuration with a configuration file hard...

I am pretty sure IPMI watchdogs should probably be configured like any other, so I'd prefer if this IPMI-specific config would go away one day...

> 2. Configured pre-boot
> IPMI Watchdog hardware support out-of-band configuration (pre-OS). This is
> useful where the system admin wants to configure watchdog on systems from a
> pre-os configuration utility (like use factory set defaults) or remotely with
> tools like bmc-watchdog(8) for hundreds of systems.

Which code is supposed to ping the hw in this case?
 
> 3. Configured via a watchdog daemon
> Systemd's RuntimeWatchdogSec, bmc-watchdog(8) or watchdog(5)
> 
> In scenarios #1 and #2, the timeout value is already set in the watchdog device
> (the timer is set to Stopped). But systemd does not currently probe/use this.
> 
> For such scenarios, it would be beneficial if systemd can first get the current
> timeout value (WDIOC_GETTIMEOUT), and if not set, only then set it to
> RuntimeWatchdogSec. This would ensure that timeout values set via other
> mechanisms still hold good and the system admin does not have to duplicate the
> timeout values in /etc/systemd/system.conf (especially for large number of
> systems, remotely).

RuntimeWatchdogSec= has two purposes: configure the hw to some interval, and make systemd ping the hw in the right frequency. By default both are off. If you set the time setting then both are turned on. IIUC you want us to do the latter but not the former, right in IPMI setups? This has multiple problems, one of them being that right now we carefully made sure that people can choose any watchdog sw implementation they wish, but if we shall automatically detect a pre-initialized watchdog config and then make use of that we'd take possession when the user doesn't necessarily want us to. Also, this would require us to open the watchdog device first, to see what is configured, and if nothing is close it right-away again. However, that is problematic since some drivers (non IPMI...) don't allow us to close the watchdog device without triggering an immediate reboot. Hence automatically discovering a pre-initialized setting is problematic...
Comment 2 Charles Rose 2012-09-13 08:15:33 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > Watchdog hardware on servers can typically be configured in three ways:
> > 1. Configured via module parameters
...
> >     IPMI_WATCHDOG_OPTIONS="timeout=300 action=reset nowayout=0"
> 
> Which code will ping the hw in this case?

In all cases here, systemd would be the code to ping the hw watchdog. The values to modprobe are more like the defaults.

> 
> Having init scripts that load kernel modules is something we really should try
> to avoid these days. Modules should be auto-loading depending on hw showing up.

I Agree. We are attempting autoload of IPMI:
    https://patchwork.kernel.org/patch/1243021/

> Which means the ipmi watchdog module should just be loaded like any other
> module if IPMI is available, and that makes configuration with a configuration
> file hard...
> 
> I am pretty sure IPMI watchdogs should probably be configured like any other,
> so I'd prefer if this IPMI-specific config would go away one day...

Yes. The ideal case would be where there are no config files to load ipmi or ipmi_watchdog functionality, it all happens automatically. Only configurable option at that time would be RuntimeWatchdogSec with just the timeout.

There are however some options unique to IPMI, like the 'action' parameter, which can set the timeout action (reboot, shutdown, etc.). Defaults are probably good for most cases.

> 
> > 2. Configured pre-boot
> > IPMI Watchdog hardware support out-of-band configuration (pre-OS). This is
> > useful where the system admin wants to configure watchdog on systems from a
> > pre-os configuration utility (like use factory set defaults) or remotely with
> > tools like bmc-watchdog(8) for hundreds of systems.
> 
> Which code is supposed to ping the hw in this case?

systemd.

User would set the timeout to 300s like this:
   # bmc-watchdog --set -i 300

This does not start the timer, just sets the timeout - this is done via /dev/ipmi0 (not /dev/watchdog), so does not rely on the watchdog api and hence does not imply a start timer.

The timer is started only on the first open() (from systemd). This is the desired behaviour.

We want timeout and any other value the user would like to be set via tools like bmc-watchdog/ipmitool/etc., but do the open() (and start the timer) from systemd.

> 
> > 3. Configured via a watchdog daemon
> > Systemd's RuntimeWatchdogSec, bmc-watchdog(8) or watchdog(5)
> > 
...
> RuntimeWatchdogSec= has two purposes: configure the hw to some interval, and
> make systemd ping the hw in the right frequency. By default both are off. If
> you set the time setting then both are turned on. IIUC you want us to do the
> latter but not the former, right in IPMI setups? This has multiple problems,
> one of them being that right now we carefully made sure that people can choose
> any watchdog sw implementation they wish, but if we shall automatically detect
> a pre-initialized watchdog config and then make use of that we'd take
> possession when the user doesn't necessarily want us to. Also, this would
> require us to open the watchdog device first, to see what is configured, and if
> nothing is close it right-away again. However, that is problematic since some
> drivers (non IPMI...) don't allow us to close the watchdog device without
> triggering an immediate reboot. Hence automatically discovering a
> pre-initialized setting is problematic...

I agree.

Your proposal on the mail thread of an 'auto' option sounds like a reasonable compromise.
  
   Auto: I want watchdog functionality, but not sure what the timeout is/should be/take it from the driver.
   Flow: open(); GETTIMEOUT; if (!timeout) SETTIMEOUT

For the long term, if we can get ipmi_watchdog to autoload on hw detect, we can have users set RuntimeWatchdogSec=auto or set a timeout value.
Comment 3 Lennart Poettering 2013-09-10 14:56:10 UTC
So, if I got this right, there are multiple things missing here:

a) systemd doesn't really support watchdog devices that are loaded as kmods, right now. It assumes that all watchdog devices are just there, i.e. built into the kernel as platform devices. iirc this is usually not the case for ipmi watchdogs on RHEL, right?

The way this should be implemented is probably that systemd starts to use udev to watch for watchdog devices and then makes use of all watchdog devices that have a specific udev tag set.

This is probably a more complex patch.

b) if the special watchdog mode to reuse the defaults is used, then we need to invoke WDIOC_GETTIMEOUT to get the default way.

This is probably a much easier task, just a few changes in src/shared/watchdog.c's update_timeout().

Both a) and b) together should make the IPMI watchdog stuff work


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.