Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system shutdown timestamp #3111

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

SuperQ
Copy link
Member

@SuperQ SuperQ commented Sep 5, 2024

Add a metric for the scheduled shutdown time from systemd.

Add a metric for the scheduled shutdown time from systemd.

Signed-off-by: Ben Kochie <superq@gmail.com>
@anarcat
Copy link
Contributor

anarcat commented Sep 5, 2024

haha excellent. i should really have just dumped my code over the fence last night, as I ended up with something very similar:

modified   collector/systemd_linux.go
@@ -130,6 +130,9 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) {
 	socketRefusedConnectionsDesc := prometheus.NewDesc(
 		prometheus.BuildFQName(namespace, subsystem, "socket_refused_connections_total"),
 		"Total number of refused socket connections", []string{"name"}, nil)
+	scheduledShutdownTime := prometheus.NewDesc(
+		prometheus.BuildFQName(namespace, subsystem, "scheduled_shutdown_timestamp_seconds"),
+		"time of the next scheduled reboot, or zero", []string{"kind"}, nil)
 	systemdVersionDesc := prometheus.NewDesc(
 		prometheus.BuildFQName(namespace, subsystem, "version"),
 		"Detected systemd version", []string{"version"}, nil)
@@ -170,6 +173,7 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) {
 		systemdVersionDesc:            systemdVersionDesc,
 		systemdUnitIncludePattern:     systemdUnitIncludePattern,
 		systemdUnitExcludePattern:     systemdUnitExcludePattern,
+		scheduledShutdownTime:         scheduledShutdownTime,
 		logger:                        logger,
 	}, nil
 }
@@ -194,6 +198,13 @@ func (c *systemdCollector) Update(ch chan<- prometheus.Metric) error {
 		systemdVersion,
 		systemdVersionFull,
 	)
+	shutdownTimestamp, shutdownKind := c.getShutdownTime(conn)
+	ch <- prometheus.MustNewConstMetric(
+		c.scheduledShutdownTime,
+		prometheus.GaugeValue,
+		shutdownTimestamp,
+		shutdownKind,
+	)
 
 	allUnits, err := c.getAllUnits(conn)
 	if err != nil {
@@ -506,3 +517,20 @@ func (c *systemdCollector) getSystemdVersion(conn *dbus.Conn) (float64, string)
 	}
 	return v, version
 }
+
+func (c *systemdCollector) getShutdownTimesamp(conn *dbus.Conn) (float64, string) {
+	timestamp, err := conn.GetManagerProperty("ScheduledShutdown")
+	if err != nil {
+		level.Debug(c.logger).Log("msg", "Unable to get scheduled shutdown time, defaulting to 0")
+		return 0, ""
+	}
+	version = strings.TrimPrefix(strings.TrimSuffix(version, `"`), `"`)
+	level.Debug(c.logger).Log("msg", "Got systemd version", "version", version)
+	parsedVersion := systemdVersionRE.FindString(version)
+	v, err := strconv.ParseFloat(parsedVersion, 64)
+	if err != nil {
+		level.Debug(c.logger).Log("msg", "Got invalid systemd version", "version", version)
+		return 0, ""
+	}
+	return v, version
+}

bewarned there's leftover copy-paste from the version fetch function above. :)

i even filed coreos/go-systemd#447 upstream to figure out how to wrestle that thing out of that byzantine API. :)

thank you so much for working on this!

Copy link
Contributor

@anarcat anarcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great start, stuck like i was at "wtf is this dbus interface gaah" :)

func (c *systemdCollector) collectScheduledShutdownMetrics(conn *dbus.Conn, ch chan<- prometheus.Metric) error {
var shutdownTimeUsec uint64

timestampValue, err := conn.GetServicePropertyContext(context.TODO(), "org.freedesktop.login1", "ScheduledShutdown")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just so you know, i'm not sure this returns a single integer. if it behaves like the commandline tool, it returns a tuple of 3 elements. with a pending reboot:

root@perdulce:~# busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "reboot" 1725545703588789

without:

anarcat@angela:~$ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "" 18446744073709551615

notice how the timestamp is in nanoseconds, and how completely out of whack it is when there's no scheduled shutdown. not sure what's going on there.

the script i wrote in #3110 (comment) does this somewhat properly, and outputs the following metrics, with the first example:

# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero
# TYPE node_shutdown_scheduled_timestamp_seconds gauge
node_shutdown_scheduled_timestamp_seconds{kind=reboot} 1725545703.588789

with the second, it does that:

# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero
# TYPE node_shutdown_scheduled_timestamp_seconds gauge
node_shutdown_scheduled_timestamp_seconds 0

@@ -112,6 +113,11 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) {
"Whether the system is operational (see 'systemctl is-system-running')",
nil, nil,
)
systemShutdownDesc := prometheus.NewDesc(
prometheus.BuildFQName(namespace, subsystem, "system_shutdown_timestamp"),
"Time for a scheduled shutdown (see 'systemctl status systemd-shutdownd.service')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that command outputs:

Unit systemd-shutdownd.service could not be found.

here.

I found a systemd-shutdown(8) manual page, that doesn't much in understanding that component. In fact, I had a frustrating time trying to find any meaningful documentation on how that damn thing works... The org.freedesktop.login1(5) manual page does mention it though:

       ScheduledShutdown shows the value pair set with the
       ScheduleShutdown() method described above.

That's for the property we're (trying to) fetch(ing) here... That method referenced there is:

       ScheduleShutdown() schedules a shutdown operation type at time
       usec in microseconds since the UNIX epoch.  type can be one of
       "poweroff", "dry-poweroff", "reboot", "dry-reboot", "halt", and
       "dry-halt". (The "dry-" variants do not actually execute the
       shutdown action.)  CancelScheduledShutdown() cancels a scheduled
       shutdown. The output parameter cancelled is true if a shutdown
       operation was scheduled.

... which is, frankly, not that much helpful.

@@ -112,6 +113,11 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) {
"Whether the system is operational (see 'systemctl is-system-running')",
nil, nil,
)
systemShutdownDesc := prometheus.NewDesc(
prometheus.BuildFQName(namespace, subsystem, "system_shutdown_timestamp"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that should be system_shutdown_timestamp_seconds, no?

@@ -112,6 +113,11 @@ func NewSystemdCollector(logger log.Logger) (Collector, error) {
"Whether the system is operational (see 'systemctl is-system-running')",
nil, nil,
)
systemShutdownDesc := prometheus.NewDesc(
prometheus.BuildFQName(namespace, subsystem, "system_shutdown_timestamp"),
"Time for a scheduled shutdown (see 'systemctl status systemd-shutdownd.service')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure how to represent the "no shutdown scheduled" state. in my script, i used "zero seconds" as a value for that, but the property returned somehow uses something else (which looks a lot like MAX_INT-1, AKA 2^64-1, AKA 18446744073709551615 ≈ 1,844 674 407 4 × 10^19)

also note that on my laptop, this morning, after the device went to sleep on its own after a timeout, dbus says this:

anarcat@angela:~$ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "suspend" 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants