Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send urgent alerts to docs-ops-critical channel #5052

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sean1588
Copy link
Member

Description

part of: https://github.com/pulumi/devrel-team/issues/893

As a way to help differentiate more critical alerts that need to be addressed with immediacy from those that are less urgent, this will send the production build and deploy pipeline failures as well as any failures to update community packages to the #docs-ops critical channel. Let me know what you think.

Copy link

Your site preview for commit 8bb6296 is ready! 🎉

http://registry--origin-pr-5052-8bb6296c.s3-website.us-west-2.amazonaws.com/registry.

Copy link
Contributor

@cnunciato cnunciato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'd feel great about this, as it implies that the messages that go into docs-ops can be ignored, which needn't be the case -- we just haven't historically treated those messages as drop-everything urgent. I'd rather see us (specifically the on-call person) treat the notifications that go into docs-ops as requiring immediate (or within-the-hour, or similar) review than create a whole new channel for "no, seriously"-level notifications.

If there's a higher level of notification urgency than can be served by the docs-ops channel, then we probably need a higher-level notification mechanism for it -- i.e., paging. Personally I don't think we're there yet, at least not with the events we're currently monitoring (e.g., build failures and the like, which do not warrant waking team members in the middle of the night, as they aren't customer-impacting), but if there were more urgent events to respond to, such as outages, a Slack channel probably wouldn't be the best choice.

I'd rather see us define the events that meet the "critical" bar first, then propose any necessary process adjustments around that, ideally by making better use of the channels we already have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants