What is an Incident

Incidents, Alerts, and Events

When Flashduty On-call receives an alert event (such as a Zabbix notification), the system automatically triggers an alert, which in turn triggers an incident. Multiple similar active alerts may be grouped into the same incident for unified assignment, notification, and handling. Simply put: An incident is a combination of similar alerts. Without noise reduction, an incident equals an alert. Conversely, with noise reduction, an incident equals its associated multiple alerts. For more about alert noise reduction, read Understanding Noise Reduction.

Incident Severity, Status, and Progress

Severity

Level	Description
Info	Minor, service is still running normally, just a status reminder, no immediate action needed
Warning	Warning, service may have errors or problems are imminent, should intervene early to prevent escalation
Critical	Critical, widespread service errors or outages, users affected, must take immediate action

Incidents, alerts, and events all use these three severity levels. Severity is capitalized, which is important when using APIs. The severity generation rules are:

Event Severity: Alert events from different integration sources (like Zabbix and Nightingale) have different severity enumerations. Flashduty On-call maps them to these three standard severities according to specific rules. For mapping details, refer to the specific integration documentation. To customize severity, see Alert Processing.
Alert Severity: Equals the highest severity among associated events.
Incident Severity: Equals the highest severity among associated alerts.

Processing Progress

Status	Description
Triggered	After incident triggers, progress defaults to “Triggered”, system initiates automatic assignment, sets responders and sends notifications
Processing	When anyone clicks Acknowledge, progress immediately changes to “Processing”. In this state, responders may be Acknowledged or Unacknowledged, but at least one person is “Acknowledged”. When all responders unacknowledge, progress reverts to “Triggered”
Closed	When anyone clicks Close or incident auto-recovers, progress immediately changes to “Closed”

Incident Status

Alert status represents the incident’s state in the original monitoring system, i.e., “Recovered” or “Not Recovered”. Incident status is completely determined by its associated alerts.

Status	Description
Recovered	All alerts associated with the incident have recovered, incident auto-recovers
Not Recovered	At least one alert associated with the incident hasn’t recovered, incident remains unrecovered

Incident auto-recovery leads to automatic closure (of processing progress); but manually closing an incident has no effect on incident status.

Incident Labels

Labels are a fundamental concept in Flashduty On-call. Different labels describe alert and incident information across various dimensions, and are extensively used in filtering, searching, and grouping scenarios.

Label Generation Rules

Alert labels are extracted from event messages reported by the original monitoring system. Different sources have different extraction methods, but generally we follow the capture everything relevant principle. For example, for Prometheus-sourced alert events, Flashduty On-call extracts Labels and Annotations information from the Payload.

Labels can only be obtained through event reporting, cannot be manually modified or added
Auto-triggered incident labels always equal the labels of the first associated alert
Manually triggered incident labels are always empty

Flashduty On-call provides label enhancement for automatic label generation. Go to Configure Label Enhancement to learn more.

Incident Lifecycle

Trigger New Incident

Incidents can be triggered in the following ways:

Auto-trigger: Flashduty On-call receives an alert event from an integration (like Zabbix notification), event auto-triggers an alert, alert auto-triggers an incident
Manual trigger: Click Create Incident button in Flashduty On-call console, fill in title, description, severity, etc. to trigger a new incident

Assignment and Notification

After a new incident triggers, Flashduty On-call sequentially matches escalation rules under the channel. After matching an escalation rule, the system assigns the incident to individuals, team members, or on-call personnel and sends notifications.

If no escalation rule matches, the incident won’t be assigned to anyone and no notifications will be sent.

You can set different escalation rules for different time periods or incident types to achieve flexible assignment. The system allows you to set multiple levels within an escalation rule. If current level responders don’t acknowledge and resolve the incident within the specified time, the system automatically escalates to the next level.You can flexibly arrange notification methods in escalation rules. Flashduty On-call supports many group chat and direct message notification channels. Direct messages are one-to-one push channels (like voice, SMS, email), group chats push messages to messaging groups (like Feishu/Lark, Dingtalk, Slack) with additional mentions for assignees.

If you assign an incident to a schedule with no one on-call (empty schedule), the system won’t send notifications to individuals, but if you’ve configured group chat channels, messages will still be pushed to those groups.

Acknowledge and Resolve

On-call personnel can acknowledge immediately upon receiving notification. You can acknowledge incidents via voice calls or instant messages. After acknowledgment, incident progress changes to Processing.

Flashduty On-call currently doesn’t restrict incidents to only be acknowledged by “assigned responders”. Anyone who sees the incident can acknowledge it.

Closing an incident changes progress to Closed. If alerts associated with the incident auto-recover, the incident also auto-closes. Conversely, if you manually close an incident, all associated alerts are automatically closed. This means these alerts will no longer merge new events.

Incident Timeline

Every incident has a timeline for tracing changes and actions at different historical moments. For example, at what time, through what channel, who was notified, and notification results.

Triggering Incidents

Trigger via Integration

Flashduty On-call supports most common monitoring systems, including Prometheus, Zabbix, Nightingale, and cloud monitoring. Go to Alert Integration for specific steps.

Flashduty On-call supports dedicated and shared integration modes:

Dedicated Integration: Deliver alerts to a channel’s dedicated integration, incidents trigger within that channel
Shared Integration: Deliver alerts to Integration Center’s shared integration, then configure routing to deliver alerts to different channels by rules

Trigger via API

Flashduty On-call provides a custom event standard, allowing you to report alerts via standard protocol, suitable for any non-integrated monitoring system. For details, read Custom Alert Events.

To ensure system stability, Flashduty On-call has a 200qps rate limit for API reporting. Exceeding this limit will reject reports.

Please ensure you actively close alerts, or set incident auto-close timeout in the channel. Too many incidents will severely degrade console search performance. The system may close historical incidents without notification.

Trigger via Email

Flashduty On-call provides email integration, allowing you to report alerts by sending emails, suitable for all monitoring systems supporting email alerts. For details, read Email Integration Guide.

You can set specific email prefixes for each integration. You can also contact us to set a memorable dedicated domain for your account. For example, order-service@tesla.flashcat.cloud.

Trigger via Console

Click Create button in console to create an incident.

Field	Required	Description
Incident Title	Yes	One sentence describing what happened
Severity	Yes	Choose Critical, Warning, or Info
Channel	Yes	Incident ownership; not needed if creating within a channel
Assignment Method	Yes	By Policy: Select a channel policy for assignment Direct: Select individuals or schedules for assignment
Incident Description	No	Detailed description, supports Markdown

Understanding Noise Reduction

Understand the relationship between events, alerts, incidents and noise reduction principles

Search and View Incidents

Master incident list and details page usage

Handle and Update Incidents

Learn how to acknowledge, snooze, close incidents

Escalate and Reassign Incidents

Learn how to reassign and escalate

Quick Start

Channels

Incident Management

Status Page

Analytics

Integration Settings

Configuration

Advanced

Integrations

Incidents, Alerts, and Events

Incident Severity, Status, and Progress

Severity

Processing Progress

Incident Status

Incident Labels

Label Generation Rules

Incident Lifecycle

Incident Timeline

Triggering Incidents

Trigger via Integration

Trigger via API

Trigger via Email

Trigger via Console

Understanding Noise Reduction

Search and View Incidents

Handle and Update Incidents

Escalate and Reassign Incidents

Quick Start

Channels

Incident Management

Status Page

Analytics

Integration Settings

Configuration

Advanced

Integrations

​Incidents, Alerts, and Events

​Incident Severity, Status, and Progress

​Severity

​Processing Progress

​Incident Status

​Incident Labels

​Label Generation Rules

​Incident Lifecycle

​Incident Timeline

​Triggering Incidents

​Trigger via Integration

​Trigger via API

​Trigger via Email

​Trigger via Console

​Related Topics

Understanding Noise Reduction

Search and View Incidents

Handle and Update Incidents

Escalate and Reassign Incidents

Incidents, Alerts, and Events

Incident Severity, Status, and Progress

Severity

Processing Progress

Incident Status

Incident Labels

Label Generation Rules

Incident Lifecycle

Incident Timeline

Triggering Incidents

Trigger via Integration

Trigger via API

Trigger via Email

Trigger via Console

Related Topics