Skip to main content

Incidents, Alerts, and Events

When Flashduty On-call receives an alert event (such as a Zabbix notification), the system automatically triggers an alert, which in turn triggers an incident. Multiple similar active alerts may be grouped into the same incident for unified assignment, notification, and handling. Simply put: An incident is a combination of similar alerts. Without noise reduction, an incident equals an alert. Conversely, with noise reduction, an incident equals its associated multiple alerts. For more about alert noise reduction, read Understanding Noise Reduction.

Incident Severity, Status, and Progress

Severity

LevelDescription
InfoMinor, service is still running normally, just a status reminder, no immediate action needed
WarningWarning, service may have errors or problems are imminent, should intervene early to prevent escalation
CriticalCritical, widespread service errors or outages, users affected, must take immediate action
Incidents, alerts, and events all use these three severity levels. Severity is capitalized, which is important when using APIs. The severity generation rules are:
  • Event Severity: Alert events from different integration sources (like Zabbix and Nightingale) have different severity enumerations. Flashduty On-call maps them to these three standard severities according to specific rules. For mapping details, refer to the specific integration documentation. To customize severity, see Alert Processing.
  • Alert Severity: Equals the highest severity among associated events.
  • Incident Severity: Equals the highest severity among associated alerts.

Processing Progress

StatusDescription
TriggeredAfter incident triggers, progress defaults to “Triggered”, system initiates automatic assignment, sets responders and sends notifications
ProcessingWhen anyone clicks Acknowledge, progress immediately changes to “Processing”. In this state, responders may be Acknowledged or Unacknowledged, but at least one person is “Acknowledged”. When all responders unacknowledge, progress reverts to “Triggered”
ClosedWhen anyone clicks Close or incident auto-recovers, progress immediately changes to “Closed”

Incident Status

Alert status represents the incident’s state in the original monitoring system, i.e., “Recovered” or “Not Recovered”. Incident status is completely determined by its associated alerts.
StatusDescription
RecoveredAll alerts associated with the incident have recovered, incident auto-recovers
Not RecoveredAt least one alert associated with the incident hasn’t recovered, incident remains unrecovered
Incident auto-recovery leads to automatic closure (of processing progress); but manually closing an incident has no effect on incident status.

Incident Labels

Labels are a fundamental concept in Flashduty On-call. Different labels describe alert and incident information across various dimensions, and are extensively used in filtering, searching, and grouping scenarios.

Label Generation Rules

Alert labels are extracted from event messages reported by the original monitoring system. Different sources have different extraction methods, but generally we follow the capture everything relevant principle. For example, for Prometheus-sourced alert events, Flashduty On-call extracts Labels and Annotations information from the Payload.
  • Labels can only be obtained through event reporting, cannot be manually modified or added
  • Auto-triggered incident labels always equal the labels of the first associated alert
  • Manually triggered incident labels are always empty
Flashduty On-call provides label enhancement for automatic label generation. Go to Configure Label Enhancement to learn more.

Incident Lifecycle

1

Trigger New Incident

Incidents can be triggered in the following ways:
  • Auto-trigger: Flashduty On-call receives an alert event from an integration (like Zabbix notification), event auto-triggers an alert, alert auto-triggers an incident
  • Manual trigger: Click Create Incident button in Flashduty On-call console, fill in title, description, severity, etc. to trigger a new incident
2

Assignment and Notification

After a new incident triggers, Flashduty On-call sequentially matches escalation rules under the channel. After matching an escalation rule, the system assigns the incident to individuals, team members, or on-call personnel and sends notifications.
If no escalation rule matches, the incident won’t be assigned to anyone and no notifications will be sent.
You can set different escalation rules for different time periods or incident types to achieve flexible assignment. The system allows you to set multiple levels within an escalation rule. If current level responders don’t acknowledge and resolve the incident within the specified time, the system automatically escalates to the next level.You can flexibly arrange notification methods in escalation rules. Flashduty On-call supports many group chat and direct message notification channels. Direct messages are one-to-one push channels (like voice, SMS, email), group chats push messages to messaging groups (like Feishu/Lark, Dingtalk, Slack) with additional mentions for assignees.
If you assign an incident to a schedule with no one on-call (empty schedule), the system won’t send notifications to individuals, but if you’ve configured group chat channels, messages will still be pushed to those groups.
3

Acknowledge and Resolve

On-call personnel can acknowledge immediately upon receiving notification. You can acknowledge incidents via voice calls or instant messages. After acknowledgment, incident progress changes to Processing.
Flashduty On-call currently doesn’t restrict incidents to only be acknowledged by “assigned responders”. Anyone who sees the incident can acknowledge it.
Closing an incident changes progress to Closed. If alerts associated with the incident auto-recover, the incident also auto-closes. Conversely, if you manually close an incident, all associated alerts are automatically closed. This means these alerts will no longer merge new events.

Incident Timeline

Every incident has a timeline for tracing changes and actions at different historical moments. For example, at what time, through what channel, who was notified, and notification results.
Incident Timeline

Triggering Incidents

Trigger via Integration

Flashduty On-call supports most common monitoring systems, including Prometheus, Zabbix, Nightingale, and cloud monitoring. Go to Alert Integration for specific steps.
Flashduty On-call supports dedicated and shared integration modes:
  • Dedicated Integration: Deliver alerts to a channel’s dedicated integration, incidents trigger within that channel
  • Shared Integration: Deliver alerts to Integration Center’s shared integration, then configure routing to deliver alerts to different channels by rules

Trigger via API

Flashduty On-call provides a custom event standard, allowing you to report alerts via standard protocol, suitable for any non-integrated monitoring system. For details, read Custom Alert Events.
To ensure system stability, Flashduty On-call has a 200qps rate limit for API reporting. Exceeding this limit will reject reports.
Please ensure you actively close alerts, or set incident auto-close timeout in the channel. Too many incidents will severely degrade console search performance. The system may close historical incidents without notification.

Trigger via Email

Flashduty On-call provides email integration, allowing you to report alerts by sending emails, suitable for all monitoring systems supporting email alerts. For details, read Email Integration Guide.
You can set specific email prefixes for each integration. You can also contact us to set a memorable dedicated domain for your account. For example, order-service@tesla.flashcat.cloud.

Trigger via Console

Click Create button in console to create an incident.
FieldRequiredDescription
Incident TitleYesOne sentence describing what happened
SeverityYesChoose Critical, Warning, or Info
ChannelYesIncident ownership; not needed if creating within a channel
Assignment MethodYesBy Policy: Select a channel policy for assignment
Direct: Select individuals or schedules for assignment
Incident DescriptionNoDetailed description, supports Markdown