Flashduty Docs
中文EnglishRoadmapAPI官网控制台
中文EnglishRoadmapAPI官网控制台
  1. Incidents
  • Getting Started
    • Introduction
    • Quick start
    • FAQ
    • Product Comparison
  • Incidents
    • What is an Incident
    • View Incidents
    • Handle Incidents
    • Escalations and Assignments
    • Custom Fields
    • Custom Actions
    • Alert Noise Reduction
    • Past Incidents
    • Outlier Incidents
  • Configure Flashduty
    • Channels
    • Integrate Alerts
    • Alert Noise Reduction
    • Escalation Rules
    • Label Enrichment
    • Schedules
    • Templates
    • Service Calendars
    • Preferences
    • Alert Routing
    • Silence and Inhibition
    • Filters
    • Notification Bots
    • Alert Pipeline
  • Platform
    • Teams and Members
    • Permissions
    • Single Sign-On
    • Insights
  • Advanced Features
    • Referencing Variables
    • Dynamic Assignment
  • Integrations
    • Alerts integration
      • Standard Alert Integration
      • Email Integration
      • Nightingale/FlashCat Integration
      • Prometheus Integration
      • Grafana Integration
      • Zabbix Integration
      • Uptime Kuma Integration
      • Alibaba Cloud ARMS Integration
      • Alibaba Cloud Monitor CM Event Integration
      • Alibaba Cloud Monitor CM Metrics Integration
      • Alibaba Cloud SLS Integration
      • AWS CloudWatch Integration
      • Azure Monitor Integration
      • Baidu Cloud BCM Integration
      • Huawei Cloud CES Integration
      • Influxdata Integration
      • Open Falcon Integration
      • PagerDuty Integration
      • Tencent BlueKing Integration
      • Tencent Cloud CLS Integration
      • Tencent Cloud Monitor CM Integration
      • Tencent Cloud EventBridge
      • OceanBase Integration
      • Graylog Integration
      • Skywalking Integration
      • Sentry Integration
      • Jiankongbao Integration
      • AWS EventBridge Integration
      • Dynatrace Integration
      • Huawei Cloud LTS Integration
      • GCP Integration
      • Splunk Alert Events Integration
      • AppDynamics Alert Integration
      • SolarWinds Alert Events Integration
      • Volcengine CM Alert Events Integration
      • Volcengine CM Event Center Integration
      • Volcengine TLS Integration
      • OpManager Integration
      • Meraki Integration
      • Keep Integration
      • ElastAlert2 Alert Integration
      • StateCloud Alert Events
      • Guance Alert Events
      • Zilliz Alert Events
      • Huawei Cloud APM Alerts
      • zstack integration
    • Change integration
      • Standard Change Event
      • Jira Issue Events
    • IM integration
      • Feishu (Lark) Integration Guide
      • Dingtalk Integration
      • WeCom Integration
      • Slack Integration
      • Microsoft Teams Integration
    • Single Sign-On
      • Authing Integration
      • Keycloak Guide
      • OpenLDAP Guide
    • Webhooks
      • Alert webhook
      • Incident webhook
      • Costom action
  • Terms
    • Terms of Service
    • User Agreement/Privary Policy
    • SLA
    • Data Security
  1. Incidents

What is an Incident

An incident represents an ongoing issue or a matter that needs attention. Incidents are typically triggered by alerts and often associate with a series of similar alerts.

Incidents, Alerts, and Events#


When Flashduty receives an alert event (such as a Zabbix notification), the system automatically triggers an alert, which in turn triggers an incident. Multiple similar active alerts may be grouped into a single incident for unified assignment, notification, and handling.
Simply put: an incident is a combination of similar alerts. Without noise reduction, an incident equals a single alert. Conversely, with noise reduction enabled, an incident equals multiple associated alerts. To learn more about alert noise reduction models, please read Understanding Noise Reduction.

Incident Severity, Status, and Progress#


Severity#

Info: Minor issues where services remain operational; serves as a status reminder with no immediate action required.
Warning: Issues that may indicate potential problems or impending issues; early intervention recommended to prevent escalation.
Critical: Severe issues causing widespread service disruption or outages affecting users; immediate intervention required.
Incidents, alerts, and events all use these three severity levels. Severity levels are capitalized, which is important when using the API. The severity determination rules are as follows:
Event Severity: Different integrations (like Zabbix and Nightingale) have their own severity enumerations, which Flashduty maps to these three standard levels. For specific mapping relationships, please refer to the integration documentation, or Alert Processing Pipeline for custom severity levels.
Alert Severity: Equals the highest severity level among associated events.
Incident Severity: Equals the highest severity level among associated alerts.

Progress#

Pending: Default status when an incident is triggered; system initiates automatic assignment, sets responders, and sends notifications.
In Progress: Status changes when any responder clicks acknowledge. Responders may be in either acknowledged or unacknowledged states, but at least one must be "acknowledged". Returns to "Pending" if all responders un-acknowledge.
Closed: Status changes when any responder clicks close the incident or when the incident auto-resolves.

Status#

Alert status represents the incident's state in the original monitoring system: "resolved" or "unresolved". An incident's status is determined entirely by its associated alerts.
Resolved: All associated alerts are resolved; incident automatically resolves.
Unresolved: At least one associated alert remains unresolved.
💡
Automatic incident resolution leads to automatic closure (progress status), but manually closing an incident doesn't affect its resolution status.

Incident Labels#


Labels are a fundamental concept in Flashduty, describing alert and incident information across different dimensions, used extensively for filtering, searching, and grouping.
How are labels generated?
Alert labels are extracted from event message bodies reported by the original alert system. Different sources have different extraction methods, following a maximum extraction principle. For example, for Prometheus alerts, Flashduty extracts Labels and Annotations information from the Payload.
Labels can only be obtained through event reporting, not manual modification or addition. An automatically triggered incident's labels always equal those of its first associated alert. A manually triggered incident always has empty labels.
Flashduty provides label enrichment options for automatic label generation. Learn more at Configure Label Enrichment.

Incident Lifecycle#


1. Triggering New Incidents
Automatic Trigger: When Flashduty receives an integration alert event (like Zabbix notification), it automatically triggers an alert, which triggers an incident.
Manual Trigger: Manually create an incident through the Flashduty console by clicking Create Incident and filling in title, description, severity, etc.
2. Assignment and Notification
After triggering, Flashduty matches escalation rules within the channel. Upon matching, the system assigns the incident to individuals, team members, or on-call personnel and sends notifications. Without matching escalation rules, the incident won't be assigned or generate notifications.
You can set different escalation rules for different time periods or incident types for flexible assignment. Multiple escalation levels can be set within one rule. If current level responders don't confirm and handle the incident within the specified time, the system automatically escalates to the next level.
You can flexibly arrange notification methods in escalation rules. Flashduty supports numerous group and individual notification channels. Individual channels are one-on-one (voice, SMS, email), while group channels push messages to groups (Feishu/Lark, Dingtalk, Slack) with additional responder notifications.
💡
Note: Incidents only generate notifications after assignment. No assignment means no notifications.
If you assign an incident to a schedule with no one On-Call (empty shift), no individual notifications will be sent, but group chat messages will still be delivered if configured.
3. Acknowledgment and Resolution
On-Call personnel can acknowledge incidents immediately upon notification through voice calls or instant messages. After acknowledgment, incident progress changes to In Progress.
💡
Currently, Flashduty doesn't restrict incident acknowledgment to "assigned responders" only. Anyone who can see the incident can acknowledge it.
Close the incident changes progress to Closed. If associated alerts auto-resolve, the incident auto-closes. Conversely, manually closing an incident auto-closes all associated alerts, preventing them from merging with new events.

Incident Timeline#


Each incident has a timeline tracking historical changes and operations. It shows when and how notifications were sent, to whom, and their results.
drawing

Triggering Incidents#


Via Integration#

Flashduty supports most common monitoring systems including Prometheus, Zabbix, Nightingale, and cloud monitoring. Visit Standard Alert Integration for specific steps.
💡
Flashduty supports dedicated and shared integration modes. Alerts delivered to channel-specific integrations trigger incidents in that channel.
Alternatively, deliver alerts to shared integrations in the integration center, then configure routing to different channels based on rules.

Via API#

Flashduty provides a custom event standard for alert reporting via standard protocol, suitable for any non-adapted monitoring system. Read Email Integration for details.
💡
For system stability, Flashduty currently limits API reporting to 200qps. Excess requests will be rejected.
💡
Ensure you actively close alerts or set automatic incident timeout closure in your channel. Too many incidents severely impact console search performance. The system may close historical incidents without notification in such cases.

Via Email#

Flashduty provides email integration for alert reporting via email, suitable for all monitoring systems supporting email notifications. Read Custom Events for details.
💡
You can set specific email prefixes for each integration. Contact us to set up a memorable custom domain for your account, like order-service@tesla.flashcat.cloud.

Via Console#

Click Create in the console to initiate incident creation.
FieldRequiredDescription
TitleYesOne-line description of what happened
SeverityYesChoose from Critical, Warning, Info
ChannelYesIncident ownership; not required if creating within a channel
AssignmentYesRule-based: Select channel escalation rules.
Direct: Select individuals or schedules
DescriptionNoDetailed incident description, supports Markdown
修改于 2024-12-11 03:02:16
上一页
Product Comparison
下一页
View Incidents
Built with