Flashduty Docs
中文English
RoadmapAPI官网控制台
中文English
RoadmapAPI官网控制台
  1. Alert Rules
  • Introduction
  • On-call
    • Getting Started
      • Quick start
      • FAQ
      • Product Comparison
    • Incidents
      • What is an Incident
      • View Incidents
      • Handle Incidents
      • Escalations and Assignments
      • Custom Fields
      • Custom Actions
      • Alert Noise Reduction
      • Past Incidents
      • Outlier Incidents
      • Status Pages
    • Configure On-call
      • Channels
      • Integrate Alerts
      • Alert Noise Reduction
      • Escalation Rules
      • Label Enrichment
      • Schedules
      • Templates
      • Service Calendars
      • Preferences
      • Alert Routing
      • Silence and Inhibition
      • Filters
      • Notifications
      • Alert Pipeline
    • Advanced Features
      • Referencing Variables
      • Dynamic Assignment
      • Insights
      • War-room
    • Integrations
      • Alerts integration
        • Standard Alert Integration
        • Email Integration
        • Nightingale/FlashCat Integration
        • Prometheus Integration
        • Grafana Integration
        • Zabbix Integration
        • Uptime Kuma Integration
        • Alibaba Cloud ARMS Integration
        • Alibaba Cloud Monitor CM Event Integration
        • Alibaba Cloud Monitor CM Metrics Integration
        • Alibaba Cloud SLS Integration
        • AWS CloudWatch Integration
        • Azure Monitor Integration
        • Baidu Cloud BCM Integration
        • Huawei Cloud CES Integration
        • Influxdata Integration
        • Open Falcon Integration
        • PagerDuty Integration
        • Tencent BlueKing Integration
        • Tencent Cloud CLS Integration
        • Tencent Cloud Monitor CM Integration
        • Tencent Cloud EventBridge
        • OceanBase Integration
        • Graylog Integration
        • Skywalking Integration
        • Sentry Integration
        • Jiankongbao Integration
        • AWS EventBridge Integration
        • Dynatrace Integration
        • Huawei Cloud LTS Integration
        • GCP Integration
        • Splunk Alert Events Integration
        • AppDynamics Alert Integration
        • SolarWinds Alert Events Integration
        • Volcengine CM Alert Events Integration
        • Volcengine CM Event Center Integration
        • Volcengine TLS Integration
        • OpManager Integration
        • Meraki Integration
        • Keep Integration
        • ElastAlert2 Alert Integration
        • StateCloud Alert Events
        • Guance Alert Events
        • Zilliz Alert Events
        • Huawei Cloud APM Alerts
        • zstack integration
        • Monit Alert Integration
        • RUM Alert Integration
      • Change integration
        • Standard Change Event
        • Jira Issue Events
      • IM integration
        • Feishu (Lark) Integration Guide
        • Dingtalk Integration
        • WeCom Integration
        • Slack Integration
        • Microsoft Teams Integration
      • Single Sign-On
        • Authing Integration
        • Keycloak Guide
        • OpenLDAP Guide
      • Webhooks
        • Alert webhook
        • Incident webhook
        • Costom action
        • ServiceNow Sync
        • Jira Sync
      • Other
        • Link Integration
  • RUM
    • Getting Started
      • Introduction
      • Quick start
      • FAQ
    • Applications
      • Applications
      • SDK Integration
      • Advanced Configuration
      • Analysis Dashboard
    • Performance Monitoring
      • Overview
      • Metrics
      • Performance Analysis
      • Performance Optimize
    • Error Tracking
      • Overview
      • Error Reporting
      • Issues
      • Source Mapping
      • Error Grouping
      • Issue States
      • Issue Alerting
    • Session Explorer
      • Overview
      • Data Query
    • Session Replay
      • View Session Replay
      • Overview
      • SDK Configuration
      • Privacy Protection
    • Best Practice
      • Distributed Tracing
    • Others
      • Terminology
      • Data Collection
      • Data Security
  • Monitors
    • Getting Started
      • Introduction
      • Quick Start
    • Alert Rules
      • Prometheus
      • ElasticSearch
      • Loki
      • ClickHouse
      • MySQL
      • Oracle
      • PostgreSQL
      • Aliyun SLS
      • VictoriaLogs
    • FAQ
      • FAQ
  • Platform
    • Teams and Members
    • Permissions
    • Single Sign-On
  • Terms
    • Terms of Service
    • User Agreement/Privary Policy
    • SLA
    • Data Security
中文English
RoadmapAPI官网控制台
中文English
RoadmapAPI官网控制台
  1. Alert Rules

Loki

This document details how to configure Loki data source alert rules in the Monitors alert engine. Monitors supports Loki's LogQL query syntax, enabling aggregation analysis of log data and alert triggering.

Core concepts#

Loki's query language LogQL is divided into two categories:
1.
Log queries: Return log line content (Stream).
2.
Metric queries: Count or aggregate logs, such as using count_over_time function to return values (Vector).

1. Threshold mode#

This mode is suitable for scenarios that require multi-level threshold evaluation (such as Info/Warning/Critical) on log aggregation values.

Configuration#

Query statement (LogQL): Write a LogQL that returns a numeric vector (select "Statistics" query mode).
Example: Count the number of logs containing the error keyword in the mysql job over the last 5 minutes.
count_over_time({job="mysql"} |= "error" [5m])
Threshold conditions:
Critical: $A > 50 (more than 50 error logs in 5 minutes)
Warning: $A > 10 (more than 10 error logs in 5 minutes)

How it works#

The engine executes the LogQL query and retrieves time series data (Vector) with labels. The engine iterates through each series, extracts the value, and compares it against the configured threshold expressions.

Recovery logic#

Automatic recovery: When the query result value falls below the threshold, it automatically recovers.
Specific recovery condition: Can be configured as $A < 5 to avoid oscillation near the threshold.
Recovery query:
Supports configuring a separate LogQL for recovery evaluation; recovery is triggered as long as data is found.
Supports ${label_name} variable substitution.
Example: Alert on error logs, recover on specific recovery logs count_over_time({job="mysql"} |= "recovered" [5m]).

2. Data exists mode#

This mode is suitable for scenarios where you prefer to write filter conditions directly in LogQL, or only care about "whether abnormal data exists". This mode is recommended for log anomaly detection alerts.

Configuration#

Query statement (LogQL): Write a LogQL containing comparison operators that returns only data meeting the conditions.
Example: Directly filter out services with error rates exceeding 5%.
count_over_time({job="ingress"} |= "error-code-500" [5m]) / count_over_time({job="ingress"} [5m]) * 100 > 5
Evaluation rule: An alert is triggered as long as the LogQL query returns data.

Pros and cons#

Pros: Computation logic is pushed down to the Loki server, reducing data transfer.
Cons: Cannot distinguish alert levels; can only trigger a single level of alert.

Recovery logic#

Recover when data disappears: When the LogQL query result is empty (i.e., no longer meets the > 5 condition), recovery is determined.
Recovery query: Supports configuring additional query statements to assist in determining recovery status.

3. No data mode#

This mode is used to monitor whether the log reporting pipeline is interrupted, or whether logs that should be continuously generated have stopped.

Configuration#

Query statement (LogQL): Write a query that should always have data.
Example: Count the log reporting rate for all hosts.
rate({job="node-logs"} [1m])
Evaluation rule: If a Series (uniquely identified by labels, such as instance="host-1") existed in previous cycles but cannot be found in the current and consecutive N cycles, a "no data" alert is triggered.

Typical applications#

Monitor whether collection agents like Promtail/Fluentd have stopped working.
Monitor whether critical business logs (such as order creation logs) have been abnormally interrupted.

4. Get original logs when alerting#

You can get original logs through related queries when alerting. However, it is usually not recommended to get too many; just get 1 as a log sample to include in the alert message.
The results of related queries can be rendered in the "Note description", example:
{{- if eq $status "firing" }}
error log count: {{ $value | printf "%.3f" }}
{{- range $x := $relates.R1}}
Loki log time: {{(nanoTime $x.Fields.__time__ 8).Format "2006-01-02T15:04:05Z07:00"}}
Loki Log line: {{$x.Fields.__log__}}
{{- end}}
{{- end}}

添加官方技术支持微信

在这里,获得使用上的任何帮助,快速上手FlashDuty

微信扫码交流
修改于 2026-01-09 03:04:47
上一页
ElasticSearch
下一页
ClickHouse
Built with