SLS (Alibaba Cloud Log Service) Alert Rule Configuration

This document provides detailed instructions on configuring alert rules for Alibaba Cloud Log Service (SLS) data sources in Monitors. Monitors retrieves data through the SLS SQL query interface (GetLogsV3) and triggers alerts based on query results.

Core concepts

Query language: Uses SLS SQL syntax.

Required parameters: Each query must specify sls.project and sls.logstore parameters.

Time range: The SLS query time range is controlled by API parameters (configured via sls.timespan). You do not need to write WHERE __time__ > ... in the SQL statement.

Field handling: By default, __source__ and __time__ fields are ignored (unless explicitly specified as value fields).

1. Threshold mode

This mode is suitable for scenarios requiring threshold comparisons on aggregated values.

Configuration

Query: Write a SLS SQL aggregation query.

Example: Count error logs per host in the last 15 minutes.

Query parameters:

sls.project: (Required) Project name.

sls.logstore: (Required) Logstore name.

sls.timespan.value: (Optional) Time span value, defaults to 15.

sls.timespan.unit: (Optional) Time span unit, supports s (seconds), m (minutes), h (hours), d (days). Defaults to m.

Field mapping:

Label fields: Fields used to distinguish different alert objects. In the example above, this is host. This field can be left empty, and Monitors will automatically treat all fields except value fields as label fields.

Value fields: Numeric fields used for threshold evaluation. In the example above, this is error_cnt.

Threshold conditions:

Use $A.field_name to reference values.

Example: Critical: $A.error_cnt > 50, Warning: $A.error_cnt > 10.

How it works

The engine calls the SLS API with a specified time range (e.g., last 15 minutes) and executes the SQL query. After retrieving results, it groups by label fields and compares value fields against thresholds.

Recovery logic

Auto recovery: Automatically recovers when the latest query result values no longer meet any alert threshold.

Specific recovery conditions: Configure additional recovery expressions (e.g., $A.error_cnt < 5).

Recovery query:

Supports configuring an independent SQL statement for recovery evaluation.

Supports ${label_name} variable substitution.

Example: The alert SQL found that network card with network_host="a", interface="b" is down. The recovery SQL can be:

The engine replaces ${network_host} and ${interface} with actual values before executing the query. If data is found, recovery is confirmed.

2. Data exists mode

This mode is suitable for scenarios where filtering logic is written directly in SQL.

Configuration

Query: Use a HAVING clause to filter anomalous data.

Example: Query hosts with more than 50 errors.

Query parameters: Same as above, requires sls.project and sls.logstore.

Evaluation rule: An alert is triggered as soon as the query returns data.

Pros and cons

Pros: Leverages SLS server-side computing power, reducing data transmission.

Cons: Cannot distinguish between multiple severity levels.

Recovery logic

Data disappearance means recovery: Recovery is confirmed when the query result is empty.

Recovery query: Supports configuring additional query statements.

3. No data mode

This mode monitors scenarios where data is expected but actually missing.

Configuration

Query: Write a query that should continuously return data.

Example: Query log reporting heartbeats from all hosts.

Evaluation rule: If a host appeared in previous cycles but cannot be found in the current and N consecutive cycles, a "no data" alert is triggered.

4. Advanced configuration and best practices

Power SQL

If you need to use SLS enhanced SQL syntax, add the following to query parameters:

sls.powersql: true

Time range control

By default, data from the last 15 minutes is queried. Adjust using parameters:

sls.timespan.value: 60

sls.timespan.unit: m

Note: Do not use __time__ for filtering in SQL unless you have special requirements. The engine automatically sets the API request's from and to timestamps based on the above parameters.

Debug parameters

If you need to debug data for a specific time period, use the following parameters (typically for debugging only, do not configure in production rules):

sls.from: Start timestamp (seconds).

sls.to: End timestamp (seconds).

Aliyun SLS

Core concepts#

1. Threshold mode#

Configuration#

How it works#

Recovery logic#

2. Data exists mode#

Configuration#

Pros and cons#

Recovery logic#

3. No data mode#

Configuration#

4. Advanced configuration and best practices#

Power SQL#

Time range control#

Debug parameters#

Core concepts

1. Threshold mode

Configuration

How it works

Recovery logic

2. Data exists mode

Configuration

Pros and cons

Recovery logic

3. No data mode

Configuration

4. Advanced configuration and best practices

Power SQL

Time range control

Debug parameters