ClickHouse

Monitors supports using standard SQL syntax to query ClickHouse and trigger alerts based on query results.

Prerequisites

Config Item	Description
Query Language	Uses ClickHouse SQL syntax
Field Processing	All field names are automatically converted to lowercase; please use lowercase letters when configuring
Type Conversion	Recommended to use `toString()`, `toFloat64()` and other functions to convert complex types

1. Threshold Evaluation Mode

This mode is suitable for scenarios requiring threshold comparison on aggregated values.

Configuration

Query Statement: Write SQL aggregate query, returning value columns and (optional) label columns.

Example: Count error log quantity by service in the last 5 minutes.

SELECT 
    service_name, 
    count(*) AS error_cnt 
FROM app_log 
WHERE timestamp > now() - INTERVAL 5 MINUTE AND level = 'error'
GROUP BY service_name

Field Mapping:

Label Fields: Fields used to distinguish different alert objects. In the above example, it’s service_name. This field can be left empty; Monitors will automatically treat all fields except value fields as label fields.
Value Fields: Numeric fields used for threshold evaluation. In the above example, it’s error_cnt.

Threshold Conditions:

Use $A.field_name to reference values.
Example: Critical: $A.error_cnt > 50, Warning: $A.error_cnt > 10.

How It Works

The engine executes the SQL query and gets the result set. It groups data by “label fields”, then extracts “value fields” values to compare against threshold expressions.

Recovery Logic

Strategy	Description
Auto Recovery	When values no longer satisfy any alert threshold, automatically generates recovery event
Specific Recovery Condition	Configure recovery expression (e.g., `$A.error_cnt < 5`)
Recovery Query	Independent SQL for recovery evaluation, supports `${label_name}` variables

2. Data Exists Mode

This mode is suitable for scenarios where filter logic is written directly in SQL.

Configuration

Query Statement: Use HAVING clause in SQL to directly filter out anomalous data.

Example: Directly query services with error count exceeding 50.

SELECT 
    service_name, 
    count(*) AS error_cnt 
FROM app_log 
WHERE timestamp > now() - INTERVAL 5 MINUTE AND level = 'error'
GROUP BY service_name
HAVING count(*) > 50

Evaluation Rules: As long as SQL query returns data, triggers alert.

Pros and Cons Analysis

Type	Description
Pros	Leverages ClickHouse’s powerful OLAP capabilities for computation and filtering, with excellent performance
Cons	Cannot differentiate multi-level alerts

Recovery Logic

Recovery When Data Disappears: When SQL query result is empty, determines recovery
Recovery Query: Supports configuring additional query statements to assist in determining recovery status

3. No Data Mode

This mode is used to monitor scenarios where “data is expected but actually missing”.

Configuration

Query Statement: Write a SQL query that is expected to continuously return data.

Example: Query heartbeat reports from all probes.

SELECT probe_id, max(timestamp) as last_seen
FROM probe_heartbeat
WHERE timestamp > now() - INTERVAL 5 MINUTE
GROUP BY probe_id

Evaluation Rules: If a probe_id appeared in previous cycles but cannot be found in current and N consecutive cycles, triggers “No Data” alert.

4. Best Practices

Type Conversion

ClickHouse drivers may return formats unrecognizable by the engine when processing complex types. Recommend explicit conversion in SELECT clause:

toString(uuid)
toFloat64(avg_duration)

Time Filtering

ClickHouse is very sensitive to time partitions. Always include time range filtering in WHERE clause to utilize indexes:

timestamp > now() - INTERVAL 5 MINUTE
timestamp > toDateTime(now()) - 300

Field Case

Monitors engine converts column names returned by ClickHouse to lowercase. When filling in “label fields” and “value fields”, always use lowercase letters.

Quick Start

Alert Rules

FAQ

Prerequisites

1. Threshold Evaluation Mode

Configuration

How It Works

Recovery Logic

2. Data Exists Mode

Configuration

Pros and Cons Analysis

Recovery Logic

3. No Data Mode

Configuration

4. Best Practices

Quick Start

Alert Rules

FAQ

​Prerequisites

​1. Threshold Evaluation Mode

​Configuration

​How It Works

​Recovery Logic

​2. Data Exists Mode

​Configuration

​Pros and Cons Analysis

​Recovery Logic

​3. No Data Mode

​Configuration

​4. Best Practices

Prerequisites

1. Threshold Evaluation Mode

Configuration

How It Works

Recovery Logic

2. Data Exists Mode

Configuration

Pros and Cons Analysis

Recovery Logic

3. No Data Mode

Configuration

4. Best Practices