Prerequisites
| Config Item | Description |
|---|---|
| Query Language | Uses ClickHouse SQL syntax |
| Field Processing | All field names are automatically converted to lowercase; please use lowercase letters when configuring |
| Type Conversion | Recommended to use toString(), toFloat64() and other functions to convert complex types |
1. Threshold Evaluation Mode
This mode is suitable for scenarios requiring threshold comparison on aggregated values.Configuration
- Query Statement: Write SQL aggregate query, returning value columns and (optional) label columns.
- Example: Count error log quantity by service in the last 5 minutes.
- Field Mapping:
- Label Fields: Fields used to distinguish different alert objects. In the above example, it’s
service_name. This field can be left empty; Monitors will automatically treat all fields except value fields as label fields. - Value Fields: Numeric fields used for threshold evaluation. In the above example, it’s
error_cnt.
- Threshold Conditions:
- Use
$A.field_nameto reference values. - Example:
Critical: $A.error_cnt > 50,Warning: $A.error_cnt > 10.
How It Works
The engine executes the SQL query and gets the result set. It groups data by “label fields”, then extracts “value fields” values to compare against threshold expressions.Recovery Logic
| Strategy | Description |
|---|---|
| Auto Recovery | When values no longer satisfy any alert threshold, automatically generates recovery event |
| Specific Recovery Condition | Configure recovery expression (e.g., $A.error_cnt < 5) |
| Recovery Query | Independent SQL for recovery evaluation, supports ${label_name} variables |
2. Data Exists Mode
This mode is suitable for scenarios where filter logic is written directly in SQL.Configuration
- Query Statement: Use
HAVINGclause in SQL to directly filter out anomalous data.
- Example: Directly query services with error count exceeding 50.
- Evaluation Rules: As long as SQL query returns data, triggers alert.
Pros and Cons Analysis
| Type | Description |
|---|---|
| Pros | Leverages ClickHouse’s powerful OLAP capabilities for computation and filtering, with excellent performance |
| Cons | Cannot differentiate multi-level alerts |
Recovery Logic
- Recovery When Data Disappears: When SQL query result is empty, determines recovery
- Recovery Query: Supports configuring additional query statements to assist in determining recovery status
3. No Data Mode
This mode is used to monitor scenarios where “data is expected but actually missing”.Configuration
- Query Statement: Write a SQL query that is expected to continuously return data.
- Example: Query heartbeat reports from all probes.
- Evaluation Rules: If a
probe_idappeared in previous cycles but cannot be found in current and N consecutive cycles, triggers “No Data” alert.
4. Best Practices
Type Conversion
Type Conversion
ClickHouse drivers may return formats unrecognizable by the engine when processing complex types. Recommend explicit conversion in SELECT clause:
toString(uuid)toFloat64(avg_duration)
Time Filtering
Time Filtering
ClickHouse is very sensitive to time partitions. Always include time range filtering in
WHERE clause to utilize indexes:timestamp > now() - INTERVAL 5 MINUTEtimestamp > toDateTime(now()) - 300
Field Case
Field Case
Monitors engine converts column names returned by ClickHouse to lowercase. When filling in “label fields” and “value fields”, always use lowercase letters.