Alert Testing¶
You write alert rules, deploy them, and hope they work. But common issues slip through:
a for: 5m alert that never fires because the metric only breaches for 3 minutes, a gap-fill
rule that triggers false positives during scrape outages, or a compound alert where two metrics
never overlap. Sonda lets you generate the exact metric shapes to trigger (and resolve) alerts
on demand, so you can catch these problems before they hit production.
Threshold Alerts¶
The most basic alert test: verify that a HighCPU rule fires when cpu_usage > 90.
The sine generator produces a smooth wave that crosses your threshold predictably.
With amplitude=50 and offset=50, it oscillates between 0 and 100, crossing 90 for
about 12 seconds per 60-second cycle.
name: cpu_usage
rate: 1
duration: 180s
generator:
type: sine
amplitude: 50.0
period_secs: 60
offset: 50.0
labels:
instance: server-01
job: node
encoder:
type: prometheus_text
sink:
type: stdout
The metric crosses 90 around tick 9, stays above until tick 21, then drops -- giving you roughly 12 seconds above threshold per cycle.
Sine wave math
The formula is: value = offset + amplitude * sin(2 * pi * tick / period_ticks)
With amplitude=50 and offset=50, the threshold at 90 is crossed when sin(x) > 0.8:
sin(x) = 0.8atx = arcsin(0.8) = 0.927 radians- The sine exceeds 0.8 from
x = 0.927tox = pi - 0.927 = 2.214 - That's
1.287 / 6.283 = 20.5%of each cycle - With a 60-second period: ~12.3 seconds above 90 per cycle
| Tick (sec) | sin(2pit/60) | Value | Above 90? |
|---|---|---|---|
| 0 | 0.000 | 50.0 | No |
| 5 | 0.500 | 75.0 | No |
| 10 | 0.866 | 93.3 | Yes |
| 15 | 1.000 | 100.0 | Yes |
| 20 | 0.866 | 93.3 | Yes |
| 25 | 0.500 | 75.0 | No |
Smooth waves are great for threshold crossings, but Prometheus for: clauses need precise timing control.
Testing for: Duration Behavior¶
Prometheus alerts with a for: clause require the condition to be true for a continuous
duration before firing. You need exact control over how long the metric stays above threshold.
Sequence generator¶
The sequence generator steps through an explicit
list of values, one per tick. Each value lasts exactly 1/rate seconds:
name: cpu_usage
rate: 1
duration: 80s
generator:
type: sequence
values: [10, 10, 10, 10, 10, 95, 95, 95, 95, 95, 10, 10, 10, 10, 10, 10]
repeat: true
labels:
instance: server-01
job: node
encoder:
type: prometheus_text
sink:
type: stdout
At rate: 1, ticks 5-9 are above 90 (5 seconds continuous), then the pattern repeats.
Adjust the number of above-threshold values to match your for: duration.
Matching your for: duration
For a for: 5m alert, you need 300 consecutive above-threshold values at rate: 1.
Rather than typing 300 values, use the constant generator instead (next section).
Constant generator shortcut¶
For simple sustained-threshold tests, the constant generator is more practical:
name: cpu_usage
rate: 1
duration: 360s
generator:
type: constant
value: 95.0
labels:
instance: server-01
job: node
encoder:
type: prometheus_text
sink:
type: stdout
Run this for 6 minutes to test a for: 5m alert. The value stays at 95 for the entire
duration -- the alert should fire after 5 minutes of continuous breach.
Now that you can trigger alerts, what about testing when they resolve?
Testing Alert Resolution¶
When a metric goes silent during a gap, Prometheus treats it as stale and the alert resolves. Use gap windows to control when metrics disappear.
Time: 0s 40s 60s 100s 120s
|-----------|xxxxxxxxxxx|-----------|xxxxxxxxxxx|
emit events gap (20s) emit events gap (20s)
Gaps occupy the tail of each cycle. With every: 60s and for: 20s, the gap runs from
second 40 to second 60 of each cycle.
name: cpu_usage
rate: 1
duration: 300s
generator:
type: constant
value: 95.0
gaps:
every: 60s
for: 20s
labels:
instance: server-01
job: node
encoder:
type: prometheus_text
sink:
type: stdout
The value stays at 95 (above threshold) but goes silent for 20 seconds every 60-second cycle.
The alert enters pending state during the 40-second emit window but may not reach the for:
duration before the gap resets it.
Combine with generators
Gaps work with any generator. A sine wave with periodic gaps creates a realistic "flapping service" pattern for alert testing.
With single-metric alerts covered, let's move on to compound alerts that depend on multiple metrics.
Multi-Metric Correlation¶
Production alerts often depend on more than one metric. Compound rules like
cpu_usage > 90 AND memory_usage_percent > 85 require both conditions to be true
simultaneously. Sonda supports this with phase_offset and clock_group in
multi-scenario YAML files.
| Field | Default | Description |
|---|---|---|
phase_offset |
"0s" |
Delay before this scenario starts emitting |
clock_group |
(none) | Groups scenarios under a shared timing reference |
scenarios:
- signal_type: metrics
name: cpu_usage
phase_offset: "0s"
clock_group: alert-test
generator:
type: sequence
values: [20, 20, 20, 95, 95, 95, 95, 95, 20, 20]
repeat: true
# ...
- signal_type: metrics
name: memory_usage_percent
phase_offset: "3s"
clock_group: alert-test
generator:
type: sequence
values: [40, 40, 40, 88, 88, 88, 88, 88, 40, 40]
repeat: true
# ...
Understanding the timing¶
Wall time cpu_usage (offset=0s) memory_usage (offset=3s)
-------- --------------------- ------------------------
t=0s starts: 20 sleeping
t=3s 95 (above threshold) starts: 40
t=6s 95 88 (above threshold)
t=8s 20 (drops) 88
The overlap window -- where both metrics are above threshold -- runs from t=6s to t=8s
(2 seconds per cycle). For a for: 5m alert, extend the above-threshold sequences or use
constant generators with a longer duration.
See Example Scenarios for the full multi-metric-correlation.yaml file.
With local testing covered, let's push metrics to a real backend.
Pushing to a Backend¶
The fastest path to end-to-end alert testing: push metrics into a TSDB and verify alerts fire.
POST metrics directly to VictoriaMetrics using the Prometheus text import API. No scenario file needed:
sonda metrics --name cpu_usage --rate 1 --duration 180s --value 95 \
--label instance=server-01 --label job=node \
--sink http_push --endpoint http://localhost:8428/api/v1/import/prometheus \
--content-type "text/plain"
Or use a scenario file for generators that require YAML (sequence, csv_replay):
Use the Prometheus remote write protocol for native compatibility with any remote-write receiver. No scenario file needed:
sonda metrics --name cpu_usage --rate 1 --duration 180s --value 95 \
--label instance=server-01 --label job=node \
--encoder remote_write \
--sink remote_write --endpoint http://localhost:8428/api/v1/write
Or use a scenario file:
Remote write requires a feature flag when building from source
Pre-built binaries and Docker images include remote-write support. When building from
source, add --features remote-write: cargo build --features remote-write -p sonda.
| Target | URL |
|---|---|
| VictoriaMetrics | http://localhost:8428/api/v1/write |
| vmagent | http://localhost:8429/api/v1/write |
| Prometheus | http://localhost:9090/api/v1/write |
| Thanos Receive | http://localhost:19291/api/v1/receive |
| Cortex/Mimir | http://localhost:9009/api/v1/push |
Verify data arrived¶
# Check that the series exists
curl "http://localhost:8428/api/v1/series?match[]={__name__='cpu_usage'}"
# Query the latest value
curl "http://localhost:8428/api/v1/query?query=cpu_usage"
Docker Compose stack required
These push scenarios require a running backend. Start the included stack with:
docker compose -f examples/docker-compose-victoriametrics.yml up -d.
See Docker Deployment for details.
You can also use the pull model instead of pushing.
Scrape-Based Integration¶
If you prefer the Prometheus pull model, sonda-server exposes a scrape endpoint for each running scenario.
Start sonda-server and submit a scenario:
cargo run -p sonda-server -- --port 8080
# In another terminal:
curl -X POST -H "Content-Type: text/yaml" \
--data-binary @examples/sine-threshold-test.yaml \
http://localhost:8080/scenarios
The response includes a scenario ID. Configure Prometheus to scrape it:
scrape_configs:
- job_name: sonda
scrape_interval: 15s
static_configs:
- targets: ['localhost:8080']
metrics_path: /scenarios/<scenario-id>/metrics
Scrape path
Replace <scenario-id> with the UUID returned from POST /scenarios. Each running
scenario exposes its own metrics endpoint.
See Server API for the full API reference.
Beyond simple threshold alerts, Sonda can also test cardinality explosions and replay real incidents.
Cardinality Explosion Alerts¶
Many monitoring stacks alert when series cardinality crosses a threshold (e.g.,
count(up) > 10000). Sonda's cardinality spikes
generate a controlled burst of unique label values to verify your cardinality-limiting
rules fire correctly.
cardinality_spikes:
- label: pod_name
every: 30s
for: 10s
cardinality: 500
strategy: counter
prefix: "pod-"
During the 10-second spike window, each tick injects a pod_name label with one of 500 unique
values (pod-0 through pod-499), producing 500 distinct series. Outside the window the label
is absent and only one series is emitted. This on/off pattern lets you verify that alerts fire
during the spike and resolve after it ends.
Docker stack required
This example pushes to VictoriaMetrics via http_push. Start the backend first:
docker compose -f examples/docker-compose-victoriametrics.yml up -d
For testing with recorded production data instead of synthetic patterns, use the replay generators.
Replay Generators¶
Sequence generator¶
The sequence generator steps through an explicit list of values, perfect for hand-crafted threshold patterns:
generator:
type: sequence
values: [10, 10, 10, 10, 10, 95, 95, 95, 95, 95, 10, 10, 10, 10, 10, 10]
repeat: true
With repeat: true, the pattern loops continuously. With repeat: false, the generator
holds the last value after the sequence ends.
CSV replay generator¶
For replaying real production data, the csv_replay generator reads values from a CSV file:
generator:
type: csv_replay
file: examples/sample-cpu-values.csv
column: 1
has_header: true
repeat: true
| Parameter | Default | Description |
|---|---|---|
file |
(required) | Path to the CSV file |
column |
0 |
Zero-based column index containing numeric values. Mutually exclusive with columns. |
columns |
-- | Multi-column mode — list of {index, name} entries. Expands into one metric stream per entry. Mutually exclusive with column. See Generators. |
has_header |
true |
Whether the first row is a header |
repeat |
true |
Cycle back to the first value after reaching the end |
When to use csv_replay vs sequence
Use csv_replay over sequence when you have more than ~20 values. It keeps the YAML
clean and makes it easy to update the data by replacing the CSV file.
Exporting values from VictoriaMetrics
Full Docker Compose Example¶
For a complete end-to-end setup with VictoriaMetrics, Grafana, and sonda-server, use the included Docker Compose stack.
# Start the stack
docker compose -f examples/docker-compose-victoriametrics.yml up -d
# Push test data
sonda metrics --scenario examples/vm-push-scenario.yaml
# Verify the metric exists (wait ~15s for ingestion)
curl "http://localhost:8428/api/v1/query?query=cpu_usage"
# Open Grafana at http://localhost:3000
# Navigate to Dashboards > Sonda > Sonda Overview
# Tear down
docker compose -f examples/docker-compose-victoriametrics.yml down -v
| Service | Port | Purpose |
|---|---|---|
| sonda-server | 8080 | REST API for scenario management |
| VictoriaMetrics | 8428 | Time series database |
| vmagent | 8429 | Metrics relay agent |
| Grafana | 3000 | Dashboards (auto-provisioned) |
See Docker Deployment for the full stack configuration.
Close the loop with Alertmanager
This stack verifies that data arrives in VictoriaMetrics, but doesn't prove alerts fire. To add vmalert, Alertmanager, and a webhook receiver to the stack, see the Alerting Pipeline guide.
Quick Reference¶
| Scenario | Generator | Example File |
|---|---|---|
| Threshold crossing | sine |
sine-threshold-test.yaml |
| Sustained breach | constant |
constant-threshold-test.yaml |
| Alert resolution via gap | constant + gaps |
gap-alert-test.yaml |
Precise for: duration |
sequence |
for-duration-test.yaml |
| Compound alert | multi-scenario | multi-metric-correlation.yaml |
| Cardinality explosion | any + cardinality_spikes |
cardinality-alert-test.yaml |
| Periodic spike / anomaly | spike |
spike-alert-test.yaml |
| Incident replay (inline) | sequence |
sequence-alert-test.yaml |
| Incident replay (file) | csv_replay |
csv-replay-metrics.yaml |
| Push to VictoriaMetrics | any | vm-push-scenario.yaml |
| Remote write | any | remote-write-vm.yaml |
Next Steps¶
Verifying alerts fire end-to-end? See Alerting Pipeline to run vmalert, Alertmanager, and a webhook receiver with Docker Compose.
Validating alert rules in CI? See CI Alert Validation to catch broken rules before they reach production.
Validating a pipeline change? See Pipeline Validation.
Verifying recording rules? Check Recording Rules.
Browsing all example scenarios? See Example Scenarios.