Synthetic Monitoring¶
Your dashboards look great -- until the data source goes quiet and you stare at flat lines wondering if it's a real outage or a broken scrape config. Long-running synthetic monitoring gives you a persistent baseline of known metrics flowing through your stack, so you can tell "no data" from "data stopped arriving" at a glance.
This guide walks you through deploying sonda-server on Kubernetes, submitting scenarios that
run for hours or days, scraping the generated metrics with Prometheus, and building Grafana
dashboards to monitor both the synthetic data and Sonda itself.
What you need:
- A Kubernetes cluster (local or remote)
kubectlandhelmCLI tools installedcurlandjqfor API calls- Familiarity with Prometheus scraping and Grafana dashboards
Set up a local Kubernetes cluster¶
If you already have a cluster (EKS, GKE, AKS, or an existing local one), skip to Deploy sonda-server.
For local development and testing, you need a lightweight Kubernetes distribution that runs on your workstation. Here are the most practical options:
| Tool | Best for | Runs on |
|---|---|---|
| kind | CI pipelines, fast throwaway clusters | Linux, macOS, Windows (WSL2) |
| k3d | k3s in Docker, built-in registry support | Linux, macOS, Windows (WSL2) |
| minikube | Broad driver support, add-on ecosystem | Linux, macOS, Windows (WSL2) |
| OrbStack | Native macOS experience, low resource usage | macOS only |
All four require Docker (or a compatible container runtime) installed and running.
kind runs Kubernetes nodes as Docker containers. It starts in under 30 seconds and is the lightest option.
# Install (macOS/Linux)
brew install kind
# Or download the binary directly
# https://kind.sigs.k8s.io/docs/user/quick-start/#installation
# Create a cluster
kind create cluster --name sonda-lab
# Verify
kubectl cluster-info --context kind-sonda-lab
Port mapping for kind
kind clusters don't expose container ports to the host by default. If you need NodePort access (for Prometheus or Grafana outside the cluster), create the cluster with a config:
k3d wraps k3s (Rancher's lightweight Kubernetes) inside Docker. It supports built-in port mapping and a local image registry out of the box.
minikube is the most established option. It supports Docker, Hyperkit, Hyper-V, and other drivers.
# Install (macOS/Linux)
brew install minikube
# Start with Docker driver (recommended)
minikube start --driver=docker --profile sonda-lab
# Verify
kubectl cluster-info --context sonda-lab
Windows WSL2
On Windows, install minikube inside your WSL2 distribution and use the Docker driver. Make sure Docker Desktop's WSL2 backend is enabled. The same commands apply inside the WSL2 terminal.
OrbStack provides a native macOS Kubernetes experience with minimal resource usage. It runs a single-node k8s cluster that starts automatically.
Once your cluster is running and kubectl get nodes shows a Ready node, you're set.
Deploy sonda-server¶
Sonda includes a Helm chart that deploys sonda-server as a Kubernetes Deployment with health
probes, a ClusterIP Service, and optional scenario injection via ConfigMap.
Wait for the pod to become ready:
You should see 1/1 Running within 15--20 seconds. The Deployment configures liveness and
readiness probes against GET /health, so Kubernetes restarts the pod automatically if the
server becomes unresponsive.
Customizing the deployment
Override common settings with --set:
# Pin a specific image version
helm install sonda ./helm/sonda --set image.tag=0.4.0
# Custom port and resource limits
helm install sonda ./helm/sonda \
--set server.port=9090 \
--set resources.requests.cpu=200m \
--set resources.limits.memory=512Mi
See Kubernetes deployment for the full chart reference.
Verify the server is healthy by port-forwarding to it:
Now let's submit some long-running scenarios.
Submit long-running scenarios¶
A long-running scenario is simply a scenario YAML without a duration field. It runs
indefinitely until you stop it with DELETE /scenarios/{id}.
name: continuous_cpu
rate: 10
generator:
type: sine
amplitude: 50.0
period_secs: 60
offset: 50.0
labels:
instance: api-server-01
job: sonda
encoder:
type: prometheus_text
sink:
type: stdout
Submit it to the server:
ID=$(curl -s -X POST -H "Content-Type: text/yaml" \
--data-binary @examples/long-running-metrics.yaml \
http://localhost:8080/scenarios | jq -r '.id')
echo "Scenario started: $ID"
The scenario runs in a background thread inside the server. Submit as many as you need -- each gets its own thread and scrape endpoint.
Multiple scenarios for richer coverage
Submit several scenarios with different shapes to simulate a realistic environment:
a sine wave for CPU, a step counter for requests, a constant for an up gauge.
Each scenario gets its own /scenarios/{id}/metrics endpoint that Prometheus can
scrape independently.
To verify it's running:
# List all running scenarios
curl -s http://localhost:8080/scenarios | jq '.[] | {id, name, status}'
# Check live stats for your scenario
curl -s http://localhost:8080/scenarios/$ID/stats | jq .
For the full API reference, see Server API.
Scrape metrics with Prometheus¶
Each running scenario exposes its metrics at GET /scenarios/{id}/metrics in Prometheus text
exposition format. You can point Prometheus (or any compatible scraper like vmagent) at this
endpoint.
Static scrape config¶
If you know the scenario ID ahead of time, configure a static scrape job:
scrape_configs:
- job_name: sonda
scrape_interval: 15s
metrics_path: /scenarios/<SCENARIO_ID>/metrics
static_configs:
- targets: ["sonda.default.svc:8080"]
Replace <SCENARIO_ID> with the UUID returned by POST /scenarios. The target address uses
the Kubernetes Service DNS name (sonda.<namespace>.svc).
Prometheus ServiceMonitor¶
If you run the Prometheus Operator (kube-prometheus-stack),
you can create a ServiceMonitor to auto-discover sonda-server. The Sonda Helm chart does
not include a ServiceMonitor template today, so create one manually:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: sonda
labels:
release: prometheus # must match your Prometheus Operator's selector
spec:
selector:
matchLabels:
app.kubernetes.io/name: sonda
endpoints:
- port: http
interval: 15s
path: /scenarios/<SCENARIO_ID>/metrics
One path per ServiceMonitor endpoint
Each ServiceMonitor endpoint scrapes a single path. If you have multiple running
scenarios, you need one endpoints entry per scenario ID (each with a different
path). For dynamic discovery, consider using a relabeling rule or a script that
queries GET /scenarios and updates the scrape config.
Using vmagent instead of Prometheus
vmagent supports the same scrape_configs format. Point it at sonda-server using
a standard static scrape config. If you're already running the
VictoriaMetrics Docker Compose stack,
add sonda-server as a scrape target in the vmagent config.
Build Grafana dashboards¶
Once Prometheus is scraping your synthetic metrics, you can visualize them in Grafana.
Sonda ships with a Sonda Overview dashboard (docker/grafana/dashboards/sonda-overview.json)
that shows metric values, event rates, and gap/burst indicators. You can import it directly
into any Grafana instance connected to a Prometheus-compatible datasource.
Import the shipped dashboard¶
- Open Grafana and go to Dashboards > Import.
- Upload
docker/grafana/dashboards/sonda-overview.jsonor paste its contents. - Select your Prometheus datasource when prompted.
- The dashboard uses template variables
$datasourceand$job-- set$jobtosonda(or whateverjoblabel your scenarios use).
Build a custom panel¶
For a focused monitoring panel, create a new dashboard with a time series visualization and query your synthetic metric directly:
Add a second panel showing the emission rate over time:
Threshold lines
Add a fixed threshold line in the Grafana panel options (e.g., at 90 for a CPU alert threshold). This gives you a visual reference for when the sine wave crosses your alert boundary.
With dashboards in place, you can see your synthetic data flowing at a glance. Next, let's make sure Sonda itself stays healthy.
Monitor sonda-server health¶
The stats API tells you whether each scenario is emitting as expected. Poll it periodically or build monitoring around it.
Health endpoint¶
The simplest check -- Kubernetes already uses this for liveness and readiness probes:
Per-scenario stats¶
The /scenarios/{id}/stats endpoint returns live stats including event counts, current
emission rate, bytes emitted, error counts, and gap/burst state:
Key fields to watch:
| Field | What it tells you |
|---|---|
total_events |
Running count of emitted events. Should increase steadily. |
current_rate |
Actual emission rate. Compare against your scenario's rate. |
errors |
Error count. Should be 0 for healthy scenarios. |
uptime |
Time since scenario started. Confirms it hasn't restarted. |
List all scenarios¶
Check that all your submitted scenarios are still running:
If a scenario shows status: "stopped" unexpectedly, re-submit it.
Scripting a health check
Wrap the stats check in a simple script that alerts you if events stop flowing:
Rotate scenarios¶
Test patterns change over time. You might start with a sine wave to validate dashboards, then switch to a sequence generator to test alert thresholds. Scenario rotation is straightforward: stop the old scenario and start a new one.
Stop and replace¶
# Stop the running scenario
curl -s -X DELETE http://localhost:8080/scenarios/$ID | jq .
# {"id":"...","status":"stopped","total_events":12345}
# Submit a new scenario
NEW_ID=$(curl -s -X POST -H "Content-Type: text/yaml" \
--data-binary @examples/sequence-alert-test.yaml \
http://localhost:8080/scenarios | jq -r '.id')
echo "New scenario: $NEW_ID"
Scrape config update required
When you replace a scenario, the new scenario gets a different UUID. If your Prometheus
scrape config uses the scenario ID in the metrics_path, you need to update it to
point at the new ID.
Scripted rotation¶
For scheduled rotations (e.g., different patterns during business hours vs. overnight), wrap the stop-and-start sequence in a cron job or Kubernetes CronJob:
#!/bin/bash
SONDA_URL="http://localhost:8080"
SCENARIO_FILE="$1"
# Stop all running scenarios
for id in $(curl -s "$SONDA_URL/scenarios" | jq -r '.[].id'); do
curl -s -X DELETE "$SONDA_URL/scenarios/$id" > /dev/null
done
# Start the new scenario
curl -s -X POST -H "Content-Type: text/yaml" \
--data-binary "@$SCENARIO_FILE" \
"$SONDA_URL/scenarios" | jq .
Alert on Sonda itself¶
Synthetic monitoring is only useful if you know when it breaks. If Sonda stops emitting, your dashboards go silent, and you need to distinguish "Sonda died" from "real outage."
Detect missing synthetic data¶
Create an alert rule that fires when your synthetic metric disappears. This uses the
absent() function in PromQL:
groups:
- name: sonda-watchdog
interval: 30s
rules:
- alert: SondaSyntheticDataMissing
expr: absent(continuous_cpu{job="sonda"})
for: 2m
labels:
severity: warning
annotations:
summary: "Synthetic monitoring data missing"
description: >
The metric continuous_cpu from Sonda has not been seen for 2 minutes.
Either sonda-server is down or the scenario has stopped.
This fires if continuous_cpu{job="sonda"} hasn't been scraped for 2 minutes. Adjust the
for: duration based on your scrape interval and tolerance for gaps.
Monitor the pod itself¶
Since sonda-server runs as a Kubernetes Deployment with health probes, standard kube-state-metrics alerts cover pod-level failures:
- alert: SondaPodNotReady
expr: kube_pod_status_ready{pod=~"sonda.*", condition="true"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Sonda pod is not ready"
Layer your alerting¶
A robust setup uses both layers:
| Layer | What it catches | Alert |
|---|---|---|
| Pod health | Server crash, OOM kill, image pull failure | SondaPodNotReady |
| Metric presence | Scenario stopped, scrape misconfigured, data pipeline broken | SondaSyntheticDataMissing |
The pod alert fires fast (infrastructure issue). The metric-absent alert fires when the data pipeline is broken anywhere between Sonda and Prometheus -- which is exactly the kind of problem synthetic monitoring exists to catch.
Testing these alerts with Sonda
You can validate these watchdog rules using the same patterns from the
Alert Testing and Alerting Pipeline guides.
Submit a scenario, verify the alert stays silent, then DELETE the scenario and watch
the absent() alert fire.
Quick reference¶
| Task | Command |
|---|---|
| Deploy sonda-server | helm install sonda ./helm/sonda |
| Submit a scenario | curl -X POST -H "Content-Type: text/yaml" --data-binary @scenario.yaml http://localhost:8080/scenarios |
| List running scenarios | curl http://localhost:8080/scenarios |
| Check scenario stats | curl http://localhost:8080/scenarios/<id>/stats |
| Scrape metrics | curl http://localhost:8080/scenarios/<id>/metrics |
| Stop a scenario | curl -X DELETE http://localhost:8080/scenarios/<id> |
| Health check | curl http://localhost:8080/health |
Related pages:
- Kubernetes deployment -- Helm chart values and configuration
- Server API -- full endpoint reference
- Alert Testing -- generator patterns for alert threshold testing
- Alerting Pipeline -- end-to-end alerting with vmalert and Alertmanager