Kubernetes¶
Run sonda-server in Kubernetes when you want a continuous synthetic-telemetry baseline running inside the cluster. The service emits known metrics through your stack at all times. A flat Grafana panel then means "the data stopped," not "the scrape config broke." It is the difference between guessing whether a quiet dashboard is a real outage and knowing it in one look.
When you finish the install below, you have a sonda-server pod running in the cluster. It is reachable at a stable Service DNS name: sonda.<namespace>.svc.cluster.local:8080. It accepts scenarios sent over HTTP and exposes their metrics for Prometheus to scrape. The bundled Helm chart produces exactly that: a Deployment with health probes, a ClusterIP Service with a named http port, and optional scenario injection through a ConfigMap.
Looking for the full walkthrough?
The Synthetic Monitoring guide is the end-to-end worked example for this use case. It covers starting a local cluster, deploying the chart, submitting long-running scenarios, scraping them with Prometheus, and building Grafana dashboards. This page is the chart reference: every value, probe setting, and auth option the guide uses.
Prerequisites¶
You need a running Kubernetes cluster and these CLI tools installed:
kubectl-- configured to talk to your clusterhelm-- v3.x
If you don't have a cluster yet, the Synthetic Monitoring guide covers lightweight local options: kind, k3d, minikube, and OrbStack. It includes step-by-step setup instructions.
Install the chart¶
Wait for the pod to become ready:
You should see 1/1 Running within 15--20 seconds. The chart defaults to
ghcr.io/davidban77/sonda:<!--x-release-please-version-->1.14.0<!--x-release-please-end--> (the chart's appVersion). Pin a different version with
--set image.tag=<version>.
Deploy to a dedicated namespace
Keep Sonda isolated from your application workloads:
All kubectl commands on this page assume the default namespace. Add -n sonda if you
installed into a different one.
Chart values reference¶
The chart includes default values that work for most installs. Override any value with --set flags or a
-f values.yaml file.
Image¶
| Value | Default | Description |
|---|---|---|
image.repository |
ghcr.io/davidban77/sonda |
Container image registry and name |
image.tag |
"" (uses appVersion: 1.14.0) |
Image tag to pull |
image.pullPolicy |
IfNotPresent |
Kubernetes image pull policy |
imagePullSecrets |
[] |
Secrets for private registries |
Server¶
| Value | Default | Description |
|---|---|---|
server.port |
8080 |
Port sonda-server listens on inside the container |
server.bind |
0.0.0.0 |
Bind address |
Service¶
| Value | Default | Description |
|---|---|---|
service.type |
ClusterIP |
Kubernetes Service type (ClusterIP, NodePort, LoadBalancer) |
service.port |
8080 |
Service port exposed to the cluster |
The Service exposes a named port called http. ServiceMonitor and Ingress
resources reference this name.
Resources¶
| Value | Default | Description |
|---|---|---|
resources.requests.cpu |
100m |
CPU request |
resources.requests.memory |
128Mi |
Memory request |
resources.limits.cpu |
500m |
CPU limit |
resources.limits.memory |
256Mi |
Memory limit |
These defaults are sized for light workloads: a handful of scenarios at moderate rates. If you run many concurrent scenarios or high event rates, increase the limits:
helm install sonda ./helm/sonda \
--set resources.requests.cpu=200m \
--set resources.limits.cpu=1000m \
--set resources.limits.memory=512Mi
Security¶
| Value | Default | Description |
|---|---|---|
podSecurityContext |
{} |
Pod-level security context (e.g., fsGroup) |
securityContext |
{} |
Container-level security context (e.g., runAsNonRoot, readOnlyRootFilesystem, capabilities) |
Authentication¶
| Value | Default | Description |
|---|---|---|
server.auth.enabled |
false |
Enable API key authentication on /scenarios/* endpoints |
server.auth.existingSecret |
"" |
Name of an existing Secret containing the API key |
server.auth.secretKey |
api-key |
Key within the Secret that holds the API key value |
When server.auth.enabled is true, the chart injects SONDA_API_KEY into the container
from the referenced Secret. See API key authentication for setup
instructions.
Scheduling¶
| Value | Default | Description |
|---|---|---|
replicaCount |
1 |
Number of Deployment replicas (ignored when HPA is enabled) |
nodeSelector |
{} |
Node selector constraints |
tolerations |
[] |
Pod tolerations |
affinity |
{} |
Pod affinity/anti-affinity rules |
Autoscaling (HPA)¶
| Value | Default | Description |
|---|---|---|
autoscaling.enabled |
false |
Enable HorizontalPodAutoscaler |
autoscaling.minReplicas |
1 |
Minimum replica count |
autoscaling.maxReplicas |
5 |
Maximum replica count |
autoscaling.targetCPUUtilizationPercentage |
80 |
Target CPU utilization |
autoscaling.targetMemoryUtilizationPercentage |
(unset) | Target memory utilization |
Pod Disruption Budget¶
| Value | Default | Description |
|---|---|---|
podDisruptionBudget.enabled |
false |
Enable PodDisruptionBudget |
podDisruptionBudget.minAvailable |
1 |
Minimum available pods during disruption |
podDisruptionBudget.maxUnavailable |
(unset) | Maximum unavailable pods during disruption |
Ingress¶
| Value | Default | Description |
|---|---|---|
ingress.enabled |
false |
Enable Ingress resource |
ingress.className |
"" |
Ingress class name |
ingress.annotations |
{} |
Ingress annotations |
ingress.hosts |
[{host: sonda.local, paths: [{path: /, pathType: Prefix}]}] |
Ingress host rules |
ingress.tls |
[] |
TLS configuration |
ServiceMonitor¶
A ServiceMonitor endpoint scrapes a single path. sonda-server exposes three Prometheus paths and they answer different questions — pick by the dashboard you are filling:
| Path | What it returns | When to scrape |
|---|---|---|
/metrics |
Server-process RED + saturation (sonda_server_* series). See Server metrics. |
Operational dashboards and alerts on the server itself. |
/scenarios/metrics |
Aggregate scenario data fused into one response. Supports ?label=k:v filtering. See Aggregate Prometheus scrape. |
A single scrape target covering every scenario, regardless of how it was launched. |
/scenarios/<scenario-id>/metrics |
One scenario's series. | Per-scenario debugging or a stable URL per scenario loaded from the scenarios ConfigMap. |
The chart defaults serviceMonitor.path to /metrics — the right choice for operational alerting on the server process. Switch it to /scenarios/metrics when you also want one scrape job to cover every scenario the server is running.
| Value | Default | Description |
|---|---|---|
serviceMonitor.enabled |
false |
Enable Prometheus Operator ServiceMonitor |
serviceMonitor.interval |
30s |
Scrape interval |
serviceMonitor.scrapeTimeout |
10s |
Scrape timeout |
serviceMonitor.path |
/metrics |
Metrics endpoint path. Switch to /scenarios/metrics for aggregate scenario scraping, or /scenarios/<scenario-id>/metrics for a single scenario. |
serviceMonitor.additionalLabels |
{} |
Extra labels on the ServiceMonitor resource |
One ServiceMonitor per path
A ServiceMonitor endpoint scrapes a single path. To scrape both /metrics and /scenarios/metrics, deploy two ServiceMonitor resources (or one with two endpoints entries — see the manual ServiceMonitor example below).
Scenarios (ConfigMap)¶
| Value | Default | Description |
|---|---|---|
scenarios |
{} |
Map of filename to YAML content, mounted at /scenarios |
See Configuring scenarios below.
Configuring scenarios¶
You can load scenarios into sonda-server two ways. Include them in the Helm release through a
ConfigMap, or submit them at runtime over the API.
ConfigMap (deploy-time)¶
Define scenarios under the scenarios key in a values file. Each key becomes a file mounted
at /scenarios inside the container:
scenarios:
cpu-metrics.yaml: |
name: cpu_usage
rate: 100
duration: 30s
generator:
type: sine
amplitude: 50
period_secs: 60
offset: 50
encoder:
type: prometheus_text
sink:
type: stdout
The Deployment template includes a checksum/scenarios annotation. Changing scenario
content in your values file triggers an automatic pod rollout on helm upgrade.
You can place kind: composable metric pack YAMLs in the same scenarios map next to your runnable scenarios. When scenarios is populated, the chart points the server's --catalog flag at the mounted /scenarios directory. POST /scenarios bodies that reference a pack by name (pack: <name>) then resolve automatically. No extra configuration. See Pack references over HTTP for how that resolution works.
See Scenario Fields for the full YAML schema.
API (runtime)¶
Once sonda-server is running, you can submit scenarios at runtime without redeploying:
curl -X POST -H "Content-Type: text/yaml" \
--data-binary @examples/basic-metrics.yaml \
http://localhost:8080/scenarios
This is useful for long-running synthetic monitoring where you rotate scenarios over time. See Server API for the full endpoint reference and the Synthetic Monitoring guide for operational patterns.
Health probes¶
The Deployment configures both liveness and readiness probes against GET /health:
| Probe | Initial delay | Period | Timeout | Failure threshold |
|---|---|---|---|---|
| Liveness | 5s | 10s | 3s | 3 |
| Readiness | 2s | 5s | 3s | 3 |
The /health endpoint returns {"status":"ok"} with HTTP 200 when the server is running.
Pods restart automatically if the server becomes unresponsive.
Accessing the server¶
Port-forward¶
The quickest way to reach sonda-server from your workstation:
Then interact with the API at http://localhost:8080:
# Health check
curl http://localhost:8080/health
# Start a scenario
curl -X POST -H "Content-Type: text/yaml" \
--data-binary @examples/basic-metrics.yaml \
http://localhost:8080/scenarios
# List running scenarios
curl http://localhost:8080/scenarios
In-cluster DNS¶
Other pods in the cluster can reach sonda-server using the Service DNS name:
For example, a Prometheus instance in the same namespace can scrape
http://sonda:8080/scenarios/<id>/metrics directly.
Prometheus scraping¶
sonda-server exposes three scrape endpoints. Each answers a different question:
| Endpoint | What it returns | When to scrape |
|---|---|---|
GET /scenarios/{id}/metrics |
One scenario's series | Per-scenario debugging |
GET /scenarios/metrics |
Aggregate scenario data | Per-process scenario view |
GET /metrics |
Server-process RED + saturation | Operational dashboards and alerts |
GET /scenarios/metrics and GET /scenarios/{id}/metrics are idempotent snapshots — one sample per (name, labels) series with no timestamp, like a node_exporter scrape. GET /metrics returns the server's own RED and saturation telemetry — see Server metrics for the nine series and the alerts that go with them. Most Prometheus setups want a job per endpoint; the aggregate /scenarios/metrics covers every scenario and you do not need to know scenario IDs in advance. See Aggregate Prometheus scrape for the ?label=k:v AND-filter syntax.
Aggregate scrape config¶
scrape_configs:
- job_name: sonda
scrape_interval: 15s
metrics_path: /scenarios/metrics
static_configs:
- targets: ["sonda.default.svc:8080"]
Add params: {label: ["device:srl1"]} to scope a job to one device's metrics. Repeat the label value to AND-combine selectors.
Per-scenario scrape config¶
When you want one scrape job per scenario, point metrics_path at the scenario ID:
scrape_configs:
- job_name: sonda-id
scrape_interval: 15s
metrics_path: /scenarios/<SCENARIO_ID>/metrics
static_configs:
- targets: ["sonda.default.svc:8080"]
Replace <SCENARIO_ID> with the UUID returned by POST /scenarios.
ServiceMonitor¶
If you run the Prometheus Operator (typically through kube-prometheus-stack), the chart includes an optional ServiceMonitor template. Enable it with:
See the ServiceMonitor values reference for all options
(interval, scrapeTimeout, path, additionalLabels).
Manual ServiceMonitor¶
You can also apply a custom ServiceMonitor manually for full control. The example below scrapes the server's own RED metrics and the aggregate scenario view in one resource:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: sonda
labels:
release: prometheus # must match your Prometheus Operator's selector
spec:
selector:
matchLabels:
app.kubernetes.io/name: sonda
endpoints:
- port: http
interval: 15s
path: /metrics
- port: http
interval: 15s
path: /scenarios/metrics
The port: http field matches the named port on the Sonda Service. Each endpoints entry scrapes a single path. For a per-scenario route, add one entry per scenario ID with path: /scenarios/<SCENARIO_ID>/metrics.
API key authentication¶
Sonda-server supports optional bearer token authentication on /scenarios/*, /scenarios/metrics, and
/events. When enabled, clients must include an Authorization: Bearer <key> header. The
/health endpoint stays public so liveness and readiness probes work without credentials.
For the full authentication behavior (error responses, protected vs. public endpoints), see the Server API Authentication section.
Create a Secret¶
Store your API key in a Kubernetes Secret:
apiVersion: v1
kind: Secret
metadata:
name: sonda-api-key
type: Opaque
stringData:
api-key: "your-secret-key-here"
Generate a random key
Enable auth in the Helm chart¶
Point the chart at your Secret:
helm install sonda ./helm/sonda \
--set server.auth.enabled=true \
--set server.auth.existingSecret=sonda-api-key
Or in a values file:
server:
auth:
enabled: true
existingSecret: sonda-api-key
secretKey: api-key # default; change if your Secret uses a different key
The chart sets SONDA_API_KEY in the container environment from the Secret. On startup you
will see:
INFO sonda_server: API key authentication enabled for /scenarios/*, /events, /metrics, and /scenarios/metrics endpoints
Authenticated API calls¶
Once auth is enabled, include the bearer token in all /scenarios/*, /scenarios/metrics, and /events requests:
# Port-forward to reach the server
kubectl port-forward svc/sonda 8080:8080
# Start a scenario (requires auth)
curl -X POST \
-H "Authorization: Bearer your-secret-key-here" \
-H "Content-Type: text/yaml" \
--data-binary @examples/basic-metrics.yaml \
http://localhost:8080/scenarios
# Health check (always public)
curl http://localhost:8080/health
Prometheus scraping with auth¶
When authentication is enabled, every scrape endpoint requires a bearer token: /metrics, /scenarios/metrics, and /scenarios/{id}/metrics. Add the token to your Prometheus scrape config:
For a ServiceMonitor, add bearerTokenSecret to the endpoint:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: sonda
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: sonda
endpoints:
- port: http
interval: 15s
path: /metrics
bearerTokenSecret:
name: sonda-api-key
key: api-key
Add a second endpoints entry with the same bearerTokenSecret to also scrape /scenarios/metrics or a per-scenario path. See Authentication on Server metrics for the full reference.
Same Secret, same namespace
The bearerTokenSecret must reference a Secret in the same namespace as the
Prometheus instance, not the Sonda namespace. If they differ, copy the Secret or use
bearer_token_file with a mounted volume instead.
Upgrading¶
Update your release after changing values or pulling a new chart version:
# Upgrade with new values
helm upgrade sonda ./helm/sonda -f my-values.yaml
# Upgrade to a new image version
helm upgrade sonda ./helm/sonda --set image.tag=<!--x-release-please-version-->1.14.0<!--x-release-please-end-->
If your values file includes scenarios, the ConfigMap checksum annotation triggers an
automatic pod rollout. No manual restart is needed.
Uninstalling¶
This removes the Deployment, Service, ConfigMap (if created), and all associated resources.
Add -n <namespace> if you installed into a non-default namespace.
What's next¶
- Synthetic Monitoring guide -- deploy Sonda on Kubernetes, submit long-running scenarios, scrape with Prometheus, and build Grafana dashboards
- Server API -- full endpoint reference for
sonda-server - Server metrics -- the nine
/metricsseries and the PromQL alerts that matter - Docker -- Docker image and Compose stacks for local development
- Scenario Fields -- full YAML schema for scenario configuration