Troubleshooting¶

When Sonda isn't behaving as expected, start here. This guide covers the most common issues and how to resolve them, organized from general diagnostics to specific sink and deployment problems.

First steps¶

Before diving into specific issues, run these quick checks.

Validate your configuration¶

Use --dry-run to parse and validate a scenario without emitting any events:

sonda --dry-run metrics --name cpu --rate 10 --duration 30s

sonda --dry-run run --scenario my-scenario.yaml

If the config is valid, Sonda prints the resolved settings and exits with code 0. If there's an error, it prints the problem to stderr and exits with code 1.

Get diagnostic output¶

Use --verbose to print the resolved config at startup, then run normally. This shows exactly what Sonda parsed before it starts emitting events:

sonda --verbose metrics --name cpu --rate 10 --duration 30s \
  --sink http_push --endpoint http://localhost:8428/api/v1/import/prometheus

Exit codes¶

Code	Meaning
`0`	Success -- scenario completed or `--dry-run` validation passed
`1`	Error -- invalid config, connection failure, or runtime error

Connection and delivery issues¶

Connection refused¶

You configured a network sink but Sonda reports a connection error.

Symptom	Likely cause	Fix
`connection refused` on HTTP/TCP sink	Backend is not running or not listening on expected port	Verify the backend is up: `curl -s http://host:port/health`
`connection refused` on gRPC (OTLP)	Collector not running, or wrong port (HTTP vs gRPC)	OTLP gRPC uses port `4317`, not `4318` (HTTP). Check collector status
DNS resolution failure	Hostname typo or DNS not configured	Test with `dig` or `nslookup`. Use IP address to isolate DNS
Timeout with no error	Firewall blocking the port	Check firewall rules. Try `nc -zv host port` to test connectivity

Tip

Test connectivity to your backend before running Sonda. A quick curl -s http://localhost:8428/health for VictoriaMetrics or curl -s http://localhost:3100/ready for Loki confirms the backend is reachable.

Data not appearing at the destination¶

Sonda runs without errors but you don't see data in your backend.

Symptom	Likely cause	Fix
No data in VictoriaMetrics	Wrong endpoint path	Use `/api/v1/import/prometheus` for `http_push`, `/api/v1/write` for `remote_write`
No data in Prometheus	Prometheus needs remote write receiver enabled	Start Prometheus with `--web.enable-remote-write-receiver`
Encoder/sink mismatch	Using `prometheus_text` encoder with `remote_write` sink (or vice versa)	Match encoder to sink: `remote_write` encoder with `remote_write` sink, `otlp` encoder with `otlp_grpc` sink
HTTP 400 Bad Request	Wrong `content_type` for the endpoint	Use `text/plain` for VictoriaMetrics import endpoint

Batching delays¶

Data arrives in chunks or only appears when the scenario ends.

Symptom	Likely cause	Fix
Stdout output appears in bursts	Normal OS-level buffering (~8 KB)	Expected behavior. Data flushes when the buffer fills or the scenario ends
No HTTP POST until scenario ends	Batch threshold not reached at low rates	Lower `batch_size` (e.g., `1024` for `http_push`) or increase the rate. See Sink Batching
Short scenario sends only one batch	Total data smaller than batch threshold	All data flushes on exit. This is correct behavior for short runs

Info

At 10 events/sec with http_push at the default 64 KiB threshold, roughly 650 events (~65 seconds) must accumulate before the first POST. For faster feedback during development, set batch_size: 1024 or lower.

Sink-specific issues¶

Loki¶

Symptom	Likely cause	Fix
`400 Bad Request` from Loki	Label names contain invalid characters	Loki labels must match `[a-zA-Z_][a-zA-Z0-9_]*`. Avoid dots, dashes, or spaces in label keys
Logs rejected in multi-tenant Loki	Missing tenant header	Add `X-Scope-OrgID` via custom headers on an `http_push` sink, or use the default tenant if Loki is in single-tenant mode
No logs visible in Grafana	Wrong label selector in Explore	Check that your Grafana query matches the labels you set in the scenario

Tip

Sonda sends logs to {url}/loki/api/v1/push. You only configure the base URL (e.g., http://localhost:3100), not the full push path.

Kafka¶

Symptom	Likely cause	Fix
Broker connection timeout	Wrong broker address or port	Verify broker is reachable: `nc -zv broker-host 9092`. Check for TLS port (`9093`) vs plaintext (`9092`)
`UnknownTopicOrPartition`	Topic doesn't exist and auto-creation is off	Set `auto.create.topics.enable=true` on the broker, or create the topic before running Sonda
Authentication failure with SASL	Wrong mechanism, username, or password	Double-check `sasl.mechanism` matches your broker config. Confluent Cloud uses `PLAIN`, AWS MSK uses `SCRAM-SHA-256`
Data sent but unreadable	Consumer expects a different encoding	Ensure the consumer's deserializer matches Sonda's encoder (e.g., `prometheus_text` produces plain text)

Warning

SASL credentials are sent in plaintext if TLS is not enabled. Sonda warns about this at startup, but always enable tls.enabled: true alongside SASL in production.

Remote write¶

Symptom	Likely cause	Fix
HTTP 400 from backend	Wrong endpoint URL for the backend	Each backend has a specific path. See the compatible endpoints table
HTTP 403 or 401	Backend requires authentication headers	Add auth headers via `http_push` with custom `headers` instead

Common remote write URLs

Backend	URL
VictoriaMetrics	`http://host:8428/api/v1/write`
Prometheus	`http://host:9090/api/v1/write`
Cortex / Mimir	`http://host:9009/api/v1/push`
Thanos Receive	`http://host:19291/api/v1/receive`

OTLP gRPC¶

Symptom	Likely cause	Fix
gRPC `INVALID_ARGUMENT`	Signal type mismatch between encoder and sink	Set `signal_type` in the sink to match your scenario: `metrics` for metric scenarios, `logs` for log scenarios
Connection refused on port 4318	Using the HTTP port instead of gRPC	OTLP gRPC uses port `4317`. Port `4318` is for OTLP HTTP
`UNAUTHENTICATED`	Collector requires auth token	Configure the collector to accept unauthenticated connections, or use an `http_push` sink with auth headers instead

Resource issues¶

High memory usage¶

Symptom	Likely cause	Fix
Memory grows during cardinality spikes	Each unique label combination creates a new series in memory	Reduce `cardinality` in spike config, or use shorter `for` windows
Memory spikes during CSV replay	Large CSV file loaded into memory	Use smaller CSV files, or split large files into chunks
Steady memory growth over long runs	Large label sets with many static labels	Reduce the number of labels per metric. Each label adds memory per series

Info

Sonda's baseline memory footprint is roughly 5 MB. Memory scales with the number of unique series being generated simultaneously. For sizing guidance, see Capacity Planning -- Performance baselines.

Configuration mistakes¶

YAML parsing errors¶

Symptom	Likely cause	Fix
`invalid type` error on a numeric field	Value is quoted as a string in YAML (e.g., `rate: "10"`)	Remove quotes from numeric fields: `rate: 10`
`unknown field` error	Typo in a field name, or field placed at the wrong nesting level	Check indentation. `labels` goes at the scenario level, not inside `sink`
`missing field` error	Required field omitted	Run `sonda --dry-run` to see which field is missing

Feature flag errors¶

Some sinks and encoders require Cargo feature flags when building from source. Pre-built release binaries include all features.

Feature	Required for	Build command
`http`	`http_push`, `loki` sinks	`cargo build --features http -p sonda`
`remote-write`	`remote_write` encoder and sink	`cargo build --features remote-write -p sonda`
`otlp`	`otlp` encoder, `otlp_grpc` sink	`cargo build --features otlp -p sonda`
`kafka`	`kafka` sink	`cargo build --features kafka -p sonda`

Tip

Build with all features at once: cargo build --features http,remote-write,otlp,kafka -p sonda

Container and signal handling¶

Sonda flushes all buffered data on clean shutdown (SIGTERM or SIGINT). If the process is killed with SIGKILL, any data still in the buffer is lost.

Symptom	Likely cause	Fix
Partial data loss in Docker	Container stopped with `docker kill` (sends SIGKILL)	Use `docker stop` instead, which sends SIGTERM and waits for graceful shutdown
Data loss in Kubernetes	Pod killed before flush completes	Set `terminationGracePeriodSeconds` to at least 5 seconds in your pod spec
No data flushed on Ctrl+C in script	Script traps signals before Sonda receives them	Ensure SIGTERM/SIGINT propagate to the Sonda process

SIGKILL bypasses flush

kill -9 (SIGKILL) terminates Sonda immediately with no chance to flush buffered data. Always use kill (SIGTERM) or Ctrl+C (SIGINT) for a clean shutdown.

Kubernetes: ensure graceful shutdown

spec:
  terminationGracePeriodSeconds: 10
  containers:
    - name: sonda
      image: ghcr.io/davidban77/sonda:latest

Docker Compose: default stop signal is SIGTERM (correct)

services:
  sonda:
    image: ghcr.io/davidban77/sonda:latest
    # docker compose stop sends SIGTERM by default -- no special config needed
    stop_grace_period: 10s

Related pages:

Sinks -- sink types, parameters, and retry configuration
Sink Batching -- how batching affects data delivery
CLI Reference -- all flags for --dry-run, --verbose, and sink options
Capacity Planning -- performance baselines and infrastructure sizing