Generators¶
A generator produces values for each tick of a scenario. For metrics, it produces f64 values. For logs, it produces structured log events. You select a generator with the generator.type field.
Which generator?¶
Pick the row that matches what you are simulating. The first table covers the eight core metric generators. The second covers operational aliases — shortcuts for the same engine that let you write flap instead of sequence with hand-aligned values.
| Generator | Use case | Shape | Key fields |
|---|---|---|---|
constant |
Up/down indicators, recording-rule baselines | Flat horizontal line | value |
sine |
CPU, latency, cyclical load | Smooth oscillation around a midpoint | amplitude, period_secs, offset |
sawtooth |
Counter wraps, queue fill cycles | Linear ramp that resets at period end | min, max, period_secs |
uniform |
Jitter, random-load streams | Random values drawn each tick | min, max, seed |
sequence |
Exact for: durations, scripted timelines |
Whatever you list, tick by tick | values, repeat |
step |
rate() and increase() testing |
Monotonic counter that increases each tick | start, step_size, optional max |
spike |
Anomaly detection, threshold alerts | Baseline with periodic outlier bursts | baseline, magnitude, interval_secs |
csv_replay |
Bit-for-bit incident reproduction | The recorded values, at the recorded cadence | file, timescale, default_metric_name |
For logs, choose template for synthesized messages with field pools or csv_replay for replaying a structured CSV of real log events at the recorded cadence. For latency distributions, see the histogram and summary generators.
| Alias | Operational meaning | Shape | Key fields |
|---|---|---|---|
steady |
Healthy "everything is fine" baseline | Sine + jitter around a center | center, amplitude, noise |
flap |
Interface or health-check toggling | Binary up/down with per-state durations | up_duration, down_duration |
saturation |
Buffer or disk filling, then resetting | Ramp from baseline to ceiling, repeats | baseline, ceiling, time_to_saturate |
leak |
Memory leak, unreleased connections | One-way ramp toward ceiling, no reset | baseline, ceiling, time_to_ceiling |
degradation |
Latency increasing, throughput dropping | One-way ramp with noise on top | baseline, ceiling, time_to_degrade |
spike_event |
CPU spikes, request surges, error floods | Baseline with periodic bursts | baseline, spike_height, spike_interval |
Aliases and core generators are interchangeable
Aliases are shortcuts. At parse time, each alias is translated into the core generators above. Use the alias when the operational meaning is clearer; use the core generator when you need a parameter the alias does not expose (for example, negative spike magnitude for dip testing).
Metric generators¶
constant¶
Returns the same value on every tick. Use it for baseline testing or known-value verification (for example, recording rule validation).
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
value |
float | yes | -- | The fixed value emitted on every tick. |
Shape: A flat horizontal line at the configured value.
version: 2
kind: runnable
defaults:
rate: 2
duration: 2s
encoder:
type: prometheus_text
sink:
type: stdout
scenarios:
- id: up
signal_type: metrics
name: up
generator:
type: constant
value: 1.0
When no generator is configured, the default is constant with value: 0.0.
sine¶
Produces a sine wave that oscillates between offset - amplitude and offset + amplitude.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
amplitude |
float | yes | -- | Half the peak-to-peak swing. |
period_secs |
float | yes | -- | Duration of one full cycle in seconds. |
offset |
float | yes | -- | Vertical midpoint of the wave. |
Shape: Oscillates smoothly between 0 and 100 with a 60-second period. At tick 0, the value equals the offset.
version: 2
kind: runnable
defaults:
rate: 2
duration: 2s
encoder:
type: prometheus_text
sink:
type: stdout
scenarios:
- id: cpu
signal_type: metrics
name: cpu
generator:
type: sine
amplitude: 50.0
period_secs: 4
offset: 50.0
sawtooth¶
Ramps linearly from min to max and resets to min at the start of each period.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
min |
float | yes | -- | Value at the start of each period. |
max |
float | yes | -- | Value approached at the end (never reached). |
period_secs |
float | yes | -- | Duration of one full ramp in seconds. |
Shape: A linear ramp from 0 to 100 over 60 seconds, then a reset to 0.
version: 2
kind: runnable
defaults:
rate: 2
duration: 2s
encoder:
type: prometheus_text
sink:
type: stdout
scenarios:
- id: ramp
signal_type: metrics
name: ramp
generator:
type: sawtooth
min: 0.0
max: 100.0
period_secs: 4
uniform¶
Produces uniformly distributed random values in the range [min, max]. Deterministic when seeded.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
min |
float | yes | -- | Lower bound (inclusive). |
max |
float | yes | -- | Upper bound (inclusive). |
seed |
integer | no | 0 |
RNG seed for deterministic replay. |
Shape: Random values scattered between 10 and 90. The same seed produces the same sequence.
version: 2
kind: runnable
defaults:
rate: 2
duration: 2s
encoder:
type: prometheus_text
sink:
type: stdout
scenarios:
- id: noise
signal_type: metrics
name: noise
generator:
type: uniform
min: 10.0
max: 90.0
seed: 42
noise 69.32519030174588 1774279698726
noise 68.2543018631486 1774279699231
noise 27.068700996215277 1774279699731
sequence¶
Steps through an explicit list of values. Use it to model specific incident patterns such as threshold crossings.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
values |
list of floats | yes | -- | The ordered values to step through. Must not be empty. |
repeat |
boolean | no | true |
When true, cycles back to the start. When false, holds the last value. |
generator:
type: sequence
values: [10, 10, 10, 10, 10, 95, 95, 95, 95, 95, 10, 10, 10, 10, 10, 10]
repeat: true
Shape: Steps through the list one value per tick. With repeat: true, wraps around after the last value. With repeat: false, the last value is emitted for every subsequent tick.
cpu_spike_test{instance="server-01",job="node"} 10 1774279704026
cpu_spike_test{instance="server-01",job="node"} 10 1774279705031
cpu_spike_test{instance="server-01",job="node"} 10 1774279706031
cpu_spike_test{instance="server-01",job="node"} 10 1774279707031
cpu_spike_test{instance="server-01",job="node"} 10 1774279708031
cpu_spike_test{instance="server-01",job="node"} 95 1774279709031
step¶
Produces a monotonically increasing counter value: start + tick * step_size. With max set, the value wraps using modular arithmetic to simulate a counter reset. This is the standard generator for testing PromQL rate() and increase() queries.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
start |
float | no | 0.0 |
Initial value at tick 0. |
step_size |
float | yes | -- | Increment applied per tick. |
max |
float | no | none | Wrap-around threshold. When set and greater than start, the value resets to start once it reaches max. |
Shape: A linear ramp from start, increasing by step_size each tick. Without max, it grows without bound. With max, it wraps back to start when it reaches the threshold.
request_count{instance="web-01",job="app"} 0 1775192670938
request_count{instance="web-01",job="app"} 1 1775192671439
request_count{instance="web-01",job="app"} 2 1775192671939
request_count{instance="web-01",job="app"} 3 1775192672443
request_count{instance="web-01",job="app"} 4 1775192672943
Simulating counter resets
Set max to a low value to see the wrap-around behavior. For example, start: 0, step_size: 1, max: 5 produces 0, 1, 2, 3, 4, 0, 1, 2, .... This is useful for verifying that your rate() queries handle counter resets correctly. Prometheus treats the drop from max-1 back to start as a counter reset, so rate() and increase() stitch across the wrap correctly. No data is lost or double-counted.
step defaults to metric_type: counter
When scraped through sonda-server, a step scenario appears as a # TYPE <name> counter metric by default. Every other metric generator defaults to gauge. Override either default by setting metric_type: on the scenario.
spike¶
Outputs a constant baseline value with periodic spikes. During a spike window the value is baseline + magnitude; outside the window the value is baseline. Use it to test alert thresholds and anomaly detection rules that trigger on sudden value changes.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
baseline |
float | yes | -- | The normal output value between spikes. |
magnitude |
float | yes | -- | The amount added to baseline during a spike. Negative values create dips below baseline. |
duration_secs |
float | yes | -- | How long each spike lasts in seconds. |
interval_secs |
float | yes | -- | Time between spike starts in seconds. Must be greater than 0. |
generator:
type: spike
baseline: 50.0
magnitude: 200.0
duration_secs: 10
interval_secs: 60
Shape: Holds at 50 for most of the 60-second cycle, then jumps to 250 for 10 seconds.
cpu_spike_test{instance="server-01",job="node"} 250 1775195158883
cpu_spike_test{instance="server-01",job="node"} 250 1775195159888
cpu_spike_test{instance="server-01",job="node"} 250 1775195160888
cpu_spike_test{instance="server-01",job="node"} 250 1775195161888
cpu_spike_test{instance="server-01",job="node"} 250 1775195162888
Negative magnitude for dip testing
Set magnitude to a negative value to create periodic dips below the baseline. For example, baseline: 100.0 with magnitude: -50.0 produces values that drop from 100 to 50 during the spike window. This is useful for testing low-threshold alerts.
csv_replay¶
Replays numeric values from a CSV file. Use it to reproduce real production metric patterns captured from monitoring systems, including Grafana CSV exports with embedded labels. The replay rate is derived from the CSV's column-0 timestamps and the optional timescale multiplier, so a 5-minute incident plays back over 5 minutes without manual tuning. For a step-by-step walkthrough of the Grafana export workflow, see the Grafana CSV Replay guide.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
string | yes | -- | Path to the CSV file. |
columns |
list | no | -- | Explicit column specs. Each entry: {index, name} with optional labels. When absent, columns are auto-discovered from the header. |
repeat |
boolean | no | true |
When true, cycles back to the start. When false, holds the last value. |
timescale |
float | no | 1.0 |
Replay speed multiplier. 2.0 plays 2x faster, 0.5 plays 2x slower. Must be strictly positive. |
default_metric_name |
string | no | -- | Fallback metric name for auto-discovered columns whose header has labels but no __name__. Suffixed with _<column_index> when several columns share the fallback. |
Header rows are auto-detected. If any non-time field on the first data line is non-numeric, the line is treated as a header and skipped.
When columns is omitted, Sonda reads the CSV header and auto-discovers column names and labels. If the CSV has no header (an all-numeric first row), you must provide explicit columns.
Scenario rate: is overridden for csv_replay
For csv_replay, the scenario's rate: is always replaced by timescale / median_delta_t, where median_delta_t is the median interval between consecutive timestamps in column 0 of the CSV. Setting rate: in YAML has no effect on emission cadence. Run sonda --verbose --dry-run to confirm the derived rate or inspect the startup banner. Use timescale: to speed up or slow down replay.
When columns is absent, Sonda reads the header row and creates one metric stream per data column. This works with both plain headers and Grafana-style label-aware headers.
version: 2
kind: runnable
defaults:
rate: 1
encoder:
type: prometheus_text
sink:
type: stdout
scenarios:
- signal_type: metrics
name: ignored_when_columns_set # each column entry provides its own metric name
generator:
type: csv_replay
file: examples/sample-multi-column.csv
columns:
- index: 1
name: cpu_percent
- index: 2
name: mem_percent
- index: 3
name: disk_io_mbps
labels:
instance: prod-server-42
job: node
This expands into three independent metric streams — cpu_percent, mem_percent, and disk_io_mbps — that share the same labels, rate, sink, and other scenario fields.
Each column entry can carry its own labels map. Per-column labels are merged with scenario-level labels, and column labels win on key conflict.
scenarios:
- signal_type: metrics
name: system_metrics
generator:
type: csv_replay
file: examples/sample-multi-column.csv
columns:
- index: 1
name: cpu_percent
labels:
core: "0"
- index: 2
name: mem_percent
labels:
type: physical
- index: 3
name: disk_io_mbps
labels:
instance: prod-server-42
job: node
cpu_percent gets {core="0", instance="prod-server-42", job="node"}. disk_io_mbps gets only the scenario-level labels.
Shape: Follows the exact pattern recorded in the CSV file. The values are replayed verbatim, one per tick.
Note
The CSV file path is relative to the working directory where you run sonda, not relative to the scenario file.
Supported header formats for auto-discovery
Sonda recognizes five column header formats:
| Format | Example | Metric name | Labels |
|---|---|---|---|
__name__ inside braces |
{__name__="up", job="prom"} |
up |
job |
| Name before braces | up{job="prom"} |
up |
job |
| Labels only | {job="prom"} |
from default_metric_name |
job |
| Plain name | cpu_percent |
cpu_percent |
none |
| Simple word | prometheus |
prometheus |
none |
Formats 1 and 2 are produced by Grafana. Format 3 (labels only, no metric name) is supported through the default_metric_name field on the generator. See the Grafana CSV Replay guide.
Operational aliases¶
Writing type: sawtooth with min, max, and period_secs works, but you have to translate operational behaviour into mathematical parameters. Operational aliases let you describe what is happening — a memory leak, a flapping interface, a healthy baseline — and Sonda translates that into the right generator with reasonable defaults.
Aliases are shortcuts. At config load time, each alias is translated into a concrete generator (and optionally jitter settings). The runtime never sees aliases; everything runs through the same generator engine. Every existing generator type still works unchanged.
| Alias | Translates to | Operational meaning |
|---|---|---|
steady |
sine + jitter |
Normal healthy oscillation |
flap |
sequence |
Binary up/down toggle (interface flapping) |
saturation |
sawtooth |
Resource filling up then resetting |
leak |
sawtooth |
One-way resource growth (no reset) |
degradation |
sawtooth + jitter |
Gradual performance loss with noise |
spike_event |
spike |
Periodic anomalous bursts |
steady¶
Models a healthy "everything is fine" signal. Values oscillate gently around a center point with slight noise — the kind of metric you see on a server under normal load.
| Parameter | Type | Default | Description |
|---|---|---|---|
center |
float | 50.0 |
Midpoint of the oscillation. |
amplitude |
float | 10.0 |
Half the peak-to-peak swing. |
period |
duration | "60s" |
Duration of one full cycle. |
noise |
float | 1.0 |
Jitter amplitude (uniform noise in [-noise, +noise]). |
noise_seed |
integer | 0 |
Seed for deterministic noise. |
generator:
type: steady
center: 75.0
amplitude: 10.0
period: "60s"
noise: 2.0
noise_seed: 7
Values oscillate between 63 and 87 (75 ± 10, plus up to ± 2 noise) on a 60-second cycle.
Generate a starter file
sonda new walks through signal type → generator → rate → duration → sink and writes a ready-to-run YAML with the steady alias filled in.
flap¶
Models a binary signal toggling between two states — an interface going up and down, a service alternating between healthy and unhealthy.
| Parameter | Type | Default | Description |
|---|---|---|---|
up_duration |
duration | "10s" |
How long the signal stays "up" per cycle. |
down_duration |
duration | "5s" |
How long the signal stays "down" per cycle. |
up_value |
float | 1.0 |
Value emitted during the "up" state. |
down_value |
float | 0.0 |
Value emitted during the "down" state. |
enum |
string | unset | Domain shortcut for (up_value, down_value). See the table below. Mutually exclusive with explicit up_value/down_value. |
At rate: 1, this produces 10 ticks of 1.0 followed by 5 ticks of 0.0, then repeats. The number of ticks per state is derived from the duration and the scenario rate.
enum: shortcut¶
For metrics that operators read directly, use the enum: shortcut over hand-tuned values. It selects a (up_value, down_value) pair aligned with gNMI / openconfig conventions, so dashboards and alert rules built around standard state codes (UP=1, DOWN=2, ESTABLISHED=6, IDLE=1) keep working unchanged. enum: oper_state is the recommended starting point for any interface-state or operational-status metric.
enum: value |
up_value |
down_value |
Use case |
|---|---|---|---|
boolean |
1.0 | 0.0 | Generic boolean — explicit synonym for the default (up_value, down_value) pair |
link_state |
1.0 | 0.0 | Synonym of boolean |
oper_state |
1.0 | 2.0 | gNMI / openconfig oper-state (UP=1, DOWN=2) |
admin_state |
1.0 | 2.0 | gNMI / openconfig admin-state (UP=1, DOWN=2) |
neighbor_state |
6.0 | 1.0 | BGP neighbor-state (ESTABLISHED=6, IDLE=1) |
generator:
type: flap
up_duration: "60s"
down_duration: "30s"
enum: oper_state # up_value=1.0, down_value=2.0 -- no need to spell them out
enum: is mutually exclusive with explicit up_value / down_value. Combining them is rejected when the scenario is loaded, with the message flap: 'enum' is mutually exclusive with explicit 'up_value'/'down_value' — pick one.
Custom values
When the metric does not match a documented enum, use explicit up_value and down_value instead. For example, a link that alternates between full speed and degraded throughput:
saturation¶
Models a resource that fills up and resets on a repeating cycle — disk usage growing between log rotations, a buffer draining when a consumer catches up.
| Parameter | Type | Default | Description |
|---|---|---|---|
baseline |
float | 0.0 |
Value at the start of each cycle. |
ceiling |
float | 100.0 |
Maximum value before reset. |
time_to_saturate |
duration | "5m" |
Duration of one fill cycle. |
Values ramp linearly from 20 to 95 over 5 minutes, then reset to 20 and repeat.
leak¶
Models a resource growing toward a ceiling without ever resetting — a memory leak, a connection pool that never releases, a queue that fills but never drains.
| Parameter | Type | Default | Description |
|---|---|---|---|
baseline |
float | 0.0 |
Starting resource level. |
ceiling |
float | 100.0 |
Target ceiling value. |
time_to_ceiling |
duration | "10m" |
Time to grow from baseline to ceiling. |
Values ramp linearly from 40 to 95 over 120 seconds with no reset.
time_to_ceiling must be >= duration
If you set a scenario duration and time_to_ceiling is shorter, Sonda rejects the config with an error. A leak that resets mid-run is the saturation pattern. Use that alias instead if you want repeating fill-and-reset cycles.
degradation¶
Models gradual performance loss with realistic noise — latency increasing over time, error rates rising, throughput dropping.
| Parameter | Type | Default | Description |
|---|---|---|---|
baseline |
float | 0.0 |
Starting performance level. |
ceiling |
float | 100.0 |
Worst-case performance level. |
time_to_degrade |
duration | "5m" |
Duration of the degradation ramp. |
noise |
float | 1.0 |
Jitter amplitude added as noise. |
noise_seed |
integer | 0 |
Seed for deterministic noise. |
generator:
type: degradation
baseline: 0.05
ceiling: 0.5
time_to_degrade: "60s"
noise: 0.02
noise_seed: 42
Values ramp from 50ms to 500ms over 60 seconds with ± 20ms of noise on each tick.
spike_event¶
Models periodic anomalous bursts above a baseline — CPU spikes, sudden request surges, momentary error floods.
| Parameter | Type | Default | Description |
|---|---|---|---|
baseline |
float | 0.0 |
Normal value between spikes. |
spike_height |
float | 100.0 |
Amount added to baseline during a spike. |
spike_duration |
duration | "10s" |
How long each spike lasts. |
spike_interval |
duration | "30s" |
Time between spike starts. |
generator:
type: spike_event
baseline: 35.0
spike_height: 60.0
spike_duration: "10s"
spike_interval: "30s"
Values hold at 35 between spikes, then jump to 95 (35 + 60) for 10 seconds every 30 seconds.
Aliases vs. core generators
You can use the underlying generator directly if you need parameters the alias does not expose. For example, spike_event does not expose the spike generator's magnitude parameter (which supports negative values for dip testing). In that case, use type: spike with magnitude: -50.0 directly.
Aliases and core generators are interchangeable in any scenario file. Mix them freely.
Histogram and summary generators¶
The metric generators above produce a single number per tick — one value, one time series, one line. That works for counters ("how many requests?") and gauges ("what is the CPU usage?"), but it cannot answer distribution questions: "how fast are requests?" or "what latency do 99% of users experience?"
That is the problem histograms and summaries solve. Instead of recording a single value, they observe many individual measurements (for example, request durations). They then produce multiple time series per tick that describe the shape of those measurements: where the values cluster, how they spread, and where the tail ends.
Another way to see it: a counter tells you how many requests happened. A histogram tells you how long each one took, broken down into ranges so you can compute percentiles.
How real systems work
When you instrument an HTTP handler with a histogram in a Prometheus client library, every request duration is observed into the histogram. The client does not store each individual duration. Instead, it maintains cumulative counters for predefined bucket boundaries (for example, "how many requests took 100ms or less?"). Prometheus scrapes these counters, and you use histogram_quantile() to estimate percentiles from the bucket distribution.
Sonda's histogram generator does the same thing. It samples synthetic observations from a distribution, updates cumulative bucket counters, and emits the result in Prometheus format. The output looks the same as a real instrumented service.
Histograms and summaries are scenario entries with signal_type: histogram or signal_type: summary and a distribution: block in place of the metric generator: block. Run them with sonda run like any other scenario. For a hands-on walkthrough of testing latency alerts, see the Histograms, Summaries, and Latency Alerts guide.
histogram¶
A histogram answers the question: "what is the distribution of observed values?" It does this by sorting observations into buckets — ranges with upper boundaries you define. Each bucket counts how many observations fell at or below that boundary.
For a metric named http_request_duration_seconds with buckets at 0.1, 0.25, and 0.5, each tick produces something like:
http_request_duration_seconds_bucket{le="0.1"} 60 # 60 requests were <= 100ms
http_request_duration_seconds_bucket{le="0.25"} 85 # 85 requests were <= 250ms
http_request_duration_seconds_bucket{le="0.5"} 97 # 97 requests were <= 500ms
http_request_duration_seconds_bucket{le="+Inf"} 100 # all 100 requests
http_request_duration_seconds_count 100 # total observations
http_request_duration_seconds_sum 15.2 # total seconds across all requests
Buckets are cumulative — the le="0.25" count includes every observation that is also in le="0.1". They are also counters, so they only increase over time. This is what makes rate() and histogram_quantile() work: Prometheus computes per-second rates from the counter deltas, then interpolates between bucket boundaries to estimate any percentile you ask for.
Choosing bucket boundaries
Bucket boundaries determine the resolution of your percentile estimates. If your SLO is "p99 latency under 500ms" but you have no bucket boundary near 500ms, the estimate is coarse. The default Prometheus buckets (0.005 to 10.0) work for general HTTP latency. For tighter SLOs, add boundaries near your threshold:
More buckets means more time series (one per bucket per label combination), so there is a cardinality tradeoff. For most services, 10-15 buckets is a reasonable starting point.
Each tick, the generator samples observations_per_tick values from a configurable distribution, updates cumulative bucket counters, and emits one line per bucket plus +Inf, _count, and _sum. Bucket counts never decrease; they follow counter semantics.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
name |
string | yes | -- | Base metric name. Sonda appends _bucket, _count, _sum automatically. |
rate |
float | yes | -- | Ticks per second. Each tick produces one full histogram sample. |
duration |
string | no | runs forever | Total run time. |
distribution |
object | yes | -- | Observation distribution model. See Distribution models. |
buckets |
list of floats | no | Prometheus defaults | Sorted upper boundaries. Default: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]. |
observations_per_tick |
integer | no | 100 |
Number of observations sampled per tick. |
mean_shift_per_sec |
float | no | 0.0 |
Linear drift applied to the distribution center per second. Simulates latency degradation. |
seed |
integer | no | 0 |
RNG seed for deterministic output. The same seed produces the same bucket counts. |
labels |
map | no | none | Static labels attached to every series. |
encoder |
object | no | prometheus_text |
Output format. |
sink |
object | no | stdout |
Output destination. |
version: 2
kind: runnable
defaults:
rate: 1
duration: 10s
encoder:
type: prometheus_text
sink:
type: stdout
scenarios:
- signal_type: histogram
name: http_request_duration_seconds
distribution:
type: exponential
rate: 10.0
observations_per_tick: 100
seed: 42
labels:
method: GET
handler: /api/v1/query
Shape: N+3 time series per tick (N bucket boundaries + +Inf + _count + _sum). With default buckets, that is 14 series per tick. Every bucket counter is cumulative and increases across ticks.
http_request_duration_seconds_bucket{handler="/api/v1/query",le="0.005",method="GET"} 3 1775409497421
http_request_duration_seconds_bucket{handler="/api/v1/query",le="0.01",method="GET"} 11 1775409497421
http_request_duration_seconds_bucket{handler="/api/v1/query",le="0.025",method="GET"} 26 1775409497421
...
http_request_duration_seconds_bucket{handler="/api/v1/query",le="+Inf",method="GET"} 100 1775409497421
http_request_duration_seconds_count{handler="/api/v1/query",method="GET"} 100 1775409497421
http_request_duration_seconds_sum{handler="/api/v1/query",method="GET"} 9.505 1775409497421
Simulating latency degradation
Set mean_shift_per_sec to a positive value to make the distribution center move higher over time. More observations land in higher buckets, percentile estimates rise, and latency alerts eventually trigger. See the alert testing walkthrough for a complete example.
summary¶
Where a histogram stores raw bucket counts and lets Prometheus estimate percentiles server-side, a summary does the math upfront. It computes the actual percentile values on the client and reports them directly. The p50 is 98ms. The p99 is 148ms. No estimation, no bucket interpolation.
The tradeoff is flexibility. With a histogram, you can compute any percentile after the fact from the stored buckets. With a summary, you only get the specific quantiles you configured. And critically, you cannot aggregate summary quantiles across instances — averaging the p99 of ten pods does not give you the fleet-wide p99. If you need cross-instance percentiles (and in Kubernetes, you almost always do), use histograms.
Each tick, the generator samples observations, sorts them, and computes quantile values using the nearest-rank method.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
name |
string | yes | -- | Base metric name. Sonda appends _count, _sum for those series. |
rate |
float | yes | -- | Ticks per second. |
duration |
string | no | runs forever | Total run time. |
distribution |
object | yes | -- | Observation distribution model. See Distribution models. |
quantiles |
list of floats | no | [0.5, 0.9, 0.95, 0.99] |
Quantile targets in (0, 1). |
observations_per_tick |
integer | no | 100 |
Number of observations sampled per tick. |
mean_shift_per_sec |
float | no | 0.0 |
Linear drift applied to the distribution center per second. |
seed |
integer | no | 0 |
RNG seed for deterministic output. |
labels |
map | no | none | Static labels attached to every series. |
encoder |
object | no | prometheus_text |
Output format. |
sink |
object | no | stdout |
Output destination. |
version: 2
kind: runnable
defaults:
rate: 1
duration: 10s
encoder:
type: prometheus_text
sink:
type: stdout
scenarios:
- signal_type: summary
name: rpc_duration_seconds
distribution:
type: normal
mean: 0.1
stddev: 0.02
observations_per_tick: 100
seed: 42
labels:
service: auth
method: GetUser
Shape: Q+2 time series per tick (Q quantile targets + _count + _sum). With default quantiles, that is 6 series per tick. Quantile values are fresh per-tick snapshots computed from that tick's observations. _count and _sum are cumulative.
rpc_duration_seconds{method="GetUser",quantile="0.5",service="auth"} 0.098 1775409507904
rpc_duration_seconds{method="GetUser",quantile="0.9",service="auth"} 0.128 1775409507904
rpc_duration_seconds{method="GetUser",quantile="0.95",service="auth"} 0.136 1775409507904
rpc_duration_seconds{method="GetUser",quantile="0.99",service="auth"} 0.148 1775409507904
rpc_duration_seconds_count{method="GetUser",service="auth"} 100 1775409507904
rpc_duration_seconds_sum{method="GetUser",service="auth"} 9.802 1775409507904
Summaries are not aggregatable
You cannot combine quantile values across several instances. If you need percentiles across a fleet, use histograms instead. histogram_quantile() works on summed bucket counters.
Distribution models¶
Both histogram and summary generators require a distribution block that controls how observations are sampled. The distribution you choose determines the shape of the data — whether observations cluster tightly around a center, skew toward fast values with a long tail, or spread evenly across a range.
Pick the distribution that matches the real-world metric you are simulating. For HTTP request latency, exponential is almost always the right choice: most requests are fast, but some take much longer. For RPC durations in a healthy service with predictable behavior, normal gives you a symmetric bell curve. Uniform is mainly useful for stress-testing bucket boundaries, since real metrics rarely distribute evenly.
| Distribution | YAML type | Parameters | Typical use |
|---|---|---|---|
| Exponential | exponential |
rate (lambda; mean = 1/rate) |
Request latency with long tail |
| Normal | normal |
mean, stddev |
Symmetric metrics (RPC duration) |
| Uniform | uniform |
min, max |
Even spread for bucket boundary testing |
Models latency where most requests are fast but some have long tails. Mean = 1/rate = 0.1s.
Symmetric bell curve centered at mean. Good for metrics with consistent spread.
Log generators¶
Log generators produce structured log events instead of numeric values. They live on a signal_type: logs entry under the log_generator: key (not generator:).
template¶
Generates log events from message templates with randomized field values.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
templates |
list | yes | -- | One or more template entries (round-robin selection). |
templates[].message |
string | yes | -- | Message template. Use {field} for placeholders. |
templates[].field_pools |
map | no | {} |
Maps placeholder names to value lists. |
severity_weights |
map | no | info only | Severity distribution. Keys: trace, debug, info, warn, error, fatal. |
seed |
integer | no | 0 |
RNG seed for deterministic field and severity selection. |
log_generator:
type: template
templates:
- message: "Request from {ip} to {endpoint} returned {status}"
field_pools:
ip: ["10.0.0.1", "10.0.0.2"]
endpoint: ["/api", "/health"]
status: ["200", "404", "500"]
severity_weights:
info: 0.7
warn: 0.2
error: 0.1
seed: 42
Templates are selected round-robin by tick. Placeholders are resolved by picking randomly from the corresponding field pool.
csv_replay¶
Replays structured log events from a CSV file. The CSV has a timestamp column that drives the emission cadence, plus optional severity and message columns and any number of free-form field columns. The replay rate is derived from the median Δt of the timestamp column, the same model the metrics-side csv_replay uses. A 10-minute window in the CSV plays back over 10 minutes of wall clock without manual rate tuning. For a full walkthrough including the Loki / logcli export pipeline, see the Log CSV Replay guide.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
string | yes | -- | Path to the CSV file (relative to the working directory where you run sonda). |
timescale |
float | no | 1.0 |
Replay speed multiplier. 2.0 plays 2x faster, 0.5 plays 2x slower. Must be strictly positive. |
default_severity |
string | no | info |
Fallback severity when the severity column is missing, empty, or unparseable. One of trace, debug, info, warn, error, fatal. |
repeat |
boolean | no | true |
When true, cycles back to the start of the CSV. When false, holds the last row for every subsequent tick. |
columns |
object | no | auto-discover | Explicit name-based column mapping. Sub-fields: timestamp, severity, message. Any column not named here (and not auto-matched) becomes a field column. |
log_generator:
type: csv_replay
file: examples/sample-logs.csv
default_severity: info
repeat: true
Auto-discovery of column roles is case-insensitive: timestamp / ts / time → timestamp; severity / level → severity; message / msg / log → message. Every other header becomes a field column on every emitted LogEvent.
{"timestamp":"2026-05-15T18:37:55.791Z","severity":"info","message":"GET /api/v1/health returned 200","labels":{},"fields":{"user_id":"u-42"}}
{"timestamp":"2026-05-15T18:37:55.791Z","severity":"info","message":"GET /api/v1/metrics returned 200","labels":{},"fields":{"user_id":"u-17"}}
{"timestamp":"2026-05-15T18:37:55.791Z","severity":"warn","message":"GET /api/v1/users returned 200 with high latency","labels":{},"fields":{"user_id":"u-91"}}
The timestamp on each emitted event is the wall-clock time at emission, not the CSV row's timestamp. The CSV's timestamp column is only used to derive the replay cadence. Severity, message, and field values are taken from the CSV verbatim.
Scenario rate: is overridden for csv_replay
For log csv_replay, the scenario's rate: is always replaced by timescale / median_delta_t, where median_delta_t is the median interval between consecutive timestamps in the timestamp column. Setting rate: in YAML has no effect on emission cadence. Run sonda --verbose --dry-run to confirm the derived rate, or inspect the startup banner. Use timescale: to speed up or slow down replay. This is the same model the metrics-side csv_replay uses; see that section for the derivation details.
Severity fallback is soft-fail
When a row's severity cell is empty or unrecognized, Sonda falls back to default_severity instead of erroring. At expand time, Sonda emits one summary warn line counting how many rows used the fallback. Empty field cells are omitted from the row's fields map (rather than appearing as key: ""). The full failure-mode reference is in the Log CSV Replay guide.
Jitter¶
Jitter adds deterministic uniform noise to any metric generator's output. Instead of clean, perfectly smooth values, you get realistic fluctuations — the kind you see in real production metrics.
Why jitter?
A sine wave is useful for testing alert thresholds, but real CPU metrics are never perfectly smooth. Adding jitter lets you verify that your alerting rules and dashboards handle noisy signals correctly.
Jitter is configured at the scenario level (a sibling of generator, not nested inside it) because it wraps any generator transparently.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
jitter |
float | no | none | Noise amplitude. Adds uniform noise in [-jitter, +jitter] to every value. |
jitter_seed |
integer | no | 0 |
Seed for deterministic noise. The same seed produces the same noise sequence. |
version: 2
kind: runnable
defaults:
rate: 1
duration: 30s
encoder:
type: prometheus_text
sink:
type: stdout
scenarios:
- signal_type: metrics
name: cpu_usage_realistic
generator:
type: sine
amplitude: 20
period_secs: 120
offset: 50
jitter: 3.0
jitter_seed: 42
labels:
instance: server-01
job: node
Without jitter, a sine wave with offset: 50 outputs exactly 50.0 at tick 0. With jitter: 3.0, the value falls somewhere in [47.0, 53.0]. Different each tick, but reproducible across runs when jitter_seed is set.
Works with every metric generator
Jitter wraps the generator's output, so it works with constant, sine, sawtooth, uniform, sequence, step, spike, and csv_replay. It does not apply to log generators.
When to skip jitter_seed
If you omit jitter_seed, it defaults to 0. Two scenarios with the same jitter value and no explicit seed produce identical noise sequences. Set different seeds when you need independent noise on several scenarios.