CSV Import¶

You have a CSV file -- maybe a Grafana export from a production incident, maybe a hand-recorded dataset -- and you want to turn it into a portable, parameterized scenario that uses Sonda's generators instead of replaying raw values. sonda import analyzes the data, detects dominant patterns, and generates scenario YAML you can run, share, and customize.

Why import instead of replay?¶

The csv_replay generator plays back raw CSV values verbatim. That is useful for exact reproduction, but the output is tied to the original file. sonda import takes a different approach:

Portable -- the generated YAML uses generators (steady, spike_event, leak, flap, sawtooth, step), so it runs without the original CSV file.
Parameterized -- you can tune rate, duration, and generator parameters after import.
Shareable -- the YAML is self-contained. Drop it into a repo, CI pipeline, or Helm chart.

Use csv_replay when you need bit-for-bit fidelity. Use sonda import when you need the shape of the data as a reusable scenario.

The workflow¶

sonda import has three modes that form a natural pipeline:

CSV file  -->  --analyze  -->  -o scenario.yaml  -->  --run
              (understand)       (generate)          (execute)

Step 1: Analyze¶

Start by understanding what the data looks like. --analyze is read-only -- it prints detected patterns without generating any files.

sonda import examples/sample-multi-column.csv --analyze

Output

CSV Import Analysis
============================================================

Column 1 (index 1): cpu_percent
  Data points: 20
  Range: [12.30, 96.10]  Mean: 46.27
  Detected pattern: steady (center=46.27, amplitude=41.90)

Column 2 (index 2): mem_percent
  Data points: 20
  Range: [45.20, 86.20]  Mean: 59.88
  Detected pattern: steady (center=59.88, amplitude=20.50)

Column 3 (index 3): disk_io_mbps
  Data points: 20
  Range: [5.00, 65.80]  Mean: 25.04
  Detected pattern: steady (center=25.04, amplitude=30.40)

Each column shows the metric name (from the header), basic statistics, and the detected pattern with extracted parameters.

Step 2: Generate¶

Once you know the patterns look right, generate a scenario YAML file:

sonda import examples/sample-multi-column.csv -o scenario.yaml

stderr

wrote scenario to scenario.yaml

The generated file is a valid multi-scenario YAML, ready for sonda run --scenario:

scenario.yaml (generated)

scenarios:
  - signal_type: metrics
    name: cpu_percent
    rate: 1
    duration: 60s

    generator:
      type: steady
      center: 46.27
      amplitude: 41.9
      period: "60s"

    encoder:
      type: prometheus_text

    sink:
      type: stdout

  - signal_type: metrics
    name: mem_percent
    rate: 1
    duration: 60s

    generator:
      type: steady
      center: 59.88
      amplitude: 20.5
      period: "60s"

    encoder:
      type: prometheus_text

    sink:
      type: stdout

  # ... (one entry per column)

Single-column CSVs produce flat YAML

When the CSV has only one data column, the output is a flat scenario (no scenarios: wrapper). Multi-column CSVs always produce the scenarios: list format for use with sonda run.

Step 3: Run¶

If you just want to see the output without saving a file, --run generates the scenario in memory and executes it immediately:

sonda -q import examples/sample-cpu-values.csv --run --duration 3s

Output

cpu_percent 41.44404065390504 1775712694328
cpu_percent 46.07410906869991 1775712695333
cpu_percent 50.131242022026555 1775712696330
cpu_percent 55.42337922089686 1775712697333

Grafana CSV exports¶

sonda import understands Grafana's "Series joined by time" CSV format. It parses the {__name__="...", key="value"} headers to extract metric names and labels automatically.

sonda import examples/grafana-export.csv --analyze

Output

CSV Import Analysis
============================================================

Column 1 (index 1): up
  Labels: {instance="localhost:9090", job="prometheus"}
  Data points: 10
  Range: [0.00, 1.00]  Mean: 0.80
  Detected pattern: sawtooth (min=0.00, max=1.00, period=4pts)

Column 2 (index 2): up
  Labels: {instance="localhost:9100", job="node"}
  Data points: 10
  Range: [0.00, 1.00]  Mean: 0.80
  Detected pattern: sawtooth (min=0.00, max=1.00, period=6pts)

Labels are preserved in the generated YAML:

sonda import examples/grafana-export.csv -o grafana-scenario.yaml

grafana-scenario.yaml (generated, first entry)

scenarios:
  - signal_type: metrics
    name: up
    rate: 1
    duration: 60s

    generator:
      type: sawtooth
      min: 0.0
      max: 1.0
      period_secs: 4.0

    labels:
      instance: "localhost:9090"
      job: prometheus

    encoder:
      type: prometheus_text

    sink:
      type: stdout

For details on exporting from Grafana, see the Grafana CSV Export Replay guide.

Selecting columns¶

By default, all non-timestamp columns are imported. Use --columns to pick specific ones by their zero-based index:

sonda import examples/sample-multi-column.csv --columns 1,3 --analyze

Output

CSV Import Analysis
============================================================

Column 1 (index 1): cpu_percent
  Data points: 20
  Range: [12.30, 96.10]  Mean: 46.27
  Detected pattern: steady (center=46.27, amplitude=41.90)

Column 2 (index 3): disk_io_mbps
  Data points: 20
  Range: [5.00, 65.80]  Mean: 25.04
  Detected pattern: steady (center=25.04, amplitude=30.40)

Column 0 is always the timestamp and cannot be selected for import.

Detected patterns¶

The pattern detector uses statistical analysis to classify each column into one of six patterns. Each pattern maps to a Sonda generator or operational vocabulary alias.

Pattern	What it looks like	Generator / alias	Key parameters
Steady	Low variance around a center	`steady`	center, amplitude, period
Spike	Periodic outliers above a baseline	`spike_event`	baseline, spike_height, spike_duration, spike_interval
Climb	Monotonic upward trend	`leak`	baseline, ceiling, time_to_ceiling
Sawtooth	Repeating climb-reset cycles	`sawtooth`	min, max, period_secs
Flap	Bimodal toggle (up/down)	`flap`	up_value, down_value, up_duration, down_duration
Step	Constant-rate counter increments	`step`	start, step_size

The detector runs through these in priority order. When the data does not clearly match a more specific pattern, it falls back to steady.

Pattern detection is heuristic

The detector uses statistical thresholds (linear regression, IQR outlier detection, k-means clustering) to classify patterns. With very short time series (fewer than 10 data points), detection accuracy decreases. For best results, export at least 20-30 data points.

Customizing generated scenarios¶

The generated YAML is a starting point. After import, you can:

Change the sink -- replace stdout with remote_write, loki, or any other sink.
Adjust parameters -- tune amplitude, period, or baseline to match your needs.
Add scheduling -- add gaps:, bursts:, or cardinality_spike: blocks.
Override rate and duration at generation time:

sonda import data.csv -o scenario.yaml --rate 10 --duration 5m

CLI reference¶

sonda import <FILE> [OPTIONS]

Argument / Flag	Type	Default	Description
`<FILE>`	path	--	CSV file to import. Supports Grafana exports and plain CSV.
`--analyze`	flag	--	Print detected patterns (read-only). Conflicts with `-o` and `--run`.
`-o, --output <FILE>`	path	--	Write generated scenario YAML to this path. Conflicts with `--analyze` and `--run`.
`--run`	flag	--	Generate and immediately execute the scenario. Conflicts with `--analyze` and `-o`.
`--columns <INDICES>`	string	all	Comma-separated column indices (e.g., `1,3,5`). Column 0 is the timestamp.
`--rate <RATE>`	float	`1.0`	Events per second in the generated scenario.
`--duration <DURATION>`	string	`60s`	Duration of the generated scenario (e.g., `60s`, `5m`).

Exactly one of --analyze, -o, or --run must be specified.

Combine with global flags

--dry-run, --verbose, and --quiet work with sonda import --run, just like any other subcommand. Use sonda --dry-run import data.csv --run to see the resolved config without emitting events.