Configuration Sweeps¶
Veeksha supports running multiple benchmarks with different parameter combinations
using the !expand YAML tag. This creates a Cartesian product of configurations,
enabling systematic exploration of the parameter space.
The !expand tag¶
Use !expand to expand a list into multiple configurations:
traffic_scheduler:
type: rate
interval_generator:
type: poisson
arrival_rate: !expand [5, 10, 20] # Creates 3 runs
This creates three separate benchmark runs with rates 5, 10, and 20.
Hint
!expand can only be specified for fields that were not originally typed as lists. For example, arrival_rate is a float, so it can be swept.
Cartesian product expansion¶
Multiple !expand tags create a Cartesian product:
traffic_scheduler:
type: concurrent
target_concurrent_sessions: !expand [4, 8, 16] # 3 values
session_generator:
type: synthetic
channels:
- type: text
body_length_generator:
type: fixed
value: !expand [256, 512] # 2 values
This creates 6 runs (3 × 2):
concurrency=4, prompt=256
concurrency=4, prompt=512
concurrency=8, prompt=256
concurrency=8, prompt=512
concurrency=16, prompt=256
concurrency=16, prompt=512
Basic example¶
Create a file sweep.veeksha.yml:
seed: 42
output_dir: sweep_results
traffic_scheduler:
type: rate
interval_generator:
type: poisson
arrival_rate: !expand [5, 10, 20, 30]
session_generator:
type: synthetic
session_graph:
type: linear
inherit_history: true
channels:
- type: text
body_length_generator:
type: uniform
min: 100
max: 500
client:
type: openai_chat_completions
api_base: http://localhost:8000/v1
model: meta-llama/Llama-3-8B-Instruct
runtime:
benchmark_timeout: 60
evaluators:
- type: performance
target_channels: ["text"]
Run it:
uvx veeksha benchmark --config sweep.veeksha.yml
Veeksha automatically runs 4 benchmarks with rates 5, 10, 20, and 30.
Output structure¶
Sweeps create a parent directory containing all run subdirectories and summary files:
benchmark_output/
└── sweep_09:01:2026-16:38:22/ # Sweep parent directory
├── sweep_manifest.json # Sweep configuration
├── sweep_summary.json # Aggregated results
├── sweep_summary.csv # Results in CSV format
├── 09:01:2026-16:38:22-960db960/ # First run (rate=10)
│ ├── config.yml
│ ├── metrics/
│ └── ...
└── 09:01:2026-16:39:00-3f3db8a5/ # Second run (rate=11)
└── ...
Each run’s full metrics are preserved in its subdirectory, enabling detailed cross-run analysis (e.g., comparing TTFC distributions, throughput, or SLO compliance).
Sweep summary¶
Veeksha automatically generates summary files at the end of each sweep:
sweep_summary.json aggregates key metrics across all runs:
{
"base_output_dir": "benchmark_output/sweep_09:01:2026-16:38:22",
"num_runs": 2,
"runs": [
{
"run_index": 0,
"run_dir": "...",
"traffic": {"arrival_rate": 10},
"summary_stats": {...},
"throughput_metrics": {...},
"all_slos_met": true
},
...
]
}
sweep_summary.csv provides the same data in a tabular format for easy spreadsheet analysis.
Cross-file expansion¶
When using include <configuration-splitting> to split config across files,
!expand tags work across file boundaries. Veeksha collects all !expand
markers from all included files and computes the Cartesian product.
Example: Sweep across client endpoints and traffic rates
Create client.yml with multiple endpoints:
# client.yml - sweep across 2 servers
type: openai_chat_completions
api_base: !expand [http://server-a:8000/v1, http://server-b:8000/v1]
model: meta-llama/Llama-3-8B-Instruct
Create traffic.yml with multiple arrival rates:
# traffic.yml - sweep across 3 rates
type: rate
interval_generator:
type: poisson
arrival_rate: !expand [5, 10, 20]
Create a main config using !include:
# main.yml
client: !include client.yml
traffic_scheduler: !include traffic.yml
session_generator:
type: synthetic
channels:
- type: text
body_length_generator:
type: uniform
min: 100
max: 500
runtime:
benchmark_timeout: 60
Run with a single --config:
uvx veeksha benchmark --config main.yml
This creates 6 runs (2 servers × 3 rates):
api_base=server-a, rate=5
api_base=server-a, rate=10
api_base=server-a, rate=20
api_base=server-b, rate=5
api_base=server-b, rate=10
api_base=server-b, rate=20
Multi-file example
You can split across as many included files as needed. If client.yml has 2
!expand values, traffic.yml has 3, and the main config has
!expand [256, 512] for prompt length, this creates 12 runs (2 × 3 × 2).
Hint
Cross-file expansion is particularly useful for:
Testing multiple server deployments with the same workload
Comparing different models without duplicating config files
Running the same sweep against staging and production endpoints
Common sweep patterns¶
Rate Sweep - Find latency vs load relationship
traffic_scheduler:
type: rate
interval_generator:
type: poisson
arrival_rate: !expand [1, 2, 5, 10, 20, 50]
Concurrency Sweep - Throughput scaling
traffic_scheduler:
type: concurrent
target_concurrent_sessions: !expand [1, 2, 4, 8, 16, 32]
rampup_seconds: 10
Prompt Length Sweep - Prefill scaling
session_generator:
channels:
- type: text
body_length_generator:
type: fixed
value: !expand [128, 256, 512, 1024, 2048]
output_spec:
text:
output_length_generator:
type: fixed
value: 128
Output Length Sweep - Decode scaling
session_generator:
channels:
- type: text
body_length_generator:
type: fixed
value: 256
output_spec:
text:
output_length_generator:
type: fixed
value: !expand [64, 128, 256, 512, 1024]
Multi-Dimensional Sweep
traffic_scheduler:
type: concurrent
target_concurrent_sessions: !expand [4, 8]
session_generator:
channels:
- type: text
body_length_generator:
type: fixed
value: !expand [256, 512]
output_spec:
text:
output_length_generator:
type: fixed
value: !expand [128, 256]
Creates 2 × 2 × 2 = 8 runs.