Quick start¶
This guide walks you through running your first Veeksha benchmark in minutes.
Step 1: Create a configuration file¶
Create a file named my_benchmark.veeksha.yml:
# Basic benchmark configuration
seed: 42
# Where to send requests
client:
type: openai_chat_completions
api_base: http://localhost:8000/v1
model: meta-llama/Llama-3-8B-Instruct
# Traffic pattern: 5 sessions per second
traffic_scheduler:
type: rate
interval_generator:
type: poisson
arrival_rate: 5.0
# Session content: single-turn with random prompts
session_generator:
type: synthetic
session_graph:
type: linear
inherit_history: false
channels:
- type: text
body_length_generator:
type: uniform
min: 50
max: 200
output_spec:
text:
output_length_generator:
type: uniform
min: 100
max: 300
# Stop conditions
runtime:
benchmark_timeout: 60 # Run for 60 seconds
max_sessions: -1 # No limit on sessions
# Enable metrics collection
evaluators:
- type: performance
target_channels: ["text"]
slos:
- name: "P99 TTFC"
metric: ttfc
percentile: 0.99
value: 0.5
type: constant
- name: "P90 TBC"
metric: tbc
percentile: 0.90
value: 0.05
type: constant
Tip
Use the .veeksha.yml extension for IDE autocompletion support. We highly recommend adding Veeksha’s YAML schema to your IDE (see Export JSON Schema)
Step 2: Run the benchmark¶
Execute the benchmark using the CLI (against an already-running server; for managed servers see Server Management):
uvx veeksha benchmark --config my_benchmark.veeksha.yml
Step 3: View results¶
Navigate to the output directory to find:
benchmark_output/09:01:2026-10:30:00-abc123/
├── config.yml # Resolved configuration
├── health_check_results.txt # Benchmark verification
├── metrics/
│ ├── request_level_metrics.jsonl # Per-request data
│ ├── summary_stats.json # Aggregate statistics
│ ├── ttfc.csv # TTFC percentiles
│ ├── ttfc.png # TTFC distribution plot
│ ├── tbc.csv # TBC percentiles
│ └── ...
└── traces/
└── trace.jsonl # Request traces
Quick summary from summary_stats.json:
{
"Number of Requests": 287,
"Number of Completed Requests": 287,
"Error Rate": 0.0,
"Observed Session Dispatch Rate": 4.78
}
Essential configuration options¶
Here are the most commonly adjusted options:
Endpoint configuration:
client:
api_base: http://localhost:8000/v1 # Server URL
model: meta-llama/Llama-3-8B-Instruct
request_timeout: 300 # Timeout per request (seconds)
Traffic rate:
traffic_scheduler:
type: rate
interval_generator:
type: poisson
arrival_rate: 10.0 # Sessions per second
Benchmark duration:
runtime:
benchmark_timeout: 120 # Run for 2 minutes
max_sessions: 500 # Or stop after 500 sessions (whichever first)
# Use -1 for unlimited sessions
Prompt/output lengths:
session_generator:
channels:
- type: text
body_length_generator:
type: fixed
value: 256 # Fixed 256 tokens per prompt
output_spec:
text:
output_length_generator:
type: fixed
value: 128 # Request 128 token outputs
Using CLI overrides¶
Override configuration values without editing the file:
uvx veeksha benchmark \
--config my_benchmark.veeksha.yml \
--traffic_scheduler.interval_generator.arrival_rate 20.0 \
--runtime.benchmark_timeout 120
This runs at 20 sessions/second for 120 seconds instead of the file’s values.
Common patterns¶
Quick latency test at low load:
uvx veeksha benchmark \
--config my_benchmark.veeksha.yml \
--traffic_scheduler.interval_generator.arrival_rate 1.0 \
--runtime.max_sessions 50
Throughput saturation test:
traffic_scheduler:
type: concurrent
target_concurrent_sessions: 16
rampup_seconds: 10
Fixed prompt/output for consistent measurements:
session_generator:
channels:
- type: text
body_length_generator:
type: fixed
value: 512
output_spec:
text:
output_length_generator:
type: fixed
value: 256
Configuration reference¶
For a detailed reference of all configuration options, see the Configuration Reference.
Next steps¶
Workload recipes - Trace replay, multi-turn, agentic, LM-Eval examples
Configuration System - Full configuration system guide
Output Files - Understand all output files
/microbenchmarks/index - Isolate prefill, decode, and stress performance
Capacity Search - Find maximum sustainable throughput