Quick start =========== This guide walks you through running your first Veeksha benchmark in minutes. Step 1: Create a configuration file ------------------------------------ Create a file named ``my_benchmark.veeksha.yml``: .. code-block:: yaml # Basic benchmark configuration seed: 42 # Where to send requests client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct # Traffic pattern: 5 sessions per second traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 5.0 # Session content: single-turn with random prompts session_generator: type: synthetic session_graph: type: linear inherit_history: false channels: - type: text body_length_generator: type: uniform min: 50 max: 200 output_spec: text: output_length_generator: type: uniform min: 100 max: 300 # Stop conditions runtime: benchmark_timeout: 60 # Run for 60 seconds max_sessions: -1 # No limit on sessions # Enable metrics collection evaluators: - type: performance target_channels: ["text"] slos: - name: "P99 TTFC" metric: ttfc percentile: 0.99 value: 0.5 type: constant - name: "P90 TBC" metric: tbc percentile: 0.90 value: 0.05 type: constant .. tip:: Use the ``.veeksha.yml`` extension for IDE autocompletion support. We highly recommend adding Veeksha's YAML schema to your IDE (see :ref:`Export JSON Schema `) Step 2: Run the benchmark ------------------------- Execute the benchmark using the CLI (against an already-running server; for managed servers see :doc:`/user_guide/server_management`): .. code-block:: bash uvx veeksha benchmark --config my_benchmark.veeksha.yml Step 3: View results -------------------- Navigate to the output directory to find: .. code-block:: text benchmark_output/09:01:2026-10:30:00-abc123/ ├── config.yml # Resolved configuration ├── health_check_results.txt # Benchmark verification ├── metrics/ │ ├── request_level_metrics.jsonl # Per-request data │ ├── summary_stats.json # Aggregate statistics │ ├── ttfc.csv # TTFC percentiles │ ├── ttfc.png # TTFC distribution plot │ ├── tbc.csv # TBC percentiles │ └── ... └── traces/ └── trace.jsonl # Request traces Quick summary from ``summary_stats.json``: .. code-block:: json { "Number of Requests": 287, "Number of Completed Requests": 287, "Error Rate": 0.0, "Observed Session Dispatch Rate": 4.78 } Essential configuration options ------------------------------- Here are the most commonly adjusted options: **Endpoint configuration:** .. code-block:: yaml client: api_base: http://localhost:8000/v1 # Server URL model: meta-llama/Llama-3-8B-Instruct request_timeout: 300 # Timeout per request (seconds) **Traffic rate:** .. code-block:: yaml traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 10.0 # Sessions per second **Benchmark duration:** .. code-block:: yaml runtime: benchmark_timeout: 120 # Run for 2 minutes max_sessions: 500 # Or stop after 500 sessions (whichever first) # Use -1 for unlimited sessions **Prompt/output lengths:** .. code-block:: yaml session_generator: channels: - type: text body_length_generator: type: fixed value: 256 # Fixed 256 tokens per prompt output_spec: text: output_length_generator: type: fixed value: 128 # Request 128 token outputs Using CLI overrides ------------------- Override configuration values without editing the file: .. code-block:: bash uvx veeksha benchmark \ --config my_benchmark.veeksha.yml \ --traffic_scheduler.interval_generator.arrival_rate 20.0 \ --runtime.benchmark_timeout 120 This runs at 20 sessions/second for 120 seconds instead of the file's values. Common patterns --------------- **Quick latency test at low load:** .. code-block:: bash uvx veeksha benchmark \ --config my_benchmark.veeksha.yml \ --traffic_scheduler.interval_generator.arrival_rate 1.0 \ --runtime.max_sessions 50 **Throughput saturation test:** .. code-block:: yaml traffic_scheduler: type: concurrent target_concurrent_sessions: 16 rampup_seconds: 10 **Fixed prompt/output for consistent measurements:** .. code-block:: yaml session_generator: channels: - type: text body_length_generator: type: fixed value: 512 output_spec: text: output_length_generator: type: fixed value: 256 Configuration reference ----------------------- For a detailed reference of all configuration options, see the :doc:`/config_reference/index`. Next steps ---------- - :ref:`Workload recipes ` - Trace replay, multi-turn, agentic, LM-Eval examples - :doc:`/user_guide/configuration` - Full configuration system guide - :doc:`/user_guide/output_files` - Understand all output files - :doc:`/microbenchmarks/index` - Isolate prefill, decode, and stress performance - :doc:`/user_guide/capacity_search` - Find maximum sustainable throughput