Configuration Sweeps ==================== Veeksha supports running multiple benchmarks with different parameter combinations using the ``!expand`` YAML tag. This creates a Cartesian product of configurations, enabling systematic exploration of the parameter space. The !expand tag --------------- Use ``!expand`` to expand a list into multiple configurations: .. code-block:: yaml traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: !expand [5, 10, 20] # Creates 3 runs This creates three separate benchmark runs with rates 5, 10, and 20. .. hint:: `!expand` can only be specified for fields that were not originally typed as lists. For example, `arrival_rate` is a float, so it can be swept. Cartesian product expansion --------------------------- Multiple ``!expand`` tags create a Cartesian product: .. code-block:: yaml traffic_scheduler: type: concurrent target_concurrent_sessions: !expand [4, 8, 16] # 3 values session_generator: type: synthetic channels: - type: text body_length_generator: type: fixed value: !expand [256, 512] # 2 values This creates **6 runs** (3 × 2): 1. concurrency=4, prompt=256 2. concurrency=4, prompt=512 3. concurrency=8, prompt=256 4. concurrency=8, prompt=512 5. concurrency=16, prompt=256 6. concurrency=16, prompt=512 Basic example ------------- Create a file ``sweep.veeksha.yml``: .. code-block:: yaml seed: 42 output_dir: sweep_results traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: !expand [5, 10, 20, 30] session_generator: type: synthetic session_graph: type: linear inherit_history: true channels: - type: text body_length_generator: type: uniform min: 100 max: 500 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct runtime: benchmark_timeout: 60 evaluators: - type: performance target_channels: ["text"] Run it: .. code-block:: bash uvx veeksha benchmark --config sweep.veeksha.yml Veeksha automatically runs 4 benchmarks with rates 5, 10, 20, and 30. Output structure ---------------- Sweeps create a parent directory containing all run subdirectories and summary files: .. code-block:: text benchmark_output/ └── sweep_09:01:2026-16:38:22/ # Sweep parent directory ├── sweep_manifest.json # Sweep configuration ├── sweep_summary.json # Aggregated results ├── sweep_summary.csv # Results in CSV format ├── 09:01:2026-16:38:22-960db960/ # First run (rate=10) │ ├── config.yml │ ├── metrics/ │ └── ... └── 09:01:2026-16:39:00-3f3db8a5/ # Second run (rate=11) └── ... Each run's full metrics are preserved in its subdirectory, enabling detailed cross-run analysis (e.g., comparing TTFC distributions, throughput, or SLO compliance). Sweep summary ------------- Veeksha automatically generates summary files at the end of each sweep: **sweep_summary.json** aggregates key metrics across all runs: .. code-block:: json { "base_output_dir": "benchmark_output/sweep_09:01:2026-16:38:22", "num_runs": 2, "runs": [ { "run_index": 0, "run_dir": "...", "traffic": {"arrival_rate": 10}, "summary_stats": {...}, "throughput_metrics": {...}, "all_slos_met": true }, ... ] } **sweep_summary.csv** provides the same data in a tabular format for easy spreadsheet analysis. Cross-file expansion -------------------- When using :ref:`!include ` to split config across files, ``!expand`` tags work across file boundaries. Veeksha collects **all** ``!expand`` markers from all included files and computes the Cartesian product. **Example: Sweep across client endpoints and traffic rates** Create ``client.yml`` with multiple endpoints: .. code-block:: yaml # client.yml - sweep across 2 servers type: openai_chat_completions api_base: !expand [http://server-a:8000/v1, http://server-b:8000/v1] model: meta-llama/Llama-3-8B-Instruct Create ``traffic.yml`` with multiple arrival rates: .. code-block:: yaml # traffic.yml - sweep across 3 rates type: rate interval_generator: type: poisson arrival_rate: !expand [5, 10, 20] Create a main config using ``!include``: .. code-block:: yaml # main.yml client: !include client.yml traffic_scheduler: !include traffic.yml session_generator: type: synthetic channels: - type: text body_length_generator: type: uniform min: 100 max: 500 runtime: benchmark_timeout: 60 Run with a single ``--config``: .. code-block:: bash uvx veeksha benchmark --config main.yml This creates **6 runs** (2 servers × 3 rates): 1. api_base=server-a, rate=5 2. api_base=server-a, rate=10 3. api_base=server-a, rate=20 4. api_base=server-b, rate=5 5. api_base=server-b, rate=10 6. api_base=server-b, rate=20 **Multi-file example** You can split across as many included files as needed. If ``client.yml`` has 2 ``!expand`` values, ``traffic.yml`` has 3, and the main config has ``!expand [256, 512]`` for prompt length, this creates **12 runs** (2 × 3 × 2). .. hint:: Cross-file expansion is particularly useful for: - Testing multiple server deployments with the same workload - Comparing different models without duplicating config files - Running the same sweep against staging and production endpoints Common sweep patterns --------------------- **Rate Sweep** - Find latency vs load relationship .. code-block:: yaml traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: !expand [1, 2, 5, 10, 20, 50] **Concurrency Sweep** - Throughput scaling .. code-block:: yaml traffic_scheduler: type: concurrent target_concurrent_sessions: !expand [1, 2, 4, 8, 16, 32] rampup_seconds: 10 **Prompt Length Sweep** - Prefill scaling .. code-block:: yaml session_generator: channels: - type: text body_length_generator: type: fixed value: !expand [128, 256, 512, 1024, 2048] output_spec: text: output_length_generator: type: fixed value: 128 **Output Length Sweep** - Decode scaling .. code-block:: yaml session_generator: channels: - type: text body_length_generator: type: fixed value: 256 output_spec: text: output_length_generator: type: fixed value: !expand [64, 128, 256, 512, 1024] **Multi-Dimensional Sweep** .. code-block:: yaml traffic_scheduler: type: concurrent target_concurrent_sessions: !expand [4, 8] session_generator: channels: - type: text body_length_generator: type: fixed value: !expand [256, 512] output_spec: text: output_length_generator: type: fixed value: !expand [128, 256] Creates 2 × 2 × 2 = **8 runs**.