Configuration System ==================== Veeksha uses a flexible polymorphic configuration system that supports YAML files, CLI arguments, and programmatic access. This guide explains how the system works and how to navigate it effectively. Configuration methods --------------------- **YAML Files** (recommended) Create a ``.veeksha.yml`` file with your configuration: .. code-block:: yaml seed: 42 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: my-model traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 10.0 **CLI Arguments** Override any option using dot notation: .. code-block:: bash uvx veeksha benchmark \ --client.api_base http://localhost:8000/v1 \ --traffic_scheduler.interval_generator.arrival_rate 20.0 Argument names mirror the YAML hierarchy with dots. **Combined** (YAML + CLI) CLI arguments override YAML values: .. code-block:: bash # Base config from file, override arrival rate uvx veeksha benchmark \ --config base.veeksha.yml \ --traffic_scheduler.interval_generator.arrival_rate 30.0 Polymorphic options ------------------- Many options have a ``type`` field that selects a variant with its own options: .. code-block:: yaml # Session generator can be: synthetic, trace, or lmeval session_generator: type: synthetic # Selects synthetic variant session_graph: # Options specific to synthetic type: linear channels: - type: text # Traffic scheduler can be: rate or concurrent traffic_scheduler: type: rate # Selects rate variant interval_generator: # Options specific to rate type: poisson arrival_rate: 10.0 Each ``type`` exposes different options. See the :doc:`/config_reference/index` for the full list. .. _configuration-export-json-schema: Exporting JSON schema --------------------- Export a JSON schema for YAML IDE autocompletion and linting: .. code-block:: bash uvx veeksha benchmark --export-json-schema veeksha-schema.json Configure your IDE to use this schema. In VSCode and forks: .. code-block:: json // .vscode/settings.json { "yaml.schemas": { "./veeksha-schema.json": "*.veeksha.yml" }, "yaml.customTags": [ "!expand sequence" ] } .. hint:: The YAML IDE extension may be required for "yaml.schemas" to show up as a valid setting. .. figure:: /_static/assets/yaml_help_text.png :alt: VSCode YAML integration example :align: center :width: 600px The VSCode YAML extension providing autocompletion and documentation on hover. Common configuration sections ----------------------------- **client** - API endpoint configuration .. code-block:: yaml client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct # api_key: optional, falls back to OPENAI_API_KEY env var request_timeout: 300 max_tokens_param: max_completion_tokens min_tokens_param: min_tokens **traffic_scheduler** - Traffic pattern .. code-block:: yaml # Rate-based traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 10.0 cancel_session_on_failure: true # OR Concurrency-based traffic_scheduler: type: concurrent target_concurrent_sessions: 8 rampup_seconds: 10 **session_generator** - Content generation .. code-block:: yaml session_generator: type: synthetic session_graph: type: linear num_request_generator: type: uniform min: 1 max: 5 inherit_history: true channels: - type: text body_length_generator: type: uniform min: 100 max: 500 output_spec: text: output_length_generator: type: uniform min: 50 max: 200 **runtime** - Execution parameters .. code-block:: yaml runtime: benchmark_timeout: 300 # Total benchmark duration max_sessions: 1000 # Maximum sessions (-1 = unlimited) post_timeout_grace_seconds: 10 # Wait for in-flight after timeout num_client_threads: 3 # Async HTTP client threads **evaluators** - Metrics collection .. code-block:: yaml evaluators: - type: performance target_channels: ["text"] stream_metrics: true slos: - name: "P99 TTFC" metric: ttfc percentile: 0.99 value: 0.5 type: constant Environment variables --------------------- Veeksha automatically reads certain environment variables as fallbacks when configuration values are not explicitly set: ``OPENAI_API_KEY`` Used as the API key if ``client.api_key`` is not set in config. ``OPENAI_API_BASE`` Used as the API base URL if ``client.api_base`` is not set in config. This allows you to set credentials once in your environment: .. code-block:: bash export OPENAI_API_KEY=your-api-key export OPENAI_API_BASE=http://localhost:8000/v1 Then omit them from your config file: .. code-block:: yaml # No need to specify api_key or api_base client: type: openai_chat_completions model: meta-llama/Llama-3-8B-Instruct This is especially useful for: - Avoiding committing secrets to version control - Sharing configs across environments with different servers Veeksha also reads ``HF_TOKEN`` from the environment in order to access gated models. Stop conditions --------------- Benchmarks stop when either condition is met: .. code-block:: yaml runtime: benchmark_timeout: 300 # Stop after 300 seconds max_sessions: 1000 # OR after 1000 sessions Use ``-1`` for unlimited: .. code-block:: yaml runtime: benchmark_timeout: -1 # Run indefinitely max_sessions: 500 # Stop only after 500 sessions When a timeout hits, Veeksha will record all in-flight requests and keep dispatching sessions as usual. Then, it will exit after ``post_timeout_grace_seconds`` have passed, only if the session limit is not reached before that. .. code-block:: yaml runtime: benchmark_timeout: 60 post_timeout_grace_seconds: 10 # Wait 10s for in-flight requests # -1 = wait indefinitely for all in-flight # 0 = exit immediately (cancel in-flight) Output directory ---------------- Control where results are saved: .. code-block:: yaml output_dir: benchmark_output Results are saved to a timestamped subdirectory: .. code-block:: text benchmark_output/ └── 09:01:2026-10:30:00-a1b2c3d4/ ├── config.yml ├── metrics/ └── traces/ The subdirectory name includes: - Date and time - Short hash of the configuration (for uniqueness) Trace recording --------------- Control what's recorded for debugging: .. code-block:: yaml trace_recorder: enabled: true # Write trace file include_content: false # Exclude prompt/response content (smaller files) Set ``include_content: true`` to record full request content for debugging. Validation ---------- Veeksha validates configurations at startup: - Type checking for all fields - Enum validation for ``type`` fields - Required field checking - Cross-field validation (e.g., ``min <= max``) Invalid configurations produce clear error messages: .. code-block:: text ConfigurationError: traffic_scheduler.interval_generator.arrival_rate must be positive, got -5.0 .. _configuration-splitting: Splitting configuration across files ------------------------------------- For better organization and reusability, you can split your configuration across multiple YAML files using the ``!include`` tag. This is useful when you want to: - Reuse client configuration across different benchmarks - Keep environment-specific settings (e.g., API endpoints) separate - Share traffic patterns across experiments **Example: Separate client and traffic configs** Create ``client.yml`` with just client settings: .. code-block:: yaml # client.yml type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct Create ``traffic.yml`` with traffic settings: .. code-block:: yaml # traffic.yml type: rate interval_generator: type: poisson arrival_rate: 5.0 Create a main config that includes both: .. code-block:: yaml # main_config.yml seed: 42 client: !include client.yml traffic_scheduler: !include traffic.yml session_generator: type: synthetic channels: - type: text body_length_generator: type: uniform min: 50 max: 200 runtime: benchmark_timeout: 60 Run the benchmark with a single ``--config`` flag: .. code-block:: bash uvx veeksha benchmark --config main_config.yml **CLI overrides still work** You can override any value from the included files using CLI arguments: .. code-block:: bash uvx veeksha benchmark \ --config main_config.yml \ --client.model llama-70b # Override model from client.yml .. _workload-recipes: Workload recipes ================ This section shows complete, runnable configurations for common benchmarking scenarios. Each recipe is a standalone YAML file — copy it, point ``api_base`` at your server, and run. Replay a request log (CSV or JSONL) ------------------------------------ Replay a simple trace of independent requests with just token length distributions. Works with CSV files directly — no conversion needed: .. code-block:: yaml # trace_request_log.veeksha.yml seed: 42 session_generator: type: trace trace_file: sharegpt_8k_filtered.csv # CSV or JSONL wrap_mode: true flavor: type: request_log traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 10.0 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct runtime: benchmark_timeout: 120 max_sessions: -1 evaluators: - type: performance target_channels: ["text"] The trace file needs ``input_length`` and ``output_length`` columns (or the common alternatives ``num_prefill_tokens`` / ``num_decode_tokens``, which are auto-normalized). Each row becomes an independent single-request session. **Trace flavors:** ``request_log`` Independent requests with just token lengths. No session structure, no corpus files. Supports CSV and JSONL. Best for replaying public benchmarking datasets (ShareGPT, etc.). ``timed_synthetic_session`` Timed session traces with synthetic content. Supports DAG replay through ``session_context`` and context caching via ``page_size``. Best for testing KV-cache reuse across linear and non-linear sessions. ``untimed_content_multi_turn`` Replay conversation datasets with actual message content (ShareGPT, LMSYS-Chat, etc.). Configurable message schema for different dataset formats. ``shared_prefix`` Multi-turn conversation dataset. Uses hash-based deterministic content generation with configurable ``block_size``. ``rag`` Single-turn retrieval-augmented generation. Includes ``num_documents`` warmup documents. Good for testing long-context prefill. Replay conversation datasets ------------------------------ Replay datasets with actual conversation content (e.g. ShareGPT): .. code-block:: yaml # conversation_replay.veeksha.yml seed: 42 session_generator: type: trace trace_file: sharegpt_52k.jsonl wrap_mode: true flavor: type: untimed_content_multi_turn conversation_column: conversations role_key: from content_key: value user_role_value: human assistant_role_value: gpt traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 10.0 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct runtime: benchmark_timeout: 120 max_sessions: -1 evaluators: - type: performance target_channels: ["text"] Each row in the trace file should contain a ``conversations`` column (configurable) with a list of message dicts. The schema keys (``role_key``, ``content_key``, ``user_role_value``, ``assistant_role_value``) can be customized for different dataset formats. For LMSYS-Chat format, use ``role_key: role``, ``content_key: content``, ``user_role_value: user``, ``assistant_role_value: assistant``. Replay timed multi-turn traces ------------------------------- Replay timed multi-turn coding assistant traces with context caching: .. code-block:: yaml # trace_timed_synthetic_session.veeksha.yml seed: 42 session_generator: type: trace trace_file: traces/timed_synthetic_trace.jsonl wrap_mode: true flavor: type: timed_synthetic_session corpus_file: traces/corpus.txt page_size: 16 traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 5.0 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct request_timeout: 120 max_tokens_param: max_completion_tokens runtime: benchmark_timeout: 120 max_sessions: -1 Replay shared-prefix traces ----------------------------- .. code-block:: yaml # trace_shared_prefix.veeksha.yml session_generator: type: trace trace_file: traces/shared_prefix_trace.jsonl flavor: type: shared_prefix corpus_file: traces/corpus.txt block_size: 512 traffic_scheduler: type: concurrent target_concurrent_sessions: 32 rampup_seconds: 10 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct runtime: benchmark_timeout: 300 max_sessions: -1 Multi-turn conversations (synthetic) ------------------------------------- Generate multi-turn sessions with history accumulation and shared prefixes: .. code-block:: yaml # multi_turn.veeksha.yml seed: 42 session_generator: type: synthetic session_graph: type: linear inherit_history: true # Each turn includes prior turns as context num_request_generator: type: uniform min: 2 max: 4 # 2-4 turns per session request_wait_generator: type: poisson arrival_rate: 5 # ~200ms think time between turns channels: - type: text shared_prefix_ratio: 0.2 # 20% of prompt is shared shared_prefix_probability: 0.5 # 50% of sessions share a prefix body_length_generator: type: uniform min: 100 max: 500 output_spec: text: output_length_generator: type: uniform min: 50 max: 200 traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 10.0 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct runtime: benchmark_timeout: 60 max_sessions: -1 evaluators: - type: performance target_channels: ["text"] slos: - name: "P99 TTFC under 500ms" metric: ttfc percentile: 0.99 value: 0.5 type: constant Agentic workloads (branching sessions) -------------------------------------- Simulate agentic tool-calling patterns with fan-out/fan-in DAG structure: .. code-block:: yaml # agentic.veeksha.yml seed: 42 session_generator: type: synthetic session_graph: type: branching num_layers_generator: type: uniform min: 3 max: 5 layer_width_generator: type: uniform min: 2 max: 6 fan_out_generator: type: uniform min: 1 max: 5 fan_in_generator: type: uniform min: 1 max: 4 connection_dist_generator: type: uniform min: 1 max: 2 # Allow skip connections single_root: true inherit_history: true request_wait_generator: type: poisson arrival_rate: 3 channels: - type: text body_length_generator: type: uniform min: 50 max: 200 output_spec: text: output_length_generator: type: uniform min: 100 max: 300 traffic_scheduler: type: rate interval_generator: type: poisson arrival_rate: 5.0 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct runtime: max_sessions: 100 benchmark_timeout: 120 evaluators: - type: performance target_channels: ["text"] LM-Eval accuracy benchmarks ---------------------------- Run standardized evaluation tasks from the `lm-evaluation-harness `_: .. code-block:: yaml # lmeval.veeksha.yml seed: 42 session_generator: type: lmeval tasks: ["triviaqa", "truthfulqa_gen"] num_fewshot: 0 traffic_scheduler: type: concurrent target_concurrent_sessions: 4 rampup_seconds: 0 cancel_session_on_failure: false evaluators: - type: performance target_channels: ["text"] - type: accuracy_lmeval bootstrap_iters: 200 client: type: openai_completions # Note: completions, not chat api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct request_timeout: 240 max_tokens_param: max_tokens additional_sampling_params: '{"temperature": 0}' runtime: max_sessions: 40 benchmark_timeout: 1200 .. note:: LM-Eval uses ``openai_completions`` (not ``openai_chat_completions``) for generation tasks. The ``accuracy_lmeval`` evaluator computes task-specific metrics alongside the standard performance evaluator. Throughput saturation test -------------------------- Push the server to maximum throughput using closed-loop concurrency: .. code-block:: yaml # throughput.veeksha.yml seed: 42 session_generator: type: synthetic session_graph: type: single_request channels: - type: text body_length_generator: type: fixed value: 512 output_spec: text: output_length_generator: type: fixed value: 256 traffic_scheduler: type: concurrent target_concurrent_sessions: 32 rampup_seconds: 10 client: type: openai_chat_completions api_base: http://localhost:8000/v1 model: meta-llama/Llama-3-8B-Instruct runtime: benchmark_timeout: 120 max_sessions: -1 evaluators: - type: performance target_channels: ["text"] See also -------- - :doc:`/config_reference/api_reference/BenchmarkConfig` - Complete benchmark configuration reference - :doc:`/config_reference/api_reference/CapacitySearchConfig` - Capacity search configuration reference