Trace Flavors ============= Use ``session_generator.type: trace`` when your workload should come from an existing trace file instead of a synthetic session generator. The trace ``flavor`` tells Veeksha how to: - validate the input file - interpret the trace structure - reconstruct prompts, history, and timing .. code-block:: yaml session_generator: type: trace trace_file: traces/my_trace.jsonl wrap_mode: true flavor: type: request_log All flavors support JSONL. CSV is also supported for simple tabular traces, and Veeksha automatically normalizes ``num_prefill_tokens`` to ``input_length`` and ``num_decode_tokens`` to ``output_length`` when those CSV column names are used. Choose a flavor --------------- .. list-table:: :header-rows: 1 :widths: 18 30 24 28 * - Flavor - What the trace already contains - What Veeksha preserves - What Veeksha reconstructs * - ``request_log`` - Independent request lengths with no real prompt text - Input and output length distribution only - Random prompt text and single-request sessions * - ``timed_synthetic_session`` - Per-request token counts plus session topology and wait times - Linear or DAG structure, think time, and history lineage - Prompt text rebuilt from lengths and lineage seeds * - ``untimed_content_multi_turn`` - Real user and assistant message content from a chat dataset - Turn order and actual conversation text - Request history and output lengths from assistant replies * - ``shared_prefix`` - Token counts plus privacy-safe ``hash_ids`` for shared prefixes - Session order and cross-session prefix-sharing structure - Shared root prompts from ``hash_ids`` and later turns from lengths * - ``rag`` - Real RAG prompt text plus repeated document IDs - Actual prompt text and hot-document frequency - Warmup sessions for the top documents The main distinction is what survives in the trace: - only request lengths - lengths plus timing and topology - actual conversation text - privacy-safe shared-prefix identifiers - real RAG prompts plus document reuse If you are deciding between the multi-turn flavors: - Use ``untimed_content_multi_turn`` when you have real chat content. - Use ``timed_synthetic_session`` when you have timing, topology, and lengths but not prompt text. - Use ``shared_prefix`` when you need privacy-safe traces that still preserve prefix-sharing structure. - Use ``rag`` when you have real retrieval prompts and want cache warmup tied to repeated documents. Common trace fields ------------------- Several flavors reuse the same field names: ``session_id`` Groups multiple rows into one session. Required for multi-row session flavors such as ``timed_synthetic_session`` and ``shared_prefix``. ``input_length`` Total prompt tokens expected when the request is dispatched. ``new_input_length`` New tokens introduced by this turn or node. This matters when Veeksha reconstructs prompts from lengths instead of real text. ``output_length`` Requested output tokens for the request. For flavors that expect one row per request or one row per conversation, omitting ``session_id`` is usually safest. Veeksha will assign one automatically. ``request_log`` ---------------- Use this for the simplest replay case: independent requests where only input and output lengths matter. Best for: - replaying public length-only datasets - matching a prompt-length and output-length distribution - load tests that do not need real multi-turn structure Expected trace shape: - One row per request - Omit ``session_id`` or keep it unique per row - Required columns: ``input_length``, ``output_length`` Minimal JSONL example with a shared root across sessions: .. code-block:: json {"input_length": 512, "output_length": 128} {"input_length": 1024, "output_length": 256} What Veeksha does: - creates one single-request session per row - generates a random prompt with ``input_length`` tokens - requests ``output_length`` tokens from the model ``timed_synthetic_session`` --------------------------- Use this when you have recorded multi-turn or DAG-shaped workloads with timing and token counts, but you do not want to store real prompt text. Best for: - coding-assistant traces - production chat or agent traces - KV-cache and history-lineage experiments Expected trace shape: - Multiple rows per session - Required columns: ``session_id``, ``input_length``, ``new_input_length``, ``output_length`` - Recommended modern format: include ``session_context`` on every row Minimal DAG-shaped JSONL example: .. code-block:: json {"session_id": 8, "input_length": 8, "new_input_length": 8, "output_length": 4, "session_context": {"node_id": 0, "parent_nodes": [], "history_parent": null, "wait_after_ready": 0.0}} {"session_id": 8, "input_length": 8, "new_input_length": 8, "output_length": 4, "session_context": {"node_id": 1, "parent_nodes": [], "history_parent": null, "wait_after_ready": 0.1}} {"session_id": 8, "input_length": 16, "new_input_length": 8, "output_length": 5, "session_context": {"node_id": 2, "parent_nodes": [0, 1], "history_parent": 1, "wait_after_ready": 0.2}} ``session_context`` means: - ``node_id``: stable request ID inside the session - ``parent_nodes``: dependencies that must complete first - ``history_parent``: the one parent whose history becomes this request's context - ``wait_after_ready``: think time after dependencies complete Legacy linear traces are also supported. If ``session_context`` is missing on every row in a session, Veeksha falls back to row order (or ``turn_idx`` when present) and can use ``wait_after_previous_response_s`` for linear think time. What Veeksha does: - rebuilds prompt text from lengths instead of using stored content - preserves graph structure and per-node wait times - reuses lineage seeds when two nodes share the same ``history_parent`` ``untimed_content_multi_turn`` ------------------------------ Use this when your trace already contains the actual conversation text and you want Veeksha to split it into request turns. Best for: - ShareGPT-style datasets - LMSYS-Chat-style datasets - replaying real prompt text without timestamps Expected trace shape: - One row per conversation - Omit ``session_id`` or keep it unique per row - Required column: the conversation column, ``conversations`` by default Minimal ShareGPT-style JSONL example: .. code-block:: json {"conversations": [{"from": "human", "value": "What is Python?"}, {"from": "gpt", "value": "Python is a programming language."}, {"from": "human", "value": "Tell me more."}, {"from": "gpt", "value": "It was first released in 1991."}]} Minimal LMSYS-style flavor config: .. code-block:: yaml flavor: type: untimed_content_multi_turn conversation_column: conversation role_key: role content_key: content user_role_value: user assistant_role_value: assistant What Veeksha does: - turns each user/assistant pair into one request - pre-populates request history from earlier messages in the row - skips leading assistant messages, system messages, and trailing user messages without a response ``shared_prefix`` ----------------- Use this when you need privacy-safe traces that preserve shared-prefix behavior across sessions. Best for: - testing prefix caching without storing real text - replaying workloads where many sessions share common roots - synthetic reconstruction from stable prefix IDs Expected trace shape: - Multiple rows per session - Rows within a session should already be in turn order - Required columns: ``session_id``, ``input_length``, ``new_input_length``, ``output_length``, ``hash_ids`` Minimal JSONL example: .. code-block:: json {"session_id": 1, "input_length": 1024, "new_input_length": 1024, "output_length": 128, "hash_ids": [101, 102]} {"session_id": 1, "input_length": 1536, "new_input_length": 512, "output_length": 128, "hash_ids": [101, 102, 103]} {"session_id": 2, "input_length": 1024, "new_input_length": 1024, "output_length": 128, "hash_ids": [101, 102]} What Veeksha does: - uses the first row of each session as the shared-prefix root request - maps identical root ``hash_ids`` blocks to identical prompt content - uses ``hash_ids`` for cross-session sharing only on that first row - generates later-turn content from ``new_input_length`` In practice, this flavor is useful when you care about shared prefixes and conversation structure, but cannot keep the original prompt text in the trace. ``rag`` ------- Use this for retrieval-style workloads where the trace already contains the full prompt text and repeated document IDs. Best for: - RAG traces with repeated retrieved documents - document-cache warmup experiments - long-context single-request workloads Expected trace shape: - One row per request - Omit ``session_id`` or keep it unique per row - Required columns: ``doc_id``, ``prompt_text``, ``input_length``, ``output_length`` Minimal JSONL example with a repeated hot document: .. code-block:: json {"doc_id": "doc-17", "prompt_text": "Context: ...\n\nQuestion: Summarize the document.", "input_length": 2048, "output_length": 128} {"doc_id": "doc-17", "prompt_text": "Context: ...\n\nQuestion: Extract the main risks.", "input_length": 2048, "output_length": 128} {"doc_id": "doc-42", "prompt_text": "Context: ...\n\nQuestion: List the action items.", "input_length": 2048, "output_length": 128} What Veeksha does: - creates one single-request session per row - picks the most frequent ``doc_id`` values for warmup sessions - warms those documents before the main benchmark starts Tune ``flavor.num_documents`` to control how many hot documents Veeksha warms. Wrap mode --------- With ``wrap_mode: true``, Veeksha keeps looping over the trace after it reaches the end. Sessions are reshuffled between epochs, and flavors that synthesize content regenerate or remap it as needed so repeated epochs do not reuse exactly the same session IDs and prompt materialization. See also -------- - :doc:`configuration` for full trace-session-generator configuration - :doc:`/getting_started/common_benchmarks` for ready-to-run benchmark recipes - :doc:`/design/content_generation` for the design-level view of content generation