Trace Flavors¶
Use session_generator.type: trace when your workload should come from an
existing trace file instead of a synthetic session generator. The trace
flavor tells Veeksha how to:
validate the input file
interpret the trace structure
reconstruct prompts, history, and timing
session_generator:
type: trace
trace_file: traces/my_trace.jsonl
wrap_mode: true
flavor:
type: request_log
All flavors support JSONL. CSV is also supported for simple tabular traces, and
Veeksha automatically normalizes num_prefill_tokens to input_length and
num_decode_tokens to output_length when those CSV column names are used.
Choose a flavor¶
Flavor |
What the trace already contains |
What Veeksha preserves |
What Veeksha reconstructs |
|---|---|---|---|
|
Independent request lengths with no real prompt text |
Input and output length distribution only |
Random prompt text and single-request sessions |
|
Per-request token counts plus session topology and wait times |
Linear or DAG structure, think time, and history lineage |
Prompt text rebuilt from lengths and lineage seeds |
|
Real user and assistant message content from a chat dataset |
Turn order and actual conversation text |
Request history and output lengths from assistant replies |
|
Token counts plus privacy-safe |
Session order and cross-session prefix-sharing structure |
Shared root prompts from |
|
Real RAG prompt text plus repeated document IDs |
Actual prompt text and hot-document frequency |
Warmup sessions for the top documents |
The main distinction is what survives in the trace:
only request lengths
lengths plus timing and topology
actual conversation text
privacy-safe shared-prefix identifiers
real RAG prompts plus document reuse
If you are deciding between the multi-turn flavors:
Use
untimed_content_multi_turnwhen you have real chat content.Use
timed_synthetic_sessionwhen you have timing, topology, and lengths but not prompt text.Use
shared_prefixwhen you need privacy-safe traces that still preserve prefix-sharing structure.Use
ragwhen you have real retrieval prompts and want cache warmup tied to repeated documents.
Common trace fields¶
Several flavors reuse the same field names:
session_idGroups multiple rows into one session. Required for multi-row session flavors such as
timed_synthetic_sessionandshared_prefix.input_lengthTotal prompt tokens expected when the request is dispatched.
new_input_lengthNew tokens introduced by this turn or node. This matters when Veeksha reconstructs prompts from lengths instead of real text.
output_lengthRequested output tokens for the request.
For flavors that expect one row per request or one row per conversation,
omitting session_id is usually safest. Veeksha will assign one
automatically.
request_log¶
Use this for the simplest replay case: independent requests where only input and output lengths matter.
Best for:
replaying public length-only datasets
matching a prompt-length and output-length distribution
load tests that do not need real multi-turn structure
Expected trace shape:
One row per request
Omit
session_idor keep it unique per rowRequired columns:
input_length,output_length
Minimal JSONL example with a shared root across sessions:
{"input_length": 512, "output_length": 128}
{"input_length": 1024, "output_length": 256}
What Veeksha does:
creates one single-request session per row
generates a random prompt with
input_lengthtokensrequests
output_lengthtokens from the model
timed_synthetic_session¶
Use this when you have recorded multi-turn or DAG-shaped workloads with timing and token counts, but you do not want to store real prompt text.
Best for:
coding-assistant traces
production chat or agent traces
KV-cache and history-lineage experiments
Expected trace shape:
Multiple rows per session
Required columns:
session_id,input_length,new_input_length,output_lengthRecommended modern format: include
session_contexton every row
Minimal DAG-shaped JSONL example:
{"session_id": 8, "input_length": 8, "new_input_length": 8, "output_length": 4, "session_context": {"node_id": 0, "parent_nodes": [], "history_parent": null, "wait_after_ready": 0.0}}
{"session_id": 8, "input_length": 8, "new_input_length": 8, "output_length": 4, "session_context": {"node_id": 1, "parent_nodes": [], "history_parent": null, "wait_after_ready": 0.1}}
{"session_id": 8, "input_length": 16, "new_input_length": 8, "output_length": 5, "session_context": {"node_id": 2, "parent_nodes": [0, 1], "history_parent": 1, "wait_after_ready": 0.2}}
session_context means:
node_id: stable request ID inside the sessionparent_nodes: dependencies that must complete firsthistory_parent: the one parent whose history becomes this request’s contextwait_after_ready: think time after dependencies complete
Legacy linear traces are also supported. If session_context is missing on
every row in a session, Veeksha falls back to row order (or turn_idx when
present) and can use wait_after_previous_response_s for linear think time.
What Veeksha does:
rebuilds prompt text from lengths instead of using stored content
preserves graph structure and per-node wait times
reuses lineage seeds when two nodes share the same
history_parent
untimed_content_multi_turn¶
Use this when your trace already contains the actual conversation text and you want Veeksha to split it into request turns.
Best for:
ShareGPT-style datasets
LMSYS-Chat-style datasets
replaying real prompt text without timestamps
Expected trace shape:
One row per conversation
Omit
session_idor keep it unique per rowRequired column: the conversation column,
conversationsby default
Minimal ShareGPT-style JSONL example:
{"conversations": [{"from": "human", "value": "What is Python?"}, {"from": "gpt", "value": "Python is a programming language."}, {"from": "human", "value": "Tell me more."}, {"from": "gpt", "value": "It was first released in 1991."}]}
Minimal LMSYS-style flavor config:
flavor:
type: untimed_content_multi_turn
conversation_column: conversation
role_key: role
content_key: content
user_role_value: user
assistant_role_value: assistant
What Veeksha does:
turns each user/assistant pair into one request
pre-populates request history from earlier messages in the row
skips leading assistant messages, system messages, and trailing user messages without a response
rag¶
Use this for retrieval-style workloads where the trace already contains the full prompt text and repeated document IDs.
Best for:
RAG traces with repeated retrieved documents
document-cache warmup experiments
long-context single-request workloads
Expected trace shape:
One row per request
Omit
session_idor keep it unique per rowRequired columns:
doc_id,prompt_text,input_length,output_length
Minimal JSONL example with a repeated hot document:
{"doc_id": "doc-17", "prompt_text": "Context: ...\n\nQuestion: Summarize the document.", "input_length": 2048, "output_length": 128}
{"doc_id": "doc-17", "prompt_text": "Context: ...\n\nQuestion: Extract the main risks.", "input_length": 2048, "output_length": 128}
{"doc_id": "doc-42", "prompt_text": "Context: ...\n\nQuestion: List the action items.", "input_length": 2048, "output_length": 128}
What Veeksha does:
creates one single-request session per row
picks the most frequent
doc_idvalues for warmup sessionswarms those documents before the main benchmark starts
Tune flavor.num_documents to control how many hot documents Veeksha warms.
Wrap mode¶
With wrap_mode: true, Veeksha keeps looping over the trace after it reaches
the end. Sessions are reshuffled between epochs, and flavors that synthesize
content regenerate or remap it as needed so repeated epochs do not reuse exactly
the same session IDs and prompt materialization.
See also¶
Configuration System for full trace-session-generator configuration
Benchmark Types for ready-to-run benchmark recipes
Content Generation for the design-level view of content generation