Trace Flavors¶

Use session_generator.type: trace when your workload should come from an existing trace file instead of a synthetic session generator. The trace flavor tells Veeksha how to:

validate the input file
interpret the trace structure
reconstruct prompts, history, and timing

session_generator:
  type: trace
  trace_file: traces/my_trace.jsonl
  wrap_mode: true
  flavor:
    type: request_log

All flavors support JSONL. CSV is also supported for simple tabular traces, and Veeksha automatically normalizes num_prefill_tokens to input_length and num_decode_tokens to output_length when those CSV column names are used.

Choose a flavor¶

Flavor	What the trace already contains	What Veeksha preserves	What Veeksha reconstructs
`request_log`	Independent request lengths with no real prompt text	Input and output length distribution only	Random prompt text and single-request sessions
`timed_synthetic_session`	Per-request token counts plus session topology and wait times	Linear or DAG structure, think time, and history lineage	Prompt text rebuilt from lengths and lineage seeds
`untimed_content_multi_turn`	Real user and assistant message content from a chat dataset	Turn order and actual conversation text	Request history and output lengths from assistant replies
`shared_prefix`	Token counts plus privacy-safe `hash_ids` for shared prefixes	Session order and cross-session prefix-sharing structure	Shared root prompts from `hash_ids` and later turns from lengths
`rag`	Real RAG prompt text plus repeated document IDs	Actual prompt text and hot-document frequency	Warmup sessions for the top documents

The main distinction is what survives in the trace:

only request lengths
lengths plus timing and topology
actual conversation text
privacy-safe shared-prefix identifiers
real RAG prompts plus document reuse

If you are deciding between the multi-turn flavors:

Use untimed_content_multi_turn when you have real chat content.
Use timed_synthetic_session when you have timing, topology, and lengths but not prompt text.
Use shared_prefix when you need privacy-safe traces that still preserve prefix-sharing structure.
Use rag when you have real retrieval prompts and want cache warmup tied to repeated documents.

Common trace fields¶

Several flavors reuse the same field names:

session_id: Groups multiple rows into one session. Required for multi-row session flavors such as timed_synthetic_session and shared_prefix.
input_length: Total prompt tokens expected when the request is dispatched.
new_input_length: New tokens introduced by this turn or node. This matters when Veeksha reconstructs prompts from lengths instead of real text.
output_length: Requested output tokens for the request.

For flavors that expect one row per request or one row per conversation, omitting session_id is usually safest. Veeksha will assign one automatically.

`request_log`¶

Use this for the simplest replay case: independent requests where only input and output lengths matter.

Best for:

replaying public length-only datasets
matching a prompt-length and output-length distribution
load tests that do not need real multi-turn structure

Expected trace shape:

One row per request
Omit session_id or keep it unique per row
Required columns: input_length, output_length

Minimal JSONL example with a shared root across sessions:

{"input_length": 512, "output_length": 128}
{"input_length": 1024, "output_length": 256}

What Veeksha does:

creates one single-request session per row
generates a random prompt with input_length tokens
requests output_length tokens from the model

`timed_synthetic_session`¶

Use this when you have recorded multi-turn or DAG-shaped workloads with timing and token counts, but you do not want to store real prompt text.

Best for:

coding-assistant traces
production chat or agent traces
KV-cache and history-lineage experiments

Expected trace shape:

Multiple rows per session
Required columns: session_id, input_length, new_input_length, output_length
Recommended modern format: include session_context on every row

Minimal DAG-shaped JSONL example:

{"session_id": 8, "input_length": 8, "new_input_length": 8, "output_length": 4, "session_context": {"node_id": 0, "parent_nodes": [], "history_parent": null, "wait_after_ready": 0.0}}
{"session_id": 8, "input_length": 8, "new_input_length": 8, "output_length": 4, "session_context": {"node_id": 1, "parent_nodes": [], "history_parent": null, "wait_after_ready": 0.1}}
{"session_id": 8, "input_length": 16, "new_input_length": 8, "output_length": 5, "session_context": {"node_id": 2, "parent_nodes": [0, 1], "history_parent": 1, "wait_after_ready": 0.2}}

session_context means:

node_id: stable request ID inside the session
parent_nodes: dependencies that must complete first
history_parent: the one parent whose history becomes this request’s context
wait_after_ready: think time after dependencies complete

Legacy linear traces are also supported. If session_context is missing on every row in a session, Veeksha falls back to row order (or turn_idx when present) and can use wait_after_previous_response_s for linear think time.

What Veeksha does:

rebuilds prompt text from lengths instead of using stored content
preserves graph structure and per-node wait times
reuses lineage seeds when two nodes share the same history_parent

`untimed_content_multi_turn`¶

Use this when your trace already contains the actual conversation text and you want Veeksha to split it into request turns.

Best for:

ShareGPT-style datasets
LMSYS-Chat-style datasets
replaying real prompt text without timestamps

Expected trace shape:

One row per conversation
Omit session_id or keep it unique per row
Required column: the conversation column, conversations by default

Minimal ShareGPT-style JSONL example:

{"conversations": [{"from": "human", "value": "What is Python?"}, {"from": "gpt", "value": "Python is a programming language."}, {"from": "human", "value": "Tell me more."}, {"from": "gpt", "value": "It was first released in 1991."}]}

Minimal LMSYS-style flavor config:

flavor:
  type: untimed_content_multi_turn
  conversation_column: conversation
  role_key: role
  content_key: content
  user_role_value: user
  assistant_role_value: assistant

What Veeksha does:

turns each user/assistant pair into one request
pre-populates request history from earlier messages in the row
skips leading assistant messages, system messages, and trailing user messages without a response

`shared_prefix`¶

Use this when you need privacy-safe traces that preserve shared-prefix behavior across sessions.

Best for:

testing prefix caching without storing real text
replaying workloads where many sessions share common roots
synthetic reconstruction from stable prefix IDs

Expected trace shape:

Multiple rows per session
Rows within a session should already be in turn order
Required columns: session_id, input_length, new_input_length, output_length, hash_ids

Minimal JSONL example:

{"session_id": 1, "input_length": 1024, "new_input_length": 1024, "output_length": 128, "hash_ids": [101, 102]}
{"session_id": 1, "input_length": 1536, "new_input_length": 512, "output_length": 128, "hash_ids": [101, 102, 103]}
{"session_id": 2, "input_length": 1024, "new_input_length": 1024, "output_length": 128, "hash_ids": [101, 102]}

What Veeksha does:

uses the first row of each session as the shared-prefix root request
maps identical root hash_ids blocks to identical prompt content
uses hash_ids for cross-session sharing only on that first row
generates later-turn content from new_input_length

In practice, this flavor is useful when you care about shared prefixes and conversation structure, but cannot keep the original prompt text in the trace.

`rag`¶

Use this for retrieval-style workloads where the trace already contains the full prompt text and repeated document IDs.

Best for:

RAG traces with repeated retrieved documents
document-cache warmup experiments
long-context single-request workloads

Expected trace shape:

One row per request
Omit session_id or keep it unique per row
Required columns: doc_id, prompt_text, input_length, output_length

Minimal JSONL example with a repeated hot document:

{"doc_id": "doc-17", "prompt_text": "Context: ...\n\nQuestion: Summarize the document.", "input_length": 2048, "output_length": 128}
{"doc_id": "doc-17", "prompt_text": "Context: ...\n\nQuestion: Extract the main risks.", "input_length": 2048, "output_length": 128}
{"doc_id": "doc-42", "prompt_text": "Context: ...\n\nQuestion: List the action items.", "input_length": 2048, "output_length": 128}

What Veeksha does:

creates one single-request session per row
picks the most frequent doc_id values for warmup sessions
warms those documents before the main benchmark starts

Tune flavor.num_documents to control how many hot documents Veeksha warms.

Wrap mode¶

With wrap_mode: true, Veeksha keeps looping over the trace after it reaches the end. Sessions are reshuffled between epochs, and flavors that synthesize content regenerate or remap it as needed so repeated epochs do not reuse exactly the same session IDs and prompt materialization.

Trace Flavors¶

Choose a flavor¶

Common trace fields¶

request_log¶

timed_synthetic_session¶

untimed_content_multi_turn¶

shared_prefix¶

rag¶

Wrap mode¶

See also¶

`request_log`¶

`timed_synthetic_session`¶

`untimed_content_multi_turn`¶

`shared_prefix`¶

`rag`¶