OpenAIRouterClientConfig¶

class OpenAIRouterClientConfig ( OpenAIChatCompletionsClientConfig )

OpenAI-compatible router client configuration.

Polymorphic Type:

type: openai_router

All BaseClientConfig types:

openai_chat_completions: OpenAIChatCompletionsClientConfig

openai_completions: OpenAICompletionsClientConfig

openai_router: OpenAIRouterClientConfig

Fields:

api_baseOptional [ str ] = None: API base URL. Defaults to OPENAI_API_BASE env var.
api_keyOptional [ str ] = None: API key. Defaults to OPENAI_API_KEY env var.
modelstr = "meta-llama/Meta-Llama-3-8B-Instruct": The model to use for this load test.
address_append_valuestr = "chat/completions": The address append value for the LLM API.
request_timeoutint = 300: The timeout for each request to the LLM API (in seconds).
additional_sampling_paramsstr = "{}": Additional sampling params to send with each request to the LLM API.
max_tokens_paramOptional [ str ] = "max_completion_tokens": Server parameter name for maximum tokens.
ignore_eosbool = True: Sets the sampling param ignore_eos for requests to reach the desired max_tokens.
min_tokens_paramOptional [ str ] = None: Server parameter name for minimum tokens, usually set if ignore_eos is not available or does no offer enough control over output tokens (see health_check_results.txt). Note: a wrong value might cause requests to fail.
use_min_tokens_prompt_fallbackbool = False: If True, appends instructions to the prompt to generate at least N tokens (e.g. ‘Generate at least 20 tokens’). Useful if the server does not support ignore_eos or min_tokens. Only available on synthetic content generation.
completions_max_tokens_paramOptional [ str ] = "max_tokens": Server parameter name for maximum tokens on /completions endpoint. Defaults to ‘max_tokens’. The /chat/completions endpoint uses max_tokens_param instead.