OpenAIRouterClientConfig¶
class OpenAIRouterClientConfig ( OpenAIChatCompletionsClientConfig )
OpenAI-compatible router client configuration.
Polymorphic Type:
type: openai_routerAll
BaseClientConfigtypes:
openai_chat_completions: OpenAIChatCompletionsClientConfig
openai_completions: OpenAICompletionsClientConfig
openai_router: OpenAIRouterClientConfig
Fields:
api_baseOptional [ str ] =NoneAPI base URL. Defaults to OPENAI_API_BASE env var.
api_keyOptional [ str ] =NoneAPI key. Defaults to OPENAI_API_KEY env var.
modelstr ="meta-llama/Meta-Llama-3-8B-Instruct"The model to use for this load test.
address_append_valuestr ="chat/completions"The address append value for the LLM API.
request_timeoutint =300The timeout for each request to the LLM API (in seconds).
additional_sampling_paramsstr ="{}"Additional sampling params to send with each request to the LLM API.
max_tokens_paramOptional [ str ] ="max_completion_tokens"Server parameter name for maximum tokens.
ignore_eosbool =TrueSets the sampling param ignore_eos for requests to reach the desired max_tokens.
min_tokens_paramOptional [ str ] =NoneServer parameter name for minimum tokens, usually set if ignore_eos is not available or does no offer enough control over output tokens (see health_check_results.txt). Note: a wrong value might cause requests to fail.
use_min_tokens_prompt_fallbackbool =FalseIf True, appends instructions to the prompt to generate at least N tokens (e.g. ‘Generate at least 20 tokens’). Useful if the server does not support ignore_eos or min_tokens. Only available on synthetic content generation.
completions_max_tokens_paramOptional [ str ] ="max_tokens"Server parameter name for maximum tokens on /completions endpoint. Defaults to ‘max_tokens’. The /chat/completions endpoint uses max_tokens_param instead.