OpenAIRouterClientConfig

class OpenAIRouterClientConfig ( OpenAIChatCompletionsClientConfig )

OpenAI-compatible router client configuration.

Polymorphic Type:

type: openai_router

All BaseClientConfig types:

Fields:

api_baseOptional [ str ] = None

API base URL. Defaults to OPENAI_API_BASE env var.

api_keyOptional [ str ] = None

API key. Defaults to OPENAI_API_KEY env var.

modelstr = "meta-llama/Meta-Llama-3-8B-Instruct"

The model to use for this load test.

address_append_valuestr = "chat/completions"

The address append value for the LLM API.

request_timeoutint = 300

The timeout for each request to the LLM API (in seconds).

additional_sampling_paramsstr = "{}"

Additional sampling params to send with each request to the LLM API.

max_tokens_paramOptional [ str ] = "max_completion_tokens"

Server parameter name for maximum tokens.

ignore_eosbool = True

Sets the sampling param ignore_eos for requests to reach the desired max_tokens.

min_tokens_paramOptional [ str ] = None

Server parameter name for minimum tokens, usually set if ignore_eos is not available or does no offer enough control over output tokens (see health_check_results.txt). Note: a wrong value might cause requests to fail.

use_min_tokens_prompt_fallbackbool = False

If True, appends instructions to the prompt to generate at least N tokens (e.g. ‘Generate at least 20 tokens’). Useful if the server does not support ignore_eos or min_tokens. Only available on synthetic content generation.

completions_max_tokens_paramOptional [ str ] = "max_tokens"

Server parameter name for maximum tokens on /completions endpoint. Defaults to ‘max_tokens’. The /chat/completions endpoint uses max_tokens_param instead.