OpenAICompletionsClientConfig¶
class OpenAICompletionsClientConfig ( OpenAIChatCompletionsClientConfig )
OpenAI Completions client configuration.
Polymorphic Type:
type: openai_completionsAll
BaseClientConfigtypes:
openai_chat_completions: OpenAIChatCompletionsClientConfig
openai_completions: OpenAICompletionsClientConfig
openai_router: OpenAIRouterClientConfig
Fields:
api_baseOptional [ str ] =NoneAPI base URL. Defaults to OPENAI_API_BASE env var.
api_keyOptional [ str ] =NoneAPI key. Defaults to OPENAI_API_KEY env var.
modelstr ="meta-llama/Meta-Llama-3-8B-Instruct"The model to use for this load test.
address_append_valuestr ="completions"The address append value for the LLM API.
request_timeoutint =300The timeout for each request to the LLM API (in seconds).
additional_sampling_paramsstr ="{}"Additional sampling params to send with each request to the LLM API.
max_tokens_paramOptional [ str ] ="max_tokens"Server parameter name for maximum tokens.
ignore_eosbool =TrueSets the sampling param ignore_eos for requests to reach the desired max_tokens.
min_tokens_paramOptional [ str ] =NoneServer parameter name for minimum tokens, usually set if ignore_eos is not available or does no offer enough control over output tokens (see health_check_results.txt). Note: a wrong value might cause requests to fail.
use_min_tokens_prompt_fallbackbool =FalseIf True, appends instructions to the prompt to generate at least N tokens (e.g. ‘Generate at least 20 tokens’). Useful if the server does not support ignore_eos or min_tokens. Only available on synthetic content generation.