VllmServerConfig¶

class VllmServerConfig ( BaseServerConfig )

Polymorphic Type:

type: vllm

All BaseServerConfig types:

vllm: VllmServerConfig

sglang: SglangServerConfig

Fields:

env_pathOptional [ str ] = None: Path to a Python environment directory (virtualenv/conda).
modelstr = "meta-llama/Meta-Llama-3-8B-Instruct": Model name or path.
hoststr = "localhost": Host address for the server
portint = 8000: Port number for the server
api_keystr = "token-abc123": API key for server authentication
gpu_idsOptional [ list [ int ] ] = None: List of GPU IDs to use (None means auto-assign)
startup_timeoutint = 300: Timeout in seconds for server startup
health_check_intervalfloat = 2.0: Interval in seconds between health checks
require_contiguous_gpusbool = True: Require contiguous GPU allocation (e.g., GPUs 0,1,2 instead of 0,2,5)
tensor_parallel_sizeint = 1: Number of GPUs for tensor parallelism
dtypestr = "auto": Data type for model weights (auto, float16, bfloat16, etc.)
max_model_lenOptional [ int ] = None: Maximum model context length
additional_argsOptional [ str ] = "{}": Additional engine-specific arguments as JSON string, dict, or None.