VllmServerConfigΒΆ

class VllmServerConfig ( BaseServerConfig )

Polymorphic Type:

type: vllm

All BaseServerConfig types:

Fields:

env_pathOptional [ str ] = None

Path to a Python environment directory (virtualenv/conda).

modelstr = "meta-llama/Meta-Llama-3-8B-Instruct"

Model name or path.

hoststr = "localhost"

Host address for the server

portint = 8000

Port number for the server

api_keystr = "token-abc123"

API key for server authentication

gpu_idsOptional [ list [ int ] ] = None

List of GPU IDs to use (None means auto-assign)

startup_timeoutint = 300

Timeout in seconds for server startup

health_check_intervalfloat = 2.0

Interval in seconds between health checks

require_contiguous_gpusbool = True

Require contiguous GPU allocation (e.g., GPUs 0,1,2 instead of 0,2,5)

tensor_parallel_sizeint = 1

Number of GPUs for tensor parallelism

dtypestr = "auto"

Data type for model weights (auto, float16, bfloat16, etc.)

max_model_lenOptional [ int ] = None

Maximum model context length

additional_argsOptional [ str ] = "{}"

Additional engine-specific arguments as JSON string, dict, or None.