Proprietary Systems
===================
``veeksha`` can benchmark the performance of LLM Inference Systems that are exposed as public APIs. The following sections describe how to benchmark these systems.

.. note::

    Custom tokenizer corresponding to the model is fetched from Hugging Face hub. Make sure you have access to the model and are logged in to Hugging Face. Check :ref:`huggingface_setup` for more details.

Export API Key and URL
~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: shell

    export OPENAI_API_BASE=https://api.endpoints.anyscale.com/v1
    export OPENAI_API_KEY=secret_abcdefg

Running Benchmark
~~~~~~~~~~~~~~~~~

.. code-block:: shell

    python -m veeksha.run_benchmark \
    --client_config_model "meta-llama/Meta-Llama-3-8B-Instruct" \
    --max_completed_requests 20 \
    --request_interval_generator_config_type "gamma" \
    --request_length_generator_config_type "zipf" \
    --zipf_request_length_generator_config_max_tokens 8192 \
    --metrics_config_output_dir "results"

Be sure to update ``--client_config_model`` flag to the model used in the proprietary system.

.. note::

    ``veeksha`` supports different generator providers for request interval and request length. For more details, refer to :doc:`../guides/request_generator_providers`.

.. _wandb_args_proprietary_systems:

Specifying wandb args [Optional]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Optionally, you can also specify the following arguments to log results to wandb:

.. code-block:: shell

    --metrics_config_should_write_metrics \
    --metrics_config_wandb_project Project \
    --metrics_config_wandb_group Group \
    --metrics_config_wandb_run_name Run

Other Arguments
^^^^^^^^^^^^^^^
There are many more arguments for running benchmark, run the following to know more:

.. code-block:: shell

    python -m veeksha.run_benchmark -h


Saving Results
~~~~~~~~~~~~~~~
The results of the benchmark are saved in the results directory specified by the ``--metrics_config_output_dir`` argument.