Skip to content

OpenAI-Compatible Models

Configure HolmesGPT to use any OpenAI-compatible API.

Function Calling Required

Your model and inference server must support function calling (tool calling). Models that lack this capability may produce incorrect results.

Requirements

  • Function calling support - OpenAI-style tool calling
  • OpenAI-compatible API - Standard endpoints and request/response format

Supported Inference Servers

Configuration

export OPENAI_API_BASE="http://localhost:8000/v1"
export OPENAI_API_KEY="not-needed"
holmes ask "what pods are failing?" --model="openai/<your-model>"

Setup Examples

LocalAI

docker run -p 8080:8080 localai/localai:latest
export OPENAI_API_BASE="http://localhost:8080/v1"

llama-cpp-python

pip install 'llama-cpp-python[server]'
python -m llama_cpp.server --model model.gguf --chat_format chatml
export OPENAI_API_BASE="http://localhost:8000/v1"
holmes ask "analyze my deployment" --model=openai/your-loaded-model

Custom SSL Certificates

If your LLM provider uses a custom Certificate Authority (CA):

# Base64 encode your certificate and set it as an environment variable
export CERTIFICATE="base64-encoded-cert-here"

Known Limitations

  • vLLM: Does not yet support function calling
  • Text Generation WebUI: Requires OpenAI extension enabled
  • Some models: May hallucinate responses instead of reporting function calling limitations