OpenAI-Compatible Models¶
Configure HolmesGPT to use any OpenAI-compatible API.
Function Calling Required
Your model and inference server must support function calling (tool calling). Models that lack this capability may produce incorrect results.
Requirements¶
- Function calling support - OpenAI-style tool calling
- OpenAI-compatible API - Standard endpoints and request/response format
Supported Inference Servers¶
- llama-cpp-python
- LocalAI
- Text Generation WebUI (with OpenAI extension)
Configuration¶
export OPENAI_API_BASE="http://localhost:8000/v1"
export OPENAI_API_KEY="not-needed"
holmes ask "what pods are failing?" --model="openai/<your-model>"
Setup Examples¶
LocalAI¶
llama-cpp-python¶
pip install 'llama-cpp-python[server]'
python -m llama_cpp.server --model model.gguf --chat_format chatml
export OPENAI_API_BASE="http://localhost:8000/v1"
holmes ask "analyze my deployment" --model=openai/your-loaded-model
Custom SSL Certificates¶
If your LLM provider uses a custom Certificate Authority (CA):
# Base64 encode your certificate and set it as an environment variable
export CERTIFICATE="base64-encoded-cert-here"
Known Limitations¶
- vLLM: Does not yet support function calling
- Text Generation WebUI: Requires OpenAI extension enabled
- Some models: May hallucinate responses instead of reporting function calling limitations