AKS Node Health¶
By enabling this toolset, HolmesGPT will be able to perform specialized health checks and troubleshooting for Azure Kubernetes Service (AKS) nodes, including node-specific diagnostics and performance analysis.
Prerequisites¶
- Azure CLI installed and configured
- Appropriate Azure RBAC permissions for AKS clusters
- Access to the target AKS cluster
- Node-level access permissions
Configuration¶
First, ensure you're authenticated with Azure:
Then add the following to ~/.holmes/config.yaml, creating the file if it doesn't exist:
Advanced Configuration¶
You can configure additional health check parameters:
toolsets:
aks/node-health:
enabled: true
config:
subscription_id: "<your Azure subscription ID>"
resource_group: "<your AKS resource group>"
cluster_name: "<your AKS cluster name>"
health_check_interval: 300 # Health check interval in seconds
max_unhealthy_nodes: 3 # Maximum number of unhealthy nodes to report
Capabilities¶
Tool Name | Description |
---|---|
aks_check_node_health | Perform comprehensive health checks on AKS nodes |
aks_get_node_metrics | Get detailed metrics for AKS nodes |
aks_diagnose_node_issues | Diagnose common node-level issues |
aks_check_node_readiness | Check if nodes are ready and schedulable |
aks_get_node_events | Get events related to specific nodes |
aks_check_node_resources | Check resource utilization on nodes |