Every Feature, Explained.
Free diagnostics that replace 30 minutes of manual investigation. Pro auto-fix PRs that eliminate the other 10. See exactly what each tier gives you.
The Free Tier Is Not a Demo. It’s the Full Diagnostic Stack.
Most tools give you a 14-day trial and hope you forget to cancel. KI-Ops Community gives you unlimited cluster analysis, log correlation, and AI-powered root-cause identification — forever. BYOK, no strings.
Kubernetes Cluster Analysis
Full diagnostics across Pods, Deployments, Services, StatefulSets, DaemonSets, and Events. Point KI-Ops at your kubeconfig and get a complete health report in seconds — no config, no agents, no sidecars.
What you get
- Replaces 30+ minutes of manual kubectl commands per incident
- Detects CrashLoopBackOff, OOMKilled, ImagePullBackOff, Pending Pods automatically
- Node pressure, HPA status, and resource quota analysis included
- Works with EKS, GKE, AKS, k3s, kind, and any kubeconfig-accessible cluster
- Unlimited analyses — no daily caps, no artificial limits
Grafana & Loki Log Analysis
Query your Grafana dashboards and Loki log streams directly from KI-Ops. AI correlates metrics with logs to surface anomalies, error spikes, and patterns you’d spend 30 minutes finding manually.
What you get
- Cross-correlates logs + metrics + cluster state in a single analysis
- Surfaces error spikes, latency anomalies, and resource trends automatically
- No need to switch between Grafana, Loki, and kubectl — one unified view
- Works with your existing Prometheus/Grafana/Loki stack — no migration needed
- Vendor-neutral: OpenTelemetry-native architecture
Network & DNS Troubleshooting
DNS resolution, service reachability, network path analysis, and ingress health checks. Catch misconfigurations before they cascade into production incidents.
What you get
- Detects DNS resolution failures, service unreachability, and ingress misconfigurations
- Certificate expiry warnings before they page your on-call engineer
- Network policy analysis to catch blocked traffic paths
- Validates service mesh connectivity (Istio, Linkerd)
AI-Powered Root-Cause Analysis
Claude AI analyzes your cluster state, logs, and metrics together — then generates specific, validated action items. Not generic advice. Context-aware recommendations based on your actual infrastructure.
What you get
- Identifies the actual root cause, not just symptoms (e.g., "OOMKilled because traffic spike", not just "pod crashed")
- Generates copy-pasteable kubectl commands to fix the issue
- Cross-references Kubernetes best practices and common patterns
- BYOK: you control the API key, the cost (~$5–15/month), and the data flow
- No data sent to KI-Ops — LLM calls go directly from your infra to Anthropic
Knowledge Base & Best Practices
Built-in runbooks for Kubernetes, Helm, Terraform, and Docker patterns. AI cross-references best practices in every analysis, so your team learns while fixing.
What you get
- Embedded K8s troubleshooting runbooks covering 50+ common failure modes
- AI explains why a problem happened, not just what to do
- Junior engineers can resolve incidents that normally require senior expertise
- Continuously updated with community-contributed patterns
Health Dashboard & Scheduled Analysis
Real-time cluster health status, node health, resource utilization trends, and historical tracking. Schedule recurring analyses to catch degradation before it becomes an incident.
What you get
- At-a-glance health overview: nodes, pods, workloads, storage, networking
- Proactive alerting: catch memory creep, disk pressure, and HPA limits before they break
- Historical trend data to identify recurring patterns
- Schedule analyses (hourly, daily) for continuous health monitoring
Stop Copy-Pasting kubectl Fixes. Let AI Ship Validated PRs.
Pro adds auto-remediation on top of free diagnostics. Every fix is validated with kubectl dry-run, Helm template, and Terraform plan before a PR is created. Your team reviews and merges. MTTR drops from 40 minutes to 4.
Auto-Generated Fix Pull Requests
Claude generates Kubernetes manifests, Helm values, or Terraform configs and opens PRs automatically. Multi-file support for complex fixes. Your team reviews and merges — the AI does the YAML editing.
What changes for your team
- Eliminates 10–20 minutes of manual YAML editing per incident
- Multi-file PRs: deployment + HPA + configmap changes in one PR
- PR includes full incident context: root cause, impact assessment, rollback plan
- Suggested reviewers based on file ownership and on-call rotation
- Auto-merge policies available for low-risk changes
Pre-Merge Validation Pipeline
Every fix passes a 6-step validation pipeline before the PR is created. kubectl dry-run, Helm template check, Terraform plan, Dockerfile security lint, policy checks, and resource quota verification.
What changes for your team
- kubectl apply --dry-run validates API server compatibility
- Helm template check ensures chart renders valid YAML
- Terraform plan confirms no unintended infrastructure changes
- Custom policy checks (OPA/Rego, Kyverno) enforced automatically
- Resource quota check confirms the cluster has capacity for the change
- Zero broken PRs — if validation fails, the PR is not created
GitHub/GitLab Integration & Multi-Repo
Connect all your repositories. KI-Ops reads, searches, and modifies files across repos. Automatic branch creation, PR opening, and post-merge branch cleanup.
What changes for your team
- Multi-repo support: fix a deployment in repo A and a Helm chart in repo B in one workflow
- Automatic feature branch creation with descriptive naming
- Post-merge branch cleanup — no stale branches accumulating
- Full audit trail: every change is a reviewed, approved PR
- Works with GitHub and GitLab (self-hosted or cloud)
Full GitOps Workflow
From incident detection to merged fix in under 4 minutes. Feature branch creation → validated PR with incident context → team review → merge → automatic cleanup. Zero manual git work.
What changes for your team
- MTTR drops from 40 minutes to 4 minutes on average
- Compatible with ArgoCD, Flux, and other GitOps controllers
- Human-in-the-loop: AI proposes, your team decides
- One-click incident response — from alert to deployed fix
- Complete audit trail for compliance and post-mortems
Free vs. Pro — The Full Breakdown
Both tiers are production-ready. Free handles diagnostics. Pro eliminates the manual work that comes after.
| Capability | Community0€ forever | Pro250€/year |
|---|---|---|
| Cluster Diagnostics | ||
| Pod, Deployment & Node analysiskubectl-based, real-time | ||
| CrashLoopBackOff / OOMKilled detection | ||
| HPA & resource quota analysis | ||
| DNS & network checks | ||
| Cluster health dashboard | ||
| Scheduled recurring analyses | ||
| Observability Integration | ||
| Grafana dashboard queries | ||
| Loki log analysis | ||
| Prometheus metric correlation | ||
| Cross-signal correlationLogs + metrics + cluster state | ||
| AI Analysis | ||
| LLM-powered root-cause analysis | ||
| Actionable recommendations (text) | ||
| Knowledge base & best practices | ||
| Incident clustering & deduplication | ||
| Auto-Remediation (Pro) | ||
| Auto-fix pull request creationAI generates YAML diffs, opens PR | ||
| Multi-file PR supportDeployment + HPA + ConfigMap in one PR | ||
| kubectl dry-run validation | ||
| Helm template validation | ||
| Terraform plan validation | ||
| Dockerfile security lint | ||
| Custom policy checks (OPA/Kyverno) | ||
| Git & DevOps Integration (Pro) | ||
| GitHub / GitLab integration | ||
| Multi-repo support | ||
| Automatic branch creation & cleanup | ||
| PR with incident context & rollback plan | ||
| ArgoCD / Flux GitOps compatible | ||
| Deployment & Security | ||
| Self-hosted deployment | ||
| BYOK (Bring Your Own Claude API key) | ||
| Data stays on your infrastructure | ||
| Open Source (MIT) | ||
| EKS / GKE / AKS / k3s compatible | ||
| Helm chart installationUnder 5 minutes setup | ||
Both tiers include:
Self-hosted deployment · BYOK (Bring Your Own Claude API key) · No vendor lock-in · Open Source (MIT)
Deep Dives
Want the technical details? Each feature area has its own page with architecture, CLI examples, and real-world scenarios.
AIOps & Incident Intelligence
KI-gestützte Incident-Analyse, automatische Clustering und intelligente Root-Cause-Analyse für schnellere Problemlösung.
Observability-Integration & Signal-Korrelation
Logs, Metriken und Traces vereinheitlicht. Vendor-neutral über OpenTelemetry, Grafana, Loki und beliebige Logging-Systeme.
Kubernetes Monitoring & Self-Healing
Vollständige Cluster-Diagnostik: Pod, Node und Deployment-Analyse mit automatischer Erkennung typischer K8s-Fehler.
eBPF Zero-Code Instrumentation
Kernel-Level Visibility ohne Code-Änderungen. Syscall-Tracing, Network-Flows und Latency-Profiling in Echtzeit.
Security Observability & Anomaly Detection
Anomalien in Logs, Metriken und Verhalten erkennen. Runtime-Security-Signale direkt in Ihre Ops-Pipeline integrieren.
Agentic AI Operations & Auto-Fix
KI-Ops Pro: Automatische Remediation via Git Pull Requests. YAML/Helm/Terraform Validation mit Human-in-the-Loop.
Your next incident diagnosed in seconds, fixed in minutes.
Start free with full diagnostics. Upgrade to Pro when you're ready for auto-fix PRs — 250€/year, whole team.
Analyze Your Cluster Free