Every Feature, Explained.

Free diagnostics that replace 30 minutes of manual investigation. Pro auto-fix PRs that eliminate the other 10. See exactly what each tier gives you.

FREE FOREVER

The Free Tier Is Not a Demo. Its the Full Diagnostic Stack.

Most tools give you a 14-day trial and hope you forget to cancel. KI-Ops Community gives you unlimited cluster analysis, log correlation, and AI-powered root-cause identification forever. BYOK, no strings.

🔍

Kubernetes Cluster Analysis

Full diagnostics across Pods, Deployments, Services, StatefulSets, DaemonSets, and Events. Point KI-Ops at your kubeconfig and get a complete health report in seconds — no config, no agents, no sidecars.

run_kubectlget_podsdescribe_nodecheck_hpa

What you get

  • Replaces 30+ minutes of manual kubectl commands per incident
  • Detects CrashLoopBackOff, OOMKilled, ImagePullBackOff, Pending Pods automatically
  • Node pressure, HPA status, and resource quota analysis included
  • Works with EKS, GKE, AKS, k3s, kind, and any kubeconfig-accessible cluster
  • Unlimited analyses — no daily caps, no artificial limits
📊

Grafana & Loki Log Analysis

Query your Grafana dashboards and Loki log streams directly from KI-Ops. AI correlates metrics with logs to surface anomalies, error spikes, and patterns you’d spend 30 minutes finding manually.

query_lokiquery_grafanacorrelate_traces

What you get

  • Cross-correlates logs + metrics + cluster state in a single analysis
  • Surfaces error spikes, latency anomalies, and resource trends automatically
  • No need to switch between Grafana, Loki, and kubectl — one unified view
  • Works with your existing Prometheus/Grafana/Loki stack — no migration needed
  • Vendor-neutral: OpenTelemetry-native architecture
🌐

Network & DNS Troubleshooting

DNS resolution, service reachability, network path analysis, and ingress health checks. Catch misconfigurations before they cascade into production incidents.

check_dnscheck_servicecheck_ingress

What you get

  • Detects DNS resolution failures, service unreachability, and ingress misconfigurations
  • Certificate expiry warnings before they page your on-call engineer
  • Network policy analysis to catch blocked traffic paths
  • Validates service mesh connectivity (Istio, Linkerd)
🧠

AI-Powered Root-Cause Analysis

Claude AI analyzes your cluster state, logs, and metrics together — then generates specific, validated action items. Not generic advice. Context-aware recommendations based on your actual infrastructure.

BYOK — your own Claude API key

What you get

  • Identifies the actual root cause, not just symptoms (e.g., "OOMKilled because traffic spike", not just "pod crashed")
  • Generates copy-pasteable kubectl commands to fix the issue
  • Cross-references Kubernetes best practices and common patterns
  • BYOK: you control the API key, the cost (~$5–15/month), and the data flow
  • No data sent to KI-Ops — LLM calls go directly from your infra to Anthropic
📚

Knowledge Base & Best Practices

Built-in runbooks for Kubernetes, Helm, Terraform, and Docker patterns. AI cross-references best practices in every analysis, so your team learns while fixing.

What you get

  • Embedded K8s troubleshooting runbooks covering 50+ common failure modes
  • AI explains why a problem happened, not just what to do
  • Junior engineers can resolve incidents that normally require senior expertise
  • Continuously updated with community-contributed patterns
💚

Health Dashboard & Scheduled Analysis

Real-time cluster health status, node health, resource utilization trends, and historical tracking. Schedule recurring analyses to catch degradation before it becomes an incident.

What you get

  • At-a-glance health overview: nodes, pods, workloads, storage, networking
  • Proactive alerting: catch memory creep, disk pressure, and HPA limits before they break
  • Historical trend data to identify recurring patterns
  • Schedule analyses (hourly, daily) for continuous health monitoring
PRO 250/year whole team

Stop Copy-Pasting kubectl Fixes. Let AI Ship Validated PRs.

Pro adds auto-remediation on top of free diagnostics. Every fix is validated with kubectl dry-run, Helm template, and Terraform plan before a PR is created. Your team reviews and merges. MTTR drops from 40 minutes to 4.

🔧

Auto-Generated Fix Pull Requests

Claude generates Kubernetes manifests, Helm values, or Terraform configs and opens PRs automatically. Multi-file support for complex fixes. Your team reviews and merges — the AI does the YAML editing.

propose_file_changecreate_new_filecreate_pr

What changes for your team

  • Eliminates 10–20 minutes of manual YAML editing per incident
  • Multi-file PRs: deployment + HPA + configmap changes in one PR
  • PR includes full incident context: root cause, impact assessment, rollback plan
  • Suggested reviewers based on file ownership and on-call rotation
  • Auto-merge policies available for low-risk changes

Pre-Merge Validation Pipeline

Every fix passes a 6-step validation pipeline before the PR is created. kubectl dry-run, Helm template check, Terraform plan, Dockerfile security lint, policy checks, and resource quota verification.

validate_yamlkubectl_dry_runhelm_templateterraform_plan

What changes for your team

  • kubectl apply --dry-run validates API server compatibility
  • Helm template check ensures chart renders valid YAML
  • Terraform plan confirms no unintended infrastructure changes
  • Custom policy checks (OPA/Rego, Kyverno) enforced automatically
  • Resource quota check confirms the cluster has capacity for the change
  • Zero broken PRs — if validation fails, the PR is not created
📁

GitHub/GitLab Integration & Multi-Repo

Connect all your repositories. KI-Ops reads, searches, and modifies files across repos. Automatic branch creation, PR opening, and post-merge branch cleanup.

list_reposread_repo_filesearch_repocreate_branch

What changes for your team

  • Multi-repo support: fix a deployment in repo A and a Helm chart in repo B in one workflow
  • Automatic feature branch creation with descriptive naming
  • Post-merge branch cleanup — no stale branches accumulating
  • Full audit trail: every change is a reviewed, approved PR
  • Works with GitHub and GitLab (self-hosted or cloud)
🚀

Full GitOps Workflow

From incident detection to merged fix in under 4 minutes. Feature branch creation → validated PR with incident context → team review → merge → automatic cleanup. Zero manual git work.

create_branchcleanup_branchescreate_pr

What changes for your team

  • MTTR drops from 40 minutes to 4 minutes on average
  • Compatible with ArgoCD, Flux, and other GitOps controllers
  • Human-in-the-loop: AI proposes, your team decides
  • One-click incident response — from alert to deployed fix
  • Complete audit trail for compliance and post-mortems

Free vs. Pro The Full Breakdown

Both tiers are production-ready. Free handles diagnostics. Pro eliminates the manual work that comes after.

CapabilityCommunity0 foreverPro250/year
Cluster Diagnostics
Pod, Deployment & Node analysiskubectl-based, real-time
CrashLoopBackOff / OOMKilled detection
HPA & resource quota analysis
DNS & network checks
Cluster health dashboard
Scheduled recurring analyses
Observability Integration
Grafana dashboard queries
Loki log analysis
Prometheus metric correlation
Cross-signal correlationLogs + metrics + cluster state
AI Analysis
LLM-powered root-cause analysis
Actionable recommendations (text)
Knowledge base & best practices
Incident clustering & deduplication
Auto-Remediation (Pro)
Auto-fix pull request creationAI generates YAML diffs, opens PR
Multi-file PR supportDeployment + HPA + ConfigMap in one PR
kubectl dry-run validation
Helm template validation
Terraform plan validation
Dockerfile security lint
Custom policy checks (OPA/Kyverno)
Git & DevOps Integration (Pro)
GitHub / GitLab integration
Multi-repo support
Automatic branch creation & cleanup
PR with incident context & rollback plan
ArgoCD / Flux GitOps compatible
Deployment & Security
Self-hosted deployment
BYOK (Bring Your Own Claude API key)
Data stays on your infrastructure
Open Source (MIT)
EKS / GKE / AKS / k3s compatible
Helm chart installationUnder 5 minutes setup

Both tiers include:

Self-hosted deployment · BYOK (Bring Your Own Claude API key) · No vendor lock-in · Open Source (MIT)

See Pricing

Your next incident diagnosed in seconds, fixed in minutes.

Start free with full diagnostics. Upgrade to Pro when you're ready for auto-fix PRs — 250€/year, whole team.

Analyze Your Cluster Free