Getting started
Quickstart
Get AlertINT running as a single self-hosted binary, then connect your AI agent over the Model Context Protocol (MCP) to start analyzing incidents with production context.
Download the Binary
Get the latest release for your architecture. AlertINT is distributed as a static go binary with zero external dependencies.
Create Configuration
Define your environment variables in a config.yaml file. This allows AlertINT to authenticate with your alert source and AI backend. See Configuration section for more details.
config.yamlStart the Server
Launch the core service to start ingestion. This will begin listening for incoming webhooks and indexing incident history.
alertint serve --config config.yamlStart sending alerts
Point AlertINT service to start receiving and analyzing alerts. See Integrations section for more info.
Connect an MCP Client
Open your preferred tool (Claude Code, Cursor, or Windsurf) and point it at the local MCP endpoint. Verify connectivity by asking a specific operational question:
List recent AlertINT incidents and summarize the most critical one.
Getting started
Configuration
AlertINT runs from a single config.yaml plus environment-based secrets.
alertmanager:
webhook_token: ${ALERTINT_WEBHOOK_TOKEN}
anthropic:
api_key: ${ANTHROPIC_API_KEY}
state:
path: ./.alertint
slack:
webhook_url: ${SLACK_WEBHOOK_URL}
prometheus:
base_url: http://localhost:9090
bearer_token: ${PROMETHEUS_BEARER_TOKEN}Concepts
Architecture
One self-hosted binary sits between Alertmanager and your AI agent, and turns raw alerts into context worth investigating.
Webhook Transmission
Alertmanager fires a POST to the AlertINT webhook receiver over HTTP(S). The payload is the standard Alertmanager webhook JSON — no custom format required.
Data Persistence & Deduplication
Received alerts are written to local state (SQLite by default). Duplicate firings of the same alert fingerprint within the dedup window are collapsed — one record per logical alert.
Correlation Engine
Alerts that fire within a configurable time window and share common label dimensions (cluster, namespace, service, alertname) are grouped into a single incident record. Grouping is deterministic and re-evaluated as new alerts arrive.
AI Synthesis
Once an incident is stable (no new alerts for the settle window), AlertINT sends the grouped alert payloads to the configured LLM (Anthropic Claude). The model returns a structured finding: probable cause, severity assessment, and suggested next checks. The finding is stored locally alongside the incident.
Outbound Notification
The AI finding is posted to the configured Slack webhook. The message links back to the incident and includes the probable cause summary so on-call engineers know something is worth opening.
Agent Entry via MCP
An engineer opens their MCP-capable AI client (Claude Code, Cursor, Windsurf, or any MCP-compatible tool). The client is pointed at the AlertINT MCP server endpoint, which runs as part of the same binary. No separate daemon is needed.
Evidence Query
The agent calls AlertINT MCP tools to list recent incidents, retrieve alert payloads, and read the stored AI finding. All data is served from local state — no external calls at this stage.
Telemetry Context
The agent issues read-only PromQL queries through AlertINT MCP tools. AlertINT proxies the query to the configured Prometheus instance and returns the result. The agent can inspect CPU, memory, latency, error rate, or any metric stored in Prometheus — scoped to the incident time window.
Decision Point
The agent synthesizes alert payloads, the stored AI finding, and live metric context into a response. The engineer decides the next action — re-query, escalate, or begin remediation — with full context already in the conversation.
The engineer acts with full incident context — alert history, correlated findings, and live metrics — all sourced read-only from within the local runtime.
MCP-first investigation
An MCP server is the primary way you and your agent interact with AlertINT.
List recent AlertINT incidents.
Open the latest critical incident and summarize the evidence.
Show the alert labels and annotations for this incident.
Query Prometheus for CPU and memory around the incident window.
Compare the finding with the metric trend and suggest next checks.Concepts
Scope and limits
These are deliberate boundaries, not missing features.
What it does not do
- No remediation, silences, or routing changes.
- No Alertmanager, Kubernetes, or infrastructure writes.
- No ticketing or paging integrations (PagerDuty, Jira, Linear).
Reference
FAQ
Short answers for the decisions that matter before you install it.
Do I have to host it myself?
Yes — AlertINT is self-hosted. It runs as a single binary with local state, so your alert data stays with you.
Does it replace Alertmanager?
No. Alertmanager still routes and manages alerts. AlertINT receives a webhook copy and builds investigation context.
Does it create silences or change routing?
No. AlertINT is read-only — it never changes Alertmanager, Kubernetes, or your infrastructure.
Why MCP?
Because engineers increasingly investigate through agentic tools. MCP lets those tools inspect AlertINT state and query telemetry without copy-pasting alert text into chat.
Why Prometheus?
Alert payloads alone are shallow. Read-only Prometheus queries let the connected agent inspect metrics around the incident window.
Can it remediate incidents?
No. AlertINT is read-only by design. Operator-controlled, approval-gated write workflows are a far-future direction.