Configure Alerting Rules¶
Goal¶
Configure end-to-end alerting for the Verity platform — from Prometheus rules that fire on access-decay score thresholds and SLA violations, through Grafana dashboards that visualise trends, to notification channels that reach the right people at the right time.
Prerequisites¶
| Requirement | Details |
|---|---|
| Prometheus | v2.50+ scraping Verity services (see Monitoring) |
| Alertmanager | Running and connected to Prometheus |
| Grafana | v10+ with Prometheus data source configured |
| Notification accounts | Slack webhook, PagerDuty integration key, or Teams webhook URL |
1 — Prometheus Alert Rules¶
Create a rules file and mount it into your Prometheus configuration directory.
groups:
# ── Access-Decay Score Alerts ─────────────────────────────────
- name: verity.scores
rules:
- alert: HighDecayScoreDetected
expr: verity_grant_decay_score > 80
for: 10m
labels:
severity: warning
team: identity-security
annotations:
summary: "High decay score on {{ $labels.principal }}→{{ $labels.asset }}"
description: >-
Grant {{ $labels.grant_id }} has a decay score of
{{ $value | printf "%.1f" }} (threshold: 80).
A review should be triggered automatically.
runbook_url: https://mjtpena.github.io/verity/operations/runbooks/#high-decay-score
- alert: CriticalDecayScore
expr: verity_grant_decay_score > 95
for: 5m
labels:
severity: critical
team: identity-security
annotations:
summary: "Critical decay score — immediate review required"
description: >-
Grant {{ $labels.grant_id }} scored {{ $value | printf "%.1f" }}.
Verify that the review generator has created a review and that
an assignee is notified.
# ── Review SLA Alerts ─────────────────────────────────────────
- name: verity.reviews
rules:
- alert: ReviewSLABreached
expr: >-
(time() - verity_review_created_timestamp)
> on(review_id) group_left() verity_review_sla_seconds
for: 0m
labels:
severity: warning
annotations:
summary: "Review {{ $labels.review_id }} has breached its SLA"
description: >-
Review for grant {{ $labels.grant_id }} exceeded the configured
SLA. Check escalation status in the Workflow Engine.
- alert: ReviewBacklogGrowing
expr: verity_reviews_pending_total > 200
for: 30m
labels:
severity: warning
annotations:
summary: "Pending review backlog exceeds 200"
# ── Service Health Alerts ─────────────────────────────────────
- name: verity.health
rules:
- alert: ConnectorIngestStalled
expr: rate(verity_events_ingested_total[15m]) == 0
for: 15m
labels:
severity: critical
annotations:
summary: "Connector {{ $labels.connector }} has stopped ingesting"
runbook_url: https://mjtpena.github.io/verity/operations/runbooks/#connector-stalled
- alert: KafkaConsumerLag
expr: verity_kafka_consumer_lag > 10000
for: 10m
labels:
severity: warning
annotations:
summary: "Kafka consumer lag above 10 000 for {{ $labels.group_id }}"
- alert: DecayEngineLatencyHigh
expr: >-
histogram_quantile(0.99, rate(verity_score_computation_duration_seconds_bucket[5m])) > 2
for: 10m
labels:
severity: warning
annotations:
summary: "p99 decay-score computation exceeds 2 s"
Loading the rules
If you use the Verity Helm chart, set prometheus.rules.enabled: true in
values.yaml — the chart ships these rules automatically via a
PrometheusRule CRD.
2 — Alertmanager Routing¶
Route alerts to the correct channel based on severity:
global:
resolve_timeout: 5m
route:
receiver: default-slack
group_by: ["alertname", "team"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
# Critical → PagerDuty
- match:
severity: critical
receiver: pagerduty-critical
repeat_interval: 15m
# Warning → Slack
- match:
severity: warning
receiver: verity-slack-warnings
receivers:
- name: default-slack
slack_configs:
- api_url: "https://hooks.slack.com/services/T00/B00/xxxx"
channel: "#verity-alerts"
title: '{{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ "\n" }}{{ end }}'
- name: pagerduty-critical
pagerduty_configs:
- service_key: "<PAGERDUTY_INTEGRATION_KEY>"
severity: critical
- name: verity-slack-warnings
slack_configs:
- api_url: "https://hooks.slack.com/services/T00/B00/yyyy"
channel: "#verity-warnings"
Use the Alertmanager webhook receiver with a Teams Incoming Webhook URL:
3 — Grafana Dashboard¶
Import the Verity overview dashboard or build one manually with these panels:
Recommended Panels¶
| Panel | Query | Visualisation |
|---|---|---|
| Decay Score Distribution | histogram_quantile(0.50, verity_grant_decay_score_bucket) |
Heatmap |
| Ingestion Rate | sum(rate(verity_events_ingested_total[5m])) by (connector) |
Time series |
| Review SLA Compliance | verity_reviews_within_sla_total / verity_reviews_completed_total |
Gauge (%) |
| Pending Reviews | verity_reviews_pending_total |
Stat |
| Connector Lag | verity_events_lag_seconds |
Time series |
| Decay Engine p99 | histogram_quantile(0.99, rate(verity_score_computation_duration_seconds_bucket[5m])) |
Time series |
Dashboard JSON¶
A pre-built dashboard is available in the repository:
# Import into Grafana via the API
curl -X POST http://localhost:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GRAFANA_API_KEY" \
-d @deploy/grafana/verity-overview.json
Alerting in Grafana
Grafana can evaluate alert rules independently from Prometheus. If you prefer Grafana-managed alerts, create them in the dashboard UI and point contact points to the same Slack/PagerDuty channels.
4 — Verify¶
-
Fire a test alert — temporarily lower a threshold:
-
Check Alertmanager — open
http://localhost:9093and confirm the alert appears under the correct receiver. -
Verify notification — confirm a message lands in the Slack channel or PagerDuty service.
-
Revert the threshold and reload again.
Alert Summary¶
flowchart LR
P[Prometheus] -->|rules evaluate| A[Alertmanager]
A -->|critical| PD[PagerDuty]
A -->|warning| SL[Slack / Teams]
A -->|all| G[Grafana<br/>Dashboard]
style PD fill:#e74c3c,color:#fff
style SL fill:#2ecc71,color:#fff
Next Steps¶
- Review the full Monitoring & Alerting reference
- Set up Runbooks for each alert
- Configure Review Workflows to automate responses