Skip to content

Configure Alerting Rules

Goal

Configure end-to-end alerting for the Verity platform — from Prometheus rules that fire on access-decay score thresholds and SLA violations, through Grafana dashboards that visualise trends, to notification channels that reach the right people at the right time.


Prerequisites

Requirement Details
Prometheus v2.50+ scraping Verity services (see Monitoring)
Alertmanager Running and connected to Prometheus
Grafana v10+ with Prometheus data source configured
Notification accounts Slack webhook, PagerDuty integration key, or Teams webhook URL

1 — Prometheus Alert Rules

Create a rules file and mount it into your Prometheus configuration directory.

prometheus/rules/verity-alerts.yml
groups:
  # ── Access-Decay Score Alerts ─────────────────────────────────
  - name: verity.scores
    rules:
      - alert: HighDecayScoreDetected
        expr: verity_grant_decay_score > 80
        for: 10m
        labels:
          severity: warning
          team: identity-security
        annotations:
          summary: "High decay score on {{ $labels.principal }}→{{ $labels.asset }}"
          description: >-
            Grant {{ $labels.grant_id }} has a decay score of
            {{ $value | printf "%.1f" }} (threshold: 80).
            A review should be triggered automatically.
          runbook_url: https://mjtpena.github.io/verity/operations/runbooks/#high-decay-score

      - alert: CriticalDecayScore
        expr: verity_grant_decay_score > 95
        for: 5m
        labels:
          severity: critical
          team: identity-security
        annotations:
          summary: "Critical decay score  immediate review required"
          description: >-
            Grant {{ $labels.grant_id }} scored {{ $value | printf "%.1f" }}.
            Verify that the review generator has created a review and that
            an assignee is notified.

  # ── Review SLA Alerts ─────────────────────────────────────────
  - name: verity.reviews
    rules:
      - alert: ReviewSLABreached
        expr: >-
          (time() - verity_review_created_timestamp)
          > on(review_id) group_left() verity_review_sla_seconds
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "Review {{ $labels.review_id }} has breached its SLA"
          description: >-
            Review for grant {{ $labels.grant_id }} exceeded the configured
            SLA. Check escalation status in the Workflow Engine.

      - alert: ReviewBacklogGrowing
        expr: verity_reviews_pending_total > 200
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Pending review backlog exceeds 200"

  # ── Service Health Alerts ─────────────────────────────────────
  - name: verity.health
    rules:
      - alert: ConnectorIngestStalled
        expr: rate(verity_events_ingested_total[15m]) == 0
        for: 15m
        labels:
          severity: critical
        annotations:
          summary: "Connector {{ $labels.connector }} has stopped ingesting"
          runbook_url: https://mjtpena.github.io/verity/operations/runbooks/#connector-stalled

      - alert: KafkaConsumerLag
        expr: verity_kafka_consumer_lag > 10000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Kafka consumer lag above 10 000 for {{ $labels.group_id }}"

      - alert: DecayEngineLatencyHigh
        expr: >-
          histogram_quantile(0.99, rate(verity_score_computation_duration_seconds_bucket[5m])) > 2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "p99 decay-score computation exceeds 2 s"

Loading the rules

If you use the Verity Helm chart, set prometheus.rules.enabled: true in values.yaml — the chart ships these rules automatically via a PrometheusRule CRD.


2 — Alertmanager Routing

Route alerts to the correct channel based on severity:

alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  receiver: default-slack
  group_by: ["alertname", "team"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

  routes:
    # Critical → PagerDuty
    - match:
        severity: critical
      receiver: pagerduty-critical
      repeat_interval: 15m

    # Warning → Slack
    - match:
        severity: warning
      receiver: verity-slack-warnings

receivers:
  - name: default-slack
    slack_configs:
      - api_url: "https://hooks.slack.com/services/T00/B00/xxxx"
        channel: "#verity-alerts"
        title: '{{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ "\n" }}{{ end }}'

  - name: pagerduty-critical
    pagerduty_configs:
      - service_key: "<PAGERDUTY_INTEGRATION_KEY>"
        severity: critical

  - name: verity-slack-warnings
    slack_configs:
      - api_url: "https://hooks.slack.com/services/T00/B00/yyyy"
        channel: "#verity-warnings"

Use the Alertmanager webhook receiver with a Teams Incoming Webhook URL:

receivers:
  - name: verity-teams
    webhook_configs:
      - url: "https://outlook.office.com/webhook/..."
        send_resolved: true
receivers:
  - name: verity-email
    email_configs:
      - to: "security-team@example.com"
        from: "verity-alerts@example.com"
        smarthost: "smtp.example.com:587"
        auth_username: "verity-alerts@example.com"
        auth_password: "<SMTP_PASSWORD>"

3 — Grafana Dashboard

Import the Verity overview dashboard or build one manually with these panels:

Panel Query Visualisation
Decay Score Distribution histogram_quantile(0.50, verity_grant_decay_score_bucket) Heatmap
Ingestion Rate sum(rate(verity_events_ingested_total[5m])) by (connector) Time series
Review SLA Compliance verity_reviews_within_sla_total / verity_reviews_completed_total Gauge (%)
Pending Reviews verity_reviews_pending_total Stat
Connector Lag verity_events_lag_seconds Time series
Decay Engine p99 histogram_quantile(0.99, rate(verity_score_computation_duration_seconds_bucket[5m])) Time series

Dashboard JSON

A pre-built dashboard is available in the repository:

# Import into Grafana via the API
curl -X POST http://localhost:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GRAFANA_API_KEY" \
  -d @deploy/grafana/verity-overview.json

Alerting in Grafana

Grafana can evaluate alert rules independently from Prometheus. If you prefer Grafana-managed alerts, create them in the dashboard UI and point contact points to the same Slack/PagerDuty channels.


4 — Verify

  1. Fire a test alert — temporarily lower a threshold:

    # Temporarily lower the HighDecayScoreDetected threshold
    sed -i 's/> 80/> 0/' prometheus/rules/verity-alerts.yml
    curl -X POST http://localhost:9090/-/reload   # hot-reload rules
    
  2. Check Alertmanager — open http://localhost:9093 and confirm the alert appears under the correct receiver.

  3. Verify notification — confirm a message lands in the Slack channel or PagerDuty service.

  4. Revert the threshold and reload again.


Alert Summary

flowchart LR
    P[Prometheus] -->|rules evaluate| A[Alertmanager]
    A -->|critical| PD[PagerDuty]
    A -->|warning| SL[Slack / Teams]
    A -->|all| G[Grafana<br/>Dashboard]

    style PD fill:#e74c3c,color:#fff
    style SL fill:#2ecc71,color:#fff

Next Steps