Operations Guide¶
This section covers the day-to-day operation of the Verity platform in staging and production environments.
Overview¶
graph LR
subgraph Observe
PROM["Prometheus<br/>Metrics"]
GRAFANA["Grafana<br/>Dashboards"]
LOGS["Log<br/>Aggregation"]
ALERTS["Alert<br/>Manager"]
end
subgraph Respond
RUNBOOKS["Runbooks"]
TROUBLESHOOT["Troubleshooting<br/>Guide"]
end
subgraph Act
RESTART["Service<br/>Restart"]
SCALE["Scale<br/>Workers"]
MAINTAIN["Database<br/>Maintenance"]
end
PROM --> GRAFANA
PROM --> ALERTS
ALERTS --> RUNBOOKS
RUNBOOKS --> RESTART & SCALE & MAINTAIN
LOGS --> TROUBLESHOOT
TROUBLESHOOT --> RESTART & SCALE & MAINTAIN
style PROM fill:#7c3aed,color:#fff
style GRAFANA fill:#7c3aed,color:#fff
style ALERTS fill:#ef4444,color:#fff
style RUNBOOKS fill:#f59e0b,color:#000
Key Operational Areas¶
| Area | Guide | Description |
|---|---|---|
| Monitoring | Monitoring & Alerting | Prometheus metrics, alerting rules, Grafana dashboards |
| Runbooks | Operational Runbooks | Step-by-step procedures for common operational tasks |
| Troubleshooting | Troubleshooting | Common errors, debugging techniques, and solutions |
Health Check Endpoints¶
All Verity services expose health check endpoints:
| Endpoint | Purpose |
|---|---|
/health |
Basic liveness check |
/health/ready |
Readiness check (includes dependency health) |
/v1/metrics |
Prometheus metrics endpoint |
Quick Commands¶
# Check pod status
kubectl get pods -n verity -o wide
# View recent logs for a service
kubectl logs -n verity -l app.kubernetes.io/component=api-gateway --tail=100
# Check Prometheus alerts firing
kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Then visit http://localhost:9090/alerts
# Check Kafka consumer lag
kubectl exec -n verity -it deploy/verity-ingestion -- \
kafka-consumer-groups.sh --bootstrap-server $KAFKA_BOOTSTRAP_SERVERS \
--group decay-engine --describe
# Scale a service
kubectl scale -n verity deployment/verity-api-gateway --replicas=4