Data Flow
This document describes the end-to-end data pipeline that transforms raw access
data from source systems into decay scores, review decisions, and automated
remediations. All inter-service communication flows through Apache Kafka (KRaft
mode).
End-to-End Sequence
The following sequence diagram traces a single access event from ingestion through
remediation.
sequenceDiagram
autonumber
participant SRC as Source System<br/>(e.g. Entra ID)
participant CON as Connector
participant K1 as Kafka<br/>verity.events.raw.{platform}
participant IW as Ingest Worker
participant K2 as Kafka<br/>verity.events.normalised
participant NE as Normalise Engine
participant PG as PostgreSQL<br/>(TimescaleDB)
participant DE as Decay Engine
participant K3 as Kafka<br/>verity.scores.updated
participant RG as Review Generator
participant K4 as Kafka<br/>verity.reviews.created
participant WE as Workflow Engine<br/>(Temporal)
participant K5 as Kafka<br/>verity.reviews.decided
participant RE as Remediation Executor
participant K6 as Kafka<br/>verity.remediations.completed
participant AW as Audit Writer
participant CH as ClickHouse
SRC->>CON: Poll / webhook
CON->>K1: Produce raw event
K1->>IW: Consume raw event
IW->>IW: Validate & deduplicate
IW->>K2: Produce validated event
K2->>NE: Consume validated event
NE->>NE: Map to canonical model
NE->>PG: Upsert Principal, Asset, AccessGrant, AccessEvent
NE->>K2: Produce normalised event
K2->>DE: Consume normalised event
DE->>PG: Read grants & peer groups
DE->>DE: Compute 6-factor decay score
DE->>PG: Write access_scores
DE->>K3: Produce score update
K3->>RG: Consume score update
RG->>RG: Evaluate policy thresholds
RG->>PG: Create review_packet
RG->>K4: Produce review created
K4->>WE: Consume review created
WE->>WE: Start Temporal workflow
WE->>WE: Assign → Notify → Await decision
WE->>K5: Produce review decided
K5->>RE: Consume decided review
RE->>SRC: Execute remediation action
RE->>PG: Update grant status
RE->>K6: Produce remediation completed
Note over AW,CH: All services also emit to verity.audit.all
K6->>AW: Consume audit events
AW->>CH: Append to verity_audit
Kafka Topic Catalogue
All topics use the verity. prefix and follow the naming convention
verity.<domain>.<event-type>[.<qualifier>].
Raw Event Topics
| Topic |
Producer |
Consumer(s) |
Key |
Partitions |
Description |
verity.events.raw.azure |
connector-azure-ad |
ingest-worker |
tenant_id |
6 |
Raw users, groups, roles, and sign-in logs from Entra ID. |
verity.events.raw.snowflake |
connector-snowflake |
ingest-worker |
account_id |
6 |
Raw grants and query history from Snowflake. |
verity.events.raw.databricks |
connector-databricks |
ingest-worker |
workspace_id |
6 |
Raw permissions and audit logs from Databricks. |
verity.events.raw.fabric |
connector-fabric |
ingest-worker |
workspace_id |
6 |
Raw workspace items and permissions from Fabric. |
Pipeline Topics
| Topic |
Producer |
Consumer(s) |
Key |
Partitions |
Description |
verity.events.normalised |
normalise-engine |
decay-engine |
principal_id |
12 |
Canonical AccessEvent records after identity resolution and asset classification. |
verity.scores.updated |
decay-engine |
review-generator, dashboard-ui |
grant_id |
12 |
Score change events emitted after every scoring run. |
verity.reviews.created |
review-generator |
workflow-engine |
review_id |
6 |
New review packets awaiting assignment. |
verity.reviews.decided |
workflow-engine |
remediation-executor |
review_id |
6 |
Reviews with an approved decision (revoke, downgrade, or maintain). |
verity.remediations.completed |
remediation-executor |
audit-writer |
grant_id |
6 |
Confirmation of remediation actions executed in source systems. |
Infrastructure Topics
| Topic |
Producer |
Consumer(s) |
Key |
Partitions |
Description |
verity.audit.all |
All services |
audit-writer |
entity_id |
12 |
Unified audit stream — every significant action across the platform. |
verity.identity.resolve |
normalise-engine |
normalise-engine |
email |
6 |
Identity-resolution requests for cross-platform principal matching. |
verity.asset.classify |
normalise-engine |
normalise-engine |
asset_id |
6 |
Asset-classification requests for sensitivity and data-classification tagging. |
Topic Configuration Defaults
| Setting |
Value |
Rationale |
replication.factor |
3 |
Durability across broker failures. |
min.insync.replicas |
2 |
Strong consistency guarantee. |
retention.ms |
604 800 000 (7 days) |
Pipeline topics — sufficient for replay. |
retention.ms (audit) |
220 752 000 000 (7 years) |
verity.audit.all — regulatory retention. |
cleanup.policy |
delete |
Default; compacted topics noted individually. |
compression.type |
zstd |
Best compression ratio for JSON payloads. |
Data Lifecycle
Access Events
| Phase |
Store |
Retention |
Details |
| Raw ingestion |
Kafka (verity.events.raw.*) |
7 days |
Replay buffer; retained for re-processing. |
| Normalised |
PostgreSQL access_events (hypertable) |
7 years |
1-day chunks; compressed after 7 days. |
| Audit copy |
ClickHouse verity_audit |
7 years |
Partitioned monthly; TTL-enforced. |
Access Scores
| Phase |
Store |
Retention |
Details |
| Score history |
PostgreSQL access_scores (hypertable) |
1 year |
7-day chunks; used for trend analysis. |
| Latest scores |
PostgreSQL access_scores_latest (continuous aggregate) |
Indefinite |
Materialised view; refreshed continuously. |
| Score events |
Kafka (verity.scores.updated) |
7 days |
Consumed by review-generator and dashboard. |
Reviews
| Phase |
Store |
Retention |
Details |
| Active reviews |
PostgreSQL review_packets |
Indefinite |
Until closed; archived after 1 year. |
| Decisions |
PostgreSQL review_decisions |
7 years |
Regulatory requirement — who decided what. |
| Workflow state |
Temporal |
Configurable |
Temporal manages its own retention. |
Access Review Workflow
The review lifecycle is implemented as a Temporal workflow in the workflow-engine
service. The following state machine describes all possible transitions.
stateDiagram-v2
[*] --> CREATED: Review packet generated
CREATED --> ASSIGNED: Reviewer matched by policy
ASSIGNED --> NOTIFIED: Email / Slack notification sent
NOTIFIED --> DECIDED: Reviewer submits decision
NOTIFIED --> ESCALATED: SLA breached (no response)
ESCALATED --> DECIDED: Escalated reviewer decides
DECIDED --> REMEDIATED: Decision = Revoke or Downgrade
DECIDED --> CLOSED: Decision = Maintain
REMEDIATED --> CLOSED: Remediation confirmed
CLOSED --> [*]
state DECIDED {
[*] --> Revoke
[*] --> Downgrade
[*] --> Maintain
}
Workflow States
| State |
Description |
SLA |
Escalation |
CREATED |
Review packet generated by review-generator when score exceeds threshold. |
— |
— |
ASSIGNED |
Reviewer identified by ownership rules or round-robin assignment. |
— |
— |
NOTIFIED |
Notification sent via configured channel (email, Slack, Teams). |
— |
— |
DECIDED |
Reviewer has submitted a decision: Revoke, Downgrade, or Maintain. |
5 business days |
Escalates to manager after SLA breach. |
ESCALATED |
Original reviewer did not respond within SLA; escalated to next-level approver. |
3 business days |
Auto-revoke if second SLA breached. |
REMEDIATED |
Remediation action executed in the source system. |
1 hour |
Alert on failure. |
CLOSED |
Terminal state — review complete, all actions confirmed. |
— |
— |
Temporal Workflow Details
| Property |
Value |
| Task Queue |
verity-reviews |
| Workflow ID |
review-{review_id} |
| Execution Timeout |
30 days |
| Run Timeout |
30 days |
| Retry Policy |
3 attempts, 60 s initial interval, 2.0 backoff coefficient |
Signal: submit_decision |
Reviewer submits a decision (revoke / downgrade / maintain + justification). |
Signal: reassign |
Admin reassigns the review to a different reviewer. |
Query: get_status |
Returns current workflow state and metadata. |
Error Handling & Dead-Letter Queues
Every consumer implements a retry-then-DLQ pattern:
flowchart LR
A[Kafka Topic] --> B{Consumer}
B -->|Success| C[Process & Commit]
B -->|Transient Error| D[Retry — exponential backoff<br/>max 3 attempts]
D -->|Recovered| C
D -->|Exhausted| E[Dead-Letter Topic<br/>verity.dlq.{original-topic}]
E --> F[Manual Investigation]
| Setting |
Value |
| Max retries |
3 |
| Initial backoff |
1 second |
| Max backoff |
30 seconds |
| Backoff multiplier |
2.0 |
| DLQ topic naming |
verity.dlq.{original-topic-name} |
Dead-letter messages retain the original headers (x-original-topic,
x-retry-count, x-error-message) for investigation and replay.