Decay Engine¶

Path: services/analytics/decay-engine/ · Type: Worker

The Decay Engine is the core analytical service of the Verity platform. It computes an Access Decay Score for every principal–asset grant by combining six weighted factors that capture recency, trend, organisational context, data sensitivity, peer behaviour, and review freshness.

Architecture¶

graph LR
    PG[(PostgreSQL)] --> DE[Decay Engine]
    CH[(ClickHouse)] --> DE
    DE -->|verity.scores.updated| K{{Kafka}}
    DE --> PG

The Decay Engine runs as a scheduled batch job. Each cycle:

Reads all active grants from PostgreSQL.
Queries activity and audit data from PostgreSQL and ClickHouse.
Computes scores using Polars lazy evaluation.
Writes scores back to PostgreSQL.
Publishes score-change events to Kafka.

Scoring Model¶

Factor Definitions¶

The Access Decay Score is composed of six factors (F1–F6), each measuring a distinct dimension of access relevance.

F1 — Recency (weight: 30%)¶

Exponential decay based on inactivity:

$$ F_1 = e^{-\frac{\text{days_inactive}}{\text{half_life_days}}} $$

Parameter	Default	Description
`half_life_days`	90	Number of days for the score to halve

Examples:

Days Inactive	F1 Value
0	1.000
30	0.716
90	0.368
180	0.135
365	0.018

F2 — Trend (weight: 20%)¶

Activity trend comparing recent vs. prior period:

$$ F_2 = \frac{\text{events_last_90d}}{\text{events_prior_90d}} $$

Capped at [0.0, 2.0] to prevent outlier inflation.
If events_prior_90d = 0, F2 defaults to 1.0 (neutral).

F3 — Organisational Context (weight: 20%)¶

Captures structural changes that indicate access may no longer be needed:

Condition	Multiplier
Team / department changed	0.60×
Project ended	0.50×
No change	1.00×

When multiple conditions apply, the lowest multiplier is used.

F4 — Sensitivity Multiplier¶

Scales the score based on asset sensitivity — more sensitive data decays faster:

Sensitivity Level	Multiplier
`PII`	0.70
`FINANCIAL`	0.75
`CONFIDENTIAL`	0.85
`INTERNAL`	0.95
`PUBLIC`	1.00

F5 — Peer Deviation (weight: 10%)¶

Compares the principal's activity to the 80th percentile of peers with the same role and grant:

$$ F_5 = \frac{\text{principal_activity}}{\text{peer_p80_activity}} $$

Capped at [0.0, 2.0].
Peers are defined as principals with the same role accessing the same asset.

F6 — Review Recency¶

Boost or penalty based on the last human review:

Last Review	Multiplier
≤ 30 days ago	1.10
≤ 90 days ago	1.05
> 90 days ago	0.95
Never reviewed	0.90

Score Formula¶

Step 1 — Compute base score from the weighted factors:

$$ \text{base} = \frac{F_1 \times 0.30 + F_2 \times 0.20 + F_3 \times 0.20 + F_5 \times 0.10}{0.80} $$

Step 2 — Apply multipliers and scale:

$$ \text{score} = \text{base} \times F_4 \times F_6 \times 100 $$

The final score is an integer in the range 0–100.

flowchart LR
    F1[F1: Recency\n30%] --> BASE
    F2[F2: Trend\n20%] --> BASE
    F3[F3: Org Context\n20%] --> BASE
    F5[F5: Peer Deviation\n10%] --> BASE[Base Score\nweighted sum / 0.80]
    BASE --> MUL[Apply Multipliers]
    F4[F4: Sensitivity] --> MUL
    F6[F6: Review Recency] --> MUL
    MUL --> SCORE["score × 100\n(0–100)"]

Risk Levels & SLAs¶

Score Range	Risk Level	SLA (hours)	SLA (human-readable)
0–20	CRITICAL	48	2 days
21–40	HIGH	168	7 days
41–60	MEDIUM	720	30 days
61–80	LOW	2,160	90 days
81–100	No action	—	Access considered healthy

Grants scoring ≤ 80 trigger the review workflow. Grants scoring > 80 are considered healthy and do not generate review packets.

Batch Computation¶

The engine processes all active grants in a single batch using Polars with lazy evaluation for memory efficiency:

sequenceDiagram
    participant Scheduler
    participant Engine as Decay Engine
    participant PG as PostgreSQL
    participant CH as ClickHouse
    participant Kafka

    Scheduler->>Engine: Trigger batch run
    Engine->>PG: Load active grants
    Engine->>PG: Load principal metadata
    Engine->>CH: Query activity aggregates
    Engine->>Engine: Compute scores (Polars lazy)
    Engine->>PG: Bulk upsert scores
    Engine->>Kafka: Publish verity.scores.updated
    Engine->>Scheduler: Batch complete

Performance characteristics:

Lazy evaluation ensures constant memory usage regardless of grant count.
Partitioned by principal for parallelism.
Typical throughput: ~100k grants/minute on a 4-vCPU worker.

Configuration¶

Variable	Required	Default	Description
`DECAY_DATABASE_URL`	Yes	—	PostgreSQL connection string
`DECAY_CLICKHOUSE_URL`	Yes	—	ClickHouse HTTP URL
`DECAY_KAFKA_BOOTSTRAP`	Yes	—	Kafka bootstrap servers
`DECAY_HALF_LIFE_DAYS`	No	`90`	Exponential decay half-life
`DECAY_BATCH_SCHEDULE`	No	`0 2 * * *`	Cron schedule (default: 2 AM daily)
`DECAY_PARALLELISM`	No	`4`	Number of parallel partitions
`DECAY_LOG_LEVEL`	No	`INFO`	Python log level

Observability¶

Metric	Type	Description
`decay_batch_duration_seconds`	Histogram	Total batch computation time
`decay_grants_scored_total`	Counter	Number of grants scored per batch
`decay_scores_by_risk_level`	Gauge	Current grant count per risk level
`decay_score_distribution`	Histogram	Score value distribution
`decay_kafka_publish_errors_total`	Counter	Failed Kafka publishes