Remediation Pipeline¶
Overview¶
Remediation is where decisions become actions. When a reviewer decides to revoke, reduce, or suspend access — or when an automated policy triggers — the Remediation service executes the action safely, auditably, and reversibly.
Verity treats remediation as a high-risk operation. Incorrectly revoking a production service account can cause an outage; failing to revoke a compromised credential can cause a breach. The pipeline is designed to balance both risks.
Principle: Safe by Default
Every remediation action passes through validation, dry-run (if enabled), blast-radius checks, and execution — in that order. No step is skippable in production deployments.
Remediation Actions¶
The Remediation service supports four action types:
| Action | Effect | Example |
|---|---|---|
| Revoke | Completely remove the access grant. | Delete an IAM role binding, remove a GitHub collaborator. |
| Reduce | Downgrade to a lower permission level. | Change write to read, demote admin to member. |
| Suspend | Temporarily disable without deletion. | Disable an Entra ID account, suspend an API key. |
| Notify | No access change — send an alert only. | Notify the security team about a high-risk grant for manual handling. |
graph LR
D["Review Decision<br/>or Policy Trigger"]
D --> R{Action Type}
R -->|Revoke| REV["Remove Grant"]
R -->|Reduce| RED["Downgrade Permission"]
R -->|Suspend| SUS["Disable Account/Key"]
R -->|Notify| NOT["Send Alert Only"]
REV & RED & SUS --> CONN["Connector<br/>Write Operation"]
CONN --> AUDIT["Audit Log<br/>(ClickHouse)"]
NOT --> AUDIT
style D fill:#536dfe,color:#fff,stroke:none
style CONN fill:#7c4dff,color:#fff,stroke:none
style AUDIT fill:#263238,color:#fff,stroke:none
style REV fill:#f44336,color:#fff,stroke:none
style RED fill:#ff9800,color:#000,stroke:none
style SUS fill:#ffc107,color:#000,stroke:none
style NOT fill:#4caf50,color:#fff,stroke:none
Connector Framework¶
Verity uses a unified connector framework — the same connectors that ingest audit events and entitlements during the Ingest phase are used to execute remediation actions during the Remediate phase.
This bidirectional design ensures:
- Consistency: The connector understands the target system's data model for both read and write operations.
- Credential reuse: A single set of service credentials (with appropriate read + write scopes) is configured once.
- Atomicity: The connector can verify the current state before modifying it, preventing race conditions.
| Connector | Supported Actions | Write API |
|---|---|---|
| Microsoft Entra ID | Revoke, Reduce, Suspend | Microsoft Graph API |
| AWS IAM | Revoke, Reduce | AWS IAM / STS API |
| GitHub | Revoke, Reduce | GitHub REST API |
| Okta | Revoke, Suspend | Okta API |
| Google Workspace | Revoke, Reduce, Suspend | Google Admin SDK |
| PostgreSQL (roles) | Revoke, Reduce | SQL REVOKE / ALTER ROLE |
| Generic SCIM | Revoke, Suspend | SCIM 2.0 PATCH / DELETE |
| Custom (webhook) | All | HTTP POST with action payload |
Custom Connectors
If your target system is not covered by a built-in connector, the webhook connector can forward remediation actions to any HTTP endpoint. Your service receives a JSON payload and executes the action in your own code.
Safety Mechanisms¶
Remediation is guarded by four layers of safety:
graph TD
ACTION["Remediation Action Requested"]
ACTION --> V["① Validation<br/>Pre-flight checks"]
V --> DR["② Dry-Run<br/>(if enabled)"]
DR --> BR["③ Blast-Radius Check<br/>Limit enforcement"]
BR --> EXEC["④ Execution<br/>Connector write"]
EXEC --> AUDIT["Audit Log"]
V -.->|"Fails validation"| REJECT["❌ Rejected"]
DR -.->|"Dry-run mode"| LOG["📋 Logged, not executed"]
BR -.->|"Limit exceeded"| HOLD["⏸ Held for admin approval"]
style ACTION fill:#536dfe,color:#fff,stroke:none
style V fill:#7c4dff,color:#fff,stroke:none
style DR fill:#651fff,color:#fff,stroke:none
style BR fill:#448aff,color:#fff,stroke:none
style EXEC fill:#40c4ff,color:#000,stroke:none
style AUDIT fill:#263238,color:#fff,stroke:none
style REJECT fill:#f44336,color:#fff,stroke:none
style LOG fill:#ff9800,color:#000,stroke:none
style HOLD fill:#ffc107,color:#000,stroke:none
① Validation¶
Before any action, the Remediation service runs pre-flight checks:
- Grant still exists: Verify the access grant has not already been removed.
- No protection flag: Check that the grant is not marked as
protected(break-glass accounts, critical service accounts). - Connector health: Confirm the target system's connector is reachable and authenticated.
- Permission check: Verify the connector's service credentials have write scope.
② Dry-Run Mode¶
When dry-run is enabled (globally or per-connector), the Remediation service performs all validation and blast-radius checks but does not execute the action. Instead, it logs the action that would have been taken.
Use Dry-Run During Onboarding
When connecting a new system to Verity for the first time, enable dry-run mode for 2–4 weeks. This lets you validate that scoring and review routing work correctly before any automated remediation occurs.
{
"action": "revoke",
"grant_id": "g_01HZ3V9K7QWXR5YJNBM2C8F6D",
"dry_run": true,
"result": "would_execute",
"details": "Would remove role binding 'Storage Admin' for user@example.com on project prod-data"
}
③ Blast-Radius Limits¶
Blast-radius limits prevent cascading failures from mass revocations:
| Limit | Default | Description |
|---|---|---|
| Per-connector rate limit | 50 actions / hour | Max actions per connector per hour |
| Per-identity limit | 5 actions / day | Max grants revoked for a single user per day |
| Per-resource limit | 10 actions / hour | Max grants revoked on a single resource per hour |
| Global circuit breaker | 200 actions / hour | Absolute max across all connectors |
When a limit is hit, pending actions are held and an admin notification is sent. An administrator must explicitly approve the held actions or raise the limit.
Circuit Breaker
The global circuit breaker exists to prevent a bug or misconfiguration from triggering an organisation-wide revocation storm. If it trips, all remediation pauses until an admin intervenes.
④ Admin Override¶
Administrators can:
- Force-execute a held action (bypasses blast-radius limits).
- Cancel a pending action.
- Rollback a completed action (if the connector supports it).
- Pause all remediation globally (emergency stop).
Rollback¶
For connectors that support it, Verity can rollback a completed remediation action — restoring the access grant to its pre-remediation state.
The rollback mechanism works because every action logs a before-state snapshot:
{
"action_id": "act_01HZ4A2X9MBJR7YKND3F8G5HK",
"action": "revoke",
"status": "executed",
"before_state": {
"role": "Storage Admin",
"scope": "projects/prod-data",
"principal": "user@example.com",
"granted_at": "2024-03-15T10:00:00Z"
},
"after_state": {
"role": null,
"scope": null,
"principal": "user@example.com"
},
"rollback_available": true,
"rollback_expires_at": "2025-02-14T14:32:00Z"
}
Rollback Window
Rollbacks are available for 30 days after execution (configurable). After the rollback window, the before-state is retained in the audit log for compliance but can no longer be auto-applied.
Audit Trail¶
Every remediation action — whether executed, dry-run, held, or rolled back — is logged immutably to ClickHouse with full before/after state:
| Field | Description |
|---|---|
action_id |
Unique identifier for the remediation action |
review_id |
Link to the originating review (if any) |
grant_id |
The access grant being remediated |
action_type |
revoke, reduce, suspend, notify |
status |
pending, dry_run, executed, failed, rolled_back, held |
before_state |
Full state of the grant before the action (JSON) |
after_state |
Full state of the grant after the action (JSON) |
connector_id |
Which connector executed the action |
initiated_by |
User or policy that triggered the action |
decided_by |
Reviewer who made the decision (if human-triggered) |
executed_at |
Timestamp of execution |
metadata |
Additional connector-specific data |
-- Example: Find all remediation actions for a specific user in the last 30 days
SELECT
action_id,
action_type,
status,
before_state,
after_state,
executed_at
FROM remediation_actions
WHERE grant_id IN (
SELECT grant_id FROM access_grants
WHERE identity_id = 'id_01HZ3V9K7QWXR5YJNBM2C8F6E'
)
AND executed_at > now() - INTERVAL 30 DAY
ORDER BY executed_at DESC;
Retention¶
| Data | Retention | Storage |
|---|---|---|
| Full action records | 7 years | ClickHouse |
| Before/after state snapshots | 7 years | ClickHouse |
| Rollback capability | 30 days (configurable) | ClickHouse + connector state |
End-to-End Flow¶
The complete remediation flow, from review decision to audit:
sequenceDiagram
participant RV as Reviewer
participant WF as Workflow Engine
participant RM as Remediation Service
participant VAL as Validator
participant CON as Connector
participant CK as ClickHouse
participant KF as Kafka
RV->>WF: Submit decision (revoke)
WF->>RM: ExecuteRemediation(action)
RM->>VAL: Pre-flight checks
VAL-->>RM: ✓ Validated
alt Dry-run enabled
RM->>CK: Log dry-run result
RM-->>WF: DryRunComplete
else Production
RM->>RM: Check blast-radius limits
alt Within limits
RM->>CON: Snapshot before-state
CON-->>RM: Before-state captured
RM->>CON: Execute action
CON-->>RM: ✓ Action complete
RM->>CK: Log action + before/after state
RM->>KF: Publish remediation.event
RM-->>WF: RemediationComplete
else Limit exceeded
RM->>CK: Log held action
RM-->>WF: ActionHeld(awaiting_admin)
end
end
Next Steps¶
-
Access Decay Theory
Return to the fundamentals of access decay.
-
Architecture Overview
See how the Remediation service fits into the overall platform architecture.