Skip to content

Remediation Pipeline

Overview

Remediation is where decisions become actions. When a reviewer decides to revoke, reduce, or suspend access — or when an automated policy triggers — the Remediation service executes the action safely, auditably, and reversibly.

Verity treats remediation as a high-risk operation. Incorrectly revoking a production service account can cause an outage; failing to revoke a compromised credential can cause a breach. The pipeline is designed to balance both risks.

Principle: Safe by Default

Every remediation action passes through validation, dry-run (if enabled), blast-radius checks, and execution — in that order. No step is skippable in production deployments.


Remediation Actions

The Remediation service supports four action types:

Action Effect Example
Revoke Completely remove the access grant. Delete an IAM role binding, remove a GitHub collaborator.
Reduce Downgrade to a lower permission level. Change write to read, demote admin to member.
Suspend Temporarily disable without deletion. Disable an Entra ID account, suspend an API key.
Notify No access change — send an alert only. Notify the security team about a high-risk grant for manual handling.
graph LR
    D["Review Decision<br/>or Policy Trigger"]

    D --> R{Action Type}
    R -->|Revoke| REV["Remove Grant"]
    R -->|Reduce| RED["Downgrade Permission"]
    R -->|Suspend| SUS["Disable Account/Key"]
    R -->|Notify| NOT["Send Alert Only"]

    REV & RED & SUS --> CONN["Connector<br/>Write Operation"]
    CONN --> AUDIT["Audit Log<br/>(ClickHouse)"]
    NOT --> AUDIT

    style D fill:#536dfe,color:#fff,stroke:none
    style CONN fill:#7c4dff,color:#fff,stroke:none
    style AUDIT fill:#263238,color:#fff,stroke:none
    style REV fill:#f44336,color:#fff,stroke:none
    style RED fill:#ff9800,color:#000,stroke:none
    style SUS fill:#ffc107,color:#000,stroke:none
    style NOT fill:#4caf50,color:#fff,stroke:none

Connector Framework

Verity uses a unified connector framework — the same connectors that ingest audit events and entitlements during the Ingest phase are used to execute remediation actions during the Remediate phase.

This bidirectional design ensures:

  • Consistency: The connector understands the target system's data model for both read and write operations.
  • Credential reuse: A single set of service credentials (with appropriate read + write scopes) is configured once.
  • Atomicity: The connector can verify the current state before modifying it, preventing race conditions.
Connector Supported Actions Write API
Microsoft Entra ID Revoke, Reduce, Suspend Microsoft Graph API
AWS IAM Revoke, Reduce AWS IAM / STS API
GitHub Revoke, Reduce GitHub REST API
Okta Revoke, Suspend Okta API
Google Workspace Revoke, Reduce, Suspend Google Admin SDK
PostgreSQL (roles) Revoke, Reduce SQL REVOKE / ALTER ROLE
Generic SCIM Revoke, Suspend SCIM 2.0 PATCH / DELETE
Custom (webhook) All HTTP POST with action payload

Custom Connectors

If your target system is not covered by a built-in connector, the webhook connector can forward remediation actions to any HTTP endpoint. Your service receives a JSON payload and executes the action in your own code.


Safety Mechanisms

Remediation is guarded by four layers of safety:

graph TD
    ACTION["Remediation Action Requested"]

    ACTION --> V["① Validation<br/>Pre-flight checks"]
    V --> DR["② Dry-Run<br/>(if enabled)"]
    DR --> BR["③ Blast-Radius Check<br/>Limit enforcement"]
    BR --> EXEC["④ Execution<br/>Connector write"]
    EXEC --> AUDIT["Audit Log"]

    V -.->|"Fails validation"| REJECT["❌ Rejected"]
    DR -.->|"Dry-run mode"| LOG["📋 Logged, not executed"]
    BR -.->|"Limit exceeded"| HOLD["⏸ Held for admin approval"]

    style ACTION fill:#536dfe,color:#fff,stroke:none
    style V fill:#7c4dff,color:#fff,stroke:none
    style DR fill:#651fff,color:#fff,stroke:none
    style BR fill:#448aff,color:#fff,stroke:none
    style EXEC fill:#40c4ff,color:#000,stroke:none
    style AUDIT fill:#263238,color:#fff,stroke:none
    style REJECT fill:#f44336,color:#fff,stroke:none
    style LOG fill:#ff9800,color:#000,stroke:none
    style HOLD fill:#ffc107,color:#000,stroke:none

① Validation

Before any action, the Remediation service runs pre-flight checks:

  • Grant still exists: Verify the access grant has not already been removed.
  • No protection flag: Check that the grant is not marked as protected (break-glass accounts, critical service accounts).
  • Connector health: Confirm the target system's connector is reachable and authenticated.
  • Permission check: Verify the connector's service credentials have write scope.

② Dry-Run Mode

When dry-run is enabled (globally or per-connector), the Remediation service performs all validation and blast-radius checks but does not execute the action. Instead, it logs the action that would have been taken.

Use Dry-Run During Onboarding

When connecting a new system to Verity for the first time, enable dry-run mode for 2–4 weeks. This lets you validate that scoring and review routing work correctly before any automated remediation occurs.

{
  "action": "revoke",
  "grant_id": "g_01HZ3V9K7QWXR5YJNBM2C8F6D",
  "dry_run": true,
  "result": "would_execute",
  "details": "Would remove role binding 'Storage Admin' for user@example.com on project prod-data"
}

③ Blast-Radius Limits

Blast-radius limits prevent cascading failures from mass revocations:

Limit Default Description
Per-connector rate limit 50 actions / hour Max actions per connector per hour
Per-identity limit 5 actions / day Max grants revoked for a single user per day
Per-resource limit 10 actions / hour Max grants revoked on a single resource per hour
Global circuit breaker 200 actions / hour Absolute max across all connectors

When a limit is hit, pending actions are held and an admin notification is sent. An administrator must explicitly approve the held actions or raise the limit.

Circuit Breaker

The global circuit breaker exists to prevent a bug or misconfiguration from triggering an organisation-wide revocation storm. If it trips, all remediation pauses until an admin intervenes.

④ Admin Override

Administrators can:

  • Force-execute a held action (bypasses blast-radius limits).
  • Cancel a pending action.
  • Rollback a completed action (if the connector supports it).
  • Pause all remediation globally (emergency stop).

Rollback

For connectors that support it, Verity can rollback a completed remediation action — restoring the access grant to its pre-remediation state.

The rollback mechanism works because every action logs a before-state snapshot:

{
  "action_id": "act_01HZ4A2X9MBJR7YKND3F8G5HK",
  "action": "revoke",
  "status": "executed",
  "before_state": {
    "role": "Storage Admin",
    "scope": "projects/prod-data",
    "principal": "user@example.com",
    "granted_at": "2024-03-15T10:00:00Z"
  },
  "after_state": {
    "role": null,
    "scope": null,
    "principal": "user@example.com"
  },
  "rollback_available": true,
  "rollback_expires_at": "2025-02-14T14:32:00Z"
}

Rollback Window

Rollbacks are available for 30 days after execution (configurable). After the rollback window, the before-state is retained in the audit log for compliance but can no longer be auto-applied.


Audit Trail

Every remediation action — whether executed, dry-run, held, or rolled back — is logged immutably to ClickHouse with full before/after state:

Field Description
action_id Unique identifier for the remediation action
review_id Link to the originating review (if any)
grant_id The access grant being remediated
action_type revoke, reduce, suspend, notify
status pending, dry_run, executed, failed, rolled_back, held
before_state Full state of the grant before the action (JSON)
after_state Full state of the grant after the action (JSON)
connector_id Which connector executed the action
initiated_by User or policy that triggered the action
decided_by Reviewer who made the decision (if human-triggered)
executed_at Timestamp of execution
metadata Additional connector-specific data
-- Example: Find all remediation actions for a specific user in the last 30 days
SELECT
    action_id,
    action_type,
    status,
    before_state,
    after_state,
    executed_at
FROM remediation_actions
WHERE grant_id IN (
    SELECT grant_id FROM access_grants
    WHERE identity_id = 'id_01HZ3V9K7QWXR5YJNBM2C8F6E'
)
AND executed_at > now() - INTERVAL 30 DAY
ORDER BY executed_at DESC;

Retention

Data Retention Storage
Full action records 7 years ClickHouse
Before/after state snapshots 7 years ClickHouse
Rollback capability 30 days (configurable) ClickHouse + connector state

End-to-End Flow

The complete remediation flow, from review decision to audit:

sequenceDiagram
    participant RV as Reviewer
    participant WF as Workflow Engine
    participant RM as Remediation Service
    participant VAL as Validator
    participant CON as Connector
    participant CK as ClickHouse
    participant KF as Kafka

    RV->>WF: Submit decision (revoke)
    WF->>RM: ExecuteRemediation(action)
    RM->>VAL: Pre-flight checks
    VAL-->>RM: ✓ Validated

    alt Dry-run enabled
        RM->>CK: Log dry-run result
        RM-->>WF: DryRunComplete
    else Production
        RM->>RM: Check blast-radius limits
        alt Within limits
            RM->>CON: Snapshot before-state
            CON-->>RM: Before-state captured
            RM->>CON: Execute action
            CON-->>RM: ✓ Action complete
            RM->>CK: Log action + before/after state
            RM->>KF: Publish remediation.event
            RM-->>WF: RemediationComplete
        else Limit exceeded
            RM->>CK: Log held action
            RM-->>WF: ActionHeld(awaiting_admin)
        end
    end

Next Steps

  • Access Decay Theory


    Return to the fundamentals of access decay.

    Access Decay Theory

  • Architecture Overview


    See how the Remediation service fits into the overall platform architecture.

    Architecture