Skip to content

Cloud Data Platform Security

The Challenge

Modern data teams move fast — spinning up Databricks notebooks, Synapse SQL pools, and Fabric lakehouses to meet business deadlines. Access is granted liberally to unblock work, but almost never revoked once the project ends.

Pain Points

  • Broad initial grants — "Just give them admin on the workspace" is the fastest path to unblocking a data engineer.
  • No usage visibility — Platform admins cannot see who actually uses their access vs. who simply has it.
  • Cross-platform blind spots — A single analyst may have access in Databricks, Synapse, and Fabric — but no single tool shows the full picture.
  • Sensitive data exposure — Workspaces often contain PII, financial data, or health records with no automated access-decay detection.
  • Manual reviews don't scale — Data-platform teams lack tooling to run structured access-review campaigns.

How Verity Helps

Verity connects to each data platform, correlates usage telemetry with permission grants, and scores every grant on a continuous 0–100 decay scale.

flowchart LR
    subgraph Data Platforms
        DB["Databricks"]
        SY["Azure Synapse"]
        FB["Microsoft Fabric"]
    end

    subgraph Verity
        I["Ingest Plane"]
        N["Normalise Plane"]
        S["Score Plane"]
        R["Review Plane"]
        X["Remediation Plane"]
    end

    DB -->|REST API| I
    SY -->|REST API| I
    FB -->|REST API| I
    I --> N --> S --> R --> X
    X -->|Revoke / Downgrade| DB
    X -->|Revoke / Downgrade| SY
    X -->|Revoke / Downgrade| FB

    style I fill:#7c4dff,color:#fff,stroke:none
    style N fill:#651fff,color:#fff,stroke:none
    style S fill:#536dfe,color:#fff,stroke:none
    style R fill:#448aff,color:#fff,stroke:none
    style X fill:#40c4ff,color:#000,stroke:none

Key Capabilities

Capability Description
Usage-Based Scoring Actual query execution, notebook runs, and pipeline activity feed into the decay formula.
Cross-Platform Identity A single principal view even when a user has different identifiers across Databricks, Synapse, and Fabric.
Asset Sensitivity Tagging Classify workspaces, schemas, and tables by sensitivity level (1–5).
Data-Owner Routing Review packets go to the workspace or schema owner — not a central IT team.
Least-Privilege Recommendations Verity suggests downgrade paths (e.g., Contributor → Reader) based on observed behaviour.

Example Scenario

Northwind Analytics — 800-Person Data Organisation

Northwind connects Databricks, Azure Synapse, and Microsoft Fabric to Verity.

Discovery

Platform Workspaces Grants Avg. Grant Age
Databricks 120 4,200 14 months
Azure Synapse 45 2,800 18 months
Microsoft Fabric 30 1,600 6 months
Total 195 8,600 13 months

Scoring Results (Week 1)

pie title Decay Score Distribution — 8,600 Grants
    "Low (0–29)" : 3400
    "Medium (30–59)" : 2900
    "High (60–79)" : 1500
    "Critical (80–100)" : 800

Key Finding

27 % of all data-platform grants scored High or Critical — most belonging to users who hadn't queried the platform in 6+ months.

Review & Remediation

Verity generates 2,300 review packets (High + Critical), routed to 85 workspace owners.

Within 14 days:

  • 780 Critical grants auto-remediated (revoked or downgraded).
  • 1,220 High grants decided by data owners.
  • 300 remaining escalated per SLA.

Before & After

Metric Before Verity After Verity
Visibility into actual usage None Full query-level telemetry
Grants reviewed per quarter 0 (no process) 2,300 (risk-prioritised)
Over-provisioned accounts Unknown 27 % identified in Week 1
Cross-platform identity view Siloed per platform Unified principal model
Time to detect orphaned access Never ≤ 6 hours
Mean time to revoke (MTTR) ∞ (never revoked) 3.1 days

Connector Deep Dive

Databricks Connector

# config/connectors/databricks.yaml
connector:
  type: databricks
  workspace_url: https://adb-1234567890.azuredatabricks.net
  auth:
    method: service_principal
    client_id: ${DATABRICKS_CLIENT_ID}
    client_secret: ${DATABRICKS_CLIENT_SECRET}
  sync:
    schedule: "0 */6 * * *"       # Every 6 hours
    resources:
      - workspace_permissions
      - cluster_permissions
      - sql_warehouse_permissions
      - unity_catalog_grants

Synapse Connector

# config/connectors/synapse.yaml
connector:
  type: synapse
  workspace_name: northwind-analytics
  auth:
    method: managed_identity
  sync:
    schedule: "0 */6 * * *"
    resources:
      - sql_pool_permissions
      - spark_pool_permissions
      - pipeline_permissions
      - workspace_roles

Fabric Connector

# config/connectors/fabric.yaml
connector:
  type: fabric
  tenant_id: ${AZURE_TENANT_ID}
  auth:
    method: service_principal
    client_id: ${FABRIC_CLIENT_ID}
    client_secret: ${FABRIC_CLIENT_SECRET}
  sync:
    schedule: "0 */6 * * *"
    resources:
      - workspace_permissions
      - lakehouse_permissions
      - warehouse_permissions

Deployment Topology

For cloud data-platform use cases, Verity is typically deployed in the same Azure region as the data platforms to minimise API latency.

graph TB
    subgraph "Azure Region — East US 2"
        subgraph "Data Platforms"
            DB["Databricks<br/>Workspace"]
            SY["Synapse<br/>Workspace"]
            FB["Fabric<br/>Workspace"]
        end

        subgraph "AKS Cluster"
            V["Verity<br/>(19 microservices)"]
            PG["PostgreSQL +<br/>TimescaleDB"]
            CH["ClickHouse"]
        end
    end

    DB <--> V
    SY <--> V
    FB <--> V

Getting Started

Recommended First Steps

  1. Deploy Verity with the Quick Start guide.
  2. Connect Databricks first — it typically has the most grants.
  3. Tag asset sensitivity on workspaces containing PII or financial data.
  4. Run scoring for 1 week to establish baselines.
  5. Enable data-owner reviews and set SLAs per risk level.
  6. Add Synapse and Fabric connectors to complete cross-platform coverage.