Cloud Data Platform Security¶

The Challenge¶

Modern data teams move fast — spinning up Databricks notebooks, Synapse SQL pools, and Fabric lakehouses to meet business deadlines. Access is granted liberally to unblock work, but almost never revoked once the project ends.

Pain Points

Broad initial grants — "Just give them admin on the workspace" is the fastest path to unblocking a data engineer.
No usage visibility — Platform admins cannot see who actually uses their access vs. who simply has it.
Cross-platform blind spots — A single analyst may have access in Databricks, Synapse, and Fabric — but no single tool shows the full picture.
Sensitive data exposure — Workspaces often contain PII, financial data, or health records with no automated access-decay detection.
Manual reviews don't scale — Data-platform teams lack tooling to run structured access-review campaigns.

How Verity Helps¶

Verity connects to each data platform, correlates usage telemetry with permission grants, and scores every grant on a continuous 0–100 decay scale.

flowchart LR
    subgraph Data Platforms
        DB["Databricks"]
        SY["Azure Synapse"]
        FB["Microsoft Fabric"]
    end

    subgraph Verity
        I["Ingest Plane"]
        N["Normalise Plane"]
        S["Score Plane"]
        R["Review Plane"]
        X["Remediation Plane"]
    end

    DB -->|REST API| I
    SY -->|REST API| I
    FB -->|REST API| I
    I --> N --> S --> R --> X
    X -->|Revoke / Downgrade| DB
    X -->|Revoke / Downgrade| SY
    X -->|Revoke / Downgrade| FB

    style I fill:#7c4dff,color:#fff,stroke:none
    style N fill:#651fff,color:#fff,stroke:none
    style S fill:#536dfe,color:#fff,stroke:none
    style R fill:#448aff,color:#fff,stroke:none
    style X fill:#40c4ff,color:#000,stroke:none

Key Capabilities¶

Capability	Description
Usage-Based Scoring	Actual query execution, notebook runs, and pipeline activity feed into the decay formula.
Cross-Platform Identity	A single principal view even when a user has different identifiers across Databricks, Synapse, and Fabric.
Asset Sensitivity Tagging	Classify workspaces, schemas, and tables by sensitivity level (1–5).
Data-Owner Routing	Review packets go to the workspace or schema owner — not a central IT team.
Least-Privilege Recommendations	Verity suggests downgrade paths (e.g., `Contributor → Reader`) based on observed behaviour.

Example Scenario¶

Northwind Analytics — 800-Person Data Organisation¶

Northwind connects Databricks, Azure Synapse, and Microsoft Fabric to Verity.

Discovery¶

Platform	Workspaces	Grants	Avg. Grant Age
Databricks	120	4,200	14 months
Azure Synapse	45	2,800	18 months
Microsoft Fabric	30	1,600	6 months
Total	195	8,600	13 months

Scoring Results (Week 1)¶

pie title Decay Score Distribution — 8,600 Grants
    "Low (0–29)" : 3400
    "Medium (30–59)" : 2900
    "High (60–79)" : 1500
    "Critical (80–100)" : 800

Key Finding

27 % of all data-platform grants scored High or Critical — most belonging to users who hadn't queried the platform in 6+ months.

Review & Remediation¶

Verity generates 2,300 review packets (High + Critical), routed to 85 workspace owners.

Within 14 days:

780 Critical grants auto-remediated (revoked or downgraded).
1,220 High grants decided by data owners.
300 remaining escalated per SLA.

Before & After¶

Metric	Before Verity	After Verity
Visibility into actual usage	None	Full query-level telemetry
Grants reviewed per quarter	0 (no process)	2,300 (risk-prioritised)
Over-provisioned accounts	Unknown	27 % identified in Week 1
Cross-platform identity view	Siloed per platform	Unified principal model
Time to detect orphaned access	Never	≤ 6 hours
Mean time to revoke (MTTR)	∞ (never revoked)	3.1 days

Connector Deep Dive¶

Databricks Connector¶

# config/connectors/databricks.yaml
connector:
  type: databricks
  workspace_url: https://adb-1234567890.azuredatabricks.net
  auth:
    method: service_principal
    client_id: ${DATABRICKS_CLIENT_ID}
    client_secret: ${DATABRICKS_CLIENT_SECRET}
  sync:
    schedule: "0 */6 * * *"       # Every 6 hours
    resources:
      - workspace_permissions
      - cluster_permissions
      - sql_warehouse_permissions
      - unity_catalog_grants

Synapse Connector¶

# config/connectors/synapse.yaml
connector:
  type: synapse
  workspace_name: northwind-analytics
  auth:
    method: managed_identity
  sync:
    schedule: "0 */6 * * *"
    resources:
      - sql_pool_permissions
      - spark_pool_permissions
      - pipeline_permissions
      - workspace_roles

Fabric Connector¶

# config/connectors/fabric.yaml
connector:
  type: fabric
  tenant_id: ${AZURE_TENANT_ID}
  auth:
    method: service_principal
    client_id: ${FABRIC_CLIENT_ID}
    client_secret: ${FABRIC_CLIENT_SECRET}
  sync:
    schedule: "0 */6 * * *"
    resources:
      - workspace_permissions
      - lakehouse_permissions
      - warehouse_permissions

Deployment Topology¶

For cloud data-platform use cases, Verity is typically deployed in the same Azure region as the data platforms to minimise API latency.

graph TB
    subgraph "Azure Region — East US 2"
        subgraph "Data Platforms"
            DB["Databricks<br/>Workspace"]
            SY["Synapse<br/>Workspace"]
            FB["Fabric<br/>Workspace"]
        end

        subgraph "AKS Cluster"
            V["Verity<br/>(19 microservices)"]
            PG["PostgreSQL +<br/>TimescaleDB"]
            CH["ClickHouse"]
        end
    end

    DB <--> V
    SY <--> V
    FB <--> V

Getting Started¶

Recommended First Steps

Deploy Verity with the Quick Start guide.
Connect Databricks first — it typically has the most grants.
Tag asset sensitivity on workspaces containing PII or financial data.
Run scoring for 1 week to establish baselines.
Enable data-owner reviews and set SLAs per risk level.
Add Synapse and Fabric connectors to complete cross-platform coverage.