Cloud Data Platform Security¶
The Challenge¶
Modern data teams move fast — spinning up Databricks notebooks, Synapse SQL pools, and Fabric lakehouses to meet business deadlines. Access is granted liberally to unblock work, but almost never revoked once the project ends.
Pain Points
- Broad initial grants — "Just give them admin on the workspace" is the fastest path to unblocking a data engineer.
- No usage visibility — Platform admins cannot see who actually uses their access vs. who simply has it.
- Cross-platform blind spots — A single analyst may have access in Databricks, Synapse, and Fabric — but no single tool shows the full picture.
- Sensitive data exposure — Workspaces often contain PII, financial data, or health records with no automated access-decay detection.
- Manual reviews don't scale — Data-platform teams lack tooling to run structured access-review campaigns.
How Verity Helps¶
Verity connects to each data platform, correlates usage telemetry with permission grants, and scores every grant on a continuous 0–100 decay scale.
flowchart LR
subgraph Data Platforms
DB["Databricks"]
SY["Azure Synapse"]
FB["Microsoft Fabric"]
end
subgraph Verity
I["Ingest Plane"]
N["Normalise Plane"]
S["Score Plane"]
R["Review Plane"]
X["Remediation Plane"]
end
DB -->|REST API| I
SY -->|REST API| I
FB -->|REST API| I
I --> N --> S --> R --> X
X -->|Revoke / Downgrade| DB
X -->|Revoke / Downgrade| SY
X -->|Revoke / Downgrade| FB
style I fill:#7c4dff,color:#fff,stroke:none
style N fill:#651fff,color:#fff,stroke:none
style S fill:#536dfe,color:#fff,stroke:none
style R fill:#448aff,color:#fff,stroke:none
style X fill:#40c4ff,color:#000,stroke:none
Key Capabilities¶
| Capability | Description |
|---|---|
| Usage-Based Scoring | Actual query execution, notebook runs, and pipeline activity feed into the decay formula. |
| Cross-Platform Identity | A single principal view even when a user has different identifiers across Databricks, Synapse, and Fabric. |
| Asset Sensitivity Tagging | Classify workspaces, schemas, and tables by sensitivity level (1–5). |
| Data-Owner Routing | Review packets go to the workspace or schema owner — not a central IT team. |
| Least-Privilege Recommendations | Verity suggests downgrade paths (e.g., Contributor → Reader) based on observed behaviour. |
Example Scenario¶
Northwind Analytics — 800-Person Data Organisation¶
Northwind connects Databricks, Azure Synapse, and Microsoft Fabric to Verity.
Discovery¶
| Platform | Workspaces | Grants | Avg. Grant Age |
|---|---|---|---|
| Databricks | 120 | 4,200 | 14 months |
| Azure Synapse | 45 | 2,800 | 18 months |
| Microsoft Fabric | 30 | 1,600 | 6 months |
| Total | 195 | 8,600 | 13 months |
Scoring Results (Week 1)¶
pie title Decay Score Distribution — 8,600 Grants
"Low (0–29)" : 3400
"Medium (30–59)" : 2900
"High (60–79)" : 1500
"Critical (80–100)" : 800
Key Finding
27 % of all data-platform grants scored High or Critical — most belonging to users who hadn't queried the platform in 6+ months.
Review & Remediation¶
Verity generates 2,300 review packets (High + Critical), routed to 85 workspace owners.
Within 14 days:
- 780 Critical grants auto-remediated (revoked or downgraded).
- 1,220 High grants decided by data owners.
- 300 remaining escalated per SLA.
Before & After¶
| Metric | Before Verity | After Verity |
|---|---|---|
| Visibility into actual usage | None | Full query-level telemetry |
| Grants reviewed per quarter | 0 (no process) | 2,300 (risk-prioritised) |
| Over-provisioned accounts | Unknown | 27 % identified in Week 1 |
| Cross-platform identity view | Siloed per platform | Unified principal model |
| Time to detect orphaned access | Never | ≤ 6 hours |
| Mean time to revoke (MTTR) | ∞ (never revoked) | 3.1 days |
Connector Deep Dive¶
Databricks Connector¶
# config/connectors/databricks.yaml
connector:
type: databricks
workspace_url: https://adb-1234567890.azuredatabricks.net
auth:
method: service_principal
client_id: ${DATABRICKS_CLIENT_ID}
client_secret: ${DATABRICKS_CLIENT_SECRET}
sync:
schedule: "0 */6 * * *" # Every 6 hours
resources:
- workspace_permissions
- cluster_permissions
- sql_warehouse_permissions
- unity_catalog_grants
Synapse Connector¶
# config/connectors/synapse.yaml
connector:
type: synapse
workspace_name: northwind-analytics
auth:
method: managed_identity
sync:
schedule: "0 */6 * * *"
resources:
- sql_pool_permissions
- spark_pool_permissions
- pipeline_permissions
- workspace_roles
Fabric Connector¶
# config/connectors/fabric.yaml
connector:
type: fabric
tenant_id: ${AZURE_TENANT_ID}
auth:
method: service_principal
client_id: ${FABRIC_CLIENT_ID}
client_secret: ${FABRIC_CLIENT_SECRET}
sync:
schedule: "0 */6 * * *"
resources:
- workspace_permissions
- lakehouse_permissions
- warehouse_permissions
Deployment Topology¶
For cloud data-platform use cases, Verity is typically deployed in the same Azure region as the data platforms to minimise API latency.
graph TB
subgraph "Azure Region — East US 2"
subgraph "Data Platforms"
DB["Databricks<br/>Workspace"]
SY["Synapse<br/>Workspace"]
FB["Fabric<br/>Workspace"]
end
subgraph "AKS Cluster"
V["Verity<br/>(19 microservices)"]
PG["PostgreSQL +<br/>TimescaleDB"]
CH["ClickHouse"]
end
end
DB <--> V
SY <--> V
FB <--> V
Getting Started¶
Recommended First Steps
- Deploy Verity with the Quick Start guide.
- Connect Databricks first — it typically has the most grants.
- Tag asset sensitivity on workspaces containing PII or financial data.
- Run scoring for 1 week to establish baselines.
- Enable data-owner reviews and set SLAs per risk level.
- Add Synapse and Fabric connectors to complete cross-platform coverage.