# Databricks MCP for AI Agents MCP

> Databricks MCP connects your agent directly into your data intelligence platform. You can audit SQL warehouses, list compute clusters, track complex job executions, and explore structured data across Unity Catalog without leaving your chat window. It gives full control over your lakehouse orchestration via conversation.

## Overview
- **Category:** industry-titans
- **Price:** Free
- **Tags:** lakehouse, data-engineering, cluster-management, sql-warehousing, data-governance, big-data

## Description

You're managing a massive data lakehouse, but checking status means jumping between dashboards, running manual queries, and copying logs. This MCP lets you talk to your platform instead. You can ask your agent to list all active compute clusters or check the recent run history for a specific ETL job just by asking. Need to know where your structured data lives? Your agent will query the Unity Catalog and map out every root catalog and schema. It’s about getting instant, auditable visibility into everything running on your platform. Because Vinkius hosts this MCP, you connect once from any compatible client—Claude, Cursor, or Windsurf—and get immediate access to complete data governance oversight.

## Tools

### list_clusters
Retrieves a full list of all compute clusters configured in your Databricks workspace.

### get_cluster
Fetches detailed operational information for a specific cluster ID or name.

### list_jobs
Lists every configured data workflow and job that runs across your platform.

### list_job_runs
Provides a history of all executed jobs, showing success or failure status for specific workflows.

### list_warehouses
Enumerates every SQL Serverless warehouse configured in your environment.

### list_catalogs
Lists all root catalogs defined within the Unity Catalog structure.

### list_schemas
Retrieves a list of databases or schemas contained within a specified catalog.

### get_me
Identifies the current user's profile and active permissions in the Databricks workspace.

## Prompt Examples

**Prompt:** 
```
List all compute clusters for my team so I can check their status.
```

**Response:** 
```
Here are the 4 active clusters in the workspace:

*   **Standard-ETL-Cluster:** Running (Nodes: 6). Last updated: 5 minutes ago.
*   **ML-Training-Node:** Terminated. Reason: Job completion.
*   **Shared-Dev-Pool:** Running (Nodes: 3). High usage detected.
*   **Finance-Audit-Cluster:** Stopped. Needs manual restart.
```

**Prompt:** 
```
I need to know which SQL warehouses are currently running and how many there are.
```

**Response:** 
```
I found three active, serverless data warehouses:

| Warehouse Name | Status | Last Used | Purpose |
| :--- | :--- | :--- | :--- |
| Prod_Reporting | Active | 10 mins ago | BI Dashboards |
| Dev_Sandbox | Active | Now | Testing Queries |
| Historical_Arch | Idle | Never | Read-only access |

You can monitor the boundaries of these resources directly.
```

**Prompt:** 
```
What were the results for my last three data jobs?
```

**Response:** 
```
Checking job history... Here’s a quick overview:

*   **Job: Daily Sales ETL (ID 987):** Success. Finished at 6:00 AM.
*   **Job: User Sync (ID 985):** Failure. Timeout error detected. Check logs for details.
*   **Job: Inventory Update (ID 984):** Success. Completed quickly, ran in 12 minutes.
```

## Capabilities

### Audit and manage compute clusters
List all active nodes and retrieve deep details on specific clusters' current health and capacity limits.

### Track job pipelines and workflows
See every configured workflow, list jobs, and monitor recent executions to verify data pipeline status or find failure points.

### Govern structured data locations
Identify where your data lives by listing root Unity Catalog catalogs and detailed schemas across the workspace.

### Manage SQL data warehousing resources
Enumerate all configured SQL Serverless warehouses and track their current operational boundaries for cost control.

### Verify user permissions and identity
Fetch profile information for the authenticated user or service principal to audit active workspace permissions.

## Use Cases

### Debugging a failed ETL pipeline
A data engineer asks their agent, 'What went wrong with the Daily Sales ETL?' The agent calls `list_job_runs`, identifies the failing job run ID, and reports that the error was due to an upstream cluster timeout. They then use `get_cluster` to check if resource limits were hit.

### Auditing data governance for compliance
An analytics engineer needs proof of all structured data sources. The agent uses `list_catalogs` and then iterates through `list_schemas`, providing a complete, auditable map of the entire Unity Catalog structure.

### Resource optimization before scaling
An MLOps engineer wants to know if they can afford more compute power. They run `list_clusters` and compare the active count against the usage reported by `get_cluster`, determining exactly which clusters need adjustment.

### Verifying data access permissions
A platform team member needs to know if a new service account has full visibility. They run `get_me` and audit the returned profile information against required workspace roles, confirming proper identity oversight.

## Benefits

- Audit cluster health instantly. You can use `get_cluster` to get detailed specifications or `list_clusters` to see the full inventory of nodes running in your workspace.
- Never miss a failed pipeline run again. By listing job runs and using `list_job_runs`, you pinpoint exactly where data workflows break, saving hours of manual debugging.
- Manage costs by visibility. You can list all SQL warehouses (`list_warehouses`) to track which serverless resources are active and consuming credits right now.
- Understand your data map. Instead of guessing where a table is, use `list_catalogs` and `list_schemas` to get an auditable inventory of every piece of structured data.
- Control access rights. You can run the identity check (`get_me`) to verify if the service principal currently running the job has the necessary permissions.

## How It Works

The bottom line is that you manage your data lakehouse by talking to it, making complex operations simple prompts.

1. Subscribe to this MCP on Vinkius.
2. Input your Databricks Host URL and Personal Access Token (PAT) into your agent client.
3. Start asking questions. Your agent uses the connected tools to perform audits, list resources, or check job statuses in natural language.

## Frequently Asked Questions

**How does the Databricks MCP help me track my cluster usage?**
The Databricks MCP lets you list all compute clusters and get detailed information on specific nodes. This means you can audit which resources are running, check their health, and understand your overall capacity limits without logging into the platform.

**Can I use this MCP to see if my data pipelines ran correctly?**
Yes. You can list all configured jobs and monitor job runs. Your agent checks the status of past executions, telling you immediately which workflows succeeded or failed, and why.

**Does the Databricks MCP help with data governance in Unity Catalog?**
Absolutely. You can list root catalogs and then drill down to find all schemas within them. This gives you a full inventory map of where every piece of structured data resides, which is key for compliance.

**What if I need to verify my user permissions in Databricks?**
The MCP has an identity oversight tool that fetches your profile information. This lets you confirm exactly what roles and permissions are active for the service principal running your workflow, which is critical for security audits.

**Is this better than checking status on a dashboard?**
Yes. Instead of manually clicking through multiple dashboards, you ask your agent a question, and it executes the necessary checks (like listing job runs or warehouses) and gives you a summarized answer instantly.