# StarRocks MCP

> StarRocks MCP Server connects your AI agent directly to a high-performance OLAP database. It lets you query data, check cluster health, and manage schemas using natural language prompts—no SQL terminal needed. Your agent can list databases, retrieve full table structures via `get_table_schema`, monitor node status with `list_nodes`, or run complex analytical queries using `execute_query`. This is for advanced database operations.

## Overview
- **Category:** brain-trust
- **Price:** Free
- **Tags:** olap, distributed-database, data-querying, schema-management, cluster-monitoring, real-time-analytics

## Description

Forget wrestling with a terminal or writing boilerplate SQL just to check your data warehouse health. This server hooks up your agent straight to StarRocks, letting you run complex database operations using natural language—no command line needed. Your AI client acts like an expert DBA sitting right next to you.

**Discovering the Data Structure**
You can start by listing every database available in the cluster with `list_databases`. Once you've got a DB name, you pull a list of all tables inside it using `list_tables`, or check out defined views via `list_views`. To know exactly what columns you’re working with, run `get_table_schema` on any table to pull its full structure and data types.

**Running Queries & Analyzing Insights**
When you need answers, just use `execute_query` and run any SQL query against your entire StarRocks data set instantly. If you're optimizing performance, check out all defined materialized views using `list_mvs`. You can also track the lifeblood of your platform by viewing a list of active or completed data load jobs through `list_jobs`.

**Monitoring Cluster Health**
To make sure the platform isn't coughing up smoke, you can check overall system status and general metadata using `get_cluster_info`. For a detailed look at the actual backend components, `list_nodes` provides specific status information for every node in the cluster. If you gotta know how much space your data is eating up, run `get_storage_usage` to get disk usage and total data size statistics across all cluster parts.

## Tools

### execute_query
Runs any arbitrary SQL query against your StarRocks data set.

### get_cluster_info
Retrieves general status and metadata for the frontend nodes of the cluster.

### get_storage_usage
Checks the total disk usage and data size statistics across all cluster components.

### get_table_schema
Pulls the full structure, including columns and types, for a specified table.

### list_databases
Returns a list of all databases available in the StarRocks cluster.

### list_jobs
Shows a list of active or completed data load jobs running through your pipelines.

### list_mvs
Lists all defined materialized views used for accelerating analytics queries.

### list_nodes
Provides detailed status information for every backend node in the cluster.

### list_tables
Lists all tables that exist inside a specified database.

### list_views
Retrieves a list of views defined within a specific StarRocks database.

## Prompt Examples

**Prompt:** 
```
List all databases in my StarRocks cluster.
```

**Response:** 
```
I've retrieved your StarRocks databases. You have 5 databases, including 'sales_analytics', 'user_behavior', and 'logs_archive'. Which one would you like to see the tables for?
```

**Prompt:** 
```
Show me the average order value from the 'sales' table.
```

**Response:** 
```
I've executed the query. The average order value across the 'sales' table is $142.50. Would you like a breakdown of this metric by region or product category?
```

**Prompt:** 
```
Check for any offline backend nodes.
```

**Response:** 
```
I've checked the backend nodes. All 10 Backend (BE) nodes are currently online and healthy. CPU and memory usage across the cluster is within normal limits.
```

## Capabilities

### Discover Database Structure
List all databases, then retrieve the specific schema or list tables within any given database.

### Run Analytical Queries
Execute custom SQL queries on demand to pull real-time metrics and insights from your data warehouse.

### Monitor Cluster Health
Check the status and metadata of all cluster nodes (FE and BE) or get overall system info via `get_cluster_info`.

### Track Data Pipelines
List active data load jobs (`list_jobs`) and review materialized view definitions to track data movement.

### Assess Storage Usage
Get total disk usage or specific storage statistics across the entire distributed cluster using `get_storage_usage`.

## Use Cases

### Schema Validation After a Data Source Change
A data engineer needs to confirm if a new source table, `raw_user_data`, was added to the 'staging' database. Instead of running 10 separate checks, they prompt their agent: 'What is the schema for raw_user_data?' The agent calls `get_table_schema` and instantly delivers the full column list, confirming the data structure before they write any transformation code.

### Investigating a Slow Query
An analytics lead notices that reports are slow. They first ask the agent to check the cluster status using `get_cluster_info` and then run `list_nodes`. If all nodes look healthy, they use `execute_query` with specific monitoring SQL to pinpoint if a particular query is hitting an unindexed column.

### Auditing Materialized Views
An SRE needs to know which materialized views exist and how often they run. They prompt the agent: 'List all MVs in the sales database.' The agent uses `list_mvs` and then confirms the definition details, ensuring no critical data view was missed during maintenance.

### Pre-Deployment Data Check
A developer is about to deploy a new dataset. They first ask the agent to `list_databases` to confirm the target environment is correct, then use `get_storage_usage` to estimate if the incoming data will exceed allocated disk space before running any inserts.

## Benefits

- Audit your entire schema instantly. Instead of running `SHOW DATABASES;` and then manually checking each one, just ask the agent to list databases using `list_databases`. It handles the whole flow.
- Check cluster health without logging into a dashboard. The `get_cluster_info` tool gives you immediate status on your FE nodes, letting you confirm everything is running fine in seconds.
- Know exactly what data exists before querying it. Use `list_tables` and then `get_table_schema` to validate the structure of any table before writing a single line of SQL.
- Track ETL pipelines reliably. The agent can use `list_jobs` to show if your recent data loads succeeded or failed, saving you from manually checking job logs.
- Get an immediate storage breakdown. Running `get_storage_usage` lets you see disk space consumption across the entire distributed cluster in one single query.

## How It Works

The bottom line is that your AI client translates complex database actions into conversation and executes them for you.

1. First, subscribe to the StarRocks server and provide your FE Host, Port, User, and Password credentials.
2. Next, talk to your AI client (Claude, Cursor, etc.) using natural language. Ask it to perform a task, like 'Show me all databases.'
3. Finally, the agent uses the appropriate tool (e.g., `list_databases`), gets the output, and presents you with actionable data or results.

## Frequently Asked Questions

**How do I find out all available databases using list_databases?**
Just ask the agent to 'list all databases.' The tool `list_databases` runs automatically and gives you a clean list of every database name in your cluster, so you know what scope you're working with.

**What is the difference between list_nodes and get_cluster_info?**
They provide different levels of detail. `get_cluster_info` gives you general system status for the frontend nodes, while `list_nodes` provides deep, specific metadata about every individual backend node.

**Can I check a table's schema without listing databases first?**
No. The agent must establish context. You should start by using `list_databases` to narrow down the scope, and then use `list_tables` within that database before asking for the specific structure with `get_table_schema`.

**How does the execute_query tool work?**
The `execute_query` tool runs standard SQL queries. You simply tell your agent what data you need, and it formats and executes the query for you, returning the results directly in the chat.

**How do I use list_jobs to monitor the status of data ingestion pipelines?**
It lists all current and historical data load jobs. You can check which jobs are running, their completion status, or review past performance records for your analytical pipelines.

**How do I use get_storage_usage to audit disk space across my StarRocks cluster?**
This tool retrieves detailed statistics on data storage consumption. It shows you the total disk usage and size metrics, letting you spot which part of your distributed cluster is running low on capacity.

**After listing a database, how do I use list_tables to get all table names?**
It outputs a simple list of every table housed within the selected database. This helps you quickly enumerate available tables before needing to run schema checks or query data.

**What is the purpose of list_mvs, and how do I check for materialized views?**
list_mvs lists all existing materialized views in your cluster. You use these MVs to pre-calculate common metrics, which significantly speeds up complex queries that rely on aggregated data.

**How do I find my StarRocks FE Host and Port?**
The FE Host is the IP address or domain of your StarRocks Frontend node. The default HTTP port is 8030. You can verify these in your deployment configuration or by contacting your administrator.

**Can I execute DELETE or UPDATE queries through this server?**
Yes. Use the `execute_query` tool. As long as your authenticated user has the necessary permissions in StarRocks, you can execute any valid SQL statement, including DDL and DML operations.

**Is it possible to monitor cluster health through the agent?**
Yes! Use the `get_cluster_info` and `list_nodes` tools to retrieve real-time status for all Frontend and Backend nodes, helping you identify offline or overloaded nodes instantly.