# ClickHouse (Vector Search) MCP for AI Agents MCP

> ClickHouse (Vector Search) provides AI agents with direct access to your massive analytical database. It lets you manage vector embeddings and run complex SQL queries conversationally, performing high-speed semantic searches without writing boilerplate code.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** olap, vector-embeddings, sql-execution, high-performance-data, real-time-analytics

## Description

Managing modern data means juggling structured records, specialized metadata, and high-dimensional vectors. This MCP connects any compatible AI client directly to your ClickHouse cluster, letting your agent speak the language of your database. Instead of navigating dozens of tabs or writing complex connection strings, you simply ask questions in natural English.

Your agent handles everything from listing available schemas and running arbitrary DML or SELECT statements to identifying mathematical distance traces using advanced vector metrics like cosineDistance. You can audit cluster health—checking row counts and compression ratios—or verify exact capability branches, such as HNSW support. By connecting your ClickHouse data through Vinkius, you get full control of both your raw SQL records and your semantic search capabilities from one central point.

## Tools

### list_databases
Identifies all top-level schemas available within your ClickHouse cluster.

### list_tables
Retrieves the list of specific tables housed inside a selected database.

### describe_table
Provides a detailed structural breakdown, including all column types and properties for a chosen table.

### execute_sql
Runs any arbitrary SQL query (SELECT, INSERT, UPDATE, DELETE) directly against the cluster.

### vector_search
Calculates and identifies mathematical distances between vector embeddings to find semantic matches.

### get_table_stats
Extracts key internal metrics, such as row counts and compression ratios, for a given table.

### get_version
Retrieves the precise version information and binary limits of the ClickHouse execution instance.

## Prompt Examples

**Prompt:** 
```
What are the total row counts for my top three sales tables?
```

**Response:** 
```
**Table Health Report: Sales Data**

| Table Name | Rows Counted | Compression Ratio |
| :--- | :---: | :---: |
| `analytics_prod.sales` | 1,200,000 |×4.2 |
| `analytics_prod.leads` | 85,000 |×3.1 |
| `system.metrics` | 9,100 | ×6.0 |

*Note: All counts are current as of the last successful query.*
```

**Prompt:** 
```
Find me records that talk about 'sustainable energy' in the embeddings table.
```

**Response:** 
```
**Vector Search Complete**

Found 5 matches for 'sustainable energy'. The top hit is from `analytics_prod.documents`.

*   **Document ID:** D401
*   **Distance Score (cosine):** 0.032
*   **Snippet:** "...requires shifting focus toward sustainable energy sources by 2035."

Would you like to see the associated metadata for this document?
```

**Prompt:** 
```
I need a list of all available databases and what columns they contain.
```

**Response:** 
```
**Database Schema Overview**

We found four primary schemas:

1. **`default`**: Contains general system tables.
2. **`analytics_prod`**: Primary data lake for reports (e.g., `sales`, `users`).
3. **`vector_store`**: Dedicated schema holding high-dimensional embeddings.
4. **`system`**: Operational metrics and metadata.

For example, the `analytics_prod` schema includes tables like `sales` which has columns for `sale_id (Int64)`, `date (Date)`, and `vector_data (Array(Float32))`.
```

## Capabilities

### View database schemas
List all available logical databases and inspect the detailed structure of tables within them.

### Execute structured queries
Run any complex SQL query, including data definitions (DDL), data manipulation (DML), or simple SELECT statements.

### Perform vector similarity searches
Find the closest matching records in your dataset by calculating mathematical distances between vector embeddings.

### Analyze table health statistics
Pull internal structural states, including row counts and compression ratios, to audit how healthy a specific table is.

### Check cluster configuration
Retrieve the version number and binary limits of the ClickHouse instance to verify its current capability set.

## Use Cases

### Auditing data quality after a migration
A DBA needs to check if the new 'sales' table has been loaded correctly and if compression rates are within acceptable limits. They use `get_table_stats` and then `get_version` to confirm both structural integrity and platform capability.

### Building a document search engine
An AI developer needs to find documents semantically related to 'supply chain risk' using vectors. They prompt their agent, which executes the `vector_search` tool, delivering the top matches and associated metadata immediately.

### Generating a quarterly business review report
A data analyst needs to combine sales figures (DDL/DML) with user behavior metrics across three different tables. Instead of writing five separate queries, they ask their agent to run the necessary `execute_sql` commands and compile the results.

### Prototyping a new data source integration
A product team wants to know if a specific column type (like Array(Float32)) is available in an existing database. They use `describe_table` to inspect the schema, confirming feasibility before writing any code.

## Benefits

- Stop writing complex SQL. You tell your agent what data you need, and it generates the necessary query using `execute_sql`.
- Analyze vector similarity without coding boilerplate. Use `vector_search` to find semantic matches just by describing the relationship you're looking for.
- Maintain cluster health easily. Run `get_table_stats` to instantly check row counts and compression ratios across multiple tables.
- Understand your data model quickly. Use `describe_table` to get a deep, reliable schema inspection without needing to consult documentation.
- Verify system limits on the fly. Check capability branches using `get_version` to confirm support for features like HNSW.

## How It Works

The bottom line is, you get to run complex analytical tasks—from simple reporting to advanced semantic matching—using only conversational prompts.

1. Subscribe to this MCP and provide your ClickHouse URL, along with your username and password.
2. Your AI client authenticates and gains read/write access to the specified database cluster.
3. You prompt your agent in natural language; it translates that intent into specific SQL or vector operations against your data.

## Frequently Asked Questions

**How do I perform advanced semantic searches using the ClickHouse MCP?**
You can run high-speed vector similarity searches by telling your agent what concept you are looking for. The MCP executes this search against your embeddings, returning records that match conceptually, not just keyword-for-keyword.

**Can the ClickHouse MCP help me write or fix SQL queries?**
Yes. You can use natural language to request complex reports or data modifications. The MCP translates your intent into precise SELECT, DML, or DDL statements that run directly against your cluster.

**What if I don't know the schema of a table?**
No problem. You can ask the MCP to describe any table you point it toward. It immediately pulls up the column names, data types, and properties so you know exactly what data is available for querying.

**Is this ClickHouse (Vector Search) MCP good for monitoring database health?**
It's great for that. You can run commands to check the table stats, getting real-time metrics like row counts and compression ratios, which helps you audit performance without logging into a command line.

**Does this MCP support different types of data beyond simple text?**
It handles advanced analytical types, including specialized Array(Float32) vectors. This means you can build complex pipelines that combine traditional structured data with high-dimensional semantic information.