# MyScale (SQL Vector Database API) MCP

> MyScale (SQL Vector Database API) lets you manage complex vector databases using standard SQL syntax. Your AI agent runs everything—from simple data lookups (`execute_sql_query`) to high-recall semantic searches (`vector_search`), and managing the underlying indices (HNSW, ScaNN)—all in one chat session. It makes running RAG pipelines feel like just another query.

## Overview
- **Category:** databases
- **Price:** Free
- **Tags:** sql, vector-search, rag, clickhouse, database-management

## Description

You gotta manage high-performance vector databases, but you don't wanna write some arcane API calls just to run a query. MyScale lets your AI agent handle complex vector data using standard SQL syntax. Think of it like this: you talk to your agent how you always have—with SQL—and it handles the deep database magic under the hood.

First, you gotta make sure everything's talking. You start by pinging the cluster with `ping_cluster`. That confirms your credentials are good and that the whole MyScale system is actually up and ready to take connections. Once that green light pops up, you’re cleared for action.

When it comes to querying data, it's straightforward. If you need to run any custom SQL—whether it's a simple `SELECT`, an `INSERT` statement, or even an `ALTER` command—you just call `execute_sql_query`. The thing here is that no matter what query you throw at it, the results come back formatted as clean JSON. It makes parsing those data sets a breeze for your agent.

But standard queries aren't enough when you're dealing with meaning. You need semantic search. That’s where `vector_search` comes in play. You feed it a vector embedding, and the tool automatically builds and runs complex SQL to find content that's semantically similar to what you're looking for. It does all the distance calculation stuff so you don't have to.

If your data structure is missing something, you can build it out. You use `create_vector_table` when you need a brand new database table set up specifically to handle vector types and metadata constraints. This function lets you define that entire schema—the vector column and all the supporting info—straight from your prompt.

When you've got data in place, you gotta optimize it for speed. You can add specialized performance indexes using `create_vector_index`. You tell it to build an index like HNSW or ScaNN on an existing table, which dramatically speeds up how fast the database searches huge amounts of data. If you run that and need to know if it's done yet, you check its status with `check_index_status`. This tells you instantly whether the index is 'Built,' still 'InProgress,' or if something went wrong ('Error').

So, you validate connectivity first; then, you build your tables and indexes; next, you run queries using plain SQL or perform deep similarity searches. The whole process—from basic data lookup to complex vector indexing—runs through the same familiar SQL syntax your agent knows. You never have to leave the chat window to manage your entire RAG pipeline.

## Tools

### check_index_status
Checks the build status of a vector index (Built, InProgress, or Error).

### create_vector_index
Adds a specialized performance index to an existing table.

### create_vector_table
Creates a brand new database table that supports vector data types.

### execute_sql_query
Runs any custom SQL query against the MyScale cluster, automatically formatting results as JSON.

### ping_cluster
Verifies that the entire MyScale database cluster is reachable and accepting connections.

### vector_search
Performs a semantic similarity search using a given vector embedding.

## Prompt Examples

**Prompt:** 
```
Check if my MyScale cluster is reachable and credentials are correct.
```

**Response:** 
```
I've pinged the cluster using `ping_cluster`. The connection is successful and the database is responding normally.
```

**Prompt:** 
```
Search the 'articles' table for the top 5 matches for vector [0.12, -0.05, 0.33] where author is 'John'.
```

**Response:** 
```
I'm executing a `vector_search` on the 'articles' table. I've applied the filter for author 'John' and limited the results to 5. Here are the most relevant documents found...
```

**Prompt:** 
```
Create a new vector table called 'product_embeddings' with 1536 dimensions and an 'id' column.
```

**Response:** 
```
I've initiated the `create_vector_table` tool for 'product_embeddings'. The table has been created with a 1536-dimension float array constraint and an additional 'id' column using the MergeTree engine.
```

## Capabilities

### Run arbitrary SQL queries
Execute any standard database query (SELECT, INSERT, ALTER) on the connected MyScale cluster.

### Perform vector similarity searches
Find content most similar to a given data point by automatically constructing and running complex SQL queries.

### Create new tables with vectors
Define an entirely new table structure, including adding the necessary vector column and metadata constraints.

### Build specialized vector indexes
Add performance indices (like HNSW) to existing tables, optimizing search speed for large datasets.

### Monitor index build status
Check the operational state of any vector index to confirm if it is Built, InProgress, or in an Error state.

### Validate cluster connectivity
Confirm that your provided credentials and the MyScale cluster are fully reachable and active.

## Use Cases

### Building a Product Knowledge Base
A data scientist wants to query product specs (structured) and user manuals (unstructured). Instead of running two services, they ask their agent: 'Find the top 3 matching documents for vector [0.12...] in the 'manuals' table, but only show results where the 'product_id' column from the main 'products' table is 'XYZ'.'. The agent executes `vector_search` and then uses the resulting IDs to filter with a structured query via `execute_sql_query`, giving them one unified answer.

### Debugging Index Performance
A backend developer suspects slow search times. They run `check_index_status` on the 'articles' table and see the HNSW index is marked 'Error'. They immediately use `create_vector_index` to rebuild it, then re-run their complex query with `vector_search`, confirming the fix.

### Onboarding New Data Sources
An AI engineer gets a new dataset (e.g., internal HR policies). They first use `create_vector_table` to define the structure and dimensions, then use `execute_sql_query` to load initial metadata records, all before running their first live vector search.

### Verifying Connection Before Deployment
A team lead needs to ensure the staging database is ready for testing. They start by calling `ping_cluster`. If it succeeds, they proceed directly to using `vector_search` because they know the connection layer won't fail.

## Benefits

- **Unified Querying:** You don't need separate tools for structured data and embeddings. Use `vector_search` or `execute_sql_query` to combine both in one prompt, eliminating context switching between services.
- **Zero Setup Time:** Forget writing boilerplate code to manage indexes. Calling `create_vector_index` lets your agent build complex structures like HNSW on demand, making the backend appear simple.
- **Full Schema Control:** When you need a new source of truth, `create_vector_table` handles defining dimensions and constraints automatically. You just describe what's needed in natural language.
- **Operational Safety:** Running `ping_cluster` or `check_index_status` upfront means your agent confirms the database is healthy *before* running costly searches, saving compute time.
- **Advanced RAG Pipelines:** This setup lets you move beyond simple document lookup. You can use `execute_sql_query` to filter results by user ID or date, then pass that filtered list to a subsequent `vector_search` for better context.

## How It Works

The bottom line is you get database control—indexing, querying, and searching—without writing SQL or embedding code yourself.

1. Subscribe to this server and provide your specific MyScale Cluster URL, Username, and Password.
2. Start by running `ping_cluster` or `check_index_status`. This confirms the connection is live and ready for work.
3. Use a single prompt (e.g., 'Find articles about AI governance related to vector [0.12...]') which triggers the agent to run `vector_search` and returns structured results.

## Frequently Asked Questions

**How do I check if my MyScale cluster is ready before running queries using ping_cluster?**
You run `ping_cluster` first. A successful response confirms the connection and credentials are good, meaning your subsequent calls to `vector_search` or `execute_sql_query` should work without authentication errors.

**Can I create a new data table using create_vector_table?**
Yes. You use `create_vector_table` and specify the necessary dimensions (e.g., 1536-dimension float array) in your prompt. The tool handles defining the correct schema constraints.

**What's the difference between vector_search and execute_sql_query?**
`vector_search` is designed specifically for semantic similarity lookups, automatically building distance functions (like cosine). `execute_sql_query` runs any arbitrary SQL statement; use this when you need to run maintenance queries or complex joins that aren't vector-related.

**Do I have to worry about index performance after running a query? How does check_index_status help?**
If your search is slow, the first thing to check is the index. Use `check_index_status` to see if an index needs building or fixing. If it's not 'Built', you need to run `create_vector_index`.

**After I build a table with `create_vector_table`, when should I run `create_vector_index`?**
You must use `create_vector_index` before performing searches. This tool builds the necessary data structure on your new vector column, which is required for fast similarity lookups and efficient querying.

**Does running a general query using `execute_sql_query` always return JSON format?**
Yes, when you run a SELECT query via `execute_sql_query`, the API automatically appends FORMAT JSON to your results. This ensures that all returned data is consistently structured for easy parsing by your agent.

**If my vector index fails, how do I diagnose the problem using `check_index_status`?**
The `check_index_status` tool reports if the build status is 'Error'. If it shows an error, you'll need to review the cluster logs for specific failure messages and adjust your source data or index definition.

**When I use `create_vector_table`, what do I specify regarding dimensions?**
You must define the dimension size when creating a vector table. This process sets up a float array constraint, ensuring that every row you insert has the exact required number of dimensions for your embedding data.

**How can I check if my vector index has finished building?**
Use the `check_index_status` tool. It queries the system tables to show you the current status (Built, InProgress, or Error) for all vector indices in your cluster.

**Can I perform a vector search with metadata filtering?**
Yes! The `vector_search` tool includes an optional `filter` parameter where you can provide a SQL WHERE clause (e.g., "category = 'science'") to restrict your search results.

**What SQL commands are supported by the execute tool?**
The `execute_sql_query` tool supports standard MyScale/ClickHouse SQL, including SELECT, CREATE, ALTER, and INSERT. For SELECT queries, it automatically formats the output as JSON for the agent.