# GroundX MCP

> GroundX is an MCP for connecting your AI agent directly to private enterprise data stores. It lets you index massive amounts of documents—from local files or entire websites—and run them through a Retrieval-Augmented Generation (RAG) pipeline. Forget generic web searches; this tool gives your AI client access to your company's specific knowledge, turning unstructured PDFs and internal wikis into actionable context for answering questions.

## Overview
- **Category:** knowledge-management
- **Price:** Free
- **Tags:** rag-as-a-service, data-search, document-retrieval, enterprise-data, semantic-indexing, llm-context

## Description

Your AI agent needs more than just general internet knowledge; it needs to know what lives inside your organization. That’s where GroundX comes in. This MCP lets you treat all your private data—PDFs, manuals, support tickets, everything—as a searchable source of truth for your LLM calls. You can feed documents into the system via URLs or local paths and then tell the agent to search across them. Need to keep track of what's indexed? The toolset lets you list all content buckets and check ingestion progress with status checks. If you build agents using Vinkius, this MCP gives your client a dedicated pathway to your proprietary knowledge base. It turns raw data into context, allowing the agent to answer questions based on *your* policies and *your* product specs, not just general training data.

## Tools

### create_bucket
Sets up a new container where you can store and categorize documents.

### create_group
Organizes multiple buckets into a single logical collection for easier management.

### get_customer_info
Retrieves specific account and customer details needed for context during searches.

### get_ingest_status
Checks if a document ingestion task is finished processing or still running.

### ingest_documents
Loads documents into the system using provided URLs or paths.

### ingest_website
Crawls and pulls all textual content from a given website address.

### list_buckets
Shows you every container (bucket) you have set up for storing documents.

### list_content
Lists all the individual documents that have been successfully indexed into the system.

### list_groups
Displays all organized groupings of buckets you've created.

### list_workflows
Lists the predefined pipelines used to manage complex retrieval and indexing processes (RAG workflows).

### search_content
Performs a deep, conceptual search across all your indexed knowledge.

### search_documents
Finds specific documents by querying their metadata or content directly.

## Prompt Examples

**Prompt:** 
```
List all my GroundX data buckets.
```

**Response:** 
```
You have 2 active buckets: 'Knowledge Base' (ID: 101) and 'Support Docs' (ID: 102).
```

**Prompt:** 
```
Search for 'refund policy' in bucket 102.
```

**Response:** 
```
According to the 'Support Docs', refunds are processed within 5-7 business days upon request.
```

**Prompt:** 
```
Check the document count in bucket 101.
```

**Response:** 
```
Bucket 101 ('Knowledge Base') currently contains 14,502 indexed documents.
```

## Capabilities

### Feed Data from Files
You can send documents by URL or local path for the system to ingest.

### Index Entire Websites
The MCP crawls a given website and ingests all the content it finds there.

### Find Specific Records
You can perform semantic searches across all ingested data to find relevant chunks of text.

### Locate Documents by Metadata
This allows you to search for specific files based on their content or associated metadata.

### Manage Data Containers
You can list and create the dedicated containers (buckets) where different sets of documents are stored.

## Use Cases

### Handling a Product Inquiry
A customer service agent needs to answer questions about warranty changes. Instead of guessing, they call the GroundX MCP's `search_content` tool, pointing it at the 'Warranty Policy' bucket, ensuring the AI uses only the most current company documents for its response.

### Onboarding New Employees
A new employee needs to know the internal HR policy. The agent calls `ingest_documents` with the latest handbook PDFs and then runs a query that uses `list_buckets` to confirm all relevant source material is indexed before answering.

### Analyzing Competitor Data
A market analyst wants to compare internal specs against public data. They use `ingest_website` on competitor sites, then run a targeted search using `search_documents` to pull out specific features for comparison.

### Debugging an Agent Flow
A developer needs to know which documents are available to the agent. They first use `list_buckets` and then check `list_content` to confirm that all required data sources were uploaded correctly before testing.

## Benefits

- Stop relying on general web knowledge. By using `ingest_documents` or `ingest_website`, your agent searches only the documents you control, making answers specific and reliable.
- You don't have to rebuild indexing logic every time. The MCP lets you list all available workflows (`list_workflows`) and use them to manage complex data pipelines without writing custom code.
- Need context about a client? Use `get_customer_info` within your agent’s prompt flow. This gives the AI relevant account details when it generates an action or answer.
- When you need to know if the data is ready, use `get_ingest_status`. It tells you exactly where the process stands—finished, failed, or still processing.
- GroundX lets you structure your knowledge by using `create_bucket` and then grouping them with `list_groups`, making sure related documents are always searched together.

## How It Works

The bottom line is you tell your AI agent where the data lives and what questions it needs to answer, and this MCP handles the rest of the indexing and retrieval work.

1. First, use a function to establish your data structure by creating new content buckets.
2. Next, feed the MCP with data—either running `ingest_documents` from files or using `list_workflows` to manage complex retrieval pipelines.
3. Finally, send search queries that trigger semantic searches across all indexed content.

## Frequently Asked Questions

**How do I get my documents into GroundX using ingest_documents?**
You provide the tool with the URLs or local file paths. The MCP handles the actual loading and indexing process for you, so you don't have to write any upload scripts.

**What is the difference between search_content and search_documents?**
Search content performs a conceptual search across everything indexed. Search documents lets you narrow down your query by looking at metadata or searching for a specific file container.

**Can I crawl an entire website using ingest_website?**
Yes, `ingest_website` crawls the specified URL and pulls all textual content it finds. This is much faster than manual copy-pasting or uploading dozens of individual pages.

**How do I ensure my data is ready before searching?**
You must check the processing status using `get_ingest_status`. The MCP won't search until that tool confirms all ingestion tasks are complete.

**What is the best way to organize my data sources using `create_bucket` and `list_buckets`?**
You first call `list_buckets` to see all existing containers. Then, use `create_bucket` to isolate new document sets by topic or department. This keeps your knowledge base structured and easy for the agent to target.

**Where can I check the status or name of my automated data pipelines with `list_workflows`?**
`list_workflows` shows all defined RAG pipelines within GroundX. This lets you verify that your complex document processing chains are active and properly configured before attempting a search.

**How can I monitor a large data upload job and confirm it finished processing with `get_ingest_status`?**
Use `get_ingest_status` to poll the ingestion job's status. It provides a definitive confirmation of whether your document task succeeded, failed, or is still running in the background.

**After running an ingest job, what does `list_content` show me about my documents?**
It shows a full manifest of every indexed document. This list provides critical details like timestamps and source file names, confirming exactly what data is available for your agent to search.

**How do I query my indexed documents?**
Simply ask the AI agent to search for a specific term or concept, and it will query the GroundX API to retrieve the most relevant textual chunks.

**Can I manage data buckets from the agent?**
Yes, you can list your active buckets, check their document count, and verify index status.

**Does it support adding new files to a bucket?**
Currently, the integration focuses on querying the optimized indexes. File ingestion should be managed through the GroundX dashboard or a separate pipeline.