# Amazon Bedrock KB MCP MCP

> Amazon Bedrock KB connects your AI agent directly to AWS Bedrock Knowledge Bases, allowing semantic search and managed Retrieval-Augmented Generation (RAG). It lets you query massive corporate datasets—like S3 buckets or internal documents—by executing vector searches without building custom data pipelines. You get grounded LLM responses by letting your agent access proprietary knowledge exactly where it lives in AWS.

## Overview
- **Category:** industry-titans
- **Price:** Free
- **Tags:** rag, semantic-search, vector-search, foundation-models, data-retrieval, cloud-infrastructure

## Description

This MCP connects your AI client to Amazon Bedrock's full suite of knowledge management tools. It lets your agent perform complex information retrieval directly from private, internal document stores inside AWS. Instead of relying on generic internet data, your agent queries vector indices built from your own documents—think HR manuals or engineering specs. The system manages the entire process: it indexes your source files, chunks them into manageable pieces, and performs semantic searches when prompted. You don't need to build custom ingestion pipelines; you just connect your credentials and start querying. If you’re building an agent that must reference specific corporate policies, this MCP gives it access to massive datasets precisely where they reside in AWS. Vinkius hosts this capability so you can give your agent reliable, grounded context from day one.

## Tools

### list_knowledge_bases
Provides a comprehensive list of all available Amazon Bedrock knowledge bases within your account region.

### get_knowledge_base
Fetches the detailed configuration parameters for a specific AWS Bedrock knowledge base instance.

### retrieve
Executes a pure vector query to pull raw text chunks from the index without generating an answer.

### retrieve_and_generate
Generates a complete, grounded LLM response by first retrieving relevant context and then synthesizing an answer using it.

### list_data_sources
Retrieves a list of all external storage buckets currently bound to an Amazon Bedrock Knowledge Base.

### list_ingestion_jobs
Shows the status and history of document syncing operations running through AWS Bedrock's chunking pipelines.

## Prompt Examples

**Prompt:** 
```
Which knowledge bases and embedding models do I have setup?
```

**Response:** 
```
You have 1 Knowledge Base matching your region: 'Internal Wiki KB' (ID: ABCDE12345). It is mapped to the standard AWS Titan Text v2 embedding model and active for incoming syncs.
```

**Prompt:** 
```
Run a retrieval query for 'onboarding process checklist' on my KB and show me the top 3 snippets.
```

**Response:** 
```
I retrieved 3 matches from your KB. Segment 1 (from s3://hr-docs/onboarding.pdf) states: 'Ensure HR syncs Slack accounts within 24h...' Segment 2 (from s3://hr-docs/it-protocols.docx) mentions hardware issuance. Segment 3 highlights the welcome email template.
```

**Prompt:** 
```
Check the status of the S3 ingestion job for my Documentation bucket.
```

**Response:** 
```
The ingestion job for Data Source ID XYZ098 on Knowledge Base ABCDE12345 completed successfully today at 08h30. 15 new documents were chunked and mapped to the index without errors.
```

## Capabilities

### Discover available knowledge bases
You check which Amazon Bedrock Knowledge Bases are configured and active in your region.

### Get specific KB details
The agent fetches the explicit configuration parameters for a single, identified Knowledge Base instance.

### Inspect data source connections
You list and inspect which external storage buckets are actively feeding data into your knowledge base.

### Monitor sync status
The system tracks the real-time operational status of document ingestion jobs, confirming chunking pipelines completed without errors.

### Perform vector queries
Your agent runs a precise query against the vector index to pull back the top text chunks and their source URLs.

### Generate grounded responses
The MCP combines retrieval and generation, producing an LLM answer that is explicitly cited using material from your internal documents.

## Use Cases

### Onboarding HR Policies
An HR specialist asks their agent: 'What's the policy for remote work equipment?' The agent uses `list_knowledge_bases` to select the Policy KB, then runs a query via `retrieve_and_generate`. It returns an answer citing the exact section in the company handbook and listing required forms.

### Debugging Data Sync
An operations engineer notices stale data. They use `list_ingestion_jobs` to check the status of the Documentation bucket's sync job, confirming if it completed successfully at 08:30 and detailing how many new documents were processed.

### Retrieving Source Code Context
A developer needs to know the specific AWS service parameters for a legacy system. They run `list_data_sources` to confirm the correct S3 bucket, then use `retrieve` to pull the precise configuration text chunks needed for coding.

### Validating System Scope
A cloud architect needs to know what knowledge bases exist. They call `list_knowledge_bases`, which instantly shows all active KBs and their associated embedding models, preventing scope creep.

## Benefits

- You eliminate custom vector pipeline development. By using this MCP, your agent queries massive corporate datasets directly where they sit in AWS.
- The `list_ingestion_jobs` tool lets you track sync status in real time; you know immediately if new documents are being chunked and mapped correctly.
- Instead of simple keyword searches, the `retrieve_and_generate` function performs semantic retrieval, understanding context to give accurate answers.
- You can audit your connections using `list_data_sources`; this confirms exactly which S3 buckets are feeding knowledge into the system.
- `get_knowledge_base` gives you explicit control over the KB's configuration; you see the assigned embedding models and boundaries.
- If you only need the source material, use `retrieve`. This function pulls the top-K text chunks and their original document URLs without attempting to write an answer.

## How It Works

The bottom line is that you connect your existing AWS resources—like S3 buckets—and start querying them immediately through your agent client.

1. First, subscribe to the MCP and provide your AWS IAM Role or User Access Credentials.
2. Second, configure your agent client to augment its context using these credentials; this connects it to Bedrock's services.
3. Third, invoke a retrieval tool. The process executes the semantic search against your attached data sources and returns a grounded answer.

## Frequently Asked Questions

**How do I know what knowledge bases are available? (using list_knowledge_bases)**
You run `list_knowledge_bases`. This command immediately lists all the Amazon Bedrock KBs configured in your account, so you can pick the correct one for your query.

**Do I need to worry about data sync status? (using list_ingestion_jobs)**
Yes. The `list_ingestion_jobs` tool lets you check the real-time status of all chunking pipelines, ensuring your source documents are fully mapped before querying.

**What's the difference between retrieve and retrieve_and_generate? (using both)**
`retrieve` only pulls raw text snippets from the vector index; it doesn't write an answer. Use `retrieve_and_generate` when you need a complete, synthesized response grounded in those documents.

**How do I see what data sources are attached to my KB? (using list_data_sources)**
Use the `list_data_sources` tool. It provides an explicit list of all external storage buckets, confirming exactly where your knowledge base pulls its information from.

**When I run a query using the `retrieve` tool, does it enforce specific AWS security policies or IAM roles?**
Yes, the operation runs strictly under your provided AWS credentials. This means the retrieval is limited to only those data sources and knowledge bases your role has explicit read permissions for.

**Before running complex queries, how can I use `get_knowledge_base` to confirm the setup parameters of my Bedrock Knowledge Base?**
This tool returns the KB's core configuration details. You can verify things like the assigned embedding model and regional settings before attempting any retrieval.

**If I use `list_data_sources`, can I confirm if the underlying document format is compatible with chunking?**
The tool provides metadata about all attached sources. This helps you check if the source bucket contains formats that Bedrock's vector ingestion pipeline supports.

**If my queries fail, how can I use `list_knowledge_bases` to identify all available KBs and catch potential ID errors?**
Running this tool gives you a definitive list of every KB in your region. Comparing the listed IDs against your query ensures you're targeting the correct resource.

**Can my AI agent directly run RAG without calling external LLMs?**
Yes! Use the `retrieve_and_generate` capability. Your agent passes the query and a designated Bedrock model ARN. Bedrock handles fetching chunks from the local vector index and synthesizing the final answer inside AWS boundaries, returning a fully grounded response instantly.

**How can I check if new uploaded documents are successfully indexed in my agent?**
Just ask your agent to list ingestion jobs for a specific Knowledge Base ID and Data Source ID. It will report back the exact status (e.g., SYNCING, COMPLETED, FAILED) of chunks being mapped to your vector layout.

**Can I see exactly where an answer came from in my documentation?**
Absolutely. Both the standard `retrieve` functionality and `retrieve_and_generate` calls will parse out the specific origin document URLs (e.g., S3 paths) and expose the exact raw text snippets that mathematically matched your query vector.