watsonx Discovery MCP. Query enterprise data collections using plain English.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
watsonx Discovery MCP Server connects your AI client to a cognitive search engine for complex, unstructured data. It lets you query large document repositories using natural language or specialized query languages (DQL), retrieving semantic insights and metadata from massive datasets.
What your AI agents can do
Get component settings
Retrieves the configuration and health status for all project components.
Get document details
Fetches specific metadata, technical details, and ingestion status for a single indexed document.
List available enrichments
Lists all NLP models (like Sentiment or Entities) currently configured to process your documents.
Performs natural language or Discovery Query Language (DQL) queries against specified data collections.
Lists all available data collections within your project, providing the necessary IDs for querying.
Retrieves technical details, ingestion status, and comprehensive metadata for a specific indexed document ID.
Lists all available Natural Language Processing (NLP) models—like Sentiment or Entity extraction—applied to your documents.
Verifies the operational configuration and health status for every component in your Discovery project.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
watsonx Discovery MCP Server: 6 Tools for Enterprise Search
Master document retrieval, metadata analysis, and complex querying across all your enterprise data collections.
019d761fget component settings
Retrieves the configuration and health status for all project components.
019d761fget document details
Fetches specific metadata, technical details, and ingestion status for a single indexed document.
019d761flist available enrichments
Lists all NLP models (like Sentiment or Entities) currently configured to process your documents.
019d761flist collection documents
Generates a list of every document ID contained within a specified data collection.
019d761flist discovery collections
Lists all available data collections within your watsonx Discovery project.
019d761fquery discovery content
Performs a natural language or DQL query against a specified discovery collection ID and text.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with watsonx Discovery, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Look, this MCP Server connects your AI client straight into watsonx Discovery. It gives you a cognitive search engine for unstructured data—the kind of stuff buried in massive document repositories. You don't have to manually dig through some clunky console dashboard; you just ask natural language questions or run precise queries using the specialized Discovery Query Language (DQL).
It treats your whole collection of documents like one big, searchable knowledge base.
When you connect it, query_discovery_content lets your agent execute a natural language or DQL query against a specific data collection ID and text. This function is how you pull out semantic insights and metadata from huge datasets. But before you run that query, you gotta know what collections exist. Use list_discovery_collections to see every available data collection within your watsonx Discovery project; this gives you the IDs you need for querying.
If you've got an ID already, you can check which documents belong to it. list_collection_documents generates a full list of every document ID inside that specified data collection. Once you have those IDs, if you want the deep details on any single file—like its technical specs, ingestion status, or comprehensive metadata—you run get_document_details.
This pulls all the specific info for one indexed document.
You wanna know what models are running on your documents? You can use list_available_enrichments to see every Natural Language Processing (NLP) model configured to process your files. These include things like Sentiment analysis or Entity extraction, which enrich your data before you even query it. To make sure the whole project is actually working right, get_component_settings retrieves the operational configuration and health status for every single component in your Discovery setup.
It's a full diagnostic suite. You use these tools to understand what data you have (list_discovery_collections), confirm which documents are present (list_collection_documents), check how healthy the system is (get_component_settings), see what processing models are active (list_available_enrichments), and finally, run a query or check specific file metadata using query_discovery_content or get_document_details.
You're not guessing; you've got the full operational picture.
How watsonx Discovery MCP Works
- 1 Subscribe to this server, providing your watsonx URL, API Key, and Project ID.
- 2 Your AI agent connects to the endpoint and is ready for a query prompt (e.g., 'List all my Discovery collections').
- 3 The agent executes the appropriate tool call (
list_discovery_collections) and returns the structured data results to you.
The bottom line is, your AI client becomes a direct interface to complex enterprise data, eliminating the need for manual console navigation.
Who Is watsonx Discovery MCP For?
Data Scientists who struggle to validate query inputs; Knowledge Analysts needing rapid context from vast document archives; and Enterprise Developers building grounded applications that require semantic search against proprietary datasets.
Uses list_available_enrichments to audit what metadata models are running, then uses query_discovery_content to surface answers from document repositories.
Routinely tests and refines DQL queries using query_discovery_content, monitoring data flow by calling get_component_settings.
Implements the core search logic by chaining calls: first, running list_collection_documents to find IDs, then using those IDs in a query.
What Changes When You Connect
- Instant Data Inventory: Instead of navigating multiple console tabs, use
list_discovery_collectionsto get a quick list of all your accessible data sources. You know exactly what you're querying against immediately. - Deep Context Retrieval: The
query_discovery_contenttool handles complex semantic searches. It doesn't just find keywords; it finds the actual answer buried across massive, unstructured datasets. - Auditability on Demand: Need to verify if your documents are processed correctly? Use
get_document_detailsto pull comprehensive metadata and confirm the ingestion status of any specific file ID. - Know Your Pipes (Health): Keep your data pipeline running by using
get_component_settings. This checks project-level configurations and notices, letting you fix issues before they break a query. - Full Visibility into Processing: Don't guess what enrichment is happening. Call
list_available_enrichmentsto see exactly which NLP models (Sentiment, Entities, etc.) are active on your data.
Real-World Use Cases
Finding the source of truth for a policy change
A Knowledge Analyst needs to know how termination clauses changed in 2023. They use list_discovery_collections first, identifying 'Legal Documents'. Then, they execute a targeted query using query_discovery_content with DQL against that collection ID, retrieving the most relevant document snippet and its full metadata via get_document_details.
Verifying data readiness for an AI app
An Enterprise Developer needs to build a new grounded application. They start by calling list_available_enrichments to confirm Sentiment analysis is active. Next, they use get_component_settings to ensure the entire project component is healthy before writing any code.
Debugging data flow issues
A Data Scientist suspects a document wasn't indexed properly. They first call list_collection_documents to get the target ID, then use get_document_details on that specific ID. If the status isn't 'Completed', they know where to focus their fix.
Mapping all available data sets
A Product Team is scoping a new feature and needs a full list of sources. They run list_discovery_collections to map the scope, then call list_collection_documents on the most promising collection ID to get sample document IDs for testing.
The Tradeoffs
Querying without a Collection ID
Trying to run query_discovery_content with just text like 'What were the Q3 sales?' The system fails because it doesn't know which dataset to search.
→
First, always call list_discovery_collections. Identify your target collection ID (e.g., 'col-sales'). Then, run the query using the correct structure: query_discovery_content(collection_id='col-sales', query_text='What were the Q3 sales?').
Assuming data is indexed
Relying on a document containing critical information, but never confirming its status. The AI agent pulls nothing back.
→
Before querying, use get_document_details(document_id='...'). This confirms the document's ingestion status and metadata, ensuring the data is actually ready for retrieval.
Overlooking processing models
Getting a search result that seems vague or generic because key context wasn't extracted. The answer is shallow.
→
Call list_available_enrichments first. This confirms if necessary NLP tools (like Entity Extraction) are active on the data, making your subsequent queries far more precise.
When It Fits, When It Doesn't
Use this server when your primary need is to interrogate large volumes of unstructured or semi-structured text—think legal contracts, support tickets, research papers, etc. The core action is always retrieval based on content understanding.
Don't use it if:
1. Your data lives in a highly structured relational database (SQL). Use a dedicated SQL connector instead.
2. You need to change the data (write/update records). This server is read-only for retrieval and configuration lookup, not modification.
3. You only need metadata about a small, known set of files. While get_document_details helps, if you just want a list of file names without status, another simple listing tool might suffice. Use this when you need the context (metadata + content) together.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by watsonx Discovery. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Finding specific data points shouldn't involve switching between five different console tabs.
Today, if a team needs to answer a question about company history or legal policy, they have to manually navigate the IBM Cloud portal. They click into the Discovery project, then select a collection from a dropdown list. If that fails, they must check the component status page, and if all else fails, they copy an ID and paste it somewhere else just to see if it's indexed.
With this MCP server, your AI agent handles the entire flow. You ask, 'What are our Q3 compliance requirements?' The agent automatically calls `list_discovery_collections` for context, then runs `query_discovery_content`, and delivers a direct answer without you touching a single dashboard.
Using the watsonx Discovery MCP Server gives you control over the entire data lifecycle.
Before, monitoring your data was reactive. You'd find out an enrichment failed or that a component was degraded by checking specific health dashboards—a time-consuming manual audit of multiple links. The agent makes this proactive: one prompt calls `get_component_settings` and instantly reports the full project status.
This shift means you move from being a data consumer who waits for reports to a true knowledge worker who can interrogate the system in real-time. It's immediate, actionable intelligence.
Common Questions About watsonx Discovery MCP
How do I find out what collections I have? (list_discovery_collections) +
You use list_discovery_collections. This tool quickly lists all data collections in your project, giving you the necessary IDs to start querying.
What if my query fails because of permissions? (get_component_settings) +
Check the system first by calling get_component_settings. This tool verifies the operational configuration and health settings for all project components, helping you spot access or setup errors.
How do I know if a document is fully processed? (get_document_details) +
Run get_document_details on the specific document ID. It returns metadata and explicitly shows the ingestion status, letting you confirm it's ready for retrieval.
Can I list all documents in a collection? (list_collection_documents) +
Yes, use list_collection_documents. You provide the collection ID, and this tool returns a comprehensive list of every document contained within it.
What do I need to know about available models when using `list_available_enrichments`? +
The tool lists all configured NLP enrichments for your project. These names indicate which specific models—like Sentiment or Entity Extraction—have been applied during the document ingestion process.
When should I use DQL versus natural language when calling `query_discovery_content`? +
You can use either a structured Discovery Query Language (DQL) query or plain English text. Use DQL for precise, repeatable searches, and natural language for quick, semantic questions against your data.
How do I check the general operational status of my project with `get_component_settings`? +
Running get_component_settings displays the configuration and health metrics for every component in your watsonx Discovery project. This helps you quickly diagnose if ingestion pipelines or data sources are running optimally.
Are there rate limits I should worry about when frequently using `query_discovery_content`? +
While the server handles high volume, excessive querying may hit project-level rate limits. For large-scale batch processing, consider chaining your queries or utilizing a dedicated data export tool.
Can I query my data collections using natural language? +
Yes. The query_discovery_content tool allows your AI agent to perform natural language queries against your watsonx Discovery collections, returning highly relevant results based on IBM's cognitive search engine.
How do I see what NLP models are being applied to my documents? +
Use the list_available_enrichments tool to see all NLP enrichments (like Sentiment, Entity Extraction, or Category Classification) configured for your project and applied during the ingestion pipeline.
Can I monitor the ingestion status of a specific document? +
Absolutely. Using the get_document_details tool, you can check the ingestion status and technical metadata for any specific document ID, ensuring your data is correctly indexed and searchable.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
Roblox Experience Discovery
The definitive server for Roblox experiences — search games, track live players, and discover trends via AI.
DropBox Alternative
Manage your DropBox files and folders — audit storage and sharing via AI.
Shutterstock
Equip your AI agent to radically sift through Shutterstock's immense media vault. Search high-quality images, videos, audio, and audit editorial licenses directly from your prompt.
You might also like
Paleobiology Database
Access the world's largest fossil database — query occurrences, analyze taxonomic diversity, and explore geological intervals directly from your AI agent.
Zoho CRM Admin
Manage Zoho CRM users, roles, profiles, layouts, territories, and tags — complete admin control through conversation.
Email (.eml) File Parser
Transform heavy raw email exports into crystal-clear text local. Let your AI act as your personal secretary, instantly summarizing threads without wasting context window tokens.