watsonx Discovery MCP. Query enterprise data collections using plain English.

Q: How do I find out what collections I have? (listdiscoverycollections)

You use listdiscoverycollections. This tool quickly lists all data collections in your project, giving you the necessary IDs to start querying.

Q: What if my query fails because of permissions? (getcomponentsettings)

Check the system first by calling getcomponentsettings. This tool verifies the operational configuration and health settings for all project components, helping you spot access or setup errors.

Q: How do I know if a document is fully processed? (getdocumentdetails)

Run getdocumentdetails on the specific document ID. It returns metadata and explicitly shows the ingestion status, letting you confirm it's ready for retrieval.

Q: Can I list all documents in a collection? (listcollectiondocuments)

Yes, use listcollectiondocuments. You provide the collection ID, and this tool returns a comprehensive list of every document contained within it.

Q: How do I check the general operational status of my project with getcomponentsettings?

Running getcomponentsettings displays the configuration and health metrics for every component in your watsonx Discovery project. This helps you quickly diagnose if ingestion pipelines or data sources are running optimally.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

watsonx Discovery MCP Server connects your AI client to a cognitive search engine for complex, unstructured data. It lets you query large document repositories using natural language or specialized query languages (DQL), retrieving semantic insights and metadata from massive datasets.

What your AI agents can do

Get component settings

Retrieves the configuration and health status for all project components.

Get document details

Fetches specific metadata, technical details, and ingestion status for a single indexed document.

List available enrichments

Lists all NLP models (like Sentiment or Entities) currently configured to process your documents.

+ 3 more capabilities included

Search unstructured content

Performs natural language or Discovery Query Language (DQL) queries against specified data collections.

Inventory data sources

Lists all available data collections within your project, providing the necessary IDs for querying.

Check document metadata and status

Retrieves technical details, ingestion status, and comprehensive metadata for a specific indexed document ID.

Monitor NLP data enrichments

Lists all available Natural Language Processing (NLP) models—like Sentiment or Entity extraction—applied to your documents.

Check project component health

Verifies the operational configuration and health status for every component in your Discovery project.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

watsonx Discovery MCP Server: 6 Tools for Enterprise Search

Master document retrieval, metadata analysis, and complex querying across all your enterprise data collections.

get019d761f

get component settings

Retrieves the configuration and health status for all project components.

get019d761f

get document details

Fetches specific metadata, technical details, and ingestion status for a single indexed document.

list019d761f

list available enrichments

Lists all NLP models (like Sentiment or Entities) currently configured to process your documents.

list019d761f

list collection documents

Generates a list of every document ID contained within a specified data collection.

list019d761f

list discovery collections

Lists all available data collections within your watsonx Discovery project.

query019d761f

query discovery content

Performs a natural language or DQL query against a specified discovery collection ID and text.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with watsonx Discovery, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Look, this MCP Server connects your AI client straight into watsonx Discovery. It gives you a cognitive search engine for unstructured data—the kind of stuff buried in massive document repositories. You don't have to manually dig through some clunky console dashboard; you just ask natural language questions or run precise queries using the specialized Discovery Query Language (DQL).

It treats your whole collection of documents like one big, searchable knowledge base.

When you connect it, query_discovery_content lets your agent execute a natural language or DQL query against a specific data collection ID and text. This function is how you pull out semantic insights and metadata from huge datasets. But before you run that query, you gotta know what collections exist. Use list_discovery_collections to see every available data collection within your watsonx Discovery project; this gives you the IDs you need for querying.

If you've got an ID already, you can check which documents belong to it. list_collection_documents generates a full list of every document ID inside that specified data collection. Once you have those IDs, if you want the deep details on any single file—like its technical specs, ingestion status, or comprehensive metadata—you run get_document_details.

This pulls all the specific info for one indexed document.

You wanna know what models are running on your documents? You can use list_available_enrichments to see every Natural Language Processing (NLP) model configured to process your files. These include things like Sentiment analysis or Entity extraction, which enrich your data before you even query it. To make sure the whole project is actually working right, get_component_settings retrieves the operational configuration and health status for every single component in your Discovery setup.

It's a full diagnostic suite. You use these tools to understand what data you have (list_discovery_collections), confirm which documents are present (list_collection_documents), check how healthy the system is (get_component_settings), see what processing models are active (list_available_enrichments), and finally, run a query or check specific file metadata using query_discovery_content or get_document_details.

You're not guessing; you've got the full operational picture.

How watsonx Discovery MCP Works

1 Subscribe to this server, providing your watsonx URL, API Key, and Project ID.
2 Your AI agent connects to the endpoint and is ready for a query prompt (e.g., 'List all my Discovery collections').
3 The agent executes the appropriate tool call (list_discovery_collections) and returns the structured data results to you.

The bottom line is, your AI client becomes a direct interface to complex enterprise data, eliminating the need for manual console navigation.

Who Is watsonx Discovery MCP For?

Data Scientists who struggle to validate query inputs; Knowledge Analysts needing rapid context from vast document archives; and Enterprise Developers building grounded applications that require semantic search against proprietary datasets.

Knowledge Analyst

Uses list_available_enrichments to audit what metadata models are running, then uses query_discovery_content to surface answers from document repositories.

Data Scientist

Routinely tests and refines DQL queries using query_discovery_content, monitoring data flow by calling get_component_settings.

Enterprise Developer

Implements the core search logic by chaining calls: first, running list_collection_documents to find IDs, then using those IDs in a query.

What Changes When You Connect

Instant Data Inventory: Instead of navigating multiple console tabs, use list_discovery_collections to get a quick list of all your accessible data sources. You know exactly what you're querying against immediately.
Deep Context Retrieval: The query_discovery_content tool handles complex semantic searches. It doesn't just find keywords; it finds the actual answer buried across massive, unstructured datasets.
Auditability on Demand: Need to verify if your documents are processed correctly? Use get_document_details to pull comprehensive metadata and confirm the ingestion status of any specific file ID.
Know Your Pipes (Health): Keep your data pipeline running by using get_component_settings. This checks project-level configurations and notices, letting you fix issues before they break a query.
Full Visibility into Processing: Don't guess what enrichment is happening. Call list_available_enrichments to see exactly which NLP models (Sentiment, Entities, etc.) are active on your data.

Real-World Use Cases

Finding the source of truth for a policy change

A Knowledge Analyst needs to know how termination clauses changed in 2023. They use list_discovery_collections first, identifying 'Legal Documents'. Then, they execute a targeted query using query_discovery_content with DQL against that collection ID, retrieving the most relevant document snippet and its full metadata via get_document_details.

Verifying data readiness for an AI app

An Enterprise Developer needs to build a new grounded application. They start by calling list_available_enrichments to confirm Sentiment analysis is active. Next, they use get_component_settings to ensure the entire project component is healthy before writing any code.

Debugging data flow issues

A Data Scientist suspects a document wasn't indexed properly. They first call list_collection_documents to get the target ID, then use get_document_details on that specific ID. If the status isn't 'Completed', they know where to focus their fix.

Mapping all available data sets

A Product Team is scoping a new feature and needs a full list of sources. They run list_discovery_collections to map the scope, then call list_collection_documents on the most promising collection ID to get sample document IDs for testing.

The Tradeoffs

Querying without a Collection ID

Trying to run query_discovery_content with just text like 'What were the Q3 sales?' The system fails because it doesn't know which dataset to search.

→ First, always call list_discovery_collections. Identify your target collection ID (e.g., 'col-sales'). Then, run the query using the correct structure: query_discovery_content(collection_id='col-sales', query_text='What were the Q3 sales?').

Assuming data is indexed

Relying on a document containing critical information, but never confirming its status. The AI agent pulls nothing back.

→ Before querying, use get_document_details(document_id='...'). This confirms the document's ingestion status and metadata, ensuring the data is actually ready for retrieval.

Overlooking processing models

Getting a search result that seems vague or generic because key context wasn't extracted. The answer is shallow.

→ Call list_available_enrichments first. This confirms if necessary NLP tools (like Entity Extraction) are active on the data, making your subsequent queries far more precise.

When It Fits, When It Doesn't

Use this server when your primary need is to interrogate large volumes of unstructured or semi-structured text—think legal contracts, support tickets, research papers, etc. The core action is always retrieval based on content understanding.

Don't use it if:
1. Your data lives in a highly structured relational database (SQL). Use a dedicated SQL connector instead.
2. You need to change the data (write/update records). This server is read-only for retrieval and configuration lookup, not modification.
3. You only need metadata about a small, known set of files. While get_document_details helps, if you just want a list of file names without status, another simple listing tool might suffice. Use this when you need the context (metadata + content) together.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by watsonx Discovery. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_component_settings get_document_details list_available_enrichments list_collection_documents list_discovery_collections query_discovery_content

Finding specific data points shouldn't involve switching between five different console tabs.

Today, if a team needs to answer a question about company history or legal policy, they have to manually navigate the IBM Cloud portal. They click into the Discovery project, then select a collection from a dropdown list. If that fails, they must check the component status page, and if all else fails, they copy an ID and paste it somewhere else just to see if it's indexed.

With this MCP server, your AI agent handles the entire flow. You ask, 'What are our Q3 compliance requirements?' The agent automatically calls `list_discovery_collections` for context, then runs `query_discovery_content`, and delivers a direct answer without you touching a single dashboard.

Using the watsonx Discovery MCP Server gives you control over the entire data lifecycle.

Before, monitoring your data was reactive. You'd find out an enrichment failed or that a component was degraded by checking specific health dashboards—a time-consuming manual audit of multiple links. The agent makes this proactive: one prompt calls `get_component_settings` and instantly reports the full project status.

This shift means you move from being a data consumer who waits for reports to a true knowledge worker who can interrogate the system in real-time. It's immediate, actionable intelligence.

Common Questions About watsonx Discovery MCP

How do I find out what collections I have? (list_discovery_collections) +

You use list_discovery_collections. This tool quickly lists all data collections in your project, giving you the necessary IDs to start querying.

What if my query fails because of permissions? (get_component_settings) +

Check the system first by calling get_component_settings. This tool verifies the operational configuration and health settings for all project components, helping you spot access or setup errors.

How do I know if a document is fully processed? (get_document_details) +

Run get_document_details on the specific document ID. It returns metadata and explicitly shows the ingestion status, letting you confirm it's ready for retrieval.

Can I list all documents in a collection? (list_collection_documents) +

Yes, use list_collection_documents. You provide the collection ID, and this tool returns a comprehensive list of every document contained within it.

What do I need to know about available models when using `list_available_enrichments`? +

The tool lists all configured NLP enrichments for your project. These names indicate which specific models—like Sentiment or Entity Extraction—have been applied during the document ingestion process.

When should I use DQL versus natural language when calling `query_discovery_content`? +

You can use either a structured Discovery Query Language (DQL) query or plain English text. Use DQL for precise, repeatable searches, and natural language for quick, semantic questions against your data.

How do I check the general operational status of my project with `get_component_settings`? +

Running get_component_settings displays the configuration and health metrics for every component in your watsonx Discovery project. This helps you quickly diagnose if ingestion pipelines or data sources are running optimally.

Are there rate limits I should worry about when frequently using `query_discovery_content`? +

While the server handles high volume, excessive querying may hit project-level rate limits. For large-scale batch processing, consider chaining your queries or utilizing a dedicated data export tool.

Can I query my data collections using natural language? +

Yes. The query_discovery_content tool allows your AI agent to perform natural language queries against your watsonx Discovery collections, returning highly relevant results based on IBM's cognitive search engine.

How do I see what NLP models are being applied to my documents? +

Use the list_available_enrichments tool to see all NLP enrichments (like Sentiment, Entity Extraction, or Category Classification) configured for your project and applied during the ingestion pipeline.

Can I monitor the ingestion status of a specific document? +

Absolutely. Using the get_document_details tool, you can check the ingestion status and technical metadata for any specific document ID, ensuring your data is correctly indexed and searchable.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript