Unstructured MCP. Automate Data Ingestion from Any Source to Vector DB.

Q: How do I check if my S3 bucket is connected using listdatasources?

Run listdatasources. This command checks all configured remote connectors and tells you whether your S3 credentials are active and recognized by the system.

Q: Can I see which destination databases are available with listdatadestinations?

Yes, running listdatadestinations lists all configured target locations. You can confirm if Pinecone or MongoDB Atlas is set up to receive the processed data.

Q: How do I check if a past ingestion job failed using listworkflowjobs?

Run listworkflowjobs. The output gives you a history of all runs, including success/fail status and the exact time stamp for quick debugging.

Q: How do I monitor the real-time status of an ongoing job using listworkflowjobs?

You query listworkflowjobs and filter by 'status: running'. This shows if a job is currently queued, actively processing data, or paused.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Unstructured MCP Server manages the entire lifecycle of raw data. Connect it to your AI client to pull documents from sources like S3 or SharePoint, define processing rules, and send clean outputs directly to Vector DBs or SQL records.

It lets you automate document ingestion pipelines without opening a dashboard.

What your AI agents can do

Get workflow details

Gets configuration details for a specific document processing pipeline workflow.

List data destinations

Lists all configured target locations where processed data can be stored (Vector DBs, SQL).

List data sources

Lists all configured remote connectors to find where documents are currently located (S3, GCS).

+ 3 more capabilities included

List Available Data Sources

Retrieves a list of all configured external connectors, such as AWS S3 buckets or Google Cloud Storage locations.

List Target Databases

Displays every destination where processed data can be sent, including specific Vector DBs and SQL endpoints.

View Workflow Definitions

Retrieves the precise configuration details for any defined document processing pipeline.

Start Data Ingestion Job

Immediately triggers a full workflow run to ingest and process documents from your specified sources.

Track Job Status

Lists active and historical jobs, letting you monitor progress or check failure logs for document processing tasks.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Unstructured: 6 Tools for Data & Workflow Ops

These tools let you programmatically list sources, check destinations, view workflow definitions, and trigger immediate data ingestion jobs.

get019d7619

get workflow details

Gets configuration details for a specific document processing pipeline workflow.

list019d7619

list data destinations

Lists all configured target locations where processed data can be stored (Vector DBs, SQL).

list019d7619

list data sources

Lists all configured remote connectors to find where documents are currently located (S3, GCS).

list019d7619

list processing workflows

Lists every defined end-to-end pipeline that processes raw documents.

list019d7619

list workflow jobs

Shows a history of all active and completed document processing tasks, including success/fail status.

trigger019d7619

trigger workflow execution

Manually starts an immediate run of a defined processing workflow and returns a job ID.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Unstructured, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Listen up. This server handles your entire raw data lifecycle, taking messy documents—PDFs, reports, whatever—and turning them into clean, structured data your AI client can actually use. You connect this thing to your agent so it can automate document ingestion pipelines without you having to open some clunky dashboard and mess around with settings.

It’ll pull docs from sources like AWS S3 or Google Cloud Storage, let you define the processing rules, and spit out clean records straight into Vector DBs or SQL tables. Your AI agent becomes a command center for building and running Retrieval-Augmented Generation (RAG) pipelines using real data.

Here's what it does:

Listing Data Sources and Targets: You can check where your documents sit and where the clean output needs to go. The list_data_sources tool shows you every configured remote connector, whether that’s an AWS S3 bucket or a Google Cloud Storage location. Similarly, if you need to know what kind of databases are waiting for data, the list_data_destinations tool displays all target locations—that means specific Vector DB endpoints and SQL table definitions.

Managing Pipelines: To see what's possible, you first gotta list out every defined end-to-end pipeline. The list_processing_workflows function gives you a rundown of every existing document processing workflow. Once you know which pipelines exist, the get_workflow_details tool lets you pull up the precise configuration details for any specific one; it shows exactly how that data transformation is supposed to happen.

Running and Tracking Jobs: You don't wanna wait around watching things happen. The server lets you manually kick off a whole workflow run immediately using trigger_workflow_execution, and it returns a job ID so you know what started. To keep tabs on that, the list_workflow_jobs tool shows you both a history of completed tasks and any jobs that are still running.

This log tells you if the sync finished successfully or if there was an error in the processing task.

This whole setup means you can manage data ingestion from source discovery—using list_data_sources to find your files on S3 or GCS—all the way through defining the rules with get_workflow_details, triggering the job with trigger_workflow_execution, and finally confirming the clean output landed exactly where it needed to go, listing those destinations via list_data_destinations.

It's a full loop: find it, define how to process it, run it, check its status.

How Unstructured MCP Works

1 First, your agent uses list_data_sources to confirm the raw documents are available (e.g., a specific S3 bucket).
2 Next, it calls get_workflow_details to ensure the defined workflow correctly maps those sources to the desired destination (e.g., Pinecone or PostgreSQL).
3 Finally, the agent executes trigger_workflow_execution, starting the job and receiving a unique Job ID to track its completion status.

The bottom line is: you manage complex data flow from source identification through execution monitoring using only commands in your chat interface.

Who Is Unstructured MCP For?

Data Engineers, MLOps Teams, and AI Developers need this. If you spend too much time clicking between documentation dashboards, ETL tools, and vector store UIs, this server saves hours. It keeps your entire RAG knowledge base ingestion pipeline in one place.

Data Engineer

Troubleshoots a failed sync by running list_workflow_jobs and then uses get_workflow_details to adjust the source-to-destination mapping before retrying.

MLOps Team Lead

Verifies that scheduled nightly data uploads finished successfully by listing historical jobs, ensuring the vector database population is current and complete.

AI Developer

Tests a new document ingestion flow on demand. They run trigger_workflow_execution to get fresh data into the knowledge base without writing boilerplate cloud code.

What Changes When You Connect

Audit Pipelines: Use list_processing_workflows and get_workflow_details to audit every step of your data flow without logging into the main dashboard. You see exactly how raw documents map to clean JSON records.
Debug Data Flow: If a vector store is missing data, run list_workflow_jobs. This shows you if the job failed and gives you the ID needed to investigate the specific failure point.
Know Your Inputs/Outputs: Run list_data_sources before building anything. It confirms which remote buckets (S3, GCS) are connected and ready to feed documents into your system.
Test Immediately: Don't wait for a cron job. Use trigger_workflow_execution to manually run the pipeline on demand, proving the whole stack works with one command.
Centralized View: You get a unified view of data movement—from raw file storage (Source) → Processing rules (Workflow) → Final database (Destination)—all in your chat.

Real-World Use Cases

Debugging an Intermittent Sync Failure

The MLOps team notices the vector store is incomplete. They ask their agent to run list_workflow_jobs. The agent finds a recent job that failed and reports it couldn't connect to Pinecone. This tells the engineer exactly where the failure happened, allowing them to fix the destination credentials immediately.

Setting up New Data Sources

A Product Manager needs to start indexing documents from a new department SharePoint site. They ask their agent to run list_data_sources to see if SharePoint is supported, confirm the connection credentials are set, and then use get_workflow_details to map that source into an existing workflow.

On-Demand Knowledge Base Update

A company releases a major policy update. Instead of waiting for the scheduled ETL run, the developer asks their agent to use trigger_workflow_execution. The job starts instantly, pulling data from GCS and populating the vector DB within minutes.

Schema Validation Check

Before deployment, a developer uses list_data_destinations to verify that the target SQL database structure is correct. They then use get_workflow_details to confirm the workflow's output schema matches the destination table columns.

The Tradeoffs

Assuming connectivity

The developer writes code that assumes data exists in S3 but doesn't check if the bucket or connection was configured first. The job fails hours later with a vague permissions error.

→ First, run list_data_sources to verify all required connectors are active and accessible. This confirms the foundational connectivity before writing any execution code.

Bypassing workflow validation

A developer manually writes a script that reads raw files but doesn't apply the necessary data cleaning or chunking logic, resulting in garbage data being sent to the vector store.

→ Always use get_workflow_details first. This ensures you are following the predefined, tested pipeline rules—the source-to-destination mapping is handled automatically.

Ignoring job history

The system fails silently on a weekend sync. The developer has no idea if it failed or finished because they never checked the logs.

→ Run list_workflow_jobs immediately. This gives you a chronological list of all runs, showing success status and timestamps for quick triage.

When It Fits, When It Doesn't

Use this server if your core problem is taking raw data (PDFs, documents) sitting in various cloud locations and reliably transforming it into clean, structured JSON/vector records. You need visibility across the entire ingest pipeline: source listing → workflow definition → job execution.

Don't use it if you just need to move files between two directories (use a simple file copy tool). Also, don't use it if your data is already perfectly clean and structured; this server handles unstructured inputs. If your goal is only monitoring the database itself without knowing the source pipeline status, dedicated database tools are better.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Unstructured. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_workflow_details list_data_destinations list_data_sources list_processing_workflows list_workflow_jobs trigger_workflow_execution

Data ingestion pipelines require too many dashboard clicks today.

Right now, getting data into a vector store means jumping between five different UIs: The cloud storage console to check the files; the ETL tool's dashboard to define the processing rules; the destination DB console to verify schema; and finally, running a manual job via a separate scheduler. It’s slow, and if one tab fails, you lose context.

With this MCP server, your AI agent handles all of it. You ask for data movement, and the agent uses `list_data_sources`, confirms the workflow with `get_workflow_details`, and then triggers the whole process via `trigger_workflow_execution`. The entire pipeline runs—and you monitor it—from one single chat window.

Unstructured MCP Server: Run full data pipelines from your agent.

You no longer have to manually check if the required destination connector is online. You simply ask, and `list_data_destinations` confirms its status, or you use it to verify which target database is ready for the incoming data.

The process of verifying sources and destinations becomes a simple lookup command. It eliminates the need to open multiple credential management panels just to confirm connectivity. The whole system talks to your agent now.

Common Questions About Unstructured MCP

How do I check if my S3 bucket is connected using list_data_sources? +

Run list_data_sources. This command checks all configured remote connectors and tells you whether your S3 credentials are active and recognized by the system.

What does trigger_workflow_execution return when I run it? +

It returns a unique job ID. You must capture this ID to track the execution status using list_workflow_jobs later on.

Can I see which destination databases are available with list_data_destinations? +

Yes, running list_data_destinations lists all configured target locations. You can confirm if Pinecone or MongoDB Atlas is set up to receive the processed data.

How do I check if a past ingestion job failed using list_workflow_jobs? +

Run list_workflow_jobs. The output gives you a history of all runs, including success/fail status and the exact time stamp for quick debugging.

What specific configuration data does `get_workflow_details` provide? +

It returns the full blueprint for a single workflow. You get details like required input sources, expected output destinations, and any custom steps or transformations needed before execution.

I need to see all available pipelines; how does `list_processing_workflows` help? +

The function lists every end-to-end processing pipeline configured on your account. It gives you a quick overview of workflow names and their high-level purpose so you can choose the right one.

How do I monitor the real-time status of an ongoing job using `list_workflow_jobs`? +

You query list_workflow_jobs and filter by 'status: running'. This shows if a job is currently queued, actively processing data, or paused.

Can I pass specific parameters when calling `trigger_workflow_execution`? +

Yes. When triggering the workflow, you must include necessary input parameters in the payload. This lets you target a specific directory path or file list for immediate processing.

Can my AI agent trigger an immediate document processing job? +

Yes! If you have a workflow configured to pull files from an S3 bucket and load them into a Pinecone index, you can ask your agent to trigger workflow XYZ. It will start the execution and return the new Job ID, which you can use to track the progress.

How can I verify if my RAG pipelines are failing or succeeding? +

Ask your agent to list your workflow jobs. It will securely connect to Unstructured's engine and return historical and active executions, displaying statuses such as 'completed', 'failed', or 'in_progress'. This is extremely useful for MLOps engineers diagnosing ingestion alerts directly in their terminal.

Can I edit the destination database directly through the agent? +

This server is focused on auditing and executing your existing pipelines. Currently, you can list all connections (sources and destinations) and obtain their details, but creating or destructively modifying vector database connectors must be done inside the Unstructured dashboard for security.

View all recipes →

Build Document Intelligence Using MCP Servers

You have 500 PDFs, contracts and reports that contain critical business knowledge locked inside files nobody reads , Unstructured extracts the content, Pinecone makes it searchable, and Notion indexes every document

Unstructured Pinecone Notion

View all recipes

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python