Dataiku DSS MCP. Control data pipelines and models via natural chat.

Q: How do I use the datasetschema tool?

The datasetschema tool requires you to specify the target dataset name. It then returns a list of column names and their corresponding data types for that dataset.

Q: Can I use runscenario to rebuild my pipeline?

Yes. The runscenario tool executes predefined automation flows. You must first use listscenarios to find the exact scenario name (e.g., 'REBUILDPIPELINE') before triggering the run.

Q: What is the difference between listjobs and getjob?

listjobs gives you a list of all jobs (builds, training runs) in a project. getjob requires a specific job ID and gives you the detailed status, timing, and outputs for just that one job.

Q: Does listconnections list all data types?

No. listconnections lists the established data sources (databases, cloud storage, APIs) that Dataiku can access. It doesn't list the data types within those connections.

Q: How do I use listmodels to check model performance?

The listmodels tool provides saved ML model metadata. For performance, you need to use getmodel, which retrieves detailed metrics, the algorithm used, and the specific schema layers of the trained model.

Q: Can I check the metadata for a project using getproject?

Yes, getproject retrieves the project's metadata. This includes general settings, tags, and other configurations. You can use this to understand the project's scope without opening the UI.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Dataiku DSS. Connects your Dataiku DSS instance directly to your AI client. You get full control over enterprise data science workflows—list projects, check dataset schemas, monitor model performance, and run pipeline jobs—all through natural conversation.

It's the API layer for your entire data science stack.

What your AI agents can do

Dataset schema

Gets the column names and data types for a specific dataset.

Get job

Retrieves the status, timing, and outputs of a specific pipeline job.

Get model

Gets the metadata, algorithm, and performance scores for a saved model.

+ 11 more capabilities included

Check dataset schemas

You pass a dataset name and get a list of its columns and data types.

Get job status and timing

You check the current state, timing, and outputs of specific data pipeline jobs.

Get model metadata and metrics

You retrieve the saved ML model's details, including the algorithm used and its performance metrics.

List all projects and connections

You get a list of accessible DSS projects and all connected data sources (APIs, databases).

Audit data transformations

You retrieve the exact configuration and settings for a specific data recipe (Python, SQL, or Visual).

Execute workflows and scenarios

You trigger specific automation scenarios, like rebuilding a pipeline or retraining a model.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

Triggers a defined automation scenario, such as rebuilding a pipeline or retraining a model.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Dataiku DSS, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

You're hooking your Dataiku DSS instance up to your AI client. This lets you run your whole data science workflow through conversation. You can list every project you have access to and check all the data sources connected to DSS, including databases and cloud storage.

Need to scope out a dataset? You pass a dataset name, and the agent gets you a list of its columns and data types. You can also list every dataset within a specific project. To check out the setup, you can list all the data transformation recipes in a project, or list all the installed extensions and plugins for the DSS platform.

When it comes to running stuff, you can list all the pipeline jobs in a project and get their status, timing, and outputs. You can list all the saved and deployed ML models in a project, and the agent will get you the metadata, algorithm, and performance scores for any specific model.

You can list all the defined automation scenarios in a project, and you'll trigger them with run_scenario.

For deep dives, you can get the exact configuration and settings for a specific data recipe using get_recipe. You can also list all the pipeline jobs using list_jobs, and you can trigger a defined automation scenario using run_scenario. You can get the metadata, algorithm, and performance scores for a specific saved model using get_model.

It's all about full control: you can list every DSS project with list_projects and check every dataset schema with dataset_schema.

How Dataiku DSS MCP Works

1 First, subscribe to the Dataiku DSS server and provide your Dataiku Instance URL and API Key (Personal, Project, or Global key).
2 Next, you ask your AI client to perform an action, like listing projects or checking a schema.
3 The server runs the specific tool, returns the structured data, and your AI client interprets it for you.

The bottom line is, you manage your entire data science stack using natural conversation, without ever leaving your AI agent.

Who Is Dataiku DSS MCP For?

The data scientist who needs to check a schema or run a model without opening the Dataiku UI. The data engineer tracking pipeline failures or verifying recipe logic. The MLOps team that needs to trigger a model retraining run on demand. Or the analytics manager who needs a quick inventory of all connected data sources.

Data Scientist

Checks dataset schemas and monitors model training runs to stay in the research flow.

Data Engineer

Tracks pipeline jobs and verifies recipe configurations using natural language queries.

MLOps Engineer

Triggers automation scenarios and monitors deployed models in real-time for production issues.

Analytics Manager

Audits project metadata and verifies data connections across the whole organization.

What Changes When You Connect

See model performance metrics instantly. Instead of navigating to the model tab, you just ask for it using get_model. You get the algorithm and the current performance scores right in your chat window.
Verify data lineage logic. Need to know if a recipe changed? Run get_recipe to pull the exact configuration structure for Python, SQL, or Visual recipes. No more guessing about data flow.
Manage job runs without leaving your agent. Use list_jobs to see if the build tasks finished, or get_job to check the exact timing and output of a specific pipeline run.
Audit your environment quickly. Run list_connections to get an immediate inventory of all connected databases and APIs. It's your single source of truth for data sources.
Automate complex actions. Don't manually rebuild pipelines. Use list_scenarios to find the right automation and then run_scenario to trigger the build or retraining instantly.
Explore the entire data landscape. Use list_projects and list_datasets together to map out every project and dataset in your DSS environment.

Real-World Use Cases

Checking a Dataset's Structure

A data scientist needs to confirm if the 'user_email' field is still a string type before starting a new model. Instead of opening the dataset in Dataiku and clicking through tabs, they ask their agent: 'What is the schema for the 'raw_logs' dataset?' The agent calls dataset_schema and returns the exact column types, letting the scientist validate the data structure instantly.

Troubleshooting a Failed Pipeline Run

The MLOps engineer sees a job failure. Instead of logging into the dashboard and clicking through status pages, they tell their agent: 'Check the job status for the 'Fraud-Detection-Live' pipeline.' The agent runs get_job, providing the status, timing, and failure points in a single response, letting the engineer fix it faster.

Retraining a Model on Demand

The data team decides to retrain the 'Sales-Forecasting' model with new data. Instead of manually following a deployment checklist, they ask the agent to 'Retrain the sales model now.' The agent uses list_scenarios to find the right automation, then calls run_scenario, triggering the build and retraining process immediately.

Inventorying Data Assets

A new analytics manager joins the team and needs to know every project and data source. They ask the agent to 'List all projects and connections.' The agent runs list_projects and list_connections, giving the manager a comprehensive, immediate overview of the entire DSS environment.

The Tradeoffs

Manual UI Navigation

A user manually clicks through the Dataiku web UI: Project > Dataset > Schema Tab. This takes multiple clicks and context switching, and it's impossible to log the exact sequence of views.

→ Just ask your agent: 'What is the schema for dataset X?' The agent uses dataset_schema to pull the exact column structure instantly. This avoids the clicks and gets you the data you need immediately.

Guessing Pipeline State

A user checks the dashboard and sees 'Running...' but doesn't know if it's stuck or progressing. They have to wait or manually check multiple status endpoints, wasting time.

→ Ask your agent to check the job status. The agent runs get_job to provide the precise status, expected completion time, and any output messages, confirming if the pipeline is actually working.

Verifying Recipe Logic by Reading Code

A developer has to open a recipe, scroll through potentially hundreds of lines of Python or SQL, and visually verify a small configuration change, which is error-prone and slow.

→ Use get_recipe. This tool pulls the explicit configuration structure for the recipe, letting you verify the data logic in a clean, structured output, regardless of how complex the underlying code is.

When It Fits, When It Doesn't

Use this if you need to interact with Dataiku DSS's backend functions (schemas, jobs, models, recipes) without opening the browser. This is for engineers who need to script, audit, or programmatically interact with the data science platform's state. Don't use it if you are just trying to visualize data in a chart; for that, you still need the Dataiku UI. If your goal is simple data ingestion from an external source, you might use a general data integration tool instead of relying on list_connections alone.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Dataiku. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 14 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

dataset_schema get_job get_model get_project get_recipe list_connections list_datasets list_jobs list_models list_plugins list_projects list_recipes list_scenarios run_scenario

Sifting through Dataiku tabs to find one simple schema detail.

Today, checking a dataset's schema means navigating the Dataiku UI. You click into the project, select the dataset, then look for the 'Schema' tab. If you're on a large project, it's easy to get lost in the menu structure, and you waste time clicking around just to confirm a column type.

With this MCP server, you just talk to your agent. You ask, 'What is the schema for dataset X?' and the agent uses `dataset_schema` to pull the exact column structure immediately. You get the data, not a confusing menu.

Dataiku DSS MCP Server: Model Monitoring

Before, checking model performance required jumping between the 'Model' tab and the 'Metrics' dashboard, often needing to filter by date range or specific metric. It was a multi-step process that always felt manual and disconnected.

Now, you just ask your agent to check the model. The agent runs `get_model`, giving you the performance metrics and algorithm details right in the chat. It’s one step, zero clicks, and it's always current.

Common Questions About Dataiku DSS MCP

How do I use the `dataset_schema` tool? +

The dataset_schema tool requires you to specify the target dataset name. It then returns a list of column names and their corresponding data types for that dataset.

Can I use `run_scenario` to rebuild my pipeline? +

Yes. The run_scenario tool executes predefined automation flows. You must first use list_scenarios to find the exact scenario name (e.g., 'REBUILD_PIPELINE') before triggering the run.

What is the difference between `list_jobs` and `get_job`? +

list_jobs gives you a list of all jobs (builds, training runs) in a project. get_job requires a specific job ID and gives you the detailed status, timing, and outputs for just that one job.

Does `list_connections` list all data types? +

No. list_connections lists the established data sources (databases, cloud storage, APIs) that Dataiku can access. It doesn't list the data types within those connections.

How do I check the settings for a recipe using `get_recipe`? +

You must provide the specific recipe name and the project ID. The tool then extracts and returns the full configuration structure, letting you audit the data logic.

What information does `list_connections` provide about data sources? +

It lists every data connection configured in your DSS instance. You'll see the connection type (SQL, Cloud Storage, API) and the name assigned to it. This helps you audit which external sources your projects rely on.

How do I use `list_models` to check model performance? +

The list_models tool provides saved ML model metadata. For performance, you need to use get_model, which retrieves detailed metrics, the algorithm used, and the specific schema layers of the trained model.

Can I check the metadata for a project using `get_project`? +

Yes, get_project retrieves the project's metadata. This includes general settings, tags, and other configurations. You can use this to understand the project's scope without opening the UI.

Can my agent trigger a Dataiku automation scenario? +

Yes. Use the 'run_scenario' tool. Provide the project key and the scenario ID. The agent will command the backend to orchestrate the absolute workflow rules, triggering a new execution run for your pipeline or model retraining.

How do I check the schema of a specific dataset via chat? +

Provide the project key and dataset name to the 'dataset_schema' tool. Your agent will validate the API arrays structurally and return the dataset column names and types natively, helping you understand your data boundaries.

Can I monitor the performance of saved ML models? +

Absolutely. Use the 'get_model' tool. Your agent retrieves the metadata and performance metrics defining specific trained schema layers, allowing you to audit model quality and drift without opening the DSS UI.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript