# Dataiku DSS MCP for AI Agents MCP

> The Dataiku DSS MCP connects your AI client directly to your entire data science environment. You can list projects, check dataset schemas, monitor complex pipeline jobs, and audit ML model performance without leaving your chat interface. It puts full control of enterprise data workflows right into conversation.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** data-science, ml-ops, pipeline-orchestration, predictive-modeling, data-pipelines, automation

## Description

Need to manage collaborative data science work in a natural way? This MCP lets you talk to your Dataiku DSS instance like it’s an extension of your own brain. Instead of navigating dozens of tabs and clicking through build logs, you just ask your AI agent for what you need—whether that's listing all available projects or checking the precise schema of a raw dataset. You get immediate status updates on pipeline jobs, monitor training runs, and even trigger automation scenarios to rebuild pipelines when something breaks. It’s full command-line control over data science workflows, accessed via natural language conversation. When you connect this MCP through Vinkius, your agent gets access to the entire catalog of tools needed to manage everything from model metadata to underlying data connections.

## Tools

### list_projects
Lists all DSS projects that your API key has access to.

### get_project
Retrieves metadata, settings, and tags for a specific Dataiku project.

### list_datasets
Lists every dataset contained within a specified project.

### dataset_schema
Provides the complete column names and data types for any given dataset schema.

### list_recipes
Lists all defined recipes, which are your data transformation workflows, in a project.

### list_jobs
Shows all pipeline jobs associated with a project, covering build tasks and model training runs.

### get_job
Gets the current status, timing data, and outputs for a specific job run.

### list_scenarios
Retrieves a list of available automation scenarios within a project.

### list_models
Lists all machine learning models that have been saved or deployed in the project.

### get_model
Retrieves metadata, algorithm details, and performance metrics for a specific ML model.

### run_scenario
Triggers an automation scenario execution, which can rebuild pipelines or retrain models.

### list_plugins
Lists all DSS plugins that have been installed in the environment.

### list_connections
Shows a list of data connections, including configured databases, cloud storage accounts, or APIs.

### get_recipe
Retrieves the full configuration and settings for a specific data transformation recipe.

## Prompt Examples

**Prompt:** 
```
List all my active projects and tell me which one has the 'Customer' tag.
```

**Response:** 
```
**Found 4 Projects:**

*   `Churn-Prediction-Q1`: (Tags: Customer, High Priority)
*   `Sales-Forecasting`: (Tags: Sales, Quarterly)
*   `Fraud-Detection-Live`: (Tags: Security, Live)
*   `Staging-Sandbox`: (No tags)
```

**Prompt:** 
```
Check the schema for the raw user logs dataset in the Fraud project.
```

**Response:** 
```
Schema retrieved successfully:

| Column Name | Data Type | Description |
| :--- | :--- | :--- |
| `timestamp` | Date | Time of action completion |
| `user_id` | String | Unique ID for user |
| `ip_address` | String | IP address used in transaction |
| `action_type` | String | e.g., login, purchase, view |

*Total columns: 4*
```

**Prompt:** 
```
I need to trigger the pipeline rebuild for my sales data.
```

**Response:** 
```
Attempting to run scenario 'REBUILD_PIPELINE' in project 'SALES-FORECASTING'.

**Status:** Triggered.
**Details:** The system is now rebuilding underlying datasets and recipes. This process can take 15–20 minutes depending on data volume. I will notify you when the run is complete.
```

## Capabilities

### Discover and map projects
List all accessible DSS projects and retrieve detailed structural information about their datasets.

### Audit dataset schemas
Get the column names, data types, and full structure for any specified dataset in a project.

### Monitor pipeline execution
Track build tasks, training runs, and job status by listing pipeline jobs and analyzing their current state or timing.

### Verify data transformation logic
Retrieve the exact configuration structure for recipes—whether they use Python, SQL, or visual tools—to audit data flow.

### Control automation scenarios
List available automation scenarios and trigger their execution to securely rebuild pipelines or retrain models.

### Review deployed ML models
Identify saved machine learning models and retrieve detailed performance metrics, including the specific trained schema layers.

### Audit system connections
List all installed plugins and data source connections (like cloud storage or APIs) to verify organizational access rights.

## Use Cases

### Investigating why a model failed
The MLOps team notices the 'Fraud-Detection' model score dropped. They ask their agent to get the model details, check performance metrics, and then use dataset_schema on the source data to see if the input structure changed.

### Verifying a complex ETL job
A Data Engineer needs to confirm that an old sales forecasting pipeline ran correctly. They list the jobs, check the status of the last run using get_job, and then use get_recipe on the transformation recipe to audit the exact SQL logic used.

### Setting up a new environment
An Analytics Manager needs an overview. They list all projects available, check which plugins are installed via list_plugins, and verify if cloud storage connections are properly listed using list_connections.

### Resuming interrupted data flow
A Data Scientist is working on a new segmentation project. They notice the build tasks failed due to bad source data. They use run_scenario to trigger the pipeline rebuild and then check dataset_schema to confirm the raw input columns are correct.

## Benefits

- Instead of manually checking job status, you simply ask your agent to list jobs and get the current execution state or timing.
- You can audit data logic by asking for the explicit configuration structures of recipes (Python, SQL, Visual), verifying transformations instantly.
- Triggering pipelines used to require CLI commands; now you just tell your agent to run a scenario, like rebuilding datasets or retraining models.
- Model performance review is faster. You get detailed metrics and schema layers for saved ML models without needing to open the DSS UI.
- System oversight gets simple. You can list all data connections and installed plugins to quickly verify organizational constraints.
- You gain full visibility into your entire data graph, from listing projects to checking dataset schemas in one conversational flow.

## How It Works

The bottom line is that your AI client acts as a single command center for every aspect of your Dataiku data science workflow.

1. Subscribe to this MCP and provide your Dataiku Instance URL along with a valid API key (Personal, Project, or Global).
2. Your AI agent connects to the service endpoint managed by Vinkius.
3. You then use natural conversation to execute data science commands, like asking the system to list all projects in an environment.

## Frequently Asked Questions

**How do I check if my dataiku projects are connected to external databases?**
The MCP allows you to list all data connections and installed plugins. This lets you quickly audit your entire environment by seeing which cloud storage, APIs, or SQL databases are linked to your DSS instance.

**Can this MCP help me monitor if a model is performing well?**
Yes, you can list saved machine learning models and then request detailed performance metrics. This helps data scientists compare schema layers and track changes in prediction quality directly through conversation.

**Does Dataiku DSS MCP let me run manual pipeline jobs?**
Absolutely. You can use the tools to list all available pipeline jobs, check their status using `get_job`, and even trigger a full rebuild or retraining cycle via automation scenarios.

**What if I need to audit the SQL logic in my data transformations?**
You can retrieve recipes by listing them first, then using the specific tool to pull the explicit configuration structure. This allows you to verify exact Python or SQL code without opening the DSS interface.

**How do I find out what projects I have access to?**
You simply ask your agent to list all accessible DSS projects. It provides a comprehensive overview, including project metadata and tags, so you know exactly what resources are available for your team.