# Stitch Data MCP

> Stitch Data connects your data warehouse infrastructure directly to any AI agent. It lets you manage entire ETL pipelines—from listing sources and configuring destinations (like Snowflake or Redshift) to running manual syncs and pushing custom data batches—all through natural conversation.

## Overview
- **Category:** data-analytics
- **Price:** Free
- **Tags:** etl, data-pipeline, data-ingestion, warehouse, stitch-data

## Description

You're using Stitch Data to manage your ETL pipelines, and this server gives your AI client full control over everything—from setting up connections to running custom data loads. You don't have to leave your chat window to handle complex data workflows.

**Managing Connections: Sources and Destinations**
You can start by listing all the types of sources Stitch supports using `list_source_types`, or see every source currently connected through `list_sources`. If you need to add a new connection, you use `create_source` to register it. You can also delete connections permanently with `delete_source`. On the destination side, you check which types are supported via `list_destination_types`, then view all configured endpoints using `list_destinations`. To set up a new spot for your data, run `create_destination`. If that destination changes or gets retired, you use `update_destination` to modify it, or `delete_destination` to wipe it clean.

**Controlling Data Flow and Replication Jobs**
When it comes to moving data, you've got multiple options. You can manually kick off a full data refresh job using `start_sync`, which keeps your warehouse current with the source changes. For more control over what moves, you use `update_stream_metadata` to select specific tables and fields that need replication. If you wanna update an existing connection or pause it, run `update_source`. You'll also find `list_streams` lets you see all the individual tables—the streams—that exist inside a source you’ve selected.

**Loading Custom Batches and Data Integrity**
Sometimes, you don't want a full sync; you just need to push specific data. If you send a controlled batch of records for one table, using `push_import_batch` forces schema validation, which keeps your data high integrity before it lands. When you know the schemas are solid and just need to dump raw data into multiple tables without strict rules, use `push_import_data`. For partners who manage client accounts, you can create a new Stitch client account with `create_account`, or if your front-end client needs temporary access, you generate a token using `create_ephemeral_session`.

**Monitoring and Maintenance Checks**
How do you know if this thing is running right? You check the overall health of the import process using `get_import_status`. To look back at what ran, you use `list_extractions` to view a list of recent attempts to pull data from sources, or `list_loads` for records detailing past load attempts into your destination warehouse. Before you commit any actual data, run `validate_import_data`; this checks if your credentials and data format are correct without saving anything. Finally, when you're done setting up the whole thing, you can still check which destinations are set up via `list_destinations` or what types of sources you've got connected with `list_sources`. You just talk to the agent, and it handles all this heavy lifting for ya.

## Tools

### create_account
Creates a new Stitch client account (only for partners).

### create_destination
Sets up and configures a new data destination for the account.

### create_ephemeral_session
Generates a temporary token needed for front-end client connections.

### create_source
Registers and creates an entirely new data source within the system.

### delete_destination
Removes a configured destination from the account list.

### delete_source
Permanently removes an existing data source connection.

### get_import_status
Checks if the Stitch Import API is currently operational and running correctly.

### list_destination_types
Shows a list of all supported destination types, like Redshift or Snowflake.

### list_destinations
Lists all the data destinations currently configured for your account.

### list_extractions
Retrieves a list of recent job attempts to pull data from sources.

### list_loads
Displays records of past attempts to load data into your destination warehouse.

### list_source_types
Shows all types of sources that Stitch can connect to.

### list_sources
Lists every data source currently connected and configured for your account.

### list_streams
Shows all tables (streams) available within a specific, selected source.

### push_import_batch
Sends a controlled batch of records for one table to the Import API using schema validation.

### push_import_data
Loads raw data for one or more tables without enforcing strict schemas.

### start_sync
Manually triggers a full replication job to update the data warehouse from a source.

### update_destination
Modifies the settings or credentials of an existing destination connection.

### update_source
Changes settings, pauses, or unpauses a connected data source.

### update_stream_metadata
Selects specific tables and fields that should be included in the replication process.

### validate_import_data
Checks if your credentials and data format are correct without saving any records.

## Prompt Examples

**Prompt:** 
```
List all my configured data destinations in Stitch.
```

**Response:** 
```
I've retrieved your destinations. You have 2 active destinations: 'Production Redshift' (ID: dest_123) and 'Analytics Snowflake' (ID: dest_456).
```

**Prompt:** 
```
Push this batch of user records to the 'users' table using push_import_batch.
```

**Response:** 
```
Processing the batch... The data has been successfully pushed to the 'users' table. Stitch has queued the records for ingestion into your warehouse.
```

**Prompt:** 
```
Trigger a manual sync for the source with ID src_987.
```

**Response:** 
```
I've initiated the sync for source src_987. The replication job is now running and you can monitor the progress in the extraction logs.
```

## Capabilities

### Manage Connections
Create and delete data sources and destinations (like S3 or Snowflake) directly through the server.

### Run Sync Jobs
Manually start a replication job to keep your warehouse up-to-date with source changes.

### Load Custom Data Batches
Push specific groups of records for a table, ensuring best data typing and high integrity loading.

### Check Pipeline Health
List recent extraction jobs or check the overall import status to verify if your pipelines are running correctly.

### Update Metadata
Control which specific tables and fields from a source are selected for replication.

## Use Cases

### Debugging a Broken Sync
The Data Engineer notices the Snowflake warehouse is empty. Instead of clicking through three different monitoring dashboards, they ask their agent to run `list_extractions` and then check the logs using `get_import_status`. The agent reports that the source connection failed due to bad credentials, allowing an immediate fix via `update_source`.

### Ad-Hoc Reporting Test
A Product Manager needs to test a new report using user data from last night. They can't wait for the next scheduled sync. They use their agent to run `push_import_batch` with the raw records, loading them directly into Redshift for instant validation.

### Onboarding New Data Sources
A team needs to connect a new CRM system. Instead of manually configuring everything on the web portal, they use their agent to first run `list_source_types` to confirm compatibility, then `create_source`, and finally `start_sync` in sequence.

### Auditing Data Flow
The Analytics team suspects data might be missing. They instruct their agent to run `list_destinations` first to ensure the target is correct, then use `list_loads` to track all recent transfer attempts and verify the count of records.

## Benefits

- Check sync status immediately. Instead of logging into the web UI to see if a job ran, just run `list_extractions` or `get_import_status` and know what's up.
- Handle schema changes easily. You can use `update_stream_metadata` to tell the system exactly which fields you want replicated without touching the source API.
- Load data when needed. Use `push_import_batch` for custom, high-integrity testing loads into your warehouse, bypassing scheduled syncs entirely.
- Manage connections in one go. You can list all sources with `list_sources`, then use `update_source` to pause or restart any connection without leaving the chat window.
- Audit everything. Run `list_loads` to see a history of every attempt to move data, helping you pinpoint when and where data went wrong.

## How It Works

The bottom line is you manage your entire data flow by talking to it; no dashboards or clicks required.

1. Subscribe to the server and provide your Stitch Connect Token, Import Token, and Region.
2. Use commands like `list_sources` or `create_destination` to define where your data is coming from and going.
3. Tell your AI agent what to do—for example, 'Start a manual sync for Source ID ABC'—and watch the status update in the chat.

## Frequently Asked Questions

**How do I check my data pipeline health using Stitch Data?**
You run the `get_import_status` tool. This confirms if the core Import API is running correctly. You can also use `list_extractions` to see detailed logs about recent job runs.

**Can I load custom records using Stitch Data?**
Yes, you use the `push_import_batch` tool. This is best because it validates your data types before loading into your warehouse, keeping your data clean.

**I need to connect a new source; which tool do I use? Stitch Data?**
Start by using `list_source_types` to confirm compatibility. Then, run the `create_source` tool to register the connection with the system.

**What if my destination needs updating? Use Stitch Data.**
Use the `update_destination` tool. This lets you change credentials or settings for an existing warehouse without having to delete and re-create the entire connection.

**How do I generate temporary credentials for frontend integration using Stitch Data?**
Use the `create_ephemeral_session` tool. This generates a time-limited token specifically for your Connect JavaScript client, keeping your core credentials safe while letting you build out front-end integrations.

**Using Stitch Data, how do I control which specific tables or fields get replicated?**
You manage this via the `update_stream_metadata` tool. After listing available streams with `list_streams`, you select the exact fields and data types for replication, preventing unnecessary data transfer.

**Before I push a large batch of records, how can I validate my credentials and the data format using Stitch Data?**
Run `validate_import_data`. This checks your formatting and authentication without actually writing any data to your warehouse. It's an essential pre-flight step for reliable ingestion.

**If I need to force a sync or check past replication job histories with Stitch Data, what tools do I use?**
Start the process by using `start_sync` to manually trigger a replication job. Then, check historical runs by calling `list_extractions` or `list_loads` to monitor successful and failed attempts.

**How can I verify if my data format is correct before pushing it to the warehouse?**
You can use the `validate_import_data` tool. It functions exactly like a data push but only tests your credentials and data formatting without actually persisting any records.

**Is there a way to check if the Stitch Import API is currently operational?**
Yes, use the `get_import_status` tool. It checks the operational status of the Import API and returns whether it is functioning correctly.

**How do I see which types of sources I can connect to my Stitch account?**
Run the `list_source_types` tool. It will provide a comprehensive list of all available source connectors supported by Stitch.