# Scale AI MCP

> Scale AI connects your agent directly to industrial-grade data labeling and fine-tuning pipelines. It lets you manage massive annotation projects—for images, video, text, or entity recognition—using natural conversation. You can create new projects, organize high-volume work into batches, submit tasks for multi-modal formats, and track progress without leaving your chat interface.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** data-labeling, rlhf, machine-learning, annotation, training-data, project-management

## Description

Look, if you’re dealing with massive data labeling projects—images, video feeds, text documents—you don't want to jump between ten different platforms. This server hooks your agent up directly to industrial-grade pipelines so you manage the whole process using nothing but natural conversation. You get total control over everything, from setting up a new annotation scope to triggering the final processing run, all inside your chat window.

To start, you gotta define what you're doing. You use **`create_project`** when you need to set up an entirely new labeling project. That call defines the whole scope and rules—whether you’re dealing with simple image annotations or complex text collection jobs. If you figure out halfway through that the quality requirements changed, don't worry; you just run **`update_project_params`** to modify the existing project parameters. This lets you refine labeling instructions on the fly without having to recreate the whole thing.

Once the project is configured, you submit the actual work. The system supports every major data type. For images, you can kick off a standard job using **`create_image_annotation_task`**, which requires defining both the source data and the specific labeling type needed. If your image or video needs pixel-perfect detail, use **`create_segment_annotation_task`** for segmentation tasks; this handles both images and videos. When you're working with motion capture, **`create_video_playback_annotation_task`** creates a task specifically for marking events or objects within continuous video footage. For text data, the options are broad: running **`create_named_entity_recognition_task`** tags specific entities in documents, while **`create_text_collection_task`** handles gathering and labeling large collections of raw text.

When it comes to volume, you don't submit tasks one by one. You use **`create_batch`** to group a whole bunch of individual annotation tasks into one massive batch ready for processing. Once that batch is prepped—meaning every task in the grouping is accounted for and structured correctly—you run **`finalize_batch`**. That call marks the batch as complete, which kicks off the official data labeling pipeline run by Scale AI.

Managing this scale requires constant oversight. If you need to know where a specific job stands, you use **`get_task`** with any task ID; it pulls all the current status details for that job. And if something went sideways, or you just changed your mind about a pending run, you can hit **`cancel_task`**. This cancels a specific, pending data labeling task but keeps the unique task ID available so you don't lose your progress.

You use these tools to manage every stage: defining the project scope with **`create_project`**, making sure the rules are current via **`update_project_params`**, submitting diverse tasks using **`create_image_annotation_task`** for images, **`create_segment_annotation_task`** for pixel detail, **`create_video_playback_annotation_task`** for video events, **`create_named_entity_recognition_task`** for text tags, or **`create_text_collection_task`** for raw data; grouping them into large loads with **`create_batch`**, running the official processing cycle by calling **`finalize_batch`**, and maintaining total visibility over everything through **`get_task`** and **`cancel_task`**. You've got end-to-end control, period.

## Tools

### cancel_task
Cancels a pending data labeling task, allowing you to reuse the unique task ID later.

### create_batch
Groups multiple individual annotation tasks together into one large batch for processing.

### create_image_annotation_task
Creates a specific task to annotate images, defining the required labeling type and source data.

### create_named_entity_recognition_task
Submits a job for recognizing and tagging named entities within provided text documents.

### create_project
Sets up an entirely new project, defining the rules, scope, and type of annotation required.

### create_segment_annotation_task
Creates a task requiring pixel-level segmentation annotation for images or videos.

### create_text_collection_task
Submits a job to gather and label collections of raw text data.

### create_video_playback_annotation_task
Creates an annotation task specifically for marking events or objects within video footage.

### finalize_batch
Marks a prepared batch as complete, triggering the official data labeling processing run by Scale AI.

### get_task
Retrieves all current status details for any specific task ID you provide.

### update_project_params
Modifies the rules and parameters of an existing project to refine labeling quality requirements.

## Prompt Examples

**Prompt:** 
```
Create a new image annotation project called 'Lidar-Obstacles' for imageannotation.
```

**Response:** 
```
I've created the project 'Lidar-Obstacles'. You can now start creating batches or submitting tasks to this project.
```

**Prompt:** 
```
Submit an image annotation task to project 'Lidar-Obstacles' with the image URL 'https://example.com/car.jpg'.
```

**Response:** 
```
Task created successfully. The Task ID is 'task_abc123'. You can track its progress using the get_task tool.
```

**Prompt:** 
```
Finalize the batch named 'sprint-01-batch'.
```

**Response:** 
```
Batch 'sprint-01-batch' has been finalized. Scale will now begin processing the tasks within this batch.
```

## Capabilities

### Project Initialization
You create and configure a new labeling project to define the scope (e.g., image annotation vs. text collection) and specific rules.

### Multi-Modal Task Submission
The agent submits tasks for different data types, including images, video clips, and raw text collections.

### High-Volume Batching
You group many individual tasks into a single batch and then finalize that batch to start the processing run.

### Status Tracking and Management
The agent retrieves detailed status for any task ID, letting you check progress or cancel pending jobs.

### Parameter Adjustment
You update project-level instructions on the fly to adjust labeling quality requirements without recreating the whole project.

## Use Cases

### Training a Model on Lidar Data
A self-driving car ML Engineer needs to label thousands of obstacle images. Instead of writing an API script, they ask their agent: 'Create a project for image annotation and submit 50 tasks using `create_image_annotation_task`.' The agent handles the setup, submission, and tracking until the data is ready.

### Annotating Video Game Footage
An AI Researcher needs to tag specific actions in video. They tell their agent: 'Set up a video annotation project and submit 10 clips using `create_video_playback_annotation_task`.' The server handles the complex multi-modal task setup, allowing immediate review.

### Massive Document Review
A Data Ops Manager needs to process 10,000 legal documents for named entity recognition. They use `create_project` first, then run `create_text_collection_task` followed by `create_batch`, ensuring the entire corpus gets processed and finalized.

### Iterative Quality Improvement
The model is labeling poorly. Instead of restarting, a team member uses `update_project_params` to tighten the quality constraints on an existing project. Then they can monitor the fix using `get_task` before running another batch.

## Benefits

- Manage the entire data flow, from `create_project` setup to final processing, all through your chat interface. You don't leave your workflow.
- Handle diverse data types using single commands: Image Annotation, Semantic Segmentation, and Video Playback tasks are all submitted via dedicated tools like `create_image_annotation_task`.
- Never lose track of work again. Use `get_task` to pull up the current status on any task ID or cancel pending items immediately if needed.
- Process large amounts of data efficiently by grouping individual jobs with `create_batch`, and then executing them all at once using `finalize_batch`.
- Need to change the rules? You can update project parameters dynamically using `update_project_params`. This lets you refine quality without restarting the whole pipeline.

## How It Works

The bottom line is that you manage an entire data pipeline—from project definition to task completion—using only conversational commands.

1. First, subscribe to the server and provide your Scale AI Live API Key.
2. Next, use tools like `create_project` to set up the specific labeling job (e.g., NER or Image Annotation).
3. Finally, submit tasks using specialized tools (`create_batch`, etc.) and wrap up the process by calling `finalize_batch`.

## Frequently Asked Questions

**How do I start annotating images using `create_image_annotation_task`?**
You first need to use `create_project` to define the rules. Then, you can submit your data by calling `create_image_annotation_task`, which uses the project context to submit the actual job.

**What's the difference between `create_batch` and `finalize_batch`?**
`create_batch` just organizes a list of tasks, preparing them for processing. You must call `finalize_batch` afterward to actually tell Scale AI to start running the labeling job.

**Can I change my labeling rules after starting a project? Which tool handles that?**
Yes. Use `update_project_params`. This tool lets you modify instructions and parameters on an existing project without having to delete and rebuild the whole thing.

**How do I check if a task was successful? Which tool should I use?**
Use `get_task` with the unique ID. This retrieves all current status details, letting you know exactly where that specific job stands in the queue.

**Before I use `create_project`, what authentication credentials do I need to connect my agent?**
You must provide your Scale AI Live API Key. Your agent uses this key to authenticate all calls, ensuring it has the correct permissions and access limits for your specific account.

**What happens if I submit a task using `create_image_annotation_task` but realize I need to stop it?**
You use the `cancel_task` tool. This immediately halts any pending annotation job and removes its status, allowing you to reuse that unique ID or correct the initial parameters.

**When should I use `create_segment_annotation_task` instead of standard image annotation?**
Use `create_segment_annotation_task` when your data requires pixel-level precision. This tool handles semantic segmentation, enabling you to draw precise outlines around objects rather than just using bounding boxes.

**What is the correct sequence for handling large volumes of data using `create_batch` and `finalize_batch`?**
First, use `create_batch` to group all related tasks. Next, you must call `finalize_batch`. This commits the entire batch to Scale's system and triggers the actual processing pipeline.

**How do I start a high-volume labeling job using batches?**
First, use `create_batch` to initialize a group for your project. After submitting your tasks to this batch, call `finalize_batch` to signal Scale to begin the labeling process.

**Can I check the status of a specific annotation task?**
Yes, use the `get_task` tool with the specific Task ID. It will return the full metadata, current status, and any available results for that unit of work.

**What should I do if I submitted a task by mistake?**
You can use the `cancel_task` tool with the Task ID. If you need to reuse the unique identifier, you can also set the `clear_unique_id` parameter to true.