# Octoparse MCP

> Octoparse connects your AI agent directly to a full cloud web scraping platform. Run complex extraction jobs, monitor crawler progress in real time, and pull structured data from external websites straight into your chat context. It lets you treat the entire process—from triggering the scrape to analyzing the resulting rows—as one conversational command.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** data-extraction, web-crawling, no-code, automation, data-pipeline, cloud-scraping

## Description

Octoparse turns web crawling into a simple conversation with your AI agent. Instead of dealing with complex API keys or opening multiple browser tabs, you simply tell your agent what data you need from a website. The MCP handles launching the cloud scraping job and keeps track of its progress until it’s done. Once the data is ready, your agent pulls the extracted rows directly into the chat context. You can then ask the AI to summarize competitive pricing or structure an email list based on that newly acquired information. If you're looking for a central place to manage these connections, Vinkius hosts this MCP alongside thousands of other specialized tools, making it easy for your agent to access everything from data extraction to messaging services.

## Tools

### clear_task_data
Deletes all data associated with a specific Octoparse task, useful for cleaning up test runs before starting production crawls.

### get_task_data
Exports the completed web rows from an Octoparse scraping job so your agent can process them for analysis (limited to 1000 records).

### get_task_status
Retrieves and reports the current running status of any active task in Octoparse's cloud environment.

### get_token
Obtains a fresh OAuth 2.0 access token from Octoparse, which is necessary for subsequent API calls.

### list_task_groups
Lists all top-level folders or groups of tasks within your entire Octoparse account structure.

### list_tasks
Provides a list of every configured cloud scraping task, including its status and creation date.

### mark_data_exported
Changes the status of all stored data in an Octoparse task to 'extracted,' confirming it's ready for use.

### start_task
Initiates a cloud scraping job immediately, changing its status to running within Octoparse.

### stop_task
Halts any currently running Octoparse cloud task before it completes its cycle.

### update_task_params
Adjusts the core search URL or specific keywords driving a task, allowing you to scale parameterized bots without reopening the IDE.

## Prompt Examples

**Prompt:** 
```
Look up task 'LinkedIn Profiles Q4' and tell me how many rows it extracted.
```

**Response:** 
```
The Cloud Agent confirms the task 'LinkedIn Profiles Q4' finished running successfully and acquired `4523` rows of active data.
```

**Prompt:** 
```
Start my Amazon Price Monitor crawler task now.
```

**Response:** 
```
Task started. Your 'Amazon Price Monitor' has been queued to the cloud servers and will begin fetching targeted DOM elements shortly.
```

**Prompt:** 
```
Get the data extracted from task 'Real Estate NYC' and format it as a markdown table.
```

**Response:** 
```
I've fetched the rows successfully. Here is the structured breakdown highlighting the `Address`, `Square Footage`, `Beds`, and estimated `Asking Price`...
```

## Capabilities

### Start and stop scraping tasks
You can launch a cloud scrape job when you need fresh data or instantly halt a task that's running too long.

### Check live task status
Your agent reports the current progress of any active scraping project, letting you know if it’s running smoothly or stalled.

### List all projects and tasks
You can view every folder and individual scraping task configured in your Octoparse account.

### Get raw extracted data rows
The MCP fetches the final, structured web rows from a completed job and loads them directly into your agent's working memory for immediate use.

### Update scraper parameters
You can dynamically change the core URLs or keywords driving a task without having to rebuild the entire scraping project.

## Use Cases

### Competitive pricing intelligence
A business analyst needs to see price changes across 10 major retail sites. They use the MCP to run multiple scrapers, then feed all the resulting data into the agent via `get_task_data`. The agent then builds a comparative markdown table showing only items that dropped in price by over 20%.

### Building lead lists from LinkedIn
A growth hacker wants to build an email list of specific job titles. They use the MCP to start and monitor a targeted scraper, then prompt the agent to pull all collected data using `get_task_data` so the AI can validate the emails against known patterns.

### Debugging web schemas
A data engineer needs to verify if a new scrape job captured the correct fields. They use `list_tasks` first, then trigger a specific task run using `start_task`, and finally pull sample JSON via `get_task_data` to debug the schema without leaving their terminal.

### Automating market monitoring
A business analyst needs daily pricing reports. Instead of manually re-running tasks, they instruct the agent to check status with `get_task_status`, ensuring the scheduled job ran successfully before requesting the latest data dump.

## Benefits

- Data ingestion is instant. Instead of downloading CSVs, you use the `get_task_data` tool to pull structured rows directly into your agent's context, letting it format or summarize results immediately.
- Monitoring is transparent. You get real-time status updates using `get_task_status`, so you never waste time wondering if a crawler is stuck or still working.
- Control is absolute. If a scrape job goes rogue, the `stop_task` tool lets your agent shut it down instantly, saving credentials and compute time.
- Flexibility matters. Need to change what you are looking for? The `update_task_params` tool lets you shift keywords or URLs driving a task without rebuilding the whole project.
- Efficiency gains: You can list all tasks with `list_tasks`, giving your agent a complete map of every scraping job, making data retrieval systematic and reliable.

## How It Works

The bottom line is that you manage complex web scraping processes using only natural language commands in your preferred AI client.

1. First, subscribe to this MCP and provide your premium Octoparse API credentials.
2. Next, command your AI agent to perform a specific data action, like starting a task or listing available projects.
3. Finally, the MCP executes the request against Octoparse's cloud servers, delivering the status updates or raw data directly back to your chat window.

## Frequently Asked Questions

**How do I start scraping with Octoparse MCP?**
You must first obtain an access token using `get_token` and then instruct your agent to use the `start_task` tool, specifying which task group you want active.

**What if my scrape fails halfway through Octoparse MCP?**
You can check the current progress using `get_task_status`. If it's stuck, use the `stop_task` tool to halt the job and figure out what went wrong.

**Can I change the target website mid-scrape with Octoparse MCP?**
Yes. You don't have to rebuild the whole project; you can use `update_task_params` to dynamically adjust the core search URL or keywords driving the task.

**How do I get the data out of Octoparse MCP?**
Use the `get_task_data` tool. This fetches un-exported rows from a completed job, making them available for your agent to analyze and structure immediately.

**What is the best way to manage multiple scrapers with Octoparse MCP?**
Use `list_task_groups` and then `list_tasks`. This gives you a full overview of everything configured in your account, letting your agent target specific jobs.