Octoparse MCP for AI. Manage scraping tasks and pull web data from chat.

Q: How do I get the status of my scraping task using Octoparse MCP Server?

Use gettaskstatus. This tool returns the current operational state (Running, Completed, Stopped) of your specified task ID. It's the first check you should always run.

Q: Can I only get new data using Octoparse MCP Server?

No. While getnewdata pulls non-exported records, you also use gettaskdata if you need to paginate through large datasets or retrieve specific batches by offset.

Q: What happens when I use starttask? Does it run forever?

starttask initiates the job. You must then repeatedly check the progress using gettaskstatus. If needed, you can call stoptask to halt the process if it goes off track.

Q: How do I get all historical data using the gettaskdata tool?

You must call gettaskdata repeatedly, incrementing the offset parameter each time. This allows you to pull through every record in a task, not just the first batch.

Q: How do I see all available scraping setups using listtaskgroups?

listtaskgroups retrieves a comprehensive list of all managed task groups. You use these IDs to filter and locate specific sets of tasks when you need them.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

Octoparse MCP Server lets your AI client manage web scraping tasks directly in chat. It connects to Octoparse's API, giving you full control over complex data extraction workflows—no manual exporting required.

Your agent can list all task groups, check the real-time status of scrapers, start new extractions on demand, and pull filtered, non-exported records based purely on conversation.

This tool turns your AI client into a dedicated data researcher.

What your AI can do

Get new data

Fetches all extracted records that have not been marked as exported for a specific task.

Get task data

Retrieves structured data from a specified scraping task using an offset value to handle large result sets.

Get task status

Returns the current operational status of any defined web scraping task.

+ 5 more capabilities included

List All Task Groups

It lists every defined task group in your Octoparse account.

Manage and List Tasks

You can list specific scraping tasks, optionally filtering them by a designated task group ID.

Start/Stop Scraping Jobs

It initiates or halts cloud-based data extraction jobs on any specified task.

Get Task Status Updates

You receive the current operational status (Running, Completed, etc.) of a scraping job.

Retrieve New Data Records

It pulls records that have been extracted but haven't been marked as exported yet.

Get Specific Task Data Batches

You fetch structured data from a task using an offset, allowing for pagination of results.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

Octoparse MCP Server: 8 Tools for Web Scraping Operations

These tools let your agent list groups, manage task status, start/stop scraping jobs, and retrieve data batches from Octoparse directly.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Octoparse on Vinkius

Get New Data

Fetches all extracted records that have not been marked as exported for a specific task.

Get Task Data

Retrieves structured data from a specified scraping task using an offset value to...

Get Task Status

Returns the current operational status of any defined web scraping task.

List Task Groups

Lists all existing, managed groups of scraping tasks within your Octoparse account.

List Tasks

Retrieves a list of specific scraping tasks, optionally filtered by an associated...

Start Task

Initiates the execution of a specified web scraping task in the cloud environment.

Stop Task

Halts an actively running web scraping task immediately.

Update Data Status

Manually marks a given set of data records as exported, changing their status within...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Octoparse integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "octoparse-alternative": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Octoparse tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"octoparse-alternative": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Octoparse, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Octoparse. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 8 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Checking web data shouldn't require jumping between tabs.

Today, monitoring competitor pricing means logging into Octoparse, checking the task status panel (Is it Running? Is it Failed?), then navigating to the results tab, manually filtering for non-exported records, and finally downloading a CSV file—all before you can even start your analysis.

With this MCP server, you just tell your agent: 'Check if the pricing monitor is done.' The agent runs `get_task_status` and responds immediately. If it's good to go, it uses `get_new_data` to pull the latest records right into our chat window. You get instant context, no manual exports required.

Octoparse MCP Server: Manage data flow from conversation.

The old way involved running a job and then treating the result as a black box—you'd download it, check its structure in Excel, and hope you hadn't missed any critical status updates. The entire cycle felt disconnected and manual.

Now, your agent handles the whole loop. You tell it to run `start_task`, wait for confirmation via `get_task_status`, and when ready, pull filtered data using `get_new_data`. Your AI client acts as a dedicated extraction lead; it manages state transitions so you don't have to.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

Octoparse MCP Server lets your AI client run complex web scraping jobs right in chat. Ya don't have to click around in the Octoparse interface; you just tell your agent what data ya need, and it handles every API call behind the scenes. This tool turns your AI client into a dedicated data research machine, giving you full control over extraction workflows without ever leaving your conversation window.

Managing Tasks and Groups
You can ask your agent to pull up a list of all defined task groups in your account using list_task_groups. To see the specific scrapers within those groups, use list_tasks; you can even filter that list down by a designated group ID. For any running job, you'll get real-time updates on its operational status—whether it’s Running or Completed—by calling get_task_status.

Controlling the Scraping Job
When you need data, your agent can start the extraction process using start_task, firing up a cloud job for any specified task. If something goes sideways, don't sweat it; you can halt an active job immediately with stop_task. Once the scraping is done and the data sits waiting in the system, use update_data_status to manually mark specific records as exported, changing their status within Octoparse.

Retrieving the Data
The real power is getting the raw intel. To pull only the records that have been extracted but haven't been marked as exported yet, you call get_new_data. If you need a huge dataset and it comes back in chunks, you can use get_task_data to fetch structured data batches using an offset value, letting your agent paginate through massive result sets.

This gives you granular control over exactly what data gets pulled directly into the chat.

Built · Hosted · Managed by Vinkius Octoparse MCP Server - Web Data Extraction & Task Management

Server ID 019dd130-1dab-72f9-ad56-ced5fd1e2a77

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

Here's how it actually works

The bottom line is you manage complex web scraping processes by talking to your AI client instead of navigating multiple dashboards.

Subscribe to the server and provide your Octoparse OpenAPI Access Token in the settings.

Your AI client sends a conversational prompt (e.g., 'Start the pricing monitor task').

The agent routes that request through the correct tool (start_task) and returns the results, status updates, or data directly to the chat.

Who is this actually for?

Market researchers and data analysts who are tired of switching between Octoparse, a spreadsheet, and their chat window. This server is for the person whose job requires constant web monitoring—the one who needs to check competitor pricing or monitor trends without ever leaving their primary workflow tool.

Market Researcher

Uses list_tasks and get_new_data to quickly pull competitor data and track changing price points, avoiding app switching.

Data Analyst

Manages the ingestion pipeline by checking status with get_task_status, ensuring extraction health before pulling final datasets via get_task_data.

Developer/Automation Engineer

Integrates real-time web scraping into code flows, using the agent to trigger tasks (start_task) and manage data status updates (update_data_status).

What Changes When You Connect

You control task flow without leaving your agent. Instead of opening Octoparse, you simply ask to list_tasks or check status with get_task_status. This keeps your entire research process centralized in one window.

Data retrieval is smarter and faster. Don't manually export CSVs; use get_new_data to pull only the records that are ready for review, filtering out already processed data points.

Full automation of complex jobs. Need to monitor a competitor? Use your AI client to execute start_task on demand and then check progress with get_task_status, all in one continuous conversation thread.

Granular control over the dataset lifecycle. If you need to manually update records, use update_data_status. This capability lets you manage data flags right where your AI agent is working.

Streamlined data access for analysts. When a task is done and you need results, don't just download everything. Use get_task_data with offsets to pull specific batches of records directly into the chat context.

See it in action

01 01

Monitoring Competitor Pricing Shifts

A market researcher needs to know if a competitor changed their pricing page. They prompt their agent: 'Start the Amazon Monitor task and check its status.' The agent runs start_task, gives them the real-time progress via get_task_status response, and then, once complete, pulls all new leads using get_new_data. Problem solved without opening a browser.

02 02

Debugging Data Pipelines

A developer runs a scraper but suspects some data is marked incorrectly. They use the agent to run list_tasks first, verify the task ID, and then call update_data_status to mark a batch of records as exported, ensuring subsequent pulls via get_task_data are accurate.

03 03

Comprehensive Data Audit

A data analyst needs an overview of all scraping projects. They ask the agent to run list_task_groups, getting a full map of available scrapers. Then, they can individually check each group using list_tasks before deciding which one to kick off via start_task.

04 04

Handling Large Datasets

The agent fetches results from the 'Competitor Monitor' task. Instead of receiving a massive data dump, it uses get_task_data with an offset parameter to pull the first 100 records, keeping the conversation manageable and actionable.

The honest tradeoffs

Manual Exporting

Anti-pattern

A user runs a task, waits for completion, then manually clicks 'Export CSV' in the Octoparse UI, downloads the file, and re-uploads it to their analysis tool.

The Fix

Let your agent handle it. After running start_task, use get_new_data or get_task_data directly through the chat interface to ingest results immediately.

Unsure of Task Status

Anti-pattern

A user runs a task, gets disconnected, and then doesn't know if it finished or failed. They waste time re-running the job unnecessarily.

The Fix

Always check the status first. Use get_task_status to confirm the current state of the scraping job before taking any action.

Mixing up data access

Anti-pattern

A user tries to pull 'all' data without knowing what has been processed, resulting in duplicated or incomplete record sets.

The Fix

When you only want new findings, use get_new_data. This tool specifically retrieves records that haven't been marked as exported yet.

When It Fits, When It Doesn't

Use this server if your workflow relies on managing multiple state changes: starting a job, monitoring its status, and then retrieving specific batches of resulting data. It’s essential when the process is iterative—you run it, check it, fix it, repeat.

Don't use this if you just need to read static web content; standard scraping tools handle that fine. Also, don't try to build a custom database schema directly in the chat; the data must be pulled via get_task_data first so your agent can validate its structure. If all you need is a simple list of tasks without status checks, list_tasks works, but relying only on that misses the operational context provided by get_task_status and start_task. You need the full cycle for true automation.

Questions you might have

How do I get the status of my scraping task using Octoparse MCP Server? +

Use get_task_status. This tool returns the current operational state (Running, Completed, Stopped) of your specified task ID. It's the first check you should always run.

Can I only get new data using Octoparse MCP Server? +

No. While get_new_data pulls non-exported records, you also use get_task_data if you need to paginate through large datasets or retrieve specific batches by offset.

What happens when I use start_task? Does it run forever? +

start_task initiates the job. You must then repeatedly check the progress using get_task_status. If needed, you can call stop_task to halt the process if it goes off track.

Do I need to manually export data after scraping with Octoparse MCP Server? +

No. The whole point is that your AI agent interacts directly with the API. You can pull and filter results in chat using get_new_data or get_task_data, bypassing manual exports.

How do I authenticate my connection to the Octoparse MCP Server? +

You connect by entering your OpenAPI Access Token. You need this token from your Octoparse profile settings to manage web scrapers through your AI client.

How do I get all historical data using the `get_task_data` tool? +

You must call get_task_data repeatedly, incrementing the offset parameter each time. This allows you to pull through every record in a task, not just the first batch.

What is the purpose of using the `update_data_status` tool? +

This tool marks data records as exported or processed within Octoparse. Running this prevents you from retrieving the same data repeatedly, improving efficiency.

How do I see all available scraping setups using `list_task_groups`? +

list_task_groups retrieves a comprehensive list of all managed task groups. You use these IDs to filter and locate specific sets of tasks when you need them.

Can my AI automatically find the latest extracted data for a specific task? +

Yes! Use the get_not_exported_data tool with the Task ID. Your agent will respond with complete metadata for the newest records that haven't been marked as exported yet in seconds.

How do I find my Octoparse OpenAPI Access Token? +

Log in to Octoparse, navigate to the OpenAPI section in your profile or developer portal, and follow the instructions to generate a Bearer token using your account credentials.

Can I start a scraper via the AI? +

Absolutely. Use the start_task tool with your Task ID. The AI will command Octoparse to begin the extraction in the cloud immediately.

Connect to your AI in seconds.

Get new data

Get task data

Get task status

Octoparse MCP Server: 8 Tools for Web Scraping Operations

Make your AI actually useful.

Get New Data

Get Task Data

Get Task Status

List Task Groups

List Tasks

Start Task

Stop Task

Update Data Status

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

Checking web data shouldn't require jumping between tabs.

Octoparse MCP Server: Manage data flow from conversation.

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

Monitoring Competitor Pricing Shifts

Debugging Data Pipelines

Comprehensive Data Audit

Handling Large Datasets

The honest tradeoffs

Manual Exporting

Unsure of Task Status

Mixing up data access

When It Fits, When It Doesn't

Questions you might have