# ParseHub MCP

> ParseHub connects advanced cloud scraping jobs directly into your AI workflow. List configured projects, dispatch headless runs, check crawler status in real time, and pull structured datasets via chat commands. Stop managing web scrapers through separate dashboards; control complex data collection right where you write.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** data-extraction, headless-browser, web-crawling, json-output, cloud-scraping, automation-workflows

## Description

Web scraping used to mean logging into a dedicated dashboard, setting up parameters, hitting 'run,' then waiting for emails or refreshing pages until the data finally appeared. Now, you can manage that entire process inside your chat agent. This MCP lets you treat web crawling like any other function call. You can list all your existing projects—including their start URLs and templates. Need new data? Just dispatch a run job on command, specifying which project to use or even overriding the default starting URL. The system tracks everything, telling you if the job is queued or running. When it’s done, you don't just get a 'Success' message; you pull down secure, structured JSON arrays containing all the scraped payloads, ready for your agent to process.

## Tools

### cancel_run
Stops a running or queued scrape job to free up cloud resources and prevent unnecessary charges.

### delete_run
Permanently removes old scraping run history and associated data, helping you clean up your account storage quota.

### get_project
Retrieves the full configuration details for a specific web scraping project token.

### get_run_data
Downloads the final, structured JSON payload from a run only after it has been confirmed as complete and data-ready.

### get_run_details
Checks the current status of a specific scrape job to determine if it's waiting in queue, running, or finished.

### get_last_ready_data
Immediately fetches the latest completed data for a project without needing to track individual run tokens first.

### list_projects
Lists all available web scraping projects in your account, providing unique tokens and status information.

### list_runs
Provides a historical record of every run for a project, useful for auditing or finding specific past data points.

### run_project
Initiates a new scrape job using the default start URL and template configured in an existing project.

### run_project_with_url
Starts a scraping run targeting a specific, custom web address while maintaining all of the project's original extraction rules.

## Prompt Examples

**Prompt:** 
```
Fetch the list of scrape projects I have on my ParseHub account.
```

**Response:** 
```
Found 3 mapping projects in your account. 1: 'Pricing Monitor' (Token: t9zx... , runs: 4). 2: 'Real Estate Leads' (Token: t4ax...). 3: 'Competitor Blogs'. Want me to launch any of these into a run?
```

**Prompt:** 
```
Start a new run for project 't9zx...' and check its status.
```

**Response:** 
```
Dispatched action `run_project` for target `t9zx...`. Wait, pulling status logic: The cloud system confirmed it. Run ID `run_k1l`. Current Status is 'queued'. The parser will begin explicitly extracting domains soon.
```

**Prompt:** 
```
Extract the finished data JSON payload from run ID 'run_k1l'.
```

**Response:** 
```
Pulled exact constraints running `get_run_data`. The JSON indicates 40 records containing titles and strict price fields. Would you like me to process this data or format it differently?
```

## Capabilities

### List configured projects
View every web scraping project saved in your account, including their unique tokens and template details.

### Start a data extraction run
Tell the MCP to trigger a new headless scrape job for any specified project.

### Target custom URLs
Start a scraping run that focuses on specific pages, bypassing the default starting URL for a project.

### Check run status and progress
Get real-time updates on whether a scheduled scrape is queued, running, or if it has completed successfully.

### Download extracted data payload
Retrieve the final structured JSON data from any completed scraping run for immediate use.

## Use Cases

### Monitoring Competitor Pricing Changes
A market analyst needs to know if a competitor changed its pricing structure. They ask the agent to run an extractor on the main product page, wait for `get_run_details` to confirm completion, and then use `get_run_data` to pull the structured JSON of all price points.

### Processing a Batch of Articles
A research team has 50 articles on different websites. Instead of running 50 jobs manually, they ask the agent to use `run_project_with_url` for each unique URL, then collect all the resulting structured data into one payload.

### Auditing Historical Scrapes
A data engineer needs proof of what was scraped last month. They ask the agent to `list_runs`, find a specific run ID, and confirm its contents using `get_run_data` before moving on.

### Stopping an Overdue Job
A job gets stuck in an infinite loop. The user uses the agent to check the status via `get_run_details`, determines it's stalled, and immediately calls `cancel_run` to free up resources.

## Benefits

- You don't have to switch between the ParseHub dashboard and your agent. You trigger, monitor, and retrieve data—all within one chat session.
- Need fresh data fast? Use `get_last_ready_data` to grab the absolute latest payload without having to track a specific run token first.
- When you need to scrape different pages using the same template (like product categories), use `run_project_with_url`. It changes only the start page, not your extraction rules.
- The system keeps track of everything. Use `get_run_details` to check if a job is queued or running without needing to refresh an external web app.
- You can clean up old jobs and manage costs by using tools like `cancel_run` or permanently removing data with `delete_run`.

## How It Works

The bottom line is, your agent handles the entire sequence: setup, execution, monitoring, and final data retrieval from a single conversation thread.

1. Subscribe to this MCP and provide your ParseHub API key.
2. Ask your agent to list available projects, or specify a custom URL, so it can identify the correct job parameters.
3. Once you confirm the run details, the MCP executes the scrape. You then use subsequent commands to track status until the data is ready for extraction.

## Frequently Asked Questions

**How do I start a scrape if I want to use different pages?**
You use the `run_project_with_url` tool. This lets you target custom URLs while keeping all of your project's original scraping rules and template definitions intact.

**Can ParseHub MCP list what projects I already have?**
Yes, use the `list_projects` tool. It shows every web scraping project you’ve set up, giving you the unique tokens needed for subsequent commands.

**What if my scrape job fails? Can I stop it?**
You can monitor the status using `get_run_details`. If it's stalled or taking too long, use the `cancel_run` tool to safely stop the operation and free up resources.

**How do I get data from a run that finished yesterday?**
First, you should `list_runs` to find the specific ID. Once you have the ID for a completed job, use `get_run_data` to pull down the structured JSON payload.

**Do I need an API key for ParseHub MCP?**
Yep. You must subscribe and provide your ParseHub API Key during setup so the agent can authenticate and manage cloud scraping jobs on your behalf.