# Import.io Web Data MCP

> Import.io Web Data Extraction MCP lets your AI client scrape and structure data from any website. Run targeted extractors on specific URLs for clean JSON output, initiate massive bulk crawls across multiple pages, or let the Magic API automatically pull tables without pre-configured rules. Monitor job status and download results instantly as CSV or JSON.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** data-extraction, web-crawling, structured-data, json-export, automation, data-pipeline

## Description

Connect this MCP to your agent and take control of web data extraction through natural conversation. You can tell it exactly what you need from any website, whether that's specific product pricing or a list of contacts. Start by triggering predefined extractors on single URLs to get clean JSON right away. Need something bigger? Run large-scale jobs across multiple pages concurrently and track their progress in real time. If the data structure is messy, use the automated Magic API to pull tables without needing setup. Once the work's done, you can retrieve results as structured JSON or CSV files, ready for your spreadsheet program. Everything runs through Vinkius, giving your agent access to thousands of other tools alongside this one.

## Tools

### get_crawl_data
Pulls the final, organized JSON output after a large crawl job has finished.

### get_crawl_status
Checks if an ongoing bulk crawl is still running and how many pages it's processed so far.

### download_csv
Downloads the extracted data as plain CSV text, ready to paste into a spreadsheet.

### get_extractor_data
Retrieves structured JSON data from a single extraction run once that job has completed.

### list_extractors
Lists all the custom extractors already set up in your Import.io account so you know which ones to use.

### run_magic_api
Runs an automated scan against a URL, automatically pulling out tables and structured information without any setup.

### run_extractor
Starts a specific, predefined data extraction job on a single website URL.

### start_crawl
Initiates a large-scale bulk crawling operation across multiple pages at the same time.

### get_extractor_status
Checks the current state of any single extraction job, showing if it's running, done, or failed.

### account_usage
Reports how many API credits you've used this month against your subscription limit.

## Prompt Examples

**Prompt:** 
```
Run extractor 'ext-123' against 'https://example.com/products'
```

**Response:** 
```
Extraction run started. Run ID: 'run-98765'. I'll monitor the status for you. It usually takes a few seconds to process dynamic pages. Would you like me to fetch the results once it's completed?
```

**Prompt:** 
```
List all extractors in my Import.io account
```

**Response:** 
```
I've found 5 extractors in your account: 'Amazon Scraper' (ext-001), 'Competitor Pricing' (ext-002), 'Real Estate Leads' (ext-003), 'Stock Tracker' (ext-004), and 'News Monitor' (ext-005). Which one would you like to run?
```

**Prompt:** 
```
Check my monthly API credit usage
```

**Response:** 
```
You've used 12,450 credits out of your 50,000 monthly limit (24.9%). You have 37,550 credits remaining for this billing cycle. Your account health is excellent.
```

## Capabilities

### Run Targeted Data Extracts
Trigger specific, predefined data extractors on single web pages to pull clean JSON content directly into your workflow.

### Execute Bulk Crawls
Start large-scale scraping jobs across many pages at once and monitor the progress of the entire crawl job.

### Extract Unstructured Data
Use the automated Magic API to identify and pull tabular data from any website, even if you haven't set up a specific extractor for it.

### Track Job Statuses
Poll ongoing extraction runs or bulk crawl jobs to check their current state, success rates, and total pages processed.

### Export Structured Files
Retrieve final extraction results in either structured JSON format or ready-to-use CSV text for immediate processing.

## Use Cases

### Monitoring Competitor Price Changes
A market researcher needs daily pricing data from a rival's product catalog. Instead of manually entering URLs into a scraper, they ask their agent to use the `run_extractor` tool with the specific 'Product Pricing' extractor against all 50 competitor SKUs. The results are compiled and delivered as structured JSON.

### Building an Industry Directory
A business developer needs contact details (email, phone) from a list of websites that don't have standardized data. They ask their agent to use the Magic API (`run_magic_api`) across all 10 sites and then compile the resulting data into one clean CSV file for follow-up.

### Auditing Website Content Scale
A content strategist wants to see how many articles a competitor has published over two years. They ask their agent to use `start_crawl` across the main blog section, monitor the progress using `get_crawl_status`, and get a final count of pages processed.

### Validating Data Schema for New Products
A product manager needs to confirm that all new product listing sites follow the same data format. They run several targeted extracts, check the output using `get_extractor_data`, and ensure the JSON keys are consistent across all sources.

## Benefits

- Automate market intelligence collection by running predefined extractors via `run_extractor` against specific competitor product pages. You get the exact data points you need in JSON format every time.
- Handle massive web audits without writing a single line of code. Use `start_crawl` to monitor progress across hundreds of URLs, letting your agent know when the full dataset is ready.
- Bypass setup entirely with the Magic API (`run_magic_api`). If you just need pricing tables from a random site and don't have an extractor built, this feature gets it for you instantly.
- Keep track of everything. You can check job status using `get_extractor_status` or monitor your budget by running `account_usage`, so you never hit a credit wall when you need data most.
- Get the output in exactly what you need: use `download_csv` to instantly export results for spreadsheet processing, skipping the manual copy-pasting steps entirely.

## How It Works

The bottom line is you can turn unstructured website content into clean, usable data formats without writing any scraping code.

1. Subscribe to this MCP and provide your Import.io API Key.
2. Tell your agent what you want: whether it's running a targeted extractor, starting a massive crawl, or using the Magic API.
3. Your agent will manage the job, track its progress (like pages processed), and then retrieve the final data in JSON or CSV format.

## Frequently Asked Questions

**How does Import.io Web Data MCP handle websites that change often?**
The system uses predefined extractors for stable data points, but if a site layout changes completely, you can use the Magic API to try and pull general structured tables automatically.

**Can I run Import.io Web Data MCP on private sites?**
No. This MCP is designed for public web data extraction using standard scraping methods. It cannot access protected or login-required content.

**What's the difference between `run_extractor` and `start_crawl`?**
`run_extractor` targets a single, specific page with known data points. `start_crawl` is for bulk jobs across multiple pages or sections of a large website.

**What information does the account_usage tool provide?**
It tells you exactly how many API credits you've consumed this billing cycle and what your remaining credit balance is, helping manage your budget.