Import.io MCP. Get structured data from any website, conversationally.

Q: How do I use the Import.io (Web Data Extraction) MCP Server to scrape multiple pages?

You use the startcrawl tool. This initiates a single job that covers all pages concurrently. You then use getcrawlstatus to track its progress until it completes.

Q: Is runmagicapi the same as runextractor?

No. runextractor uses a specific, pre-built template you defined. runmagicapi is for quick exploration; it automatically finds and extracts data from any site without needing a pre-configured extractor.

Q: What should I do with the data after the extraction is done?

Once the job is done, use downloadcsv to get the data as a CSV file, or use getcrawldata to get the raw JSON output for further processing in your agent.

Q: How do I check if my API credits are low using Import.io (Web Data Extraction) MCP Server?

Call the accountusage tool. This instantly tells you your API credit consumption against your monthly limit.

Q: How do I check the progress of a job using the getcrawlstatus tool?

Use getcrawlstatus with the crawl job ID. This tool tells you the job's current state, how many pages it's processed, and its success rate. You can monitor large crawls in real-time.

Q: What should I use to find the correct extractor ID before running it with runextractor?

First, call the listextractors tool. This gives you a list of all extractors configured in your account. Once you have the ID, you can use it with runextractor to start the data pull.

Q: If my extraction fails, how do I debug the issue using the getextractorstatus tool?

The getextractorstatus tool checks the run's current state (running, completed, or failed) and provides metadata about the run. This helps you see why a job failed and what needs fixing.

Q: Is there a way to download the data I get from getcrawldata for my spreadsheet?

Yes, use the downloadcsv tool. This function downloads the extraction data directly as CSV text, which is perfect for pasting into any spreadsheet program.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Import.io (Web Data Extraction) MCP Server handles web data extraction and large-scale scraping. Trigger predefined extractors for specific URLs to get clean JSON data.

Start bulk crawls across many pages and monitor progress in real-time. Use the Magic API to pull structured data from any site without setting up extractors.

What your AI agents can do

Account usage

Checks your current Import.io account API credit usage.

Download csv

Downloads extracted data directly as CSV text, suitable for spreadsheet processing.

Get crawl data

Retrieves the full JSON output of a completed Import.io crawl job.

+ 7 more capabilities included

Run predefined data extractions

Triggers a specific, pre-configured data extractor against a target URL and returns the resulting structured JSON data.

Manage large-scale web crawls

Initiates bulk data gathering jobs across multiple pages and allows monitoring of the job's progress and status.

Extract data using AI (Magic API)

Runs the automated Magic API against any URL to identify and pull structured data, even without a pre-configured extractor.

Download extracted data as CSV

Retrieves the final extraction results and formats them as downloadable CSV text for spreadsheet use.

Check job status and progress

Polls the status of an ongoing crawl or extraction run, providing metrics like pages processed and success rates.

View API credit usage

Checks your current Import.io account usage against your monthly API credit limit.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Import.io (Web Data Extraction) MCP Server: 10 Tools

These tools allow you to manage every stage of web data extraction, from initiating a crawl job to downloading the final, structured CSV output.

account019d75b8

account usage

Checks your current Import.io account API credit usage.

download019d75b8

download csv

Downloads extracted data directly as CSV text, suitable for spreadsheet processing.

get019d75b8

get crawl data

Retrieves the full JSON output of a completed Import.io crawl job.

get019d75b8

get crawl status

Checks the progress, success rate, and current state of an ongoing Import.io crawl job.

get019d75b8

get extractor data

Retrieves structured JSON data from a completed Import.io extraction run.

get019d75b8

get extractor status

Checks the status and metadata of a specific Import.io extraction run.

list019d75b8

list extractors

Lists all data extractors configured within your Import.io account.

run019d75b8

run extractor

Triggers a specific Import.io extractor for a defined URL, which starts an asynchronous data run.

run019d75b8

run magic api

Runs the automated Magic API against a URL to extract data without needing a pre-configured extractor.

start019d75b8

start crawl

Starts a large-scale, multi-page data extraction job across multiple URLs.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Import.io (Web Data Extraction), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Yo, this server lets your AI agent handle all your web data extraction and massive scraping needs. You're gonna get clean JSON data by triggering predefined extractors against specific URLs. You can kick off large-scale crawls across many pages and watch the progress happen in real-time. Need to pull structured data from some site but you ain't set up an extractor? You run the Magic API against any URL to pull that data without pre-work.

You'll list all data extractors configured in your account. You can run a specific extractor for a defined URL, which starts an asynchronous data run. You'll get the resulting structured JSON data from a completed extraction run using get_extractor_data, and you can check the status and metadata of that run with get_extractor_status.

If you're doing a big crawl, you start a large-scale, multi-page data extraction job using start_crawl. You can track the job's progress, success rate, and current state with get_crawl_status, and once it's done, you retrieve the full JSON output of that crawl job with get_crawl_data. When you're done with the data, you can download the extracted data directly as CSV text using download_csv.

You can always check your current account API credit usage with account_usage.

How Import.io MCP Works

1 Subscribe to the Import.io server and provide your API Key.
2 Tell your agent what you need (e.g., 'Run extractor X on URL Y' or 'Start a crawl on site Z').
3 The agent triggers the job, and you use subsequent tools (like get_crawl_status) to monitor the process until the data is ready for retrieval.

The bottom line is, you tell your agent the data you want, and the agent manages the entire extraction lifecycle from start to finish.

Who Is Import.io MCP For?

This is for data analysts and market researchers who spend too much time manually copy-pasting data from websites. You're the product manager who needs to verify a competitor's pricing structure across dozens of pages, but you don't have time to write custom Python scraping scripts. You need a repeatable, auditable data pipeline run directly from your chat interface.

Market Researcher

Runs large-scale competitor monitoring and web audits across multiple domains without writing manual scraping scripts.

Data Analyst

Automates the collection of market data and pricing intelligence by triggering specific, known extractors against target URLs.

Product Manager

Verifies data extraction schemas and monitors crawler health across several concurrent projects, tracking progress and usage limits.

What Changes When You Connect

You get clean, structured JSON data immediately by using run_extractor against a specific URL. This means you don't have to manually parse HTML; the tool handles the structure for you.
Manage large sites without writing complex code. Just run start_crawl to cover multiple pages, and then use get_crawl_status to track exactly how far along the job is.
Need to scrape something fast? run_magic_api is the answer. It automatically finds and extracts tabular data from any site, even if you never built an extractor for it.
When the job is done, download_csv grabs the results. The data is immediately in CSV format, ready to dump into a spreadsheet or database.
You can monitor your budget using account_usage. This keeps your data extraction runs within your planned API credit limits, so you don't hit a paywall mid-project.
Use list_extractors to see every extractor you've set up. This helps you confirm the correct ID before you try to run a specific job with run_extractor.

Real-World Use Cases

Monitoring a competitor's entire product line

A market researcher needs to track pricing and features across 50 different product pages. They use start_crawl to cover the entire site range. They then check the progress using get_crawl_status until all pages are processed. Finally, they call download_csv to get a clean spreadsheet of all the data.

Quickly analyzing a new industry report

A data analyst finds a PDF-heavy website and needs to grab a table of contents or key metrics. They don't have a specific extractor. They use run_magic_api on the main URL, which automatically identifies the tables and sends the structured data back.

Validating a complex data schema

A product manager needs to confirm that a specific company's annual report has the correct revenue and profit structure. They first use list_extractors to see if an existing extractor works, then use run_extractor to trigger it on the report URL. They check the result with get_extractor_data.

Auditing data extraction costs

A data science team is running multiple parallel experiments. Before starting a big job, they check account_usage to see how many API credits they have left. This prevents unexpected billing spikes and keeps the project on budget.

The Tradeoffs

Trying to scrape massive sites page by page

Calling run_extractor repeatedly for every single URL on a site. This is slow, inefficient, and you'll miss pages in between, making the dataset incomplete.

→ Instead, use start_crawl to define the scope of the whole site. This starts a single, managed job that covers all pages concurrently, and you track it with get_crawl_status.

Assuming all data needs a custom extractor

Getting stuck because the data isn't in a clean format and you think you need to build a bespoke extractor. This wastes time and only works if the data is perfectly structured.

→ Try run_magic_api first. It handles unstructured exploration and can pull data from any website without you having to pre-build a dedicated extractor.

Ignoring job status checks

Triggering a job (like run_extractor) and then immediately trying to retrieve the data using get_extractor_data before the job is done. This fails because the run is still in progress.

→ After running the job, always check the status first using get_extractor_status. Only when the status shows 'completed' should you attempt to call get_extractor_data.

When It Fits, When It Doesn't

Use this if your goal is high-volume, reliable data acquisition from the web. Specifically, use start_crawl when you need to cover hundreds of pages. Use run_extractor when you know the exact data schema and have a pre-built extractor. Use run_magic_api when you're exploring a site and don't know what data is there—it's for discovery. Don't use this if you are trying to scrape a private, authenticated internal network; this tool requires public web access. If your primary need is to clean or transform data that is already in a database, use a dedicated database connector tool instead.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Import.io. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

account_usage download_csv get_crawl_data get_crawl_status get_extractor_data get_extractor_status list_extractors run_extractor run_magic_api start_crawl

Manually scraping data is a time sink.

Today, pulling data from a competitor's site means opening dozens of tabs. You click into the pricing page, copy the price, switch tabs, find the feature list, and paste it into a spreadsheet. You repeat this process, praying you don't miss a page or copy the wrong column.

With the Import.io MCP Server, you tell your agent to crawl the site. It handles the hundreds of clicks and the data parsing automatically. You get a clean, structured JSON output that you can immediately download as a CSV.

Import.io (Web Data Extraction) MCP Server: get structured data.

You eliminate the need for writing boilerplate Python scripts just to manage HTTP requests and parse basic HTML tags. You don't need to worry about JavaScript rendering or rate limits; the server handles that infrastructure layer.

It's not just about scraping; it's about building a reliable, repeatable data pipeline right from your chat. The process is managed, the status is visible, and the output is always ready for use.

Common Questions About Import.io MCP

How do I use the Import.io (Web Data Extraction) MCP Server to scrape multiple pages? +

You use the start_crawl tool. This initiates a single job that covers all pages concurrently. You then use get_crawl_status to track its progress until it completes.

Is `run_magic_api` the same as `run_extractor`? +

No. run_extractor uses a specific, pre-built template you defined. run_magic_api is for quick exploration; it automatically finds and extracts data from any site without needing a pre-configured extractor.

What should I do with the data after the extraction is done? +

Once the job is done, use download_csv to get the data as a CSV file, or use get_crawl_data to get the raw JSON output for further processing in your agent.

How do I check if my API credits are low using Import.io (Web Data Extraction) MCP Server? +

Call the account_usage tool. This instantly tells you your API credit consumption against your monthly limit.

How do I check the progress of a job using the `get_crawl_status` tool? +

Use get_crawl_status with the crawl job ID. This tool tells you the job's current state, how many pages it's processed, and its success rate. You can monitor large crawls in real-time.

What should I use to find the correct extractor ID before running it with `run_extractor`? +

First, call the list_extractors tool. This gives you a list of all extractors configured in your account. Once you have the ID, you can use it with run_extractor to start the data pull.

If my extraction fails, how do I debug the issue using the `get_extractor_status` tool? +

The get_extractor_status tool checks the run's current state (running, completed, or failed) and provides metadata about the run. This helps you see why a job failed and what needs fixing.

Is there a way to download the data I get from `get_crawl_data` for my spreadsheet? +

Yes, use the download_csv tool. This function downloads the extraction data directly as CSV text, which is perfect for pasting into any spreadsheet program.

Can I extract data from a website without a pre-configured extractor? +

Yes. Use the run_magic_api tool. It uses Import.io's AI logic to automatically detect and extract structured or tabular data from any URL, making it ideal for quick exploration of new data sources.

How do I monitor the progress of a bulk crawl job? +

Use the get_crawl_status tool by providing the Crawl ID returned when you started the job. Your agent will report the current state, number of pages processed, and success rate in real-time.

Can I get my extracted data in CSV format for spreadsheet analysis? +

Absolutely. Use the download_csv tool with a completed Run ID. Your agent will retrieve the extraction data in CSV format, perfect for processing in tools like Excel or Google Sheets.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript