Import.io MCP. Get structured data from any website, conversationally.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Import.io (Web Data Extraction) MCP Server handles web data extraction and large-scale scraping. Trigger predefined extractors for specific URLs to get clean JSON data.
Start bulk crawls across many pages and monitor progress in real-time. Use the Magic API to pull structured data from any site without setting up extractors.
What your AI agents can do
Account usage
Checks your current Import.io account API credit usage.
Download csv
Downloads extracted data directly as CSV text, suitable for spreadsheet processing.
Get crawl data
Retrieves the full JSON output of a completed Import.io crawl job.
Triggers a specific, pre-configured data extractor against a target URL and returns the resulting structured JSON data.
Initiates bulk data gathering jobs across multiple pages and allows monitoring of the job's progress and status.
Runs the automated Magic API against any URL to identify and pull structured data, even without a pre-configured extractor.
Retrieves the final extraction results and formats them as downloadable CSV text for spreadsheet use.
Polls the status of an ongoing crawl or extraction run, providing metrics like pages processed and success rates.
Checks your current Import.io account usage against your monthly API credit limit.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Import.io (Web Data Extraction) MCP Server: 10 Tools
These tools allow you to manage every stage of web data extraction, from initiating a crawl job to downloading the final, structured CSV output.
019d75b8account usage
Checks your current Import.io account API credit usage.
019d75b8download csv
Downloads extracted data directly as CSV text, suitable for spreadsheet processing.
019d75b8get crawl data
Retrieves the full JSON output of a completed Import.io crawl job.
019d75b8get crawl status
Checks the progress, success rate, and current state of an ongoing Import.io crawl job.
019d75b8get extractor data
Retrieves structured JSON data from a completed Import.io extraction run.
019d75b8get extractor status
Checks the status and metadata of a specific Import.io extraction run.
019d75b8list extractors
Lists all data extractors configured within your Import.io account.
019d75b8run extractor
Triggers a specific Import.io extractor for a defined URL, which starts an asynchronous data run.
019d75b8run magic api
Runs the automated Magic API against a URL to extract data without needing a pre-configured extractor.
019d75b8start crawl
Starts a large-scale, multi-page data extraction job across multiple URLs.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Import.io (Web Data Extraction), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Yo, this server lets your AI agent handle all your web data extraction and massive scraping needs. You're gonna get clean JSON data by triggering predefined extractors against specific URLs. You can kick off large-scale crawls across many pages and watch the progress happen in real-time. Need to pull structured data from some site but you ain't set up an extractor? You run the Magic API against any URL to pull that data without pre-work.
You'll list all data extractors configured in your account. You can run a specific extractor for a defined URL, which starts an asynchronous data run. You'll get the resulting structured JSON data from a completed extraction run using get_extractor_data, and you can check the status and metadata of that run with get_extractor_status.
If you're doing a big crawl, you start a large-scale, multi-page data extraction job using start_crawl. You can track the job's progress, success rate, and current state with get_crawl_status, and once it's done, you retrieve the full JSON output of that crawl job with get_crawl_data. When you're done with the data, you can download the extracted data directly as CSV text using download_csv.
You can always check your current account API credit usage with account_usage.
How Import.io MCP Works
- 1 Subscribe to the Import.io server and provide your API Key.
- 2 Tell your agent what you need (e.g., 'Run extractor X on URL Y' or 'Start a crawl on site Z').
- 3 The agent triggers the job, and you use subsequent tools (like
get_crawl_status) to monitor the process until the data is ready for retrieval.
The bottom line is, you tell your agent the data you want, and the agent manages the entire extraction lifecycle from start to finish.
Who Is Import.io MCP For?
This is for data analysts and market researchers who spend too much time manually copy-pasting data from websites. You're the product manager who needs to verify a competitor's pricing structure across dozens of pages, but you don't have time to write custom Python scraping scripts. You need a repeatable, auditable data pipeline run directly from your chat interface.
Runs large-scale competitor monitoring and web audits across multiple domains without writing manual scraping scripts.
Automates the collection of market data and pricing intelligence by triggering specific, known extractors against target URLs.
Verifies data extraction schemas and monitors crawler health across several concurrent projects, tracking progress and usage limits.
What Changes When You Connect
- You get clean, structured JSON data immediately by using
run_extractoragainst a specific URL. This means you don't have to manually parse HTML; the tool handles the structure for you. - Manage large sites without writing complex code. Just run
start_crawlto cover multiple pages, and then useget_crawl_statusto track exactly how far along the job is. - Need to scrape something fast?
run_magic_apiis the answer. It automatically finds and extracts tabular data from any site, even if you never built an extractor for it. - When the job is done,
download_csvgrabs the results. The data is immediately in CSV format, ready to dump into a spreadsheet or database. - You can monitor your budget using
account_usage. This keeps your data extraction runs within your planned API credit limits, so you don't hit a paywall mid-project. - Use
list_extractorsto see every extractor you've set up. This helps you confirm the correct ID before you try to run a specific job withrun_extractor.
Real-World Use Cases
Monitoring a competitor's entire product line
A market researcher needs to track pricing and features across 50 different product pages. They use start_crawl to cover the entire site range. They then check the progress using get_crawl_status until all pages are processed. Finally, they call download_csv to get a clean spreadsheet of all the data.
Quickly analyzing a new industry report
A data analyst finds a PDF-heavy website and needs to grab a table of contents or key metrics. They don't have a specific extractor. They use run_magic_api on the main URL, which automatically identifies the tables and sends the structured data back.
Validating a complex data schema
A product manager needs to confirm that a specific company's annual report has the correct revenue and profit structure. They first use list_extractors to see if an existing extractor works, then use run_extractor to trigger it on the report URL. They check the result with get_extractor_data.
Auditing data extraction costs
A data science team is running multiple parallel experiments. Before starting a big job, they check account_usage to see how many API credits they have left. This prevents unexpected billing spikes and keeps the project on budget.
The Tradeoffs
Trying to scrape massive sites page by page
Calling run_extractor repeatedly for every single URL on a site. This is slow, inefficient, and you'll miss pages in between, making the dataset incomplete.
→
Instead, use start_crawl to define the scope of the whole site. This starts a single, managed job that covers all pages concurrently, and you track it with get_crawl_status.
Assuming all data needs a custom extractor
Getting stuck because the data isn't in a clean format and you think you need to build a bespoke extractor. This wastes time and only works if the data is perfectly structured.
→
Try run_magic_api first. It handles unstructured exploration and can pull data from any website without you having to pre-build a dedicated extractor.
Ignoring job status checks
Triggering a job (like run_extractor) and then immediately trying to retrieve the data using get_extractor_data before the job is done. This fails because the run is still in progress.
→
After running the job, always check the status first using get_extractor_status. Only when the status shows 'completed' should you attempt to call get_extractor_data.
When It Fits, When It Doesn't
Use this if your goal is high-volume, reliable data acquisition from the web. Specifically, use start_crawl when you need to cover hundreds of pages. Use run_extractor when you know the exact data schema and have a pre-built extractor. Use run_magic_api when you're exploring a site and don't know what data is there—it's for discovery. Don't use this if you are trying to scrape a private, authenticated internal network; this tool requires public web access. If your primary need is to clean or transform data that is already in a database, use a dedicated database connector tool instead.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Import.io. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Manually scraping data is a time sink.
Today, pulling data from a competitor's site means opening dozens of tabs. You click into the pricing page, copy the price, switch tabs, find the feature list, and paste it into a spreadsheet. You repeat this process, praying you don't miss a page or copy the wrong column.
With the Import.io MCP Server, you tell your agent to crawl the site. It handles the hundreds of clicks and the data parsing automatically. You get a clean, structured JSON output that you can immediately download as a CSV.
Import.io (Web Data Extraction) MCP Server: get structured data.
You eliminate the need for writing boilerplate Python scripts just to manage HTTP requests and parse basic HTML tags. You don't need to worry about JavaScript rendering or rate limits; the server handles that infrastructure layer.
It's not just about scraping; it's about building a reliable, repeatable data pipeline right from your chat. The process is managed, the status is visible, and the output is always ready for use.
Common Questions About Import.io MCP
How do I use the Import.io (Web Data Extraction) MCP Server to scrape multiple pages? +
You use the start_crawl tool. This initiates a single job that covers all pages concurrently. You then use get_crawl_status to track its progress until it completes.
Is `run_magic_api` the same as `run_extractor`? +
No. run_extractor uses a specific, pre-built template you defined. run_magic_api is for quick exploration; it automatically finds and extracts data from any site without needing a pre-configured extractor.
What should I do with the data after the extraction is done? +
Once the job is done, use download_csv to get the data as a CSV file, or use get_crawl_data to get the raw JSON output for further processing in your agent.
How do I check if my API credits are low using Import.io (Web Data Extraction) MCP Server? +
Call the account_usage tool. This instantly tells you your API credit consumption against your monthly limit.
How do I check the progress of a job using the `get_crawl_status` tool? +
Use get_crawl_status with the crawl job ID. This tool tells you the job's current state, how many pages it's processed, and its success rate. You can monitor large crawls in real-time.
What should I use to find the correct extractor ID before running it with `run_extractor`? +
First, call the list_extractors tool. This gives you a list of all extractors configured in your account. Once you have the ID, you can use it with run_extractor to start the data pull.
If my extraction fails, how do I debug the issue using the `get_extractor_status` tool? +
The get_extractor_status tool checks the run's current state (running, completed, or failed) and provides metadata about the run. This helps you see why a job failed and what needs fixing.
Is there a way to download the data I get from `get_crawl_data` for my spreadsheet? +
Yes, use the download_csv tool. This function downloads the extraction data directly as CSV text, which is perfect for pasting into any spreadsheet program.
Can I extract data from a website without a pre-configured extractor? +
Yes. Use the run_magic_api tool. It uses Import.io's AI logic to automatically detect and extract structured or tabular data from any URL, making it ideal for quick exploration of new data sources.
How do I monitor the progress of a bulk crawl job? +
Use the get_crawl_status tool by providing the Crawl ID returned when you started the job. Your agent will report the current state, number of pages processed, and success rate in real-time.
Can I get my extracted data in CSV format for spreadsheet analysis? +
Absolutely. Use the download_csv tool with a completed Run ID. Your agent will retrieve the extraction data in CSV format, perfect for processing in tools like Excel or Google Sheets.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
LibreTranslate API
Translate and detect text — audit languages via AI.
Hugging Face
Access thousands of pre-trained AI models for NLP, vision, and audio tasks with the largest open-source machine learning hub.
Elemeno
Equip your AI agent to manage content collections, track singletons, and monitor items via the Elemeno CMS API.
You might also like
Circle.so
Manage online communities via Circle — track members, monitor posts, and manage spaces directly from any AI agent.
Magento (Adobe Commerce)
Manage e-commerce via Magento (Adobe Commerce) — search products, track orders, and audit customer data.
TripAdvisor
Search hotels, restaurants, and attractions via TripAdvisor Content API — get reviews, ratings, and POI details.