Apify MCP. Scrape websites and manage data extraction flows.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Apify connects your AI agent to a full-stack web scraping platform. Use it to run custom scrapers, extract structured JSON data from entire websites, and manage large-scale data collection jobs.
You can monitor usage limits, read cached screenshots, and dynamically push URLs to active scraping queues, all through conversation.
What your AI agents can do
Abort run
Stops an active Apify actor run, preserving any data already collected.
Get account limits
Checks your Apify subscription limits and current compute unit usage.
Get dataset items
Exports structured JSON data from an Apify dataset, supporting bulk downloads and pagination.
You tell the agent to run a specific scraper bot (Actor) with custom inputs, and the job starts immediately.
You ask the agent for the structured data from a finished run, receiving it as JSON records.
You check the status and metadata of an active or completed scraping job to track its progress.
You retrieve specific files, like screenshots or configuration inputs, saved during the scraping run.
You instruct the agent to stop a runaway scrape or push new URLs to an active job's queue.
You verify your current compute unit usage and subscription limits to manage costs.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
019d754fabort run
Stops an active Apify actor run, preserving any data already collected.
019d754fget account limits
Checks your Apify subscription limits and current compute unit usage.
019d754fget dataset items
Exports structured JSON data from an Apify dataset, supporting bulk downloads and pagination.
019d754fget key value store
Retrieves arbitrary files, like screenshots or configuration data, stored within a run's key-value store.
019d754fget run
Polls the status and metadata of a specific Apify actor run to track its progress.
019d754flist actors
Lists all accessible scrapers (Actors) in your Apify account, including IDs for running them.
019d754flist webhooks
Lists all configured webhooks, which enable event-driven notifications when runs succeed or fail.
019d754fpush to queue
Adds new URLs to an active scraping job's queue, allowing the job to crawl newly discovered pages.
019d754frun actor
Starts an Apify actor asynchronously with custom inputs, returning a run ID immediately.
019d754frun actor sync
Runs an Apify actor and waits for it to finish before returning, best for short, simple scrapes.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Apify, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Apify connects your AI agent to a full-stack web scraping platform. You'll use it to run custom scrapers, pull structured JSON data from entire websites, and manage large-scale data collection jobs. Through conversation, you can monitor usage limits, read cached screenshots, and dynamically push URLs to active scraping queues.
To start a scrape, you'll tell your agent to run a specific scraper bot (Actor) with custom inputs, and the job starts immediately. You'll use run_actor to start a job asynchronously, or run_actor_sync if you need a quick result and don't want to wait for a separate check. To pull the structured data from a finished run, you'll ask for the records and use get_dataset_items to get the JSON.
You can check the status and metadata of any scraping job using get_run, which lets you track progress. You'll pull specific files, like screenshots or config data, from the run's key-value store using get_key_value_store. You'll manage the job by adding new URLs to a running queue with push_to_queue, or stopping a runaway scrape with abort_run.
You can check your compute unit usage and subscription limits with get_account_limits. You'll list all available scrapers with list_actors, and if you need to know what event notifications are set up, you'll use list_webhooks. You'll also get a list of all configured webhooks using list_webhooks.
How Apify MCP Works
- 1 First, you connect your API token to the server. The agent gains permission to interact with your Apify account.
- 2 Next, you prompt the agent to perform an action, like listing available scrapers or running a specific actor via
run_actor. - 3 Finally, the agent executes the necessary tools, manages the data flow, and returns the scraped results (e.g., using
get_dataset_items) to your chat context.
The bottom line is, your agent handles the complexity of the scraping API calls, letting you focus only on the data and the results.
Who Is Apify MCP For?
This is for the Data Engineer who needs to run scheduled, large-scale data extractions without writing boilerplate Python scripts. It's for the Market Researcher who needs to compile product price lists from multiple competitor sites. And it's for the AI Developer who needs to feed massive amounts of structured web data into an agent's context window for RAG.
Runs complex, scheduled extraction logic and maps Apify objects directly into conversational QA checks.
Commands the agent to scrape product prices using Apify actors, then compiles the resulting JSON datasets into readable markdown tables.
Augments their agent's real-time capabilities by feeding it massive structured site data freshly scraped via headless browsers.
What Changes When You Connect
- Get Structured Data: Use
get_dataset_itemsto pull detailed, structured JSON records from a completed scrape. You get clean data ready for immediate use, not messy HTML. - Control Long Jobs: If a scrape gets stuck or hits a dead end, use
abort_runto stop it immediately. You also useget_runto track progress and see exactly what stage the job is in. - Scale Data Collection: Start large-scale jobs asynchronously with
run_actor. This lets the agent kick off a scrape and continue talking to you while the data is being collected in the background. - Keep Track of Costs: Use
get_account_limitsto check your Compute Units and proxy bandwidth before kicking off a massive run. You won't get surprise overage charges. - Build Dynamic Crawlers: If a scraper finds new links mid-run, you can use
push_to_queueto dynamically feed those new URLs back into the active job, extending its reach. - Retrieve Assets: Need a screenshot or a config file? Use
get_key_value_storeto pull specific, non-data assets from the run's storage.
Real-World Use Cases
Competitor Price Monitoring
A market researcher needs to track competitor pricing across 50 pages. They ask the agent to list_actors to find a product scraper. The agent uses run_actor to start the job, monitors it with get_run, and finally uses get_dataset_items to pull all the compiled JSON data into a table.
Website Content Audit
A developer needs to check the structure of a competitor's entire site. They use list_actors to find a site crawler, start it, and then use get_key_value_store to retrieve sample screenshots for manual review.
Handling Unexpected Links
A scraper is running, but it finds a promising new category page not in the original list. The developer uses push_to_queue to tell the agent to add the new URL to the active run, continuing the crawl until all related pages are covered.
Running a Quick Test Scrape
You only need to test a small scraper for 5 minutes. You use run_actor_sync. The agent waits for the result and immediately gives you the data via get_dataset_items without needing to poll.
The Tradeoffs
Assuming the API is simple
The user thinks they just need one tool call. They try to run a huge scrape and then immediately call get_dataset_items before the job finishes, getting an empty or incomplete dataset.
→
You must use run_actor to start the job. Then, periodically check the status with get_run until the status is SUCCEEDED. Only then should you call get_dataset_items to retrieve the data.
Overlooking job boundaries
The user runs a job, gets the run ID, and then forgets to save the ID. Later, they can't track the job status or retrieve the resulting data because they don't have the correct run ID.
→
Always save the runId returned by run_actor or run_actor_sync. This ID is required for get_run, get_dataset_items, and abort_run.
Ignoring cost limits
The user runs multiple massive scrapes back-to-back without checking usage, leading to unexpected billing overages.
→
Always check get_account_limits first. This tool tells you exactly how many Compute Units and how much proxy bandwidth you have left before starting a resource-heavy task.
When It Fits, When It Doesn't
Use this if you need to systematically extract structured data from the live web. This server is essential when your data source is a website and you need to perform complex, multi-step operations (start -> poll -> extract). Don't use it if you are only trying to read a single, static API endpoint; use a direct API call instead. If your goal is merely to check the status of a service, use get_run instead of trying to run a new actor. If you only need to list available scrapers, use list_actors; don't try to scrape without first listing what's available.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Apify. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Collecting data from websites is a mess of tabs and copy-pasting.
Today, collecting data means opening a target site, clicking through dozens of pages, right-clicking to save images, and then manually copying every price and name into a spreadsheet. You spend hours just navigating and cleaning up the data.
With the Apify MCP Server, your agent runs the entire workflow. You tell it to scrape the site, and it handles the clicks, the pagination, and the data extraction. You get clean, structured JSON data ready for your analysis.
Apify MCP Server: Get and control scraped data.
Before this server, if a scrape failed or hit a dead end, you were stuck. You had to manually figure out which part of the process failed and re-run it. You couldn't dynamically add new URLs or check why the job stopped.
Now, the agent lets you use `get_run` to check the status and `push_to_queue` to fix the job mid-stream. You control the entire process, from start to finish.
Common Questions About Apify MCP
How do I start a scrape using the Apify MCP Server? +
You start by using run_actor. This tool takes the scraper ID and initial configuration. The agent returns a run ID immediately, which you then use to monitor the job.
Is `get_dataset_items` the only way to get the data? +
No. You can also use get_key_value_store to retrieve non-tabular assets like screenshots or configuration files. get_dataset_items is specifically for the structured JSON records.
Can I stop a running scrape with the Apify MCP Server? +
Yes, you use the abort_run tool. This stops the job immediately, and any data successfully scraped before the stop is preserved.
What is the difference between `run_actor` and `run_actor_sync`? +
run_actor runs the scrape in the background and gives you a job ID. run_actor_sync waits until the scrape finishes before giving you a result. Use the async version for anything that takes longer than a few minutes.
How do I check my usage limits using the `get_account_limits` tool? +
You use get_account_limits to check your account's consumption status. This tool reports your current usage of Compute Units and proxy bandwidth, helping you avoid overage charges.
What is the purpose of `get_key_value_store` in an Apify run? +
get_key_value_store retrieves arbitrary data stored by the actor. You use this for non-structured outputs like screenshots, configuration files, or intermediate results.
How do I monitor the progress of a long-running scrape using `get_run`? +
You poll the get_run endpoint with the run ID. This gives you the current status and metadata of the actor run, letting you track if it succeeded or failed.
Can I dynamically add new URLs to an active scrape using `push_to_queue`? +
Yes, you use push_to_queue to feed new URLs into the active request queue. This enables dynamic crawling when new pages are discovered during the run.
How can the AI agent run a scrape on a list of product URLs? +
First, find your specific scraping Actor ID via list_actors. Then, prompt your agent to execute run_actor, providing the target URLs formatted as a structured JSON input payload. It returns a 'Run ID'. You can poll this run via get_run, and once it succeeds, the agent calls get_dataset_items to pull all acquired data straight to your window.
Can the agent interact with run configurations mid-way during crawling? +
Yes. If an Apify crawler is currently executing and utilizes a Request Queue, you can instruct your agent to call push_to_queue. Doing so dynamically ships new URLs to the active queue instance, extending the current web crawl without needing to stop or restart the Actor.
Can my AI automatically detect scraping timeouts and debug the failure? +
Absolutely. Because your agent can track real execution flows with get_run, it's aware if it transitions to TIMED-OUT or FAILED states. Subsequently, you can ask the agent to examine the KV Store log outputs ensuring the underlying issue (e.g. captcha block, blocking proxy) is identified immediately.
Multi-server workflows that include Apify MCP
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
Cypress Cloud
Audit E2E testing via Cypress — monitor test runs, inspect spec instances, track flaky tests, and generate enterprise reports directly from any AI agent.
GitLab
Manage projects, track issues, and oversee CI/CD pipelines via AI agents with GitLab.
ConfigCat
Manage feature flags and remote configurations via ConfigCat — list environments, create settings, and toggle features directly from your AI agent.
You might also like
OneLocal LocalReviews
Boost your local business visibility with automated review collection, reputation monitoring, and response management tools.
Amilia
Recreation and activity management — manage programs, accounts, and registrations via AI.
360Learning
Collaborative learning platform — manage users, courses, paths, and training statistics via AI.