Apify MCP. Scrape websites and manage data extraction flows.

Q: How do I start a scrape using the Apify MCP Server?

You start by using runactor. This tool takes the scraper ID and initial configuration. The agent returns a run ID immediately, which you then use to monitor the job.

Q: Is getdatasetitems the only way to get the data?

No. You can also use getkeyvaluestore to retrieve non-tabular assets like screenshots or configuration files. getdatasetitems is specifically for the structured JSON records.

Q: Can I stop a running scrape with the Apify MCP Server?

Yes, you use the abortrun tool. This stops the job immediately, and any data successfully scraped before the stop is preserved.

Q: What is the difference between runactor and runactorsync?

runactor runs the scrape in the background and gives you a job ID. runactorsync waits until the scrape finishes before giving you a result. Use the async version for anything that takes longer than a few minutes.

Q: How do I check my usage limits using the getaccountlimits tool?

You use getaccountlimits to check your account's consumption status. This tool reports your current usage of Compute Units and proxy bandwidth, helping you avoid overage charges.

Q: What is the purpose of getkeyvaluestore in an Apify run?

getkeyvaluestore retrieves arbitrary data stored by the actor. You use this for non-structured outputs like screenshots, configuration files, or intermediate results.

Q: How do I monitor the progress of a long-running scrape using getrun?

You poll the getrun endpoint with the run ID. This gives you the current status and metadata of the actor run, letting you track if it succeeded or failed.

Q: Can I dynamically add new URLs to an active scrape using pushtoqueue?

Yes, you use pushtoqueue to feed new URLs into the active request queue. This enables dynamic crawling when new pages are discovered during the run.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Apify connects your AI agent to a full-stack web scraping platform. Use it to run custom scrapers, extract structured JSON data from entire websites, and manage large-scale data collection jobs.

You can monitor usage limits, read cached screenshots, and dynamically push URLs to active scraping queues, all through conversation.

What your AI agents can do

Abort run

Stops an active Apify actor run, preserving any data already collected.

Get account limits

Checks your Apify subscription limits and current compute unit usage.

Get dataset items

Exports structured JSON data from an Apify dataset, supporting bulk downloads and pagination.

+ 7 more capabilities included

Start a Web Scrape

You tell the agent to run a specific scraper bot (Actor) with custom inputs, and the job starts immediately.

Get Scraped Data

You ask the agent for the structured data from a finished run, receiving it as JSON records.

Monitor Job Status

You check the status and metadata of an active or completed scraping job to track its progress.

Access Cached Assets

You retrieve specific files, like screenshots or configuration inputs, saved during the scraping run.

Adjust Running Jobs

You instruct the agent to stop a runaway scrape or push new URLs to an active job's queue.

Check Account Limits

You verify your current compute unit usage and subscription limits to manage costs.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

abort019d754f

abort run

Stops an active Apify actor run, preserving any data already collected.

get019d754f

get account limits

Checks your Apify subscription limits and current compute unit usage.

get019d754f

get dataset items

Exports structured JSON data from an Apify dataset, supporting bulk downloads and pagination.

get019d754f

get key value store

Retrieves arbitrary files, like screenshots or configuration data, stored within a run's key-value store.

get019d754f

get run

Polls the status and metadata of a specific Apify actor run to track its progress.

list019d754f

list actors

Lists all accessible scrapers (Actors) in your Apify account, including IDs for running them.

list019d754f

list webhooks

Lists all configured webhooks, which enable event-driven notifications when runs succeed or fail.

push019d754f

push to queue

Adds new URLs to an active scraping job's queue, allowing the job to crawl newly discovered pages.

run019d754f

run actor

Starts an Apify actor asynchronously with custom inputs, returning a run ID immediately.

run019d754f

run actor sync

Runs an Apify actor and waits for it to finish before returning, best for short, simple scrapes.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Apify, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Apify connects your AI agent to a full-stack web scraping platform. You'll use it to run custom scrapers, pull structured JSON data from entire websites, and manage large-scale data collection jobs. Through conversation, you can monitor usage limits, read cached screenshots, and dynamically push URLs to active scraping queues.

To start a scrape, you'll tell your agent to run a specific scraper bot (Actor) with custom inputs, and the job starts immediately. You'll use run_actor to start a job asynchronously, or run_actor_sync if you need a quick result and don't want to wait for a separate check. To pull the structured data from a finished run, you'll ask for the records and use get_dataset_items to get the JSON.

You can check the status and metadata of any scraping job using get_run, which lets you track progress. You'll pull specific files, like screenshots or config data, from the run's key-value store using get_key_value_store. You'll manage the job by adding new URLs to a running queue with push_to_queue, or stopping a runaway scrape with abort_run.

You can check your compute unit usage and subscription limits with get_account_limits. You'll list all available scrapers with list_actors, and if you need to know what event notifications are set up, you'll use list_webhooks. You'll also get a list of all configured webhooks using list_webhooks.

How Apify MCP Works

1 First, you connect your API token to the server. The agent gains permission to interact with your Apify account.
2 Next, you prompt the agent to perform an action, like listing available scrapers or running a specific actor via run_actor.
3 Finally, the agent executes the necessary tools, manages the data flow, and returns the scraped results (e.g., using get_dataset_items) to your chat context.

The bottom line is, your agent handles the complexity of the scraping API calls, letting you focus only on the data and the results.

Who Is Apify MCP For?

This is for the Data Engineer who needs to run scheduled, large-scale data extractions without writing boilerplate Python scripts. It's for the Market Researcher who needs to compile product price lists from multiple competitor sites. And it's for the AI Developer who needs to feed massive amounts of structured web data into an agent's context window for RAG.

Data Engineer

Runs complex, scheduled extraction logic and maps Apify objects directly into conversational QA checks.

Market Researcher

Commands the agent to scrape product prices using Apify actors, then compiles the resulting JSON datasets into readable markdown tables.

AI Developer

Augments their agent's real-time capabilities by feeding it massive structured site data freshly scraped via headless browsers.

What Changes When You Connect

Get Structured Data: Use get_dataset_items to pull detailed, structured JSON records from a completed scrape. You get clean data ready for immediate use, not messy HTML.
Control Long Jobs: If a scrape gets stuck or hits a dead end, use abort_run to stop it immediately. You also use get_run to track progress and see exactly what stage the job is in.
Scale Data Collection: Start large-scale jobs asynchronously with run_actor. This lets the agent kick off a scrape and continue talking to you while the data is being collected in the background.
Keep Track of Costs: Use get_account_limits to check your Compute Units and proxy bandwidth before kicking off a massive run. You won't get surprise overage charges.
Build Dynamic Crawlers: If a scraper finds new links mid-run, you can use push_to_queue to dynamically feed those new URLs back into the active job, extending its reach.
Retrieve Assets: Need a screenshot or a config file? Use get_key_value_store to pull specific, non-data assets from the run's storage.

Real-World Use Cases

Competitor Price Monitoring

A market researcher needs to track competitor pricing across 50 pages. They ask the agent to list_actors to find a product scraper. The agent uses run_actor to start the job, monitors it with get_run, and finally uses get_dataset_items to pull all the compiled JSON data into a table.

Website Content Audit

A developer needs to check the structure of a competitor's entire site. They use list_actors to find a site crawler, start it, and then use get_key_value_store to retrieve sample screenshots for manual review.

Handling Unexpected Links

A scraper is running, but it finds a promising new category page not in the original list. The developer uses push_to_queue to tell the agent to add the new URL to the active run, continuing the crawl until all related pages are covered.

Running a Quick Test Scrape

You only need to test a small scraper for 5 minutes. You use run_actor_sync. The agent waits for the result and immediately gives you the data via get_dataset_items without needing to poll.

The Tradeoffs

Assuming the API is simple

The user thinks they just need one tool call. They try to run a huge scrape and then immediately call get_dataset_items before the job finishes, getting an empty or incomplete dataset.

→ You must use run_actor to start the job. Then, periodically check the status with get_run until the status is SUCCEEDED. Only then should you call get_dataset_items to retrieve the data.

Overlooking job boundaries

The user runs a job, gets the run ID, and then forgets to save the ID. Later, they can't track the job status or retrieve the resulting data because they don't have the correct run ID.

→ Always save the runId returned by run_actor or run_actor_sync. This ID is required for get_run, get_dataset_items, and abort_run.

Ignoring cost limits

The user runs multiple massive scrapes back-to-back without checking usage, leading to unexpected billing overages.

→ Always check get_account_limits first. This tool tells you exactly how many Compute Units and how much proxy bandwidth you have left before starting a resource-heavy task.

When It Fits, When It Doesn't

Use this if you need to systematically extract structured data from the live web. This server is essential when your data source is a website and you need to perform complex, multi-step operations (start -> poll -> extract). Don't use it if you are only trying to read a single, static API endpoint; use a direct API call instead. If your goal is merely to check the status of a service, use get_run instead of trying to run a new actor. If you only need to list available scrapers, use list_actors; don't try to scrape without first listing what's available.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Apify. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

abort_run get_account_limits get_dataset_items get_key_value_store get_run list_actors list_webhooks push_to_queue run_actor run_actor_sync

Collecting data from websites is a mess of tabs and copy-pasting.

Today, collecting data means opening a target site, clicking through dozens of pages, right-clicking to save images, and then manually copying every price and name into a spreadsheet. You spend hours just navigating and cleaning up the data.

With the Apify MCP Server, your agent runs the entire workflow. You tell it to scrape the site, and it handles the clicks, the pagination, and the data extraction. You get clean, structured JSON data ready for your analysis.

Apify MCP Server: Get and control scraped data.

Before this server, if a scrape failed or hit a dead end, you were stuck. You had to manually figure out which part of the process failed and re-run it. You couldn't dynamically add new URLs or check why the job stopped.

Now, the agent lets you use `get_run` to check the status and `push_to_queue` to fix the job mid-stream. You control the entire process, from start to finish.

Common Questions About Apify MCP

How do I start a scrape using the Apify MCP Server? +

You start by using run_actor. This tool takes the scraper ID and initial configuration. The agent returns a run ID immediately, which you then use to monitor the job.

Is `get_dataset_items` the only way to get the data? +

No. You can also use get_key_value_store to retrieve non-tabular assets like screenshots or configuration files. get_dataset_items is specifically for the structured JSON records.

Can I stop a running scrape with the Apify MCP Server? +

Yes, you use the abort_run tool. This stops the job immediately, and any data successfully scraped before the stop is preserved.

What is the difference between `run_actor` and `run_actor_sync`? +

run_actor runs the scrape in the background and gives you a job ID. run_actor_sync waits until the scrape finishes before giving you a result. Use the async version for anything that takes longer than a few minutes.

How do I check my usage limits using the `get_account_limits` tool? +

You use get_account_limits to check your account's consumption status. This tool reports your current usage of Compute Units and proxy bandwidth, helping you avoid overage charges.

What is the purpose of `get_key_value_store` in an Apify run? +

get_key_value_store retrieves arbitrary data stored by the actor. You use this for non-structured outputs like screenshots, configuration files, or intermediate results.

How do I monitor the progress of a long-running scrape using `get_run`? +

You poll the get_run endpoint with the run ID. This gives you the current status and metadata of the actor run, letting you track if it succeeded or failed.

Can I dynamically add new URLs to an active scrape using `push_to_queue`? +

Yes, you use push_to_queue to feed new URLs into the active request queue. This enables dynamic crawling when new pages are discovered during the run.

How can the AI agent run a scrape on a list of product URLs? +

First, find your specific scraping Actor ID via list_actors. Then, prompt your agent to execute run_actor, providing the target URLs formatted as a structured JSON input payload. It returns a 'Run ID'. You can poll this run via get_run, and once it succeeds, the agent calls get_dataset_items to pull all acquired data straight to your window.

Can the agent interact with run configurations mid-way during crawling? +

Yes. If an Apify crawler is currently executing and utilizes a Request Queue, you can instruct your agent to call push_to_queue. Doing so dynamically ships new URLs to the active queue instance, extending the current web crawl without needing to stop or restart the Actor.

Can my AI automatically detect scraping timeouts and debug the failure? +

Absolutely. Because your agent can track real execution flows with get_run, it's aware if it transitions to TIMED-OUT or FAILED states. Subsequently, you can ask the agent to examine the KV Store log outputs ensuring the underlying issue (e.g. captcha block, blocking proxy) is identified immediately.

View all recipes →

MCP Servers to Build AI Training Datasets

You need a dataset of 10,000 product listings for your RAG system but there is no API , Apify scrapes them, Chroma stores them as searchable embeddings, and Notion tracks every data source with quality scores

Apify Chroma Vector Db Notion

View all recipes

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python