Web Scraper MCP. Access Real-Time Web Content & Data Streams

Q: How do I get the full text of an article using Web Scraper MCP Server?

Just use the read tool and provide the URL. It strips out all the junk—ads, menus, footers—and gives you only the main content as Markdown.

Q: Can I crawl a whole documentation site using Web Scraper MCP Server?

Yes, use the crawl tool. Give it a starting URL, and the agent will automatically map up to 10 pages deep on that domain for you.

Q: What is better: extract or read for metadata?

Extract is only for structured data (titles, descriptions). If you use read, it gives you the full content and enough context to extract all that metadata, which is usually what you want.

Q: How do I compare 10 articles at once with Web Scraper MCP Server?

You need to use batchread. This tool takes up to ten URLs and fetches them in parallel. It's the fastest way to process a large group of sources.

Q: If I only need SEO tags like titles or descriptions, is the extract tool reliable?

Yes, the extract tool pulls structured metadata reliably. It gives you title, description, OG tags, and canonical links without needing to download or process the entire page body.

Q: What is the maximum depth when using the crawl tool on a documentation site?

The crawl function automatically limits crawling to 10 pages deep. This limit keeps the response size manageable while still allowing you to map out major sections of a comprehensive wiki or docs hub.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Web Scraper gives your AI agent direct access to real-time public web data. It reads articles, crawls documentation sites, and extracts structured metadata from any URL you provide.

Instead of relying on cached or hallucinated facts, your agent gets clean Markdown content directly from the source. Use it for academic research, SEO auditing, or comparing multiple technical documents at once.

What your AI agents can do

Batch read

Fetches up to 10 web pages at the same time, processing them in parallel for quick comparison.

Crawl

Starts at one URL and automatically crawls a website, mapping out linked content on up to 10 consecutive pages.

Extract

Pulls structured metadata like titles, descriptions, og tags, and canonical links from any single web page.

+ 2 more capabilities included

Extract Article Content

The read tool pulls out the main text from any public webpage and formats it as clean Markdown.

Map Site Structure

Using crawl, your agent automatically maps a starting website, following links up to 10 pages deep.

Compare Multiple Sources

The batch_read tool fetches and processes up to 10 web URLs simultaneously for comparison or aggregation.

Get Structured Page Data

The extract tool pulls specific metadata—like the SEO title, description, and OG tags—without needing the full article content.

Map Internal Links

The list_links tool grabs every single hyperlink found on a page to audit its internal structure.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Web Scraper: 5 Tools for Web Data Extraction

These five tools let your agent fetch, map, and parse content directly from public websites into usable formats.

batch019d7604

batch read

Fetches up to 10 web pages at the same time, processing them in parallel for quick comparison.

action019d7604

crawl

Starts at one URL and automatically crawls a website, mapping out linked content on up to 10 consecutive pages.

action019d7604

extract

Pulls structured metadata like titles, descriptions, og tags, and canonical links from any single web page.

list019d7604

list links

Gathers every hyperlink found on a web page, useful for mapping internal navigation paths.

action019d7604

read

Retrieves the full, main article content from any public URL and cleans it up into Markdown format.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Web Scraper, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Listen up. This Web Scraper gives your AI agent direct access to the live web. It bypasses cached junk and hallucinated facts; your agent reads clean data directly from any public URL you point it at. You're not relying on guesswork here.

When you use this server, your agent handles all the dirty work of cleaning up messy webpages—stripping out ads, navigation bars, and boilerplate crap—so you get pure Markdown content every time. It’s built for academic research, SEO audits, or just comparing a bunch of technical docs at once.

If you need to pull out the main story from an article, use read. You hand it any public URL, and this tool pulls out only the core text, formatting it into clean Markdown. That's your go-to for news sites, blogs, or documentation pages where you just want the meat of the content.

If you wanna map out a whole site, use crawl. You give it a starting URL, and your agent automatically follows links up to ten pages deep, mapping out the entire internal structure. It tracks all that linked content for you.

For comparing multiple sources side-by-side, run batch_read. This tool fetches and processes up to ten web URLs simultaneously, letting you aggregate or compare those documents in parallel—it’s fast.

If you don't need the full article text but just want structured data, use extract. You can pull specific metadata like the SEO title, description, Open Graph tags, and canonical links from any single page without downloading the entire body.

To audit how a site is linked internally, run list_links. This tool grabs every single hyperlink on a given webpage, mapping out all the internal navigation paths.

This setup means your agent doesn't need API keys or complex authentication; you just pass it a link in the chat and tell it what to do.

How Web Scraper MCP Works

1 Subscribe to the Web Scraper server in your AI client.
2 Give the agent a link or list of links and specify the goal (e.g., 'read this,' 'crawl this site').
3 The tool runs, fetches the data in real-time, and sends back clean text formatted for your agent to use.

The bottom line is: you tell your AI client what website or links to look at, and it brings back the raw, structured content.

Who Is Web Scraper MCP For?

Developers who need current API documentation; Researchers needing fresh academic data; SEO specialists auditing site structure. Use this if you're tired of your AI agent making up facts or relying on outdated cached information.

Technical Writer

Needs to pull the absolute latest syntax or feature changes from a vendor's API documentation and format it into user guides.

SEO Specialist

Audits competitor websites, checking metadata (via extract) and mapping out their full site link structure (via list_links).

Market Researcher

Compares product announcements across multiple industry blogs by running a batch of URLs through batch_read.

What Changes When You Connect

Stop relying on cached facts. The read tool pulls live content from any URL, ensuring your agent uses the absolute latest article text.
Speed up research using batch_read. Send 10 links at once and get all the data back in parallel, letting you compare sources fast.
Don't waste time downloading full pages. Use extract to grab only the metadata—SEO titles, descriptions, etc.—in a single, clean API call.
Map entire documentation hubs with crawl. Point the agent to a starting URL and let it automatically discover up to 10 related pages deep.
Audit site architecture easily. The list_links tool grabs every single outbound hyperlink on a page, giving you a complete map of internal links.

Real-World Use Cases

Comparing competitor feature lists

A product manager needs to know how three competitors handled authentication flow. They use batch_read by sending the main documentation pages for all three sites. The agent processes them simultaneously, allowing the PM to compare architectural details and find common gaps.

Auditing a client's website health

An SEO specialist runs list_links on several key landing pages. They then use extract on those same pages to pull the metadata, checking for missing canonical tags or incorrect site descriptions.

Writing a summary of new standards

A developer needs to write code using the newest React syntax but doesn't know where to look. They point the agent to the library's API docs and use read on several key sections, letting the AI synthesize the current best practices.

Mapping a niche wiki

A researcher is studying an academic wiki with hundreds of pages. Instead of clicking through manually, they run the crawl tool from the main hub page. The agent automatically maps and retrieves content from 10 related articles for review.

The Tradeoffs

Trying to read deep documentation

The user sends a single link to a massive, multi-section technical guide (like a full framework manual) and just asks the AI to 'read it all.' The agent gets overwhelmed with boilerplate and fails.

→ Don't send one huge link. First, use list_links or run crawl to identify the specific sections you need. Then, target those exact URLs using read for focused content.

Doing metadata extraction first

The user runs extract on a page, gets the title and description, but realizes they also needed the actual body text to summarize it. They then have to run another tool.

→ If you need both structured data AND the full content, run read. The read tool handles the whole process—it gives you clean Markdown and all the context you need.

Comparing articles sequentially

The user sends Link A, waits for results. Then sends Link B, waits. This wastes time and slows down the workflow unnecessarily.

→ If your goal is comparison or summary across multiple sources, use batch_read. It fetches all 10 URLs in parallel—that's how you maximize throughput.

When It Fits, When It Doesn't

Use this server if your data source is the public internet (a URL) and you need to extract content or metadata. Specifically:

* Use read: When you only care about the clean, readable article text.
* Use extract + list_links: When you are auditing a page's structure—checking SEO tags and mapping out every single link.
* Use batch_read or crawl: When you need to process multiple sources (many URLs) simultaneously, whether for comparison or deep site mapping.

Don't use this if your data already lives in a structured database (SQL/NoSQL). For that, you need a dedicated API connector tool, not a web scraper.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Web Scraper. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 5 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

batch_read crawl extract list_links read

Manually comparing multiple articles is slow and messy.

Right now, if you want to compare how three different tech blogs covered the new JavaScript standard, you have to open three separate tabs. You copy the core paragraphs from each one, paste them into a spreadsheet, and manually highlight the key differences. It's tedious, easily misses nuances, and wastes time.

With Web Scraper MCP Server, you just send all three URLs to your agent using `batch_read`. The tool fetches everything in parallel. You get back clean Markdown for all three sources at once. Your agent can then instantly synthesize a comparison table showing the key differences.

Web Scraper MCP Server: Get structured data without the noise.

Before this, if you needed to know a page's title or its main link structure, you had two options: right-click and copy (which is unreliable) or run complex server-side scripts that only gave you raw HTML. You were either getting garbage data or running into technical roadblocks.

Now, the `extract` tool gives your agent clean JSON objects containing titles, descriptions, OG tags, and link counts—all in one go. It’s structured data delivered right to your chat.

Common Questions About Web Scraper MCP

How do I get the full text of an article using Web Scraper MCP Server? +

Just use the read tool and provide the URL. It strips out all the junk—ads, menus, footers—and gives you only the main content as Markdown.

Can I crawl a whole documentation site using Web Scraper MCP Server? +

Yes, use the crawl tool. Give it a starting URL, and the agent will automatically map up to 10 pages deep on that domain for you.

What is better: `extract` or `read` for metadata? +

Extract is only for structured data (titles, descriptions). If you use read, it gives you the full content and enough context to extract all that metadata, which is usually what you want.

How do I compare 10 articles at once with Web Scraper MCP Server? +

You need to use batch_read. This tool takes up to ten URLs and fetches them in parallel. It's the fastest way to process a large group of sources.

How does the `list_links` tool work to pull out every hyperlink on a webpage? +

It extracts all hyperlinks present in the page's HTML structure. This is useful for auditing link architecture; it returns both internal and external links, letting you map out the site's connectivity without reading any content.

What are the limitations of running multiple URLs with the `batch_read` function? +

You can process up to 10 different web pages in a single batch request. The tool fetches all these URLs simultaneously, making it ideal for comparing sources or summarizing several articles at once.

If I only need SEO tags like titles or descriptions, is the `extract` tool reliable? +

Yes, the extract tool pulls structured metadata reliably. It gives you title, description, OG tags, and canonical links without needing to download or process the entire page body.

What is the maximum depth when using the `crawl` tool on a documentation site? +

The crawl function automatically limits crawling to 10 pages deep. This limit keeps the response size manageable while still allowing you to map out major sections of a comprehensive wiki or docs hub.

Can it read documentation sites that are split into multiple pages? +

Yes! You can use the crawl tool. For example: 'Crawl the getting started guide at https://example.com/docs'. The agent will fetch the starting page and automatically follow inner links to gather up to 10 pages of context.

How does it handle ads and cluttered websites? +

The read tool uses the same underlying technology as Firefox's 'Reader View' (@mozilla/readability). It intelligently strips out standard website boilerplate—like navbars, sidebars, footers, and ads—leaving only the title and the clean main article text converted to Markdown.

Is there a limit on how many URLs I can batch process? +

Yes, to ensure conversational AI latency remains reasonable, the batch_read tool accepts a maximum of 10 URLs in a single request. All 10 URLs are fetched simultaneously in parallel for maximum speed.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript