Wayback MCP. Analyze historical web snapshots from the archive.

Q: How do I check if a specific URL was archived using checkavailability?

The checkavailability tool verifies if the URL exists in the archive and returns the timestamp of the closest snapshot. It's the best starting point for any historical investigation.

Q: Can I find out how many times a URL has been archived using getcapturecount?

Yes, getcapturecount gives you the total number of times the URL has been preserved. This number tells you the overall frequency of the URL's documented presence.

Q: What is the best tool for seeing a URL's entire history?

For the full, raw data dump, use getcdxcaptures. It provides a comprehensive log detailing the timestamp, status code, MIME type, and file size for every single capture.

Q: How do I find all subdomains of a domain using getsubdomaincaptures?

Running getsubdomaincaptures takes a root domain and returns all captured subdomains, making it easy to map the full archival scope of an organization.

Q: How do I find the earliest or most recent snapshot of a URL using getfirstcapture or getlatestcapture?

Use getfirstcapture to find the earliest preservation date. For the most recent version, run getlatestcapture. Both tools return the timestamp, status code, and original URL for immediate use.

Q: Can I analyze site availability patterns over time using getcapturesbystatus?

Yes, getcapturesbystatus filters captures by HTTP code. This lets you see if a site consistently returned 200 (OK), or if it often generated 404 or 500 errors over its history.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Internet Archive Wayback provides access to the world's largest web archive (800B+ pages over 25 years). Your AI agent can check if a URL was captured, find its first or latest snapshot, and analyze its entire capture history.

Use `get_captures_by_status` to see if a site reliably returned 200 OK, or `get_subdomain_captures` to map a domain's entire archived footprint.

What your AI agents can do

Check availability

Verifies if a URL was archived and returns the timestamp and status of the closest snapshot.

Get capture count

Counts the total number of times a specific URL has been archived.

Get captures by mime type

Filters all captures to only include specific resource types, like PDFs or images.

+ 7 more capabilities included

Check URL Preservation Status

Determines if a given URL was archived and returns the timestamp for the closest snapshot.

Calculate Capture Frequency

Retrieves the total count of times a URL has been archived by the Wayback Machine over its history.

Analyze Content Type History

Filters capture records to show only specific resource types, like PDFs or JPEG images.

Audit Site Status Over Time

Filters captures based on HTTP status codes (e.g., 200 OK, 404 Not Found) to analyze site availability patterns.

Discover Domain Subdomains

Finds all archived subdomains associated with a root domain, mapping its full digital footprint.

Retrieve Detailed Capture Logs

Gets a comprehensive list of every capture, including timestamp, MIME type, status code, and file size.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Internet Archive Wayback: 10 Tools for Web Archive Analysis

These tools let your agent query the massive Wayback Machine archive, giving you granular control over web history, from finding the earliest capture to filtering by status code.

check019d75b6

check availability

Verifies if a URL was archived and returns the timestamp and status of the closest snapshot.

get019d75b6

get capture count

Counts the total number of times a specific URL has been archived.

get019d75b6

get captures by mime type

Filters all captures to only include specific resource types, like PDFs or images.

get019d75b6

get captures by status

Filters captures based on HTTP status codes (e.g., 200, 404, 301) to analyze site availability patterns.

get019d75b6

get captures by year

Filters captures to show records only from a specific four-digit year.

get019d75b6

get captures collapsed

Provides a unique list of captured pages, removing redundant entries for the same URL key.

get019d75b6

get cdx captures

Retrieves detailed capture history, listing the timestamp, status code, MIME type, and file size for each capture.

get019d75b6

get first capture

Finds the earliest recorded capture time, along with the status and original URL.

get019d75b6

get latest capture

Gets the most recent archived version of a URL, including the timestamp, status code, and URL.

get019d75b6

get subdomain captures

Retrieves capture records for all subdomains associated with a given domain.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Internet Archive Wayback, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

The Wayback Machine gives your AI agent access to the world's biggest web archive, covering over 800 billion pages from more than 25 years. Your agent can track a URL's full life cycle and audit a domain's history.

Your agent uses check_availability to verify if a URL was archived, returning the timestamp and status of the closest snapshot. get_capture_count tells you the total number of times a URL got archived. You can find the earliest record with get_first_capture, or the most recent one using get_latest_capture. To map a domain's full digital footprint, your agent runs get_subdomain_captures for all associated subdomains.

You can narrow down the history by filtering captures using get_captures_by_mime_type for specific resources like PDFs or images, or by status code with get_captures_by_status to see if a site reliably returned 200 OK. You can also filter records to show only content from a specific year using get_captures_by_year. For a full breakdown, get_cdx_captures pulls every capture's details, including timestamp, status code, MIME type, and file size. get_captures_collapsed provides a unique list of captured pages, stripping out redundant entries for the same URL key.

How Wayback MCP Works

1 Subscribe to the Internet Archive Wayback MCP Server. It's a public, zero-key service.
2 Your AI client invokes a specific tool (e.g., get_captures_by_year), providing the URL and required parameters.
3 The agent executes the tool call and receives a structured data response showing the historical capture metadata.

The bottom line is, your AI client uses the tools to query the Wayback Machine's data and gets back a structured report on a URL's past.

Who Is Wayback MCP For?

Journalists, researchers, and legal teams who need to verify content or track changes are the primary users. If you need proof of what a website looked like in 2015, or if you're tracking how a competitor changed their privacy policy, this is for you. It’s for anyone whose job depends on knowing what was said, or what was online, at a specific point in time.

Investigative Journalist

Uses check_availability to verify if a source's article was archived, and then uses get_cdx_captures to find the exact date and content status.

Web Developer

Runs get_captures_by_mime_type to see how a website's CSS or image assets evolved, or uses get_subdomain_captures to audit a whole domain's structure.

Compliance Officer

Uses get_latest_capture and get_first_capture to establish an official record of content presence and track changes for legal retention.

What Changes When You Connect

See a URL's entire history in one go. Instead of guessing, use get_cdx_captures to get a detailed log of every capture, including the exact timestamp, MIME type, and status code.
Pinpoint exact dates. Use get_first_capture and get_latest_capture to know precisely when a piece of content first appeared or when it was last visible online.
Track site health. Running get_captures_by_status lets you analyze if a site used to return 200 OK consistently, or if it frequently failed with 404s.
Audit entire domains. Running get_subdomain_captures maps a domain's full archived footprint, revealing subdomains that might otherwise be missed.
Filter content types. If you only care about images or PDFs, get_captures_by_mime_type pulls only those records, cutting through the noise of general HTML pages.
Analyze temporal trends. Use get_captures_by_year to isolate captures from a specific year, allowing you to study changes in content or structure year-over-year.

Real-World Use Cases

Verifying a Leaked Claim

A journalist is investigating a claim about a company's product launch. They use check_availability to confirm if the specific press release URL was ever archived. If it was, they run get_cdx_captures and filter by get_captures_by_status (looking for 200 OK) to get the exact text and date, proving the claim's existence at that time.

Website Design Evolution Audit

A web developer wants to compare the look of a corporate site from 2010 versus today. They run get_captures_by_year for both 2010 and the current year. They then use get_captures_by_mime_type to compare the archived CSS and image files to see how the site's design actually changed over time.

Legal Evidence Collection

A compliance officer needs evidence of a specific website disclaimer that was active last quarter. They use get_latest_capture to find the most recent snapshot, and then get_first_capture to establish the earliest date the content was online, giving them a precise, auditable timeline.

Mapping a Company's Digital Footprint

A cybersecurity researcher suspects a domain is using multiple hidden subdomains. They use get_subdomain_captures on the main domain. This quickly reveals all archived subdomains, allowing them to investigate potential phishing or data leak vectors across the entire organizational structure.

The Tradeoffs

Guessing the right tool

Trying to find all PDFs from 2023 by just running get_captures_by_mime_type and hoping the year filter is available.

→ You must chain the tools. First, run get_captures_by_year with '2023'. Then, filter that result set using get_captures_by_mime_type to isolate only 'application/pdf' captures.

Overlooking the full scope

Checking only the main domain URL and assuming that covers all related content or subdomains.

→ Always run get_subdomain_captures first. This reveals the full domain footprint, ensuring you don't miss a critical piece of evidence hosted on a secondary subdomain.

Treating all captures equally

Analyzing a massive list of captures without knowing which status codes are valid or useful.

→ Use get_captures_by_status to narrow the focus. If you only care about successful content, filter for '200'. If you suspect a change, filter for '404' to see what was removed.

When It Fits, When It Doesn't

Use this server if your core problem is historical data verification or trend analysis over time. You need to know when content was online, what it looked like then, or how a site's structure changed across years. The key is time-series analysis of web content.

Don't use this if you just need to know the current status of a URL—use a simple HTTP request tool for that. Also, if your need is to compare two unrelated data sets (e.g., financial records vs. web history), this server won't help. When in doubt, check the get_cdx_captures output; it contains all the raw metadata fields (status, MIME type, year, etc.) so you can build your own filtering logic.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Internet Archive Wayback Machine. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

check_availability get_capture_count get_captures_by_mime_type get_captures_by_status get_captures_by_year get_captures_collapsed get_cdx_captures get_first_capture get_latest_capture get_subdomain_captures

Figuring out if a website changed its content is a nightmare.

Today, if a researcher needs to verify a claim from five years ago, they have to rely on fragmented sources—a cached screenshot here, a news report there. They copy URLs, hit different archive sites, and manually cross-reference dates and content. It's slow, error-prone, and often incomplete.

With this MCP server, your agent handles the heavy lifting. You just ask it to compare the site's structure across two years. The agent runs `get_captures_by_year` for both years, then uses `get_captures_by_mime_type` to pull only the HTML content, giving you a direct, side-by-side comparison of the site's evolution.

Use `get_subdomain_captures` to map a domain's full footprint.

Before, finding all related content for a major brand meant manually checking `blog.company.com`, `support.company.com`, and `careers.company.com`. You'd spend hours digging through multiple search interfaces, often missing the archived data for minor subdomains.

Now, the agent runs `get_subdomain_captures` on the root domain. It automatically finds and compiles the archived data for every known subdomain, letting you see the full, historical scope of the organization's online presence in one query.

Common Questions About Wayback MCP

How do I check if a specific URL was archived using `check_availability`? +

The check_availability tool verifies if the URL exists in the archive and returns the timestamp of the closest snapshot. It's the best starting point for any historical investigation.

Can I find out how many times a URL has been archived using `get_capture_count`? +

Yes, get_capture_count gives you the total number of times the URL has been preserved. This number tells you the overall frequency of the URL's documented presence.

What is the best tool for seeing a URL's entire history? +

For the full, raw data dump, use get_cdx_captures. It provides a comprehensive log detailing the timestamp, status code, MIME type, and file size for every single capture.

How do I find all subdomains of a domain using `get_subdomain_captures`? +

Running get_subdomain_captures takes a root domain and returns all captured subdomains, making it easy to map the full archival scope of an organization.

Can I filter for only PDF documents from a specific year using `get_captures_by_mime_type` and `get_captures_by_year`? +

Yes, you can combine these. First, use get_captures_by_year to narrow the scope to the target year. Then, filter that result set using get_captures_by_mime_type to isolate only 'application/pdf' records.

How do I find the earliest or most recent snapshot of a URL using `get_first_capture` or `get_latest_capture`? +

Use get_first_capture to find the earliest preservation date. For the most recent version, run get_latest_capture. Both tools return the timestamp, status code, and original URL for immediate use.

Can I analyze site availability patterns over time using `get_captures_by_status`? +

Yes, get_captures_by_status filters captures by HTTP code. This lets you see if a site consistently returned 200 (OK), or if it often generated 404 or 500 errors over its history.

How far back does the Wayback Machine go? +

The Wayback Machine has archived web pages since 1996. However, coverage varies significantly — major websites have captures going back 20+ years, while smaller or newer sites may have fewer or no captures. Use get_first_capture to find the earliest archived version of any URL.

Can I find captures that returned 404 errors? +

Yes! Use get_captures_by_status with status_code="404". This returns all archived versions where the page returned a Not Found error. This is useful for tracking when pages were removed or URLs changed structure.

Can I discover all subdomains of a website that have been archived? +

Yes! Use get_subdomain_captures with the base domain (e.g., "example.com"). This returns captures for all subdomains like www.example.com, blog.example.com, api.example.com, etc. It's useful for mapping the full archival footprint of an organization's web presence.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript