# Wayback MCP

> Internet Archive Wayback MCP accesses the world's largest web archive, giving you access to over 800 billion archived web pages spanning decades of internet history. Check a URL's current preservation status, analyze its capture timeline, and find specific content—like images or PDFs—from any year. It lets researchers track content changes, legal teams verify evidence, and developers study how websites evolved.

## Overview
- **Category:** knowledge-management
- **Price:** Free
- **Tags:** web-archiving, url-history, snapshot-analysis, digital-preservation, internet-history, cdx-api

## Description

Need to know what a website looked like five years ago? This MCP connects your AI agent directly to the Internet Archive Wayback Machine. Instead of guessing or relying on single-point snapshots, you can check a URL's full history—a massive archive spanning over 25 years. Your agent verifies if a page was ever archived and finds its most recent snapshot instantly. You can dig into granular details: Did they change their logo? Find all JPEG images from 2018. Was the site down on a specific date? Check the HTTP status codes for that year. The power of this data is channeled through Vinkius, making historical web analysis available to any compatible client. It's ideal for journalists tracking deleted content or developers comparing design iterations over time.

## Tools

### get_captures_by_mime_type
Finds archived pages filtered by the specific file type, like showing only PDFs or images from a URL's history.

### get_captures_by_status
Filters captured records by HTTP status code (e.g., finding all 404 errors across a domain over time).

### get_captures_by_year
Retrieves all archived snapshots for a specific calendar year, allowing you to analyze content frequency during that period.

### get_cdx_captures
Gets a detailed list of every capture record, including the timestamp, MIME type, and file size for full archival analysis.

### check_availability
Quickly determines if a URL has been archived and returns the date of the closest available snapshot.

### get_captures_collapsed
Shows unique page captures for a given URL, eliminating redundant entries so you only see distinct versions.

### get_capture_count
Calculates and returns the total number of times an entire URL has been archived over its history.

### get_first_capture
Identifies and retrieves metadata for the earliest preserved version of a URL, showing when it was first captured.

### get_latest_capture
Gets the most recent archived snapshot of a page, giving you the newest recorded version available.

### get_subdomain_captures
Maps out an entire domain's historical presence by finding all captured subdomains (e.g., www and help).

## Prompt Examples

**Prompt:** 
```
Check if https://example.com has been archived.
```

**Response:** 
```
✅ URL is archived! Closest snapshot: 20240115120000 (January 15, 2024). View at: https://web.archive.org/web/20240115120000/https://example.com. Total captures: 1,247 over 28 years.
```

**Prompt:** 
```
Show me all captures of https://example.com from 2020.
```

**Response:** 
```
Found 52 captures of https://example.com in 2020. First: 2020-01-03 (status 200), Last: 2020-12-28 (status 200). Average capture frequency: ~1 per week. All returned HTTP 200 (OK).
```

**Prompt:** 
```
Find all subdomains of archive.org that have been captured.
```

**Response:** 
```
Found captures for 15 subdomains of archive.org: www, web, developer, donate, help, blog, advancedsearch, petabox, us.archive.org, and others. Most captures are from www and web subdomains. Oldest capture dates back to 1998.
```

## Capabilities

### Check URL availability
Determine if a specific website address has been archived and get the timestamp of its closest preserved version.

### Analyze capture timeline
Find out when a page was first captured or what the most recent snapshot is, giving you clear start and end points for content history.

### Filter by resource type
Limit searches to specific file types like PDFs, images, or stylesheets to pinpoint necessary historical assets.

### Track status codes
Analyze capture records specifically for HTTP error or success codes (like 404 or 200) across a period of time.

### Discover domain footprint
Find all archived subdomains associated with a main website, helping map out an entire organization's historical online presence.

## Use Cases

### Tracking a Journalist's Claim
A journalist needs proof that a rival company made a claim in 2017. They ask their agent to check the URL, using `get_captures_by_year` and then `get_first_capture`. The MCP reports all available snapshots from 2017, allowing them to pinpoint the exact date and status code of the original post.

### Legal Discovery for a Breach
A compliance officer needs evidence that a specific policy was visible on a website in late 2021. They use `get_captures_by_status` to filter out error pages and then check the resulting records to confirm the presence of the required text block.

### Developer Comparing Design Changes
A developer wants to see how a site's structure changed over time. They use `get_subdomain_captures` first, then run `get_captures_by_mime_type` for CSS files across multiple years to analyze the evolution of stylesheets.

### Academic Research on Web Trends
A historian wants to study how a particular industry presented itself online over 20 years. They use `get_capture_count` and `get_captures_by_year` repeatedly across different domains to quantify the change in web presence.

## Benefits

- Instantly verify content history. Use `check_availability` to confirm if a page was ever archived, saving you the manual effort of checking multiple archives.
- Analyze site evolution with precision. Instead of guessing, use `get_captures_by_year` to pull all snapshots from a specific year for comparison.
- Track content changes over time. Use `get_first_capture` and `get_latest_capture` together to measure the gap between a page's debut and its most recent update.
- Build domain maps easily. The `get_subdomain_captures` tool reveals the full historical footprint of an organization, finding subdomains you didn't know existed.
- Filter data for specific evidence. Need only to check if images were posted? Use `get_captures_by_mime_type` to filter out irrelevant text and status codes.

## How It Works

The bottom line is that your AI agent treats the vast web archive like a searchable database, letting you query specific pieces of history without needing to browse the raw data yourself.

1. Subscribe to this MCP on Vinkius. No API key is needed; the connection is open and public.
2. Your AI agent sends a query—for example, 'Show me all captures for X URL in 2015'—to the connected archive data.
3. The MCP executes the necessary checks and returns structured historical metadata detailing the status codes, dates, and types of captured content.

## Frequently Asked Questions

**How do I check if a URL was ever on the Internet Archive Wayback using the Internet Archive Wayback MCP?**
You run `check_availability` with the target URL. This tool immediately tells you if the page has been archived and provides the timestamp of the closest preserved version.

**Can I find out when a website was first online using get_first_capture?**
Yes, running `get_first_capture` gives you the initial metadata for the earliest snapshot available. It includes the timestamp and status code of that very first recorded version.

**How do I analyze a domain's full history using get_subdomain_captures?**
Use `get_subdomain_captures` with the root domain. This tool discovers and lists all associated subdomains that have been captured, letting you map out the entire corporate footprint.

**What is the best way to filter for images in a specific year?**
You combine two tools: first, use `get_captures_by_year` to narrow down the date range. Then, refine that list using `get_captures_by_mime_type` and specify 'image/jpeg' or similar.

**Do I need an API key for Internet Archive Wayback MCP?**
No. This connection is free and public, meaning you don't have to worry about managing credentials; just connect via your preferred AI client.