# Internet Archive Metadata MCP

> Internet Archive Metadata gives your AI client deep access to historical records. Get structured data—metadata, file lists, user reviews, collection memberships, and modification history for any item on archive.org. This MCP turns vast, unstructured public domain archives into actionable information you can query and analyze.

## Overview
- **Category:** knowledge-management
- **Price:** Free
- **Tags:** digital-archiving, metadata-extraction, library-science, open-data, file-retrieval, historical-records

## Description

Need to research an obscure piece of media or a niche historical record? This MCP connects your AI client directly to the Internet Archive's backend data structure. Instead of clicking through dozens of web pages just to compile facts—like checking file formats, finding out who reviewed it, or seeing how many times it was viewed—your agent handles it all in one query. You can ask for a complete item profile, pulling everything from the title and creator down to the storage location. If you're building complex knowledge tools, Vinkius makes this MCP available right alongside thousands of others, giving your AI client access to an unmatched depth of data sources. Your agent doesn't just summarize; it retrieves specific facts—from tracking changes over time to listing every single downloadable file format attached to the record.

## Tools

### get_collections
This tool shows all the specific collections an item belongs to, giving you its structural context.

### get_derivatives
It lists automatically processed versions of the original upload, such as optimized thumbnails or OCR text files.

### get_files
This tool retrieves a list of every single downloadable file format available for that item.

### get_metadata
It fetches the complete, core data about an item: creator, date, subjects, description, and license details.

### get_history
This tool tracks every recorded change to an item over time, providing a full audit trail of modifications.

### get_metadata_only
Use this when you only need the basic descriptive data about the item without pulling file lists or reviews.

### get_parents
It reveals the higher-level categorization structure, showing which broader parent collections the item falls under.

### get_reviews
This tool pulls community ratings and review text from users who have viewed the material.

### get_server_info
It provides technical details on where the item's files are stored, useful for diagnosing access issues.

### get_stats
This tool returns key usage metrics, including download counts and general access statistics.

## Prompt Examples

**Prompt:** 
```
Get metadata for item big_buck_bunny.
```

**Response:** 
```
big_buck_bunny — Creator: Blender Foundation. Date: 2008. Type: Animation/Short film. Duration: 10 minutes. Collection: Community Video. Available in MP4, OGV, and archival formats.
```

**Prompt:** 
```
List all files for item gutenberg_etext1.
```

**Response:** 
```
Found 8 files: gutenberg_etext1.txt (plain text, 1.2 MB), gutenberg_etext1.epub (EPUB, 800 KB), gutenberg_etext1.mobi (MOBI, 900 KB), gutenberg_etext1.pdf (PDF, 2.1 MB), and various metadata files.
```

**Prompt:** 
```
Get reviews for item nasa_apollo11.
```

**Response:** 
```
Found 23 reviews. Average rating: 4.8/5 stars. Top review from user 'spacefan42': "Incredible historical footage. The quality restoration is remarkable. A must-watch for anyone interested in space exploration."
```

## Capabilities

### Identify item context and groupings
The MCP determines which larger collections or parent categories an item belongs to.

### List all available digital files
It pulls a comprehensive list of downloadable assets, detailing formats like PDF, EPUB, MP4, and their specific sizes.

### Extract community feedback and ratings
The MCP retrieves user reviews, including star averages and the text written by other users.

### Get item usage statistics
It provides access counts, showing how popular or frequently accessed the archived material is.

### Trace data evolution over time
The MCP tracks the modification history of an item, letting you see when and what changes were made to the record.

### View technical hosting details
It supplies server information regarding where the files are physically hosted.

## Use Cases

### Verifying source credibility
A student needs to cite old film footage. They ask their agent for the item's full metadata, then use get_history to see if key details (like the creator name or date) were corrected after initial upload. This verifies the source’s reliability.

### Optimizing digital asset libraries
A library manager wants to know which physical collections should be digitized next. They use get_collections and then check get_stats on related items, prioritizing those that have high download counts but no corresponding parent collection data.

### Troubleshooting file access
A user can't open a specific file type. They prompt their agent to run the combined query for get_files and get_server_info, immediately identifying if the format is missing or if the hosting location needs updating.

### Understanding content lineage
A researcher finds a derivative file but needs context. They use get_derivatives to see what was processed and then run get_parents to understand the broader thematic grouping of that content within the archive.

## Benefits

- You instantly get full context on an asset. Using get_metadata ensures you don't miss the creator, license, or subject matter that defines a record.
- Never worry about missing formats again. The ability to list all downloadable files via get_files shows you every format available—from PDF to MP3—all in one go.
- You can gauge an item’s relevance by checking community sentiment through the get_reviews tool, getting star ratings and user commentary right away.
- Tracking changes is simple. Running the get_history function provides a clear timeline of modifications, which is critical for academic integrity and provenance research.
- Quick checks are fast. If you only need basic item details without downloading massive amounts of data, use get_metadata_only to keep your queries light and fast.

## How It Works

The bottom line is that it turns manual web scraping into a single, programmatic query.

1. You give your AI client an item's unique identifier (e.g., from its URL).
2. The MCP executes the necessary queries, pulling metadata, file listings, reviews, and statistics into a structured data payload.
3. Your agent receives clean, organized JSON or plain text containing all requested historical details.

## Frequently Asked Questions

**How do I use Internet Archive Metadata MCP to find all file types for a record?**
Run get_files. This tool specifically lists every format available, whether it's plain text, an EPUB book, or a high-res MP4 video.

**Can Internet Archive Metadata MCP track if item details were changed over time?**
Yes, use get_history. It provides a modification timeline, letting you see exactly when the record was updated and what changes were made to it.

**Do I need to run all tools for full metadata on Internet Archive Metadata MCP?**
No. For basic facts, use get_metadata. If you also want community opinion, you'll need to supplement that by running get_reviews.

**What is the difference between get_metadata and get_metadata_only?**
get_metadata provides a comprehensive profile including files and reviews. get_metadata_only runs a lighter query, giving you just the core descriptive fields for faster lookups.

**How do I find out which collections an item belongs to using Internet Archive Metadata MCP?**
Use get_collections. This tool explicitly lists all the various groups or categories that contain the specific archived item.