# Markdown Frontmatter Harvester MCP

> Markdown Frontmatter Harvester indexes your local knowledge base by scanning Obsidian or Hugo vaults and extracting all YAML metadata into a single, queryable JSON file. It lets your AI agent instantly read tags, dates, statuses, and other notes' hidden data without needing to search thousands of scattered markdown files.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** yaml-parsing, metadata-extraction, markdown, obsidian, vault-management, structured-data

## Description

Writing with digital notes means using tools like Obsidian or Hugo, which rely on YAML 'frontmatter'—those little blocks at the top of a file that hold metadata like `status: draft` or `tags: [idea]`. When your AI client asks, 'Which posts are marked as drafts from 2024?', it usually fails because it can't quickly index every single local markdown file. This MCP fixes that. It acts like a hyper-fast librarian, recursively scanning your entire folder structure and stripping out only the YAML frontmatter from every document. The result is a clean JSON index of your whole vault. Your agent gets one structured data set it can filter, sort, and query instantly, giving you reliable answers about your scattered notes.

## Tools

### harvest_markdown_frontmatter
Provide the absolute directory path to scan local Markdown files and extract all YAML tags, dates, and metadata into an index.

## Prompt Examples

**Prompt:** 
```
Scan my Obsidian vault at C:/Notes and list all files that have the tag 'urgent'.
```

**Response:** 
```
I found 5 notes with the tag 'urgent'. Here are their filenames: ProjectX.md, MeetingNotes.md...
```

**Prompt:** 
```
Harvest the frontmatter from my blog repo and tell me which posts are still marked as 'status: draft'.
```

**Response:** 
```
Based on the frontmatter, you have 12 posts still marked as 'draft'. Would you like the list?
```

**Prompt:** 
```
Count how many notes I created in the year 2023 based on the YAML 'date' field.
```

**Response:** 
```
According to the metadata, you created exactly 142 notes in 2023.
```

## Capabilities

### Index entire vaults
The MCP scans massive local directories to build a unified index of metadata from all contained markdown files.

### Extract specific fields
It pulls out named data points like tags, dates, and status markers written in YAML frontmatter.

### Query structured data
Your agent queries the generated JSON index directly, allowing precise filtering across thousands of documents at once.

## Use Cases

### Finding all outdated drafts
A content manager needs to know which blog posts were created before 2023 but still have a 'status: draft' tag. They simply ask their agent, and the MCP uses `harvest_markdown_frontmatter` to generate an index, allowing the AI client to list every file that meets both criteria.

### Auditing research topics
A researcher wants to count how many notes they wrote in 2024 about 'quantum computing' based on the date and tag fields. The agent runs `harvest_markdown_frontmatter` against their vault path, generating a clean dataset that lets the AI client perform an accurate count.

### Listing urgent items
A student asks to see every note marked with the 'urgent' tag across three different subfolders. The agent runs `harvest_markdown_frontmatter` on the parent directory, providing a single index that lets it pull all relevant file names instantly.

## Benefits

- Instant Querying: Instead of manually searching file names or using complex local scripts, your agent queries a unified JSON index. You get immediate answers about metadata like tags and status.
- Massive Scale: It scans 1,000+ files in milliseconds, making it practical for large Obsidian vaults without slowing down your AI client's response time.
- Data Structure: The output is clean YAML frontmatter converted into structured JSON. This format is ideal for any agent to consume and reason over.
- Air-Gapped Security: Your private journal entries and business notes never leave your machine; the processing happens locally, maintaining 100% privacy.
- Zero Setup: You don't need complex coding or configuration files. Just point the MCP at your root folder, and it does the rest.

## How It Works

The bottom line is you get one clean, actionable index of your entire knowledge base instead of thousands of individual files.

1. You provide the MCP with the absolute path to your entire notes folder or vault.
2. The tool scans every markdown file in that directory and extracts all the YAML frontmatter data it finds, ignoring the body text.
3. It returns a single, unified JSON object containing metadata for every file found, which your AI client can then use for querying.

## Frequently Asked Questions

**How does Markdown Frontmatter Harvester read my local Obsidian vault?**
The MCP scans the absolute directory path you provide. It specifically targets YAML frontmatter blocks within markdown files to extract tags, dates, and status markers.

**Is this tool private or does it upload my notes?**
No, it's entirely air-gapped. Your journal entries and business notes never leave your machine; the processing happens locally on your system for maximum privacy.

**What file types can harvest_markdown_frontmatter handle?**
It is designed to scan Markdown files (like those used in Obsidian or Hugo) and extract the YAML frontmatter contained within them.

**Does this MCP read the body text of my notes?**
No, it only reads the metadata. It extracts the structured YAML data at the top of the file; the actual content of your note is ignored during indexing.