# DocBreach MCP

> DocBreach gives your AI client direct access to any API documentation. It finds, reads, and structures information from OpenAPI specs, Swagger files, and complex Single Page Applications (SPAs) without needing browser rendering or keys. Stop fighting with poorly scraped web pages; DocBreach delivers clean, LLM-ready Markdown so your agent can actually use the API context.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** documentation, api-docs, openapi, swagger, scraping

## Description

API documentation is notoriously difficult for AI agents to read. Most tools fail when they encounter modern websites—the JavaScript frameworks, the hidden endpoints, or the need for complex authentication headers. This MCP fixes that problem entirely. It lets your agent bypass the web's usual roadblocks, giving it clean access to technical docs whether they are in OpenAPI format, Postman collections, or a standard website.

Your agent can first find documentation sources for any service using `docs_discover`. Once found, it reads the entire context and returns pristine Markdown. If you need structure, your agent can map out an entire domain's sitemap or pull structured lists of endpoints directly from specs. Getting this level of deep, reliable API knowledge used to require multiple developer tools; now you just subscribe to DocBreach on Vinkius and let your AI client handle the rest.

## Tools

### docs_search
Searches specific keywords or phrases across an entire known documentation site.

### docs_discover
Finds the official documentation source for any service, library, or API by querying a general topic.

### docs_map
Generates a complete table of contents and sitemap structure for an entire domain.

### docs_read
Reads any documentation URL, returning clean Markdown that is ready for your AI agent to use.

### docs_extract
Pulls structured data about endpoints from raw OpenAPI or Postman specification files.

## Prompt Examples

**Prompt:** 
```
Find the official integration guide for Stripe webhooks and read the first steps.
```

**Response:** 
```
I've reviewed the Stripe Webhooks documentation at `https://docs.stripe.com/webhooks`. The guide outlines 3 main steps to listen for events: setting up an endpoint, parsing the payload, and verifying signatures. Would you like me to generate the implementation code based on this guide?
```

**Prompt:** 
```
Extract all the user authentication endpoints from the GitHub REST API specification.
```

**Response:** 
```
I've analyzed the GitHub OpenAPI spec and found 5 endpoints related to user authentication under the 'users' tag. The primary endpoints are `GET /user` for the authenticated user and `PATCH /user` to update the profile. Which endpoint would you like to see the required parameters for?
```

**Prompt:** 
```
Map the documentation structure for Vercel's v0 to see all available topics.
```

**Response:** 
```
I've mapped the documentation for `v0.dev`. There are 42 pages in total, structured under 'Concepts', 'UI Components', and 'Billing'. The most relevant page for getting started seems to be `https://v0.dev/docs/introduction`. Shall I read that page for you?
```

**Prompt:** 
```
Search the Next.js documentation for how to implement Server Actions.
```

**Response:** 
```
Based on the Next.js documentation, specifically the 'Data Fetching: Server Actions and Mutations' guide, you implement Server Actions using the `"use server"` directive. Should I output the code example for a basic form mutation?
```

**Prompt:** 
```
Read the React 19 documentation on the useActionState hook and summarize its return values.
```

**Response:** 
```
According to the React 19 documentation, `useActionState` returns an array with three values: `[state, formAction, isPending]`. The `isPending` boolean is new and replaces the need for `useFormStatus` in many cases. Do you want to see a full implementation example?
```

## Capabilities

### Find documentation sources
Directly locate where a service's technical documentation lives, even if you only know general keywords.

### Read and clean documents
Extract any API manual or guide from a URL into clean, usable Markdown for your agent to process.

### Map site structures
Generate a complete table of contents for an entire documentation domain, showing all available sections and pages.

### Search within docs
Find specific topics or keywords inside large documentation sites without needing to browse through menus.

### Extract structured API endpoints
Take raw OpenAPI, Swagger, or Postman specs and pull out only the necessary endpoint details in an organized format.

## Use Cases

### Onboarding a new client with complex APIs
A developer needs to connect their product to Stripe. Instead of spending days reading scattered docs, they ask their agent to use `docs_discover` for 'Stripe webhooks'. The agent finds the correct guide and uses `docs_read` to summarize the three main implementation steps, giving the developer a clear path forward.

### Auditing an internal API documentation site
An architect needs to know if their team covered all parts of the new billing module. They use `docs_map` on the internal domain URL. The resulting sitemap reveals sections that have pages but no content, allowing them to flag gaps immediately.

### Generating code from a raw spec file
A team member gets a raw OpenAPI JSON file for an external partner integration. They use `docs_extract` with the necessary tags (e.g., 'users') to pull out only the user-related endpoints, feeding the clean list directly to their agent for code scaffolding.

### Comparing multiple database APIs
A data scientist needs to compare two different cloud provider APIs. They use `docs_read` on both providers' 'authentication' guides and then feed both Markdown documents into the agent, asking it to summarize the differences in OAuth flows.

## Benefits

- Stops the 'JavaScript wall.' You don't have to worry if a site uses React or Vue; DocBreach reads documentation regardless of how it's rendered.
- Saves hours on research. Instead of manually checking multiple sites for API calls, use `docs_discover` to locate the source in seconds.
- Guarantees usable context. The content pulled by `docs_read` is clean Markdown, meaning your agent spends time coding, not cleaning up messy HTML tags.
- Structured data access. For specs like OpenAPI or Postman, `docs_extract` bypasses reading prose and gives you a pure list of callable endpoints.
- Complete site overview. Use `docs_map` to get the full scope of an API domain, ensuring your agent doesn't miss any crucial setup guides.

## How It Works

The bottom line is: you get reliable, structured documentation context without having to worry about how modern websites break AI parsing tools.

1. First, use `docs_discover` to find the official documentation source for a service you're interested in.
2. Next, if you need comprehensive context, use `docs_map` to get a full map of the site structure. Then, pass specific URLs from that map to `docs_read` to pull clean content.
3. Finally, feed the resulting Markdown or structured data directly into your agent for analysis and code generation.

## Frequently Asked Questions

**How does DocBreach bypass modern Single Page Application (SPA) barriers?**
DocBreach features a built-in hydration engine. Rather than relying on a heavy headless browser, it intercepts and evaluates underlying framework payloads (like `__NEXT_DATA__` for Next.js or Docusaurus state) to extract the raw documentation text directly. This makes it blisteringly fast and resource-efficient.

**What documentation formats and frameworks are officially supported?**
It natively parses OpenAPI/Swagger specifications, Postman Collections, and standard `llms.txt` files. For web-based docs, it surgically cleans noise from 12+ major frameworks including Docusaurus, Nextra, VitePress, Mintlify, GitBook, and ReadMe, returning LLM-optimized Markdown.

**Do I need any third-party API keys or paid scraping subscriptions?**
Absolutely not. DocBreach is designed to operate completely independently. It accesses public documentation directly via optimized HTTP clients using smart resolution strategies, eliminating the need for proxy services, SaaS subscriptions, or scraping API keys.

**How does this MCP server improve my AI Agent's performance?**
AI Agents typically hallucinate when APIs are updated or when facing undocumented endpoints. By equipping your agent with DocBreach, it can autonomously search, map, and read the definitive, up-to-date source of truth before writing a single line of code, ensuring deterministic integrations.

**Can it extract specific endpoints from massive OpenAPI specifications?**
Yes! The `docs.extract` tool is built precisely for this. Instead of loading a 5MB JSON spec into your agent's context window (which wastes tokens and ruins attention), DocBreach parses the spec server-side and allows your agent to query exactly the endpoints and methods it needs.