# Diffbot MCP for AI Agents MCP

> Diffbot lets your AI agent automatically extract structured data from any website. It processes complex web pages—whether they're news articles, e-commerce product listings, or forum discussions—and converts the messy content into clean JSON. You just point it at a URL, and your agent handles everything else.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** web-scraping, data-extraction, computer-vision, structured-data, e-commerce-intelligence, sentiment-analysis, content-parsing

## Description

Diffbot gives your AI client direct access to structured web data extraction. Instead of having to write complex scrapers or manually copy key details from dozens of sites, you ask your agent for what you need—and Diffbot retrieves it. The system analyzes the page type first; is it a product? An article? A list of search results? It figures it out and extracts the relevant data automatically.

This means whether you're tracking competitor pricing across multiple e-commerce sites or pulling clean, readable text from academic journals, your agent handles the dirty work. You can even analyze forum threads to gauge public sentiment or pull job market trends by gathering structured details like salary ranges and employer names. Because this MCP is available on Vinkius, you connect once with Claude, Cursor, or any compatible client, giving yourself a massive toolkit for turning raw web pages into actionable data.

## Tools

### analyze_page
Automatically classifies any web page and extracts structured data like articles, products, or events in a single pass.

### extract_article
Extracts clean content from news sites, identifying the title, author, date, and body text of an article.

### extract_custom_api
Allows you to pull data using specific extraction rules that you define in your own Diffbot dashboard setup.

### extract_discussion
Gathers comments and reviews from forum threads, allowing analysis of user-generated conversations.

### extract_event
Pulls schedules and details for events, giving you organized information about dates and locations.

### extract_image
Retrieves the main images from a page so you can build galleries or identify key visuals.

### extract_job
Extracts specific job details, including titles, employer names, and salary ranges, from career pages.

### extract_list
Identifies bounded search results or directory listings on a page to extract arrays of links and item titles.

### extract_product
Extracts comprehensive e-commerce data points like pricing, brand names, SKUs, and product descriptions.

### extract_video
Gathers video metadata and content details from a webpage so you can track media assets.

## Prompt Examples

**Prompt:** 
```
What are the core specs and pricing for this phone model: https://techsite.com/iphone16?
```

**Response:** 
```
**Product Details Report**

*   **Name:** iPhone 16 Pro Max
*   **Brand:** Apple
*   **SKU:** IPH-16PM-256
*   **Price:** $1,199.00 (In Stock)
*   **Availability:** Limited stock.

**Specifications:**
*   Display: 6.7 inches, ProMotion
*   Storage options: 128GB to 1TB
*   Camera: 48MP Main Lens

Need help comparing this against last year's model?
```

**Prompt:** 
```
Find the main topics and sentiment from these customer reviews: [Review List URL]
```

**Response:** 
```
**Customer Feedback Analysis**

Based on 450 reviews, here are the top three themes:

1.  **Battery Life (Positive):** Most users report a significant improvement in longevity. **(Sentiment Score: +0.8)**
2.  **Camera Quality (Neutral/Negative):** While sharp, many complained about low-light performance compared to previous models. **(Sentiment Score: -0.3)**
3.  **User Interface (Positive):** The new UI is simple and intuitive.

I pulled this data using the discussion extraction tool.
```

**Prompt:** 
```
Can you summarize the key takeaways from this industry report article? [Article URL]
```

**Response:** 
```
**Key Takeaways: Global AI Adoption (2024)**

*   **Title:** The Future of Enterprise AI
*   **Author:** Dr. Evelyn Reed
*   **Date Published:** June 1, 2024

The report highlights three major shifts:

✅ **Decentralization:** Companies are moving away from single vendor solutions toward hybrid models.
✅ **Edge Computing:** Processing power is shifting closer to the user for real-time data handling.
✅ **Regulatory Focus:** New global standards require greater transparency in AI model training. 

Would you like me to generate a bulleted summary of the executive abstract?
```

## Capabilities

### Classify Page Content
Automatically determines if a webpage is an article, product, list, image gallery, or job posting.

### Extract Article Text
Pulls clean text and HTML from news or blog posts while identifying the author and publication date.

### Capture E-commerce Details
Retrieves structured product information, including SKUs, specific pricing, brand names, and technical specifications.

### Analyze Discussions & Reviews
Gathers content from forum threads or reviews, allowing you to analyze the overall sentiment of user feedback.

### Scrape Search Results or Directories
Identifies structured lists on a page, pulling out arrays of titles and direct links for batch processing.

## Use Cases

### Competitive Pricing Monitoring
A market researcher needs to track how three competitors change their pricing on key products weekly. Instead of visiting and manually logging data, the agent uses Diffbot’s API to gather structured product details from all URLs, giving a clean JSON report of price changes.

### Curating News Aggregators
A content marketer needs to build a daily summary of industry news. The agent runs the `extract_article` tool on top search results to pull only the clean text and author information, eliminating boilerplate site clutter.

### Building Job Market Reports
An HR analyst wants to see salary trends for software engineers in a specific city. The agent uses Diffbot’s job extraction tool across multiple recruitment sites, providing a consolidated list of explicit salary ranges and employer names.

### Analyzing Customer Feedback
A product manager wants to understand why customers are leaving 1-star reviews. The agent uses the `extract_discussion` tool on review pages, allowing them to analyze thousands of comments for common themes and sentiment.

## Benefits

- Get precise e-commerce data, including SKU numbers and brand mappings. The `extract_product` tool makes it possible to scrape critical product details in one go.
- Stop guessing what a page is. Use the general classification tool (`analyze_page`) to instantly determine if you're looking at an article, list, or job posting before running any extraction.
- Analyze public sentiment without reading thousands of comments. The `extract_discussion` tool pulls forum threads and prepares them for automated sentiment scoring.
- Monitor market trends by gathering standardized data. You can use the `extract_job` tool to pull salary vectors and employer names from career sites across different industries.
- Process content efficiently with `extract_article`. This gives you clean, readable text bodies separated from boilerplate site navigation or ads.

## How It Works

The bottom line is: your AI client turns raw URLs into reliable, usable data structures without you needing to write any scraping code.

1. Subscribe to this MCP and enter your Diffbot Developer Token into your AI client.
2. Tell your agent the URL you want data from, along with what specific information you need (e.g., 'What is the price and SKU for this product?').
3. Your agent invokes the appropriate tool, and Diffbot returns a clean JSON object containing only the structured data.

## Frequently Asked Questions

**How does Diffbot MCP for AI Agents help with web scraping when I don't know the HTML structure?**
It doesn't matter if you know the code. The MCP uses advanced classification to understand what content is—whether it's a price, an article title, or a user comment. It gives you structured data automatically.

**Can I use Diffbot MCP for AI Agents to track competitor pricing across multiple product pages?**
Yes. You can feed the agent a list of URLs and ask it to pull standardized fields like SKU, price, and brand mapping from every page into one report.

**Is Diffbot MCP for AI Agents better than just using my AI client's native web browsing feature?**
Yes. Native browsing gives you raw text; this MCP gives you machine-readable, structured JSON data. This means your agent can reliably use the data in subsequent steps without errors.

**What kind of websites can Diffbot MCP for AI Agents handle? Is it limited to news sites?**
It handles almost anything: e-commerce, job boards, academic articles, forum discussions, and even specialized directories. The tool adapts to the page type.

**I want to analyze customer reviews; what specific data can Diffbot MCP for AI Agents extract?**
It pulls out individual comments from discussion threads, allowing your agent to run automated sentiment scoring and group common feedback themes across thousands of entries.