Diffbot MCP for AI. Turn messy web pages into clean, structured data.
Works with every AI agent you already use
…and any MCP-compatible client








How this MCP server connects to your AI agent
Diffbot takes unstructured web content and turns it into usable data. Connect your AI client to extract metadata from articles, product pages, or forum threads with simple instructions.
It also lets you query a massive knowledge graph to find specific company details and people profiles using natural language.
What AI agents can do with Diffbot Automation
Analyze page type
Automatically detects if a given web page is an article, product listing, or something else entirely.
Enhance company profile
Adds professional details and background information to a company using its name or domain.
Enhance person profile
Enriches personal profiles by adding professional history, social links, and contact data.
Pull specific information—like article content or product details—from any given web address.
Add professional details, funding history, and company background to existing names or domains.
Search a massive database of billions of entities to find specific market signals or industry data points.
Automatically detect if a URL is an article, product page, or forum thread before extracting the data.
Ask an AI about this
Waiting for input…
What AI agents can do with Diffbot: 12 Tools for Data Extraction
These tools let you programmatically analyze websites, extract specific content types, and query vast databases using natural language commands.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Diffbot on VinkiusAnalyze Page Type
Automatically detects if a given web page is an article, product listing, or something else entirely.
Enhance Company Profile
Adds professional details and background information to a company using its name or...
Enhance Person Profile
Enriches personal profiles by adding professional history, social links, and contact...
Extract Article Data
Pulls the clean text, author, and metadata from news or blog posts.
Extract Forum Thread
Gathers key comments and content from online discussion threads or message boards.
Extract Images
Identifies the main visual assets and primary images on a webpage.
Extract Product Data
Extracts specific details like SKU, price, and descriptions from e-commerce product pages.
Extract Video Metadata
Identifies embedded videos on a page and retrieves their associated metadata.
Search Knowledge Graph
Queries the massive world knowledge graph to find specific organizational or...
Verify Api Credentials
Verify your Diffbot API credentials
List Active Crawls
Provides a list and operational status check for any running data crawling jobs.
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Diffbot, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,100+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Diffbot. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 11 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
The Headache of Manual Data Collection, Solved with Vinkius AI Gateway
Think about spending a Tuesday afternoon gathering competitive data. You open Google, find five competitor pages, and start clicking through. On one site you copy the product price; on another you have to manually navigate to the 'About Us' section for the founding date, and then remember to scroll down to grab employee count details. It's slow, it requires jumping between tabs, and every single manual step is a chance to miss something or misformat data.
With this MCP connected to your agent, you stop clicking through pages and start asking questions. You give the URL and say, 'Give me the product price AND the founding year.' Your AI client runs the necessary extraction tools in the background, delivering one single block of clean, structured JSON that's ready for your database.
Structured Data Extraction with Diffbot
Manual scraping involves opening a link, finding the right element (the price, the date), and then painstakingly copying it into a spreadsheet. If you're dealing with forums, you have to read through dozens of comments just to pull out the main points.
Now, using `extract_article_data` or `extract_forum_thread`, your agent handles all that messy work. You receive clean data structures automatically. The process isn't about clicking; it’s about conversing.
What your AI can actually do with this
You don't have to build complex scrapers or spend hours copy-pasting data into spreadsheets anymore. This MCP acts as your dedicated research analyst, letting your AI agent pull structured information directly from any web page URL. You can ask it to find the main text of a news article, gather all product specs from an e-commerce listing, or summarize key points from a discussion board.
If you need more than just raw text, you can use its massive database to look up company details or employee information using simple queries. It’s like having a data engineer ready for conversation. Vinkius hosts this connector so your AI client gets access to these powerful data tools alongside everything else you're building.
It's about going from 'here' (a messy webpage) to 'there' (clean, structured data points) with just a prompt.
019dd0df-b877-7199-8f38-cfbef0a5f126 Here's how it actually works
The bottom line is, you use your AI client to talk to this MCP, and it handles all the data extraction and structuring behind the scenes.
Subscribe to this MCP and retrieve your API token from the Diffbot dashboard.
Your AI client connects using that token. You then instruct your agent with a URL or query, telling it exactly what kind of data you need extracted.
The tool processes the request, returning clean, structured results—whether that's a list of company bios or a specific product SKU.
Who is this actually for?
Anyone who spends time manually collecting information from the web—market researchers, growth strategists, or specialized data engineers. If you're tired of copy-pasting 50 links into a spreadsheet and hoping nothing gets missed, this MCP is for you.
Uses the Knowledge Graph to search for industry signals or company data without leaving their main workspace.
Instantly extracts structured metadata from thousands of URLs using natural language commands instead of writing complex scraping code.
Automates lead enrichment and competitive analysis by enhancing company profiles through simple AI queries.
What Changes When You Connect
Stop manual copy-pasting. With extract_article_data, you simply point your agent at a URL and get the core text and author details instantly, without cleanup.
Go beyond basic scraping. Use search_knowledge_graph to query specific industry signals or firmographics from billions of world entities in one step.
Boost your lead database quality. Running enhance_company_profile on a domain gives you structured data like employee count and funding metadata, not just a name.
Handle diverse content types. The tool first runs analyze_page_type so it knows whether to use the right model for an e-commerce listing (extract_product_data) or a discussion forum (extract_forum_thread).
Maintain operational visibility. You can monitor everything by using list_active_crawls and checking your API status with get_api_status.
Contextualize your data gathering. Need to know if the page is even an article? Running analyze_page_type confirms the source type before you waste time trying to extract content.
See it in action
Competitive Intelligence Gathering
A growth marketer needs to compare product features across five rival websites. Instead of visiting each site and manually gathering specs, they instruct their agent to use extract_product_data on all five URLs in a batch, getting clean data points for direct comparison.
Deep Market Sizing
A market researcher needs to find all companies in the 'AI' sector located in London with over 50 employees. They use search_knowledge_graph and define the parameters, receiving a filtered list of detailed firmographics instantly.
Content Aggregation
A content curator wants to build an internal summary of industry news. They feed their agent 10 links and ask it to use extract_article_data on each, getting ten clean summaries with authors attached for immediate review.
Contact List Cleanup
A sales team member has a list of old client names. They feed the agent the name and domain, asking it to enhance_person_profile to verify current job titles, social links, and company affiliation.
The honest tradeoffs
Treating all web content as articles
Asking the agent to run article extraction on an e-commerce page will pull out random text blocks that aren't actual product descriptions. You lose data integrity.
First, use analyze_page_type to verify it’s a product page. Then, call extract_product_data. This ensures the right extraction model runs on the correct type of content.
Searching without filters
Running a general query against the knowledge graph yields millions of irrelevant results, and you have no way to filter by industry or geography.
Always specify parameters when using search_knowledge_graph. Use DQL syntax to target specific industries (e.g., 'AI') and geographic areas.
Forgetting the context of the source
If you just copy text from a forum, it's hard to tell who said what or if the comment is relevant years later.
Use extract_forum_thread for structured context. It captures comments and helps isolate key conversations from the noise.
When It Fits, When It Doesn't
Use this MCP if your problem involves extracting structured data (like product IDs, author names, or specific company metrics) from unstructured web sources like articles, e-commerce sites, or forums. It's ideal when you need to build a database of knowledge.
Don't use this if all you need is pure text processing on data already in your system (e.g., summarizing internal documents). For those tasks, a generic LLM prompt will suffice. Also, don't expect it to 'fix' bad data; it only extracts what exists. If you just need basic web scraping without advanced structure detection or knowledge graph lookup, other simple HTTP request tools might work, but this MCP gives you the intelligence layer on top of that.
Questions you might have
How do I use Diffbot MCP to extract product data? +
You tell your agent to use extract_product_data on the URL. It will pull out structured details like SKUs, prices, and specifications found on e-commerce listing pages.
Can I find company information using search_knowledge_graph? +
Yes, you can use search_knowledge_graph to query its massive database. You just need to define the parameters like industry or location, and it returns structured results.
What is the difference between extract_article_data and extract_forum_thread? +
They are for different types of content. Use extract_article_data for clean blog posts or news articles, and use extract_forum_thread when you need to pull key points from a discussion board.
Does Diffbot MCP handle data enrichment? +
Yes. You can run enhance_company_profile or enhance_person_profile with just a name and domain, and the tool adds professional background details to that entity.
How do I know what kind of page I'm looking at? +
Before extracting anything, you can run analyze_page_type. This tells your agent if the URL is an article, a product listing, or something else so it uses the right extraction method.
How do I find my Diffbot API Token? +
Log in to your Diffbot account and navigate to the Dashboard or Manage Tokens section to copy your unique access token.
What is DQL and how can I use it? +
DQL (Diffbot Query Language) allows you to filter the Knowledge Graph. Use the search_knowledge_graph tool with queries like type:Organization industries:"AI".
Can I extract comments from articles? +
Yes! The extract_article_data tool has an optional discussion parameter. Set it to true to retrieve structured comment threads if available.
We've already built the connector for Diffbot. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 11 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.