# Exa AI MCP

> Exa AI gives your agent deep web research capabilities. It doesn't just search keywords; it understands the meaning behind what you type, finding highly relevant information across the entire web. You can programmatically extract clean text from articles, find pages similar to a source URL, or crawl specific domains to build comprehensive data sets.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** neural-search, semantic-search, content-intelligence, web-research, embeddings, data-extraction

## Description

Need to do deep research without manually clicking through dozens of search results? This MCP connects your agent directly to Exa AI's neural search engine. Instead of relying on simple keyword matches, it understands the intent and meaning in your query, pulling back the most relevant web pages regardless of what specific words were used. You can use this capability to extract clean text or highlights from any URL, automatically stripping out ads and boilerplate content. If you're building a RAG pipeline, this gives you structured access to semantically rich data. Connect it through Vinkius to give your agent powerful web research skills without needing complex scraping scripts.

## Tools

### advanced_custom_search
Runs complex search queries that combine multiple criteria into a single, precise search request.

### find_similar_pages
Identifies multiple web pages that cover topics or concepts closely related to a provided source URL.

### get_api_status
Checks the current connection status of your Exa AI account.

### extract_page_content
Pulls the main body text from any specified web page URL, discarding ads and navigation elements.

### get_query_highlights
Extracts short, relevant snippets or highlights from a search result set based on the original query.

### keyword_search
Performs basic searches using exact keywords provided by the user.

### perform_live_crawl
Executes a fresh web extraction on a given site or set of links to get the most current content.

### search_with_contents
Searches and simultaneously extracts text, allowing you to find information and pull it out in a single step.

### search_by_category
Restricts searches so results only come from specific site types, like company websites or news blogs.

### search_by_crawl_date
Limits search results to pages that were discovered on or after a specified date range.

### search_specific_domains
Focuses the entire search only on content originating from one or more named domains.

### semantic_search
Uses AI understanding to perform highly contextual searches based on the meaning of your request.

## Prompt Examples

**Prompt:** 
```
Search for the latest research on room-temperature superconductors using Exa.
```

**Response:** 
```
Running semantic search... I've found 5 high-quality research papers and articles from the last 3 months. Notable results include a study from Nature and a detailed technical blog from MIT. Would you like the highlights for any of these?
```

**Prompt:** 
```
Find pages similar to 'https://openai.com/blog/sora/' from news domains.
```

**Response:** 
```
Discovery complete! I've located 3 semantically similar articles on news platforms like TechCrunch and The Verge. They discuss generative video trends and competitor reactions. Shall I extract the clean text from these links?
```

**Prompt:** 
```
Get the clean text content from 'https://en.wikipedia.org/wiki/Artificial_intelligence'.
```

**Response:** 
```
Extraction complete! I've retrieved the clean Markdown content for the AI Wikipedia page. It includes the history, definitions, and main research areas, excluding all sidebars and ads. Would you like a summary?
```

## Capabilities

### Deep Semantic Search
Perform searches that understand the meaning of a query, delivering relevant results even if they don't contain the exact keywords you used.

### Content Extraction and Highlighting
Pull clean text or specific highlights directly from any web page URL, stripping out noise like ads and navigation elements automatically.

### Domain Mapping and Crawling
Discover all reachable links on a site and perform targeted crawls to map the structure of an entire domain.

### Finding Related Information
Locate web pages that are semantically similar to a given source URL, helping you build out research datasets.

### Advanced Query Filtering
Run searches restricted by specific criteria like the site type (e.g., news vs blog), domain name, or discovery date.

## Use Cases

### Comparing Competitor Messaging
A competitor analysis firm needs to know what tech publications are writing about a new product. Instead of running 10 separate searches, they ask their agent to use `find_similar_pages` based on the initial press release URL and filter by 'news' domains using `search_by_category`. This instantly delivers a set of semantically relevant articles.

### Building a Knowledge Base from Old Reports
A developer wants to build a knowledge base on an old industry standard. They use `perform_live_crawl` on the primary documentation site and then run `search_with_contents` using specific dates via `search_by_crawl_date` to ensure they only capture the most relevant, current information.

### Synthesizing a Technical Overview
A technical writer gets an article link and needs the core concepts for a blog post. They feed it into `extract_page_content` to get clean text, then use `get_query_highlights` with their query ('key takeaways') to pull out only the most important sentences.

### Deep Dive on Specific Industry Topics
A financial analyst needs data only from major banking sites. They use `search_specific_domains` and then execute a highly focused query using `advanced_custom_search`, ensuring the results are limited to known, trusted sources.

## Benefits

- You bypass manual keyword searching entirely. With `semantic_search`, your agent understands the *intent* of your question, pulling back results even if they use different terminology.
- Stop copy-pasting article text into notes. Use `extract_page_content` to grab clean Markdown content from any URL in one go, leaving out ads and sidebars every time.
- Building a research database used to mean crawling dozens of sites. Now, you can map an entire domain using `perform_live_crawl` and gather all reachable links programmatically.
- Need to know what's trending? The `find_similar_pages` tool instantly locates content that discusses the same topic as a source URL, perfect for competitor analysis.
- You gain control over your search scope. Use `search_by_category` or `search_specific_domains` to focus results only on verified sources, like major news outlets or specific industry blogs.

## How It Works

The bottom line is you tell your AI what to research, and it handles all the complex web logic needed to retrieve clean, focused data.

1. Subscribe to this MCP and retrieve your API Key from the Exa AI dashboard.
2. Pass that key to your agent via any MCP-compatible client (Claude, Cursor, etc.).
3. Invoke the appropriate tool—like `semantic_search` or `extract_page_content`—with your natural language prompt.

## Frequently Asked Questions

**How does `semantic_search` differ from `keyword_search`?**
It understands meaning, not just words. If you search for 'best way to save money,' `semantic_search` will pull up articles about budgeting and financial planning, even if they don't use the exact phrase 'save money.'

**Can I find similar pages using `find_similar_pages`?**
Yes. You give it one URL, and it returns multiple other web pages that discuss the same topic or concept, which is huge for competitive research.

**Do I need to use `perform_live_crawl` every time I search?**
No. You use `perform_live_crawl` when you want the absolute freshest content from a site. Otherwise, the other tools handle data extraction and searching based on cached or current web knowledge.

**What is the best way to get clean text? Should I use `extract_page_content` or `search_with_contents`?**
`extract_page_content` is for getting the full, cleaned article body from a single URL. Use `search_with_contents` if you want to search across multiple pages AND extract snippets in one query.

**How do I verify my connection status using `get_api_status`?**
It confirms if your API credentials are valid and active. Running this tool verifies the connection without performing a full search, letting you check for rate limit issues or incorrect keys before starting complex data workflows.

**When should I use `get_query_highlights` instead of retrieving full page content?**
Use it when you only need quick snippets of the most relevant text. This tool extracts key passages directly related to your search intent, saving processing time and giving you immediate answers without having to filter through boilerplate or lengthy sections.

**How can I restrict my research scope using `search_specific_domains`?**
This limits all results to a single domain or group of sites. If you're researching competitors, for example, this tool keeps your search focused only on the target company's website, ignoring external noise.

**What is the advantage of running an `advanced_custom_search`?**
It allows you to build highly structured, complex queries that combine multiple constraints. This goes beyond simple keywords or semantic meaning by letting your agent target specific data points across different criteria.

**How do I find my Exa AI API Key?**
Log in to your [**Exa AI dashboard**](https://dashboard.exa.ai/), and copy your unique API Key from the settings section.

**What makes semantic search different?**
Traditional search uses keywords. Semantic search uses neural embeddings to understand the meaning and context of your query, finding better matches.

**Can I extract text from multiple URLs at once?**
Yes! The `extract_page_content` tool accepts a JSON array of URLs and retrieves the parsed content for all of them programmatically.