# Jina AI MCP MCP

> Jina AI MCP connects your agent to real-time web intelligence. It lets you search the live internet for highly specific data, read entire webpages and clean out only the useful text, check statements for factual accuracy, and map complex documents to find hidden connections. Think of it as giving your agent a perfect research assistant who never forgets its sources.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** multimodal-search, web-crawling, content-extraction, data-parsing, ai-search, information-retrieval

## Description

This MCP turns complex web research into natural conversation. You connect Jina AI to your preferred Vinkius client, and your agent gains the ability to do more than just search—it audits information. Your agent can query the live web for optimized snippets and then clean up entire URLs so it's ready for an LLM. It won't just find links; it will figure out what those pages *say*. Need to know if a claim is true? Use the fact-checking tool. Want to narrow down 50 articles to the top three most relevant? The agent can rerank them using semantic scoring. This capability makes your AI client act like a real data architect, ensuring every answer it gives you comes from precise, verifiable sources.

## Tools

### check_fact
It determines if a given statement is factually accurate by searching for external sources.

### get_embeddings
This tool converts lists of text into numerical vectors, which measures the similarity between different pieces of text.

### read_url
It fetches a specific web address and returns clean, structured content that LLMs can easily process.

### rerank_documents
You supply multiple documents or snippets and it reorders them to show the most relevant information first for your query.

### search_web
It performs a web search using Jina Search, specifically optimizing the results structure for AI agents.

### tokenize_text
This splits large blocks of text into smaller units (tokens) that are required for efficient LLM processing.

## Prompt Examples

**Prompt:** 
```
Search the web for 'best open source LLMs 2024' using Jina AI.
```

**Response:** 
```
I've retrieved the AI-optimized search results. Top matches include Llama 3, Mistral, and Mixtral. Would you like the snippets and links for the top 5 results?
```

**Prompt:** 
```
Read the content of https://jina.ai/news and give me a summary.
```

**Response:** 
```
I've read the URL! The main content discusses Jina's latest updates on Search and Reader APIs. Notable news include new model releases. Would you like a more detailed breakdown?
```

**Prompt:** 
```
Check the fact: 'The moon is made of green cheese'.
```

**Response:** 
```
I've performed a grounded search. The statement is false; according to scientific consensus, the moon is composed of rock and metal. I can provide the sources if you'd like.
```

## Capabilities

### Search live web content
Your agent queries the current internet for snippets optimized specifically for LLM consumption.

### Extract and clean webpages
The agent reads any given URL and returns only structured, ready-to-use text, stripping out navigation clutter.

### Verify claims against sources
It checks a statement's factual accuracy by grounding the search in real data and providing evidence.

### Identify most relevant documents
The agent takes a bunch of retrieved snippets or files and sorts them to put the best information first based on your query.

### Map semantic meaning
It generates numerical representations (embeddings) for text, allowing the agent to find content that means the same thing but uses different words.

## Use Cases

### Validating a competitive claim
A marketing analyst needs to prove if a competitor's recent product claims are accurate. They run `search_web` for the announcement, collect several articles, and then use `check_fact` on specific bullet points from those articles to build an undeniable report.

### Building a legal research bot
A paralegal builds a bot that needs to read multiple case law websites. They use `read_url` on each citation, clean the content, and then pass all of it through `rerank_documents` so their agent can instantly see the most relevant sections without manually reading everything.

### Academic literature review
A student is writing a paper requiring data from three different academic journals. They use `get_embeddings` on key concepts to find related papers they missed, then pass those results through `tokenize_text` before feeding them into their agent for synthesis.

## Benefits

- Fact-checking is built right in. You don't just get an answer; you get a verified claim using `check_fact` to ensure everything the agent says is grounded in reality.
- Stop scraping messy HTML. Use `read_url` to pull only clean, structured text from any web address, making it immediately usable for your LLM pipeline.
- Need to sift through dozens of search results? Pass them to `rerank_documents`. This tool automatically sorts the noise and puts the absolute best matches right at the top.
- Semantic search becomes precise. Instead of relying on keywords, you can use `get_embeddings` to find content that shares meaning, even if the words are totally different.
- The agent handles initial research by calling `search_web`, pulling down AI-optimized results so your process starts with high quality from minute one.

## How It Works

The bottom line is that your agent gets highly filtered, fact-checked data streams, not raw search results.

1. First, your agent calls `search_web` to query the live internet and get initial results.
2. Next, it can pass those retrieved documents or a specific URL through `get_embeddings` or `read_url` for deep processing and cleaning.
3. Finally, it uses `rerank_documents` on the processed content to identify the most relevant passages before answering.

## Frequently Asked Questions

**How does the `search_web` tool work with my agent?**
`search_web` performs an AI-optimized web search, giving you curated results instead of raw links. This means your agent gets snippets specifically formatted to be useful for LLM processing.

**Can I use `read_url` on a private site?**
No, it reads public URLs. The tool's function is to fetch and clean content from publicly accessible web addresses so your agent can process the text reliably.

**`rerank_documents` helps me narrow down search results, right?**
Exactly. If you gather a bunch of documents or snippets, `rerank_documents` sorts them by relevance to your query. It puts the best material at the top so you don't have to sift through noise.

**What is the difference between `search_web` and `read_url`?**
`search_web` finds many potential sources across the web. `read_url`, on the other hand, takes one specific source and extracts all of its clean content.

**When I use `get_embeddings`, does the tool handle large lists of strings effectively?**
Yes, it processes multiple inputs in optimized batches. This is crucial for performance when you need to calculate semantic similarity across hundreds of documents.

**What makes the `check_fact` output trustworthy? Does it just guess?**
No, it grounds its responses using verifiable search sources. You get more than a simple true/false; you receive details and links supporting the claim's accuracy.

**If I run `read_url` on an article with messy HTML, will the content still be usable?**
It strips out the junk. The tool is designed to deliver clean, LLM-ready text, leaving behind only the actual readable content from the URL.

**For very long documents, how should I use `tokenize_text`? Is there a limit?**
It's built to manage large inputs by breaking them into manageable chunks. This process helps you control context size and avoid hitting token limits in your agent.