# NVIDIA AI MCP

> NVIDIA AI MCP connects your agent directly to industry-leading, GPU-accelerated foundation models. It lets you chat with large language models like Llama or Mistral, generate code from simple prompts, convert natural questions into SQL queries, and create vector embeddings for advanced search—all without managing complex infrastructure.

## Overview
- **Category:** industry-titans
- **Price:** Free
- **Tags:** llm, gpu-acceleration, embeddings, model-inference, natural-language-processing, code-generation

## Description

This MCP gives your agent direct access to the power of NVIDIA’s API Catalog. You don't have to worry about GPU hardware; you just use what you need. Need your AI client to write Python code? Use the `generate_code` tool. Want to know if a piece of text is positive or negative? Run sentiment analysis right away. You can even feed natural language questions into the system and convert them into functional SQL queries using `text_to_sql`. Beyond basic chat, you can generate vector embeddings for advanced search, condense massive reports with summarization, or translate content across dozens of languages. When you connect this MCP via Vinkius, your agent gets instant access to all these capabilities from a single point, making complex AI tasks simple commands.

## Tools

### ask_question
Asks a question using a powerful reasoning model with optional context for better answers.

### chat_completion
Chats with an NVIDIA AI model (Llama, Mistral, etc.) by specifying the desired model name and conversation history.

### generate_code
Creates code from a natural language prompt when you specify a programming language.

### get_embeddings
Generates vector embeddings for any given text using the specified NVIDIA model.

### list_models
Provides a list of all AI models currently available through the entire NVIDIA API Catalog.

### text_to_sql
Converts natural language questions into executable SQL queries for database interaction.

### analyze_sentiment
Determines the emotional tone (positive, negative, neutral) of a provided piece of text.

### summarize_text
Condenses long documents or articles into short, concise summaries while retaining key information.

### translate_text
Translates text accurately between dozens of supported languages.

## Prompt Examples

**Prompt:** 
```
Generate Python code for a REST API with FastAPI.
```

**Response:** 
```
Generated code: `from fastapi import FastAPI
app = FastAPI()

@app.get('/items')
async def get_items():
    return {'items': []}`
```

**Prompt:** 
```
Translate 'Hello, how are you?' to Japanese.
```

**Response:** 
```
こんにちは、お元気ですか？ (Konnichiwa, ogenki desu ka?)
```

**Prompt:** 
```
Summarize: The quarterly report shows revenue grew 15% YoY...
```

**Response:** 
```
Q3 revenue increased 15% year-over-year, driven by strong demand in AI and cloud services.
```

## Capabilities

### Advanced Reasoning
Ask deep questions and receive answers generated by powerful reasoning models.

### Chat with Large Language Models
Engage in conversations using top-tier foundation models like Llama 3.1 or Mistral.

### Vector Embedding Creation
Turn any block of text into a numerical vector for use in search, clustering, and retrieval systems.

### Code Generation
Write functional code snippets—like Python or JavaScript—by giving the agent a simple description of what you want.

### Natural Language Data Querying
Convert human-readable questions into precise SQL queries that can interact with databases.

## Use Cases

### Analyzing Customer Feedback at Scale
A data scientist receives thousands of customer reviews and needs to know the overall mood. They ask their agent to run `analyze_sentiment` on all the text, grouping results by 'negative' sentiment so they can immediately flag critical issues for the product team.

### Building a Knowledge Retrieval System
A developer needs an internal wiki search engine. They first run `get_embeddings` on all existing documents, then use those vectors to power a semantic search that finds relevant context when responding to user queries.

### Translating and Summarizing Global Content
A marketing analyst receives a long white paper written in German. They first run `translate_text` into English, then feed the result into `summarize_text` so they can create quick, accurate summaries for local press releases.

### Interacting with Internal Databases
A business analyst needs Q3 sales data but doesn't know the underlying schema. They simply ask their agent, 'What were the top selling products in Q3?' and use `text_to_sql` to generate the exact query needed for the BI tool.

## Benefits

- Generate working code on demand. Instead of leaving the chat window to use a separate tool, your agent can call `generate_code` right away, writing full snippets like FastAPI APIs based only on your prompt.
- Go from question to query instantly. Stop drafting SQL queries manually for every data request. Use `text_to_sql` to convert natural language into database code with zero friction.
- Handle massive amounts of text efficiently. Need a quick digest of a 50-page report? Run the `summarize_text` tool and get the core findings without reading through filler paragraphs.
- Power up your search functionality. Instead of keyword matching, you can use `get_embeddings` to create dense vector representations of documents for true semantic retrieval.
- Stay in one place. By connecting this MCP via Vinkius, your agent gets access to everything—from chatting with Llama 3.1 using `chat_completion` to analyzing sentiment—without switching services.

## How It Works

The bottom line is that you connect the API key once and gain access to dozens of GPU-backed models through your AI client's tool library.

1. Subscribe to the NVIDIA AI MCP and enter your personal API key from build.nvidia.com.
2. Select this MCP within your preferred client, like Cursor or Claude.
3. Your agent can now call tools directly—for example, running `chat_completion` to chat with Llama 3.1.

## Frequently Asked Questions

**How does the NVIDIA AI MCP help with embedding vectors?**
The `get_embeddings` tool converts any text into a numerical vector using the specified model. This is crucial for advanced search, allowing your agent to find conceptual matches instead of relying only on exact keywords.

**Can I use chat_completion with different models?**
Yes, you specify which AI model—like Mistral or Llama 3.1—you want to talk to directly within the `chat_completion` tool call, giving you control over performance and style.

**What is text_to_sql used for?**
The `text_to_sql` tool translates human language questions into accurate SQL queries. This lets your agent query databases without needing to know the database schema or write complex syntax.

**Is summarize_text good enough for legal documents?**
It's excellent for condensing long texts, but remember it is a summary tool. For highly sensitive legal review, you should always pair `summarize_text` with detailed context provided through the chat completions.

**Does NVIDIA AI MCP support multiple programming languages?**
The `generate_code` tool allows you to specify various languages. You just need to tell your agent what language you want, and it writes the code in that syntax.