NVIDIA AI MCP. Run advanced ML tasks from a single API gateway.

Q: How do I get an NVIDIA API Key?

Sign up at build.nvidia.com, go to your account settings, and generate an API key. The Developer Program includes free inference credits.

Q: Can I generate code in specific languages?

Yes! The generatecode tool lets you specify the programming language (Python, JavaScript, TypeScript, Java, etc.) for better results.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

NVIDIA AI connects your agent to GPU-accelerated foundation models. You can chat with Llama or Mistral, write code from plain language prompts, create vector embeddings, analyze text sentiment, or turn natural questions into SQL queries—all through one API catalog.

What your AI agents can do

Analyze sentiment

Checks if a given piece of text has a positive, negative, or neutral emotional tone.

Ask question

Asks an advanced reasoning model (405B parameters) complex questions and requires optional context for better answers.

Chat completion

Allows chatting with various models, including Llama 3.1 or Mistral, using the OpenAI message format.

+ 6 more capabilities included

Query Databases with Natural Language

Run the text_to_sql tool to convert any question (e.g., 'Who hit their quota?') into a precise SQL query for database execution.

Prototype Code from Text Prompts

Use generate_code to write complete, executable code blocks in languages like Python or JavaScript just by describing the functionality you need.

Index and Search Large Datasets

Generate vector embeddings with get_embeddings, allowing your agent to index large bodies of text for semantic search and retrieval-augmented generation (RAG).

Analyze Text Tone and Intent

Pass any piece of written content through the analyze_sentiment tool to determine if the tone is positive, negative, or neutral.

Manage Model Access and Options

Run list_models to see every available AI model on the NVIDIA API Catalog before calling chat_completion.

Translate and Condense Content

Use translate_text for accurate cross-language translation, or run summarize_text to cut down multi-page reports into key bullet points.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

NVIDIA AI: 9 Tools for Model Inference & Reasoning

This server lets your agent run advanced ML tasks like generating code, querying databases, or creating vector embeddings using NVIDIA's full catalog.

analyze019d75e0

analyze sentiment

Checks if a given piece of text has a positive, negative, or neutral emotional tone.

ask019d75e0

ask question

Asks an advanced reasoning model (405B parameters) complex questions and requires optional context for better answers.

chat019d75e0

chat completion

Allows chatting with various models, including Llama 3.1 or Mistral, using the OpenAI message format.

generate019d75e0

generate code

Writes functional code in a specified language based on a simple natural language description of what's needed.

get019d75e0

get embeddings

Converts input text into vector embeddings using the dedicated `nvidia/nv-embed-v1` model for data indexing.

list019d75e0

list models

Retrieves a list of every AI model available and supported on the NVIDIA API Catalog.

summarize019d75e0

summarize text

Takes long documents or articles and condenses them into a concise, readable summary.

text019d75e0

text to sql

Converts natural language questions directly into runnable SQL query strings for databases.

translate019d75e0

translate text

Translates text accurately between dozens of supported languages using neural translation models.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with NVIDIA AI, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Listen up. This NVIDIA AI MCP Server hooks your agent directly into GPU-accelerated foundation models via the entire NVIDIA API Catalog. You don't gotta mess with local GPU setup; it just gives your client straight access to some seriously state-of-the-art LLMs for complex, real-world tasks.

You can talk shop with Llama 3.1 or Mistral models. Use the chat_completion tool to handle general conversations and task execution using the standard OpenAI message format. Before you start a chat session, run list_models to see every single AI model available on that catalog; it'll save you time figuring out what's even there.

Need to write code? No problem. Just describing the function you need—like 'Write me a Python script that reads this CSV and calculates the average profit for Q2'—is enough. The generate_code tool writes complete, executable blocks of code in languages like Python or JavaScript based only on your natural language description.

When it comes to data, you got options. You can index huge amounts of text by running get_embeddings. This converts any piece of writing into vector embeddings using the dedicated nvidia/nv-embed-v1 model. That's how your agent does semantic search and builds those RAG pipelines.

Ever gotta talk to a database? Don't even bother writing SQL manually. The text_to_sql tool takes any natural language question—say, 'Who hit their quota last week?'—and spits out the precise, runnable SQL query string for your database to execute immediately.

And what about massive documents? If you have a ten-page report or an academic article, running summarize_text condenses all that fluff into a tight, readable summary. Or, if the topic crosses borders, use translate_text. This tool translates text accurately between dozens of supported languages using neural models.

Need to know what people are feeling? Pass any piece of writing through analyze_sentiment. It checks whether the tone is positive, negative, or neutral.

Got a complex question that needs thinking? Don't just rely on general chat. Use the dedicated ask_question tool. This utilizes an advanced reasoning model with 405B parameters to tackle highly complex questions; you can even feed it optional context to sharpen the answer.

This whole setup means your agent doesn't just talk; it acts. It talks to databases, writes working code, processes massive data sets for search, and handles language barriers so you don't have to. You’ll see how fast your workflow gets when all these tools are wrapped up in one API catalog.

How NVIDIA AI MCP Works

1 Subscribe to the NVIDIA AI MCP Server and provide your API Key from build.nvidia.com.
2 Your agent calls a specific tool (e.g., generate_code) via the MCP client, providing input like 'Write a FastAPI endpoint for user data'.
3 The server processes the request using GPU acceleration and returns the structured output—the Python code block or the SQL query—directly to your agent.

The bottom line is: You tell your agent what you need, it calls the right NVIDIA tool, and you get clean, actionable data back instantly.

Who Is NVIDIA AI MCP For?

This server is for developers and data teams who can't afford manual context switching. If you spend too much time copying text from one analysis tool into another just to build a final report, this is for you. You need the raw power of specialized ML tools available in one place.

ML Engineer

Needs to prototype new features quickly. They'll use get_embeddings and analyze_sentiment on streaming data, then pipe the results into a final report using chat_completion.

Data Scientist

Spends time querying relational databases. They rely heavily on text_to_sql to test hypotheses without writing boilerplate SQL and use summarize_text on large CSV reports.

Backend Developer

Needs code generation for repetitive tasks or API wrappers. They'll call generate_code for a FastAPI endpoint, then potentially use the resulting code in their IDE via your agent client.

What Changes When You Connect

Complex Reasoning: Stop trying to prompt the model into doing everything. Use ask_question with its 405B parameter reasoning model for deep, multi-step problem solving that goes beyond simple chat completions.
Data Preparation Speed: Don't manually chunk and vectorize data. Call get_embeddings once to create high-quality vectors across your entire dataset, ready for instant semantic search or clustering.
Structured Output: Need database access? Skip the manual SQL writing. Just ask a question about your schema—the text_to_sql tool gives you clean, executable SQL instantly.
Code Reliability: When prototyping, don't waste hours on boilerplate code. Use generate_code to get functional Python or JS snippets directly from a simple prompt and drop them into your project.
Full Model Visibility: Don't guess what models are available. Run list_models first to see the entire catalog before committing to a specific model via chat_completion.
Universal Language Support: Whether you need to translate an email from Japanese (translate_text) or summarize a French legal document (summarize_text), this server handles diverse global data types.

Real-World Use Cases

Analyzing Customer Feedback for Product Improvement

A PM wants to know if the new feature is working. Instead of reading hundreds of tickets, they send a batch of comments to their agent. The agent runs analyze_sentiment on each comment, then sends the negative results and context to ask_question to generate three specific action items for the product team.

Building an Internal Knowledge Bot

A data scientist needs a chatbot trained on company documents. They first use get_embeddings to vectorize all internal manuals, then pass the user's query to their agent which runs text_to_sql against the metadata and uses the embeddings for RAG retrieval.

Rapid Prototyping of New APIs

A developer needs a backend endpoint quickly. They call generate_code asking for a 'Python FastAPI route to handle user signups.' The code returns instantly, allowing them to copy and paste the functional skeleton into their project.

Market Research & Comparison

A business analyst receives a lengthy competitor report. They use summarize_text to cut it down first. Next, they run translate_text on key paragraphs to compare global market sentiment, and finally use analyze_sentiment on the translated text for quick insights.

The Tradeoffs

Treating the LLM like a Search Engine

Asking chat_completion: 'What were Q3 revenue numbers?' and getting a general, unverified answer. The model just guesses.

→ If you need specific data points from a database, don't ask it in chat. Call the text_to_sql tool instead; it will generate the precise query needed to pull that number.

Bypassing Embedding Generation

Trying to feed raw text into a retrieval system and hoping it works. The search results are always vague or irrelevant.

→ You must first run the input chunk through get_embeddings. This process converts the text into a searchable vector, guaranteeing the right semantic match when you query.

Over-relying on Chat for Code

Asking chat_completion: 'Write me a class.' and receiving incomplete code that requires manual debugging of syntax errors.

→ Use the dedicated generate_code tool. It's specifically designed to take natural language prompts and output runnable, structured code blocks.

When It Fits, When It Doesn't

You use this server if your workflow involves multiple, distinct data transformations: text -> vector -> SQL query; or general chat -> specialized function call (code/translation). It's essential when you need to combine a model's intelligence with structured output. Don't use it if you only need simple chat responses—then chat_completion alone might suffice, but this server gives you the full toolset for reliability. If your primary task is only generating embeddings, calling get_embeddings directly works; but having the whole catalog ensures flexibility when that requirement changes tomorrow.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by NVIDIA. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 9 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

analyze_sentiment ask_question chat_completion generate_code get_embeddings list_models summarize_text text_to_sql translate_text

Analyzing content tone shouldn't require three different APIs and a spreadsheet.

Today, if you get customer feedback through multiple channels—emails, support tickets, chat logs—you have to copy each piece of text into a separate sentiment analysis platform. You then export the results, paste them into Excel, and manually average the scores just to know if things are getting better or worse.

With this MCP server, you simply pipe all your incoming feedback directly to the `analyze_sentiment` tool. It returns clean, structured data—a list of (text, sentiment score)—meaning you get real insights without leaving your agent environment.

NVIDIA AI MCP Server: Turn a question into an executable query.

The manual process for querying a database means writing boilerplate SQL every single time. You have to remember syntax, worry about table names, and manually adjust the `WHERE` clauses just because your business question changed slightly—like asking 'Which users in California' instead of 'which users in Texas'.

Now, you just ask your agent: 'Show me all premium accounts from California who signed up last month.' The `text_to_sql` tool translates that entire request into a perfect SQL query. It’s done. No manual coding required.

Common Questions About NVIDIA AI MCP

Which AI models are available? +

The NVIDIA API Catalog offers Llama 3.1 (8B, 70B, 405B), Mistral, CodeLlama, Gemma, Nemotron, and many more. Use the list_models tool to see all available models.

How do I get an NVIDIA API Key? +

Sign up at build.nvidia.com, go to your account settings, and generate an API key. The Developer Program includes free inference credits.

Can I generate code in specific languages? +

Yes! The generate_code tool lets you specify the programming language (Python, JavaScript, TypeScript, Java, etc.) for better results.

Are there usage limits on the free tier? +

Yes, the NVIDIA Developer Program provides free inference credits. Once exhausted, you can upgrade to a paid plan for higher throughput. Check your usage dashboard at build.nvidia.com.

When using `get_embeddings`, what data structure does the input text need to follow? +

The input must be plain, readable strings. You don't need to worry about complex formatting; simply pass the text you want embedded. This keeps the process efficient and ensures the vector output is accurate for search or clustering.

If I use `text_to_sql` and get an incorrect query, what information do I need to provide? +

You must supply the database schema. The model needs column names, data types, and relationship details for the relevant tables. Providing this context guarantees the generated SQL is syntactically correct and functional.

How does the system ensure high performance when calling `chat_completion`? +

The server leverages dedicated GPU acceleration from NVIDIA hardware. This architecture handles large model inference jobs quickly, allowing you to manage complex chats with powerful models like Llama or Mistral without significant latency.

If the initial answer from `ask_question` is too general, how can I refine the prompt? +

You must narrow your focus and provide constraints. Include specific examples of desired output formats or hard limitations in your query. The 405B-parameter model performs best when given tightly defined parameters.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript