NVIDIA API Catalog MCP. Connect your AI client to enterprise-grade compute power.

Q: How do I check if a model exists before calling nvidiachatcompletion?

You should run nvidialistfoundationmodels first. This tool dumps an array of all accessible LLM paths, letting you confirm the exact model name your agent needs to use.

Q: Does this MCP handle API quota issues?

Yes. You can proactively run nvidiachecktokenquota at the beginning of any workflow. This tells your agent exactly how many credits are left, stopping runs before they fail due to overage.

Q: Can I process images with this MCP?

Yes. Use the nvidiavisioninference tool. It specifically handles multimodal tasks, allowing your agent to run advanced analysis on visual data.

NVIDIA API Catalog MCP connects your AI client directly to a massive array of foundational models running on NVIDIA compute hardware. It lets you discover available LLMs, route complex chat queries, generate embeddings from raw text, and process visual data—all without managing individual vendor APIs.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Give Claude and any AI agent real-world access

Discover available models

List all explicitly hosted LLM and foundation model configurations that are currently accessible.

Route conversational chat queries

Send unstructured text to an active LLM for immediate, contextual answers.

Generate numerical vector embeddings

Convert raw blocks of text into dense arrays that measure semantic meaning, perfect for database searches.

Process visual data and images

Run specialized tasks on image inputs to extract descriptions or run advanced vision analysis.

Check usage credits and limits

Poll the system to confirm current API quota status before running expensive inference jobs.

Ask an AI about this

Waiting for input…

AI Agent

What AI agents can do with NVIDIA API Catalog: 8 Available Tools

These tools give your agent direct access to core capabilities like running LLMs, extracting data from images, checking quotas, and listing available models.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using NVIDIA API Catalog MCP

Nvidia Chat Completion

Sends natural language questions to a hosted LLM and receives direct, generated answers.

Nvidia Check Token Quota

Queries the system to check your current API usage limits and remaining credits for...

Nvidia Generate Embeddings

Takes raw text inputs and converts them into numerical vectors used for semantic...

Nvidia Get Cloud Status

Pings the core NVIDIA compute endpoints to check system latency and operational...

Nvidia List Foundation Models

Retrieves a list of all major LLMs and foundation models that are currently...

Nvidia List Lora Adapters

Checks for fine-tuned model overrides, allowing you to use specialized versions without retraining the whole base model.

Nvidia Summarize Content

Compresses large blocks of text into a shorter summary while retaining key information.

Nvidia Vision Inference

Processes image inputs to perform advanced visual analysis and extract data from...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

NVIDIA API Catalog MCP is compatible with Claude

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The NVIDIA API Catalog integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "nvidia-api-catalog": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the NVIDIA API Catalog tools with full Vinkius guardrails applied.

NVIDIA API Catalog MCP is compatible with VS Code

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"nvidia-api-catalog": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on each call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with NVIDIA API Catalog, then connect any of our 5,200+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,200+ others, all in one place
Add new capabilities to your AI anytime you want
Connections are secured and governed automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog weekly

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by NVIDIA API Catalog. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS CLOUD

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on each call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Managing model access feels like juggling credentials.

Today, to build a single agent capable of everything—from summarizing reports to analyzing pictures—you're probably managing five or six different API keys. Every time you add a new feature, you have to check the documentation for yet another service, write custom error handling for quota issues, and map out completely separate authentication flows.

This MCP changes that. You connect once, and your agent gets access to everything. Instead of managing credentials across five different endpoints, you simply call tools like `nvidia_chat_completion` or `nvidia_vision_inference`. The system handles the routing, the keys, and the complexity for you.

The NVIDIA API Catalog MCP delivers structured data insights.

Manual processes often leave you with raw text output that's hard to act on. You get a summary, but you can't easily search *within* the key points; or you process an image and get back a giant JSON dump that requires manual parsing.

With this MCP, if you run `nvidia_generate_embeddings`, the result is immediately useful. If you use `nvidia_summarize_content`, the output is clean and ready for the next step in your workflow. The data flows naturally from one intelligent operation to the next.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

model-discovery

llm-proxy

inference-engine

api-catalog

model-routing

foundation-models

What NVIDIA API Catalog MCP does for your AI

Building advanced agent workflows means connecting to dozens of specialized services. This MCP cuts through that complexity. Instead of dealing with separate credentials for every model or endpoint, your AI client talks to this central catalog. It figures out the right foundational model for the job, whether you need simple text compression or complex image analysis.

For instance, if you're building a knowledge retrieval system, your agent can first use tools like nvidia_list_foundation_models to see what's available. Then, it passes raw text through to nvidia_generate_embeddings to create vector representations. Finally, when a user asks a question, the chat completion tool handles the full conversational exchange. This centralized approach means your logic stays clean and portable.

By connecting this MCP via Vinkius, you give your agent access to best-in-class GPU compute power for everything from text summarization to multimodal vision tasks.

Built · Hosted · Managed by Vinkius NVIDIA API Catalog - Model Inference Tools

Server ID 019d75e1-35ae-70cf-91e7-31316ddc2c23

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

Who uses NVIDIA API Catalog MCP

This connector is built for machine learning engineers, generative developers, and AI architects who are constantly integrating diverse models into complex systems. If you're tired of managing dozens of individual API keys just to run basic text analysis or image tagging, this MCP is what you need.

ML Engineer

Uses the catalog to compare different foundational models and select the best one for a specific inference task, optimizing performance.

Generative Developer

Builds complex workflows that chain together multiple model types—like summarizing text first, then generating embeddings, and finally using those vectors to answer questions.

AI Architect

Maps out the entire system architecture, ensuring that resource usage is tracked (nvidia_check_token_quota) across all connected model types before deployment.

Benefits of connecting NVIDIA API Catalog MCP

Stop worrying about model discovery. Use nvidia_list_foundation_models to see every available LLM path in one place, making it easy for your agent to choose the right tool for the job.

Handle complex resource management with nvidia_check_token_quota. Your workflow checks its own credit limits before running a massive inference task, preventing costly failures mid-process.

Need text turned into searchable data? Pass content through nvidia_generate_embeddings to create reliable vector arrays that power your RAG system or semantic search engine.

Vision tasks are now simple. Use nvidia_vision_inference to feed an image and get structured, actionable data back—no manual image processing needed.

Keep your code clean by letting the MCP handle routing. Instead of writing separate logic for summarization vs. chat, just call nvidia_summarize_content, and the backend takes care of the rest.

NVIDIA API Catalog MCP use cases

01 01

Building a document analysis pipeline

A user uploads a 50-page report. The agent first uses nvidia_list_foundation_models to confirm capability, then passes the text to nvidia_summarize_content. Finally, it sends the summary and key sections through nvidia_generate_embeddings, allowing the end-user to search specific concepts within the document later.

02 02

Creating a product QA bot

The user provides an image of a complex appliance. The agent uses nvidia_vision_inference to extract model numbers and component names. It then passes those extracted details to nvidia_chat_completion to generate a tailored troubleshooting guide.

03 03

Automating knowledge base updates

A team uploads 100 new internal articles. The agent iterates through them, using nvidia_generate_embeddings on each one and storing the resulting vectors in a database. This keeps the entire knowledge base fresh for future queries.

04 04

Testing multi-step agent logic

Before deployment, an engineer runs a test suite that calls nvidia_get_cloud_status to verify latency. They then run a simulated chat session using nvidia_chat_completion, ensuring the entire sequence is stable and fast.

NVIDIA API Catalog MCP tradeoffs

What to watch out for, and the recommended way to handle each one.

Using specific vendor APIs directly

Avoid

Writing dozens of functions, each requiring unique API key management and different data structures for every single model or service you want to connect.

Instead

Centralize your connections. Use this MCP as a unified proxy. Your agent calls one standardized function (like nvidia_chat_completion), and the catalog handles the complex routing and authentication underneath.

Ignoring resource constraints

Avoid

Running an intensive, multi-step workflow that fails silently or suddenly cuts off because the API key exceeded its daily token limit.

Instead

Always check quotas first. Call nvidia_check_token_quota at the start of your job flow. This prevents failed runs and saves you time debugging usage limits.

Handling image data manually

Avoid

Writing custom code to preprocess images, resizing them, normalizing pixels, and then calling a separate vision API endpoint with complex payloads.

Instead

Let the tool handle it. Use nvidia_vision_inference. It accepts the raw input and outputs structured results directly, skipping all the manual data preparation steps.

Frequently asked questions about NVIDIA API Catalog MCP

How do I check if a model exists before calling nvidia_chat_completion? +

You should run nvidia_list_foundation_models first. This tool dumps an array of all accessible LLM paths, letting you confirm the exact model name your agent needs to use.

Does this MCP handle API quota issues? +

Yes. You can proactively run nvidia_check_token_quota at the beginning of any workflow. This tells your agent exactly how many credits are left, stopping runs before they fail due to overage.

What is the difference between nvidia_generate_embeddings and chat completion? +

Chat completion generates conversational text responses. Generating embeddings converts unstructured text into dense numerical arrays, which you use for semantic search or clustering, not conversation.

Can I process images with this MCP? +

Yes. Use the nvidia_vision_inference tool. It specifically handles multimodal tasks, allowing your agent to run advanced analysis on visual data.

Give Claude and any AI agent real-world access

What AI agents can do with NVIDIA API Catalog: 8 Available Tools

Nvidia Chat Completion

Sends natural language questions to a hosted LLM and receives direct, generated answers.

Nvidia Check Token Quota

Queries the system to check your current API usage limits and remaining credits for...

Nvidia Generate Embeddings

Takes raw text inputs and converts them into numerical vectors used for semantic...

Nvidia Get Cloud Status

Pings the core NVIDIA compute endpoints to check system latency and operational...

Nvidia List Foundation Models

Retrieves a list of all major LLMs and foundation models that are currently...

Nvidia List Lora Adapters

Checks for fine-tuned model overrides, allowing you to use specialized versions without retraining the whole base model.

Nvidia Summarize Content

Compresses large blocks of text into a shorter summary while retaining key information.

Nvidia Vision Inference

Processes image inputs to perform advanced visual analysis and extract data from...

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Managing model access feels like juggling credentials.

The NVIDIA API Catalog MCP delivers structured data insights.

model-discovery

llm-proxy

inference-engine

api-catalog

model-routing

foundation-models

What NVIDIA API Catalog MCP does for your AI

How to set up NVIDIA API Catalog MCP

Who uses NVIDIA API Catalog MCP

Benefits of connecting NVIDIA API Catalog MCP

NVIDIA API Catalog MCP use cases

Building a document analysis pipeline

Creating a product QA bot

Automating knowledge base updates

Testing multi-step agent logic

NVIDIA API Catalog MCP tradeoffs

Using specific vendor APIs directly

Ignoring resource constraints

Handling image data manually

When to use NVIDIA API Catalog MCP

Frequently asked questions about NVIDIA API Catalog MCP