Vinkius

NVIDIA API Catalog MCP. Connect your AI client to enterprise-grade compute power.

NVIDIA API Catalog MCP connects your AI client directly to a massive array of foundational models running on NVIDIA compute hardware. It lets you discover available LLMs, route complex chat queries, generate embeddings from raw text, and process visual data—all without managing individual vendor APIs.

NVIDIA API Catalog MCP is compatible with Claude Claude
NVIDIA API Catalog MCP is compatible with ChatGPT ChatGPT
NVIDIA API Catalog MCP is compatible with Cursor Cursor
NVIDIA API Catalog MCP is compatible with Gemini Gemini
NVIDIA API Catalog MCP is compatible with Windsurf Windsurf
NVIDIA API Catalog MCP is compatible with VS Code VS Code
NVIDIA API Catalog MCP is compatible with JetBrains JetBrains
NVIDIA API Catalog MCP is compatible with Vercel Vercel
See Vinkius in Action

Give Claude and any AI agent real-world access

Discover available models

List all explicitly hosted LLM and foundation model configurations that are currently accessible.

Route conversational chat queries

Send unstructured text to an active LLM for immediate, contextual answers.

Generate numerical vector embeddings

Convert raw blocks of text into dense arrays that measure semantic meaning, perfect for database searches.

Process visual data and images

Run specialized tasks on image inputs to extract descriptions or run advanced vision analysis.

Check usage credits and limits

Poll the system to confirm current API quota status before running expensive inference jobs.

Waiting for input…

AI Agent
NVIDIA API Catalog

What AI agents can do with NVIDIA API Catalog: 8 Available Tools

These tools give your agent direct access to core capabilities like running LLMs, extracting data from images, checking quotas, and listing available models.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using NVIDIA API Catalog MCP

Nvidia Chat Completion

Sends natural language questions to a hosted LLM and receives direct, generated answers.

Nvidia Check Token Quota

Queries the system to check your current API usage limits and remaining credits for...

Nvidia Generate Embeddings

Takes raw text inputs and converts them into numerical vectors used for semantic...

Nvidia Get Cloud Status

Pings the core NVIDIA compute endpoints to check system latency and operational...

Nvidia List Foundation Models

Retrieves a list of all major LLMs and foundation models that are currently...

Nvidia List Lora Adapters

Checks for fine-tuned model overrides, allowing you to use specialized versions without retraining the whole base model.

Nvidia Summarize Content

Compresses large blocks of text into a shorter summary while retaining key information.

Nvidia Vision Inference

Processes image inputs to perform advanced visual analysis and extract data from...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

NVIDIA API Catalog MCP is compatible with Claude

Claude AI

1

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

2

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

3

Start a conversation

Open a new chat. The NVIDIA API Catalog integration is available immediately — no restart needed.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

  • Import from OpenAPI, Swagger, or YAML specs
  • Create Agent Skills with progressive disclosure
  • Deploy to edge with MCPFusion framework
  • Built in DLP, auth, and compliance on each call
  • Real time usage dashboard and cost metering
  • Publish to catalog or keep private
Start building

Make Your AI Do More

Start with NVIDIA API Catalog, then connect any of our 5,200+ other servers whenever your AI needs more. One click, no limits.

  • Use this MCP plus 5,200+ others, all in one place
  • Add new capabilities to your AI anytime you want
  • Connections are secured and governed automatically
  • Track usage and costs across all your servers
  • Works with Claude, ChatGPT, Cursor, and more
  • New servers added to the catalog weekly
NVIDIA API Catalog MCP server cover

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by NVIDIA API Catalog. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS CLOUD

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on each call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Managing model access feels like juggling credentials.

Today, to build a single agent capable of everything—from summarizing reports to analyzing pictures—you're probably managing five or six different API keys. Every time you add a new feature, you have to check the documentation for yet another service, write custom error handling for quota issues, and map out completely separate authentication flows.

This MCP changes that. You connect once, and your agent gets access to everything. Instead of managing credentials across five different endpoints, you simply call tools like `nvidia_chat_completion` or `nvidia_vision_inference`. The system handles the routing, the keys, and the complexity for you.

The NVIDIA API Catalog MCP delivers structured data insights.

Manual processes often leave you with raw text output that's hard to act on. You get a summary, but you can't easily search *within* the key points; or you process an image and get back a giant JSON dump that requires manual parsing.

With this MCP, if you run `nvidia_generate_embeddings`, the result is immediately useful. If you use `nvidia_summarize_content`, the output is clean and ready for the next step in your workflow. The data flows naturally from one intelligent operation to the next.

What NVIDIA API Catalog MCP does for your AI

Building advanced agent workflows means connecting to dozens of specialized services. This MCP cuts through that complexity. Instead of dealing with separate credentials for every model or endpoint, your AI client talks to this central catalog. It figures out the right foundational model for the job, whether you need simple text compression or complex image analysis.

For instance, if you're building a knowledge retrieval system, your agent can first use tools like nvidia_list_foundation_models to see what's available. Then, it passes raw text through to nvidia_generate_embeddings to create vector representations. Finally, when a user asks a question, the chat completion tool handles the full conversational exchange. This centralized approach means your logic stays clean and portable.

By connecting this MCP via Vinkius, you give your agent access to best-in-class GPU compute power for everything from text summarization to multimodal vision tasks.

Built · Hosted · Managed by Vinkius NVIDIA API Catalog - Model Inference Tools
Server ID 019d75e1-35ae-70cf-91e7-31316ddc2c23
Vinkius Inspector
Compliance Grade A+
Score 100/100
Vinkius Inspector Badge — Score 100/100

Frequently asked questions about NVIDIA API Catalog MCP

How do I check if a model exists before calling nvidia_chat_completion? +

You should run nvidia_list_foundation_models first. This tool dumps an array of all accessible LLM paths, letting you confirm the exact model name your agent needs to use.

Does this MCP handle API quota issues? +

Yes. You can proactively run nvidia_check_token_quota at the beginning of any workflow. This tells your agent exactly how many credits are left, stopping runs before they fail due to overage.

What is the difference between nvidia_generate_embeddings and chat completion? +

Chat completion generates conversational text responses. Generating embeddings converts unstructured text into dense numerical arrays, which you use for semantic search or clustering, not conversation.

Can I process images with this MCP? +

Yes. Use the nvidia_vision_inference tool. It specifically handles multimodal tasks, allowing your agent to run advanced analysis on visual data.