NVIDIA API Catalog MCP Server
Cloud Engine proxy running native foundational completions natively utilizing active Nemotron and Llama3 architectures.
Vinkius AI Gateway supports streamable HTTP and SSE.

Works with every AI agent you already use
…and any MCP-compatible client


















NVIDIA API Catalog MCP Server: see your AI Agent in action
Built-in capabilities (8)
nvidia_chat_completion
Trigger direct NLP inference matrices directly evaluating queries over hosted LLMs
nvidia_check_token_quota
Poll safely dynamic credit and explicit constraint execution limits bounding inference execution
nvidia_generate_embeddings
Pass parameters safely mapping explicit unstructured vectors directly using specific Embedding arrays
nvidia_get_cloud_status
Ping explicitly the core hosted NVIDIA matrix tracing inference endpoints evaluating latencies securely
nvidia_list_foundation_models
Dumps the strict array specifying explicit LLM matrix paths accessible securely natively
nvidia_list_lora_adapters
Evaluate explicit matrices tracking fine-tuned overrides isolating logical constraints dynamically
nvidia_summarize_content
Standard natively configured logical execution executing predefined abstract compression matrices smoothly
nvidia_vision_inference
g. Llama-Vision natively). Invoke strictly multimodal abilities capturing diagnostic constraints returning inference on graphical data
What this connector unlocks
What you can do
Trigger massive inference executions navigating safely over natively hosted logic endpoints using the explicit API Catalog:
- Discover Active Cloud LLMs natively listing every explicitly hosted model configuration safely mapped
- Route Chat Completions pulling explicit answers evaluating safely unstructured conversational bounds dynamically
- Extract Native Embeddings passing direct text evaluations extracting numerical arrays gracefully
- Evaluate Multimodal limits assigning native Vision tasks routing natively strictly matrix limits
- Execute Text Summarization compressing explicit bounds generating specific arrays cleanly routing effectively
How it works
1. Declare Logic Tokens, explicitly combining the NVIDIA_API_KEY configuration natively over the SDK bounds proxy implicitly
2. Pass Strict Logic Inference, requesting native models securely bypassing manual SDK mapping configurations resolving completely
3. Map and execute hardware limits inherently parsing directly standard structured completions securely
Who is this for?
Explicitly targeted evaluating limits specifically for AI Engineers, Generative Integrators, and Developers parsing direct responses over public NVIDIA compute matrices.
Frequently asked questions
Give your AI agents the power of NVIDIA API Catalog
Access NVIDIA API Catalog and 2,000+ MCP servers — ready for your agents to use, right now. No glue code. No custom integrations. Just plug Vinkius AI Gateway and let your agents work.
More in this category

Jira Cloud
10 toolsManage projects, search issues, and track tasks via Jira Cloud API.
Google Maps
4 toolsEmpower location intelligence via Google Maps — perform geocoding, search millions of places, retrieve rich venue details, and calculate directions directly from any AI agent.

NVIDIA AI
9 toolsAccess LLMs, embeddings, code generation, and reasoning via NVIDIA API Catalog.
You might also like

Frontegg
12 toolsManage B2B identity, provision users, and oversee tenants via AI agents with Frontegg.

Unbounce
4 toolsAutomate marketing tasks via Unbounce — retrieve landing pages, fetch captured leads, audit performance stats, and manage test variants easily.
Hugging Face Audio
4 toolsConnect Hugging Face Audio to any AI agent via MCP.
