# NVIDIA API Catalog MCP

> NVIDIA API Catalog MCP connects your AI client directly to a massive array of foundational models running on NVIDIA compute hardware. It lets you discover available LLMs, route complex chat queries, generate embeddings from raw text, and process visual data—all without managing individual vendor APIs.

## Overview
- **Category:** industry-titans
- **Price:** Free
- **Tags:** model-discovery, llm-proxy, inference-engine, api-catalog, model-routing, foundation-models

## Description

Building advanced agent workflows means connecting to dozens of specialized services. This MCP cuts through that complexity. Instead of dealing with separate credentials for every model or endpoint, your AI client talks to this central catalog. It figures out the right foundational model for the job, whether you need simple text compression or complex image analysis.

For instance, if you're building a knowledge retrieval system, your agent can first use tools like `nvidia_list_foundation_models` to see what's available. Then, it passes raw text through to `nvidia_generate_embeddings` to create vector representations. Finally, when a user asks a question, the chat completion tool handles the full conversational exchange. This centralized approach means your logic stays clean and portable. By connecting this MCP via Vinkius, you give your agent access to best-in-class GPU compute power for everything from text summarization to multimodal vision tasks.

## Tools

### nvidia_chat_completion
Sends natural language questions to a hosted LLM and receives direct, generated answers.

### nvidia_check_token_quota
Queries the system to check your current API usage limits and remaining credits for inference jobs.

### nvidia_generate_embeddings
Takes raw text inputs and converts them into numerical vectors used for semantic search.

### nvidia_get_cloud_status
Pings the core NVIDIA compute endpoints to check system latency and operational health.

### nvidia_list_foundation_models
Retrieves a list of all major LLMs and foundation models that are currently available through the catalog.

### nvidia_list_lora_adapters
Checks for fine-tuned model overrides, allowing you to use specialized versions without retraining the whole base model.

### nvidia_summarize_content
Compresses large blocks of text into a shorter summary while retaining key information.

### nvidia_vision_inference
Processes image inputs to perform advanced visual analysis and extract data from pictures.

## Prompt Examples

**Prompt:** 
```
Deploy commands exploring active NLP data listing completely the hosted LLMs mapped heavily inside the NVIDIA catalog safely.
```

**Response:** 
```
Parsed logically evaluating NVIDIA Cloud API natively (`list_foundation_models`). Platform responded safely listing 42 explicit parameters including Llama3 cleanly bounding choices naturally.
```

**Prompt:** 
```
Trigger inference explicitly navigating natively utilizing Nemotron LLMs to summarize standard matrices cleanly parsing bounds gracefully.
```

**Response:** 
```
Tunnel explicitly mapping `summarize_content`. Engine successfully extracted cleanly formatted response arrays bouncing latency smoothly gracefully natively over hosted limits.
```

**Prompt:** 
```
Execute explicitly generating explicit unstructured text matrices extracting native embedding queries purely isolating the arrays properly.
```

**Response:** 
```
Execution logic parameters strictly extracting values safely allocating implicitly natively `generate_embeddings`. Payload correctly returned arrays naturally formatting vector bindings efficiently bounds.
```

## Capabilities

### Discover available models
List all explicitly hosted LLM and foundation model configurations that are currently accessible.

### Route conversational chat queries
Send unstructured text to an active LLM for immediate, contextual answers.

### Generate numerical vector embeddings
Convert raw blocks of text into dense arrays that measure semantic meaning, perfect for database searches.

### Process visual data and images
Run specialized tasks on image inputs to extract descriptions or run advanced vision analysis.

### Check usage credits and limits
Poll the system to confirm current API quota status before running expensive inference jobs.

## Use Cases

### Building a document analysis pipeline
A user uploads a 50-page report. The agent first uses `nvidia_list_foundation_models` to confirm capability, then passes the text to `nvidia_summarize_content`. Finally, it sends the summary and key sections through `nvidia_generate_embeddings`, allowing the end-user to search specific concepts within the document later.

### Creating a product QA bot
The user provides an image of a complex appliance. The agent uses `nvidia_vision_inference` to extract model numbers and component names. It then passes those extracted details to `nvidia_chat_completion` to generate a tailored troubleshooting guide.

### Automating knowledge base updates
A team uploads 100 new internal articles. The agent iterates through them, using `nvidia_generate_embeddings` on each one and storing the resulting vectors in a database. This keeps the entire knowledge base fresh for future queries.

### Testing multi-step agent logic
Before deployment, an engineer runs a test suite that calls `nvidia_get_cloud_status` to verify latency. They then run a simulated chat session using `nvidia_chat_completion`, ensuring the entire sequence is stable and fast.

## Benefits

- Stop worrying about model discovery. Use `nvidia_list_foundation_models` to see every available LLM path in one place, making it easy for your agent to choose the right tool for the job.
- Handle complex resource management with `nvidia_check_token_quota`. Your workflow checks its own credit limits before running a massive inference task, preventing costly failures mid-process.
- Need text turned into searchable data? Pass content through `nvidia_generate_embeddings` to create reliable vector arrays that power your RAG system or semantic search engine.
- Vision tasks are now simple. Use `nvidia_vision_inference` to feed an image and get structured, actionable data back—no manual image processing needed.
- Keep your code clean by letting the MCP handle routing. Instead of writing separate logic for summarization vs. chat, just call `nvidia_summarize_content`, and the backend takes care of the rest.

## How It Works

The bottom line is that this MCP handles the entire communication layer between your agent and massive compute resources.

1. First, your agent sets up credentials by declaring logic tokens using the configured NVIDIA API key.
2. Next, you send a request for specific model inference, letting the MCP handle all the underlying hardware mapping and routing.
3. Finally, you receive structured completions or numerical arrays back—the data is ready to be used immediately in your application.

## Frequently Asked Questions

**How do I check if a model exists before calling nvidia_chat_completion?**
You should run `nvidia_list_foundation_models` first. This tool dumps an array of all accessible LLM paths, letting you confirm the exact model name your agent needs to use.

**Does this MCP handle API quota issues?**
Yes. You can proactively run `nvidia_check_token_quota` at the beginning of any workflow. This tells your agent exactly how many credits are left, stopping runs before they fail due to overage.

**What is the difference between nvidia_generate_embeddings and chat completion?**
Chat completion generates conversational text responses. Generating embeddings converts unstructured text into dense numerical arrays, which you use for semantic search or clustering, not conversation.

**Can I process images with this MCP?**
Yes. Use the `nvidia_vision_inference` tool. It specifically handles multimodal tasks, allowing your agent to run advanced analysis on visual data.