# DeepInfra MCP

> DeepInfra provides serverless access to high-end AI models for text, image generation, and vector embeddings. Connect your agent to run state-of-the-art LLMs like Llama 3 or DeepSeek directly. You can generate images from prompts, convert documents into searchable vectors, and handle specialized tasks (OCR, speech-to-text) all through a single connection.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** llm-inference, serverless-ai, text-to-image, embeddings, ai-models

## Description

This MCP connects your AI agent to an extensive library of open-source models without you ever touching GPU infrastructure. It handles everything from complex text generation to visual asset creation. Need to build a semantic search pipeline? You use the embeddings endpoint to convert raw text into high-dimensional vectors. Want to create marketing visuals? Just give it a prompt and get stunning images back, using models like FLUX or Stable Diffusion. And when standard LLM calls don't cut it—say you need to transcribe audio or read text from a photo—the native inference tools step in. By connecting this through Vinkius, your agent gets access to these world-class capabilities, allowing you to build complex workflows entirely within your existing coding environment.

## Tools

### create_embedding
Converts provided text into numerical vectors for semantic search or RAG systems.

### generate_image
Creates a visual image based on an input descriptive text prompt.

### create_chat_completion
Generates text by calling an LLM with specific models and message arrays.

### run_native_inference
Executes specialized models for tasks outside the standard OpenAI API spec, such as OCR or speech-to-text.

## Prompt Examples

**Prompt:** 
```
Generate a chat completion using deepseek-ai/DeepSeek-V3 to explain quantum entanglement.
```

**Response:** 
```
I'll use the `create_chat_completion` tool with the DeepSeek-V3 model to generate a detailed explanation of quantum entanglement for you.
```

**Prompt:** 
```
Create an image of a cyberpunk city at night using black-forest-labs/FLUX-1-schnell.
```

**Response:** 
```
I'm calling the `generate_image` tool with the FLUX-1-schnell model and your cyberpunk prompt. One moment while the image is generated.
```

**Prompt:** 
```
Generate embeddings for the text 'Artificial Intelligence is transforming the world' using BAAI/bge-large-en-v1.5.
```

**Response:** 
```
I'll process that text through the `create_embedding` tool using the BGE model to get the vector representation.
```

## Capabilities

### Generate Conversational Text
Use state-of-the-art models to create long-form text, summaries, or structured responses based on chat prompts.

### Create Visual Assets
Input a descriptive text prompt and receive high-resolution images generated by advanced diffusion models.

### Vectorize Documents for Search
Process any block of text, converting it into numerical vectors suitable for Retrieval-Augmented Generation (RAG) or semantic indexing.

### Handle Specialized Media Tasks
Run niche model deployments—like speech-to-text transcription or OCR—that don't follow standard LLM API formats.

## Use Cases

### Building a Knowledge Chatbot
A data engineer needs a chatbot that answers questions based on proprietary documents. They use `create_embedding` to index the PDFs into vectors, then call `create_chat_completion` with those retrieved context chunks for accurate responses.

### Generating Marketing Content
A content creator needs a visual asset library for a campaign. They use `generate_image` repeatedly in their workflow, feeding it different prompts to maintain brand consistency and speed up production time.

### Transcribing Field Recordings
An operations manager records site interviews. Instead of using a separate service, they call `run_native_inference` to pass the audio file, getting clean text transcription in one step.

## Benefits

- You get high-performance text generation instantly. Use `create_chat_completion` with models like DeepSeek-V3 to build complex conversational logic without managing any infrastructure.
- Image creation is simple. Just provide a prompt and use the `generate_image` tool to populate your application's visual assets directly from your coding environment.
- Building search pipelines becomes straightforward. Use the `create_embedding` function to turn unstructured text into usable vectors, making RAG feasible for any project size.
- Don't worry about model compatibility. The `run_native_inference` tool handles specialized needs—think OCR or Whisper audio transcription—that standard APIs ignore.
- You maintain control over the output. These tools allow you to set parameters like temperature and token counts, ensuring predictable and reliable results.

## How It Works

The bottom line is you get access to multiple specialized AI backends through one predictable connection point.

1. Subscribe to this MCP and provide your DeepInfra API Token.
2. Your AI client handles the connection, allowing your agent to call for specific model operations (e.g., text generation or image creation).
3. The platform routes the request to DeepInfra's serverless endpoints, which executes the task and returns the resulting data payload.

## Frequently Asked Questions

**Which LLM models can I use with the chat tool?**
You can use any model hosted on DeepInfra, such as `deepseek-ai/DeepSeek-V3` or `meta-llama/Llama-3.3-70B-Instruct`, by passing the model name to the `create_chat_completion` tool.

**How do I generate images using FLUX or Stable Diffusion?**
Use the `generate_image` tool. Simply provide the model name (e.g., `black-forest-labs/FLUX-1-schnell`) and your text prompt to receive the generated image URL.

**What is the 'run_native_inference' tool used for?**
It is used for models that don't follow the OpenAI chat/image spec, such as audio transcription (Whisper), specialized OCR models, or your own private model deployments on DeepInfra.

**What do I need to use an API key when running create_chat_completion?**
You must provide a valid DeepInfra API token for authentication. This token verifies your subscription and grants access to the models you're calling.

**How should I handle rate limits when using create_embedding?**
If you hit a rate limit, your agent will receive an error code telling you how long to wait. You just need to implement simple backoff logic in your workflow.

**What is the required input format for the text I pass to create_embedding?**
You must provide plain string(s) of text. The system will handle chunking and processing those inputs into high-dimensional vectors.

**Does run_native_inference support models that don't follow the standard OpenAI spec?**
Yes, that's exactly what it does. This tool lets you access specialized models for tasks like OCR or custom deployments outside of the typical LLM format.

**Can I control the output image size when using generate_image?**
You specify the desired dimensions—like 1024x1024 pixels—as part of the prompt parameters. This ensures your visual assets fit exactly where you need them.