DeepInfra MCP. Run LLMs, images, and embeddings from your agent.

Q: How do I use the createchatcompletion tool with a custom model?

You specify the full model name (e.g., deepseek-ai/DeepSeek-V3) as a parameter when calling the tool. This gives you direct control over which specific LLM you use for the conversation.

Q: Is generateimage the only way to make pictures?

Yes, generateimage is the dedicated tool for creating visuals. You provide a text prompt, and it returns the image data. You can't use the chat completion tool for this.

Q: What is the purpose of createembedding?

The createembedding tool converts raw text into high-dimensional vectors. These vectors allow your agent to perform semantic search, finding information based on meaning rather than just matching keywords.

Q: How do I handle non-standard models with runnativeinference?

You pass the specific model identifier and inputs to runnativeinference. It's designed for specialized tasks like OCR, video generation, or private deployments that don't follow the standard OpenAI format.

Q: Does createembedding support different vector dimensions?

Yes, createembedding processes text into high-dimensional vectors using models like BAAI/bge-large-en-v1.5. The resulting vector size depends on the specific embedding model you choose.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

DeepInfra (Serverless LLM Inference) MCP Server lets your AI agent run large models for text, images, and embeddings. Access state-of-the-art models like DeepSeek-V3 and FLUX-1 without managing GPU infrastructure.

It provides four core tools: `create_chat_completion` for text generation, `generate_image` for visuals, `create_embedding` for vector math, and `run_native_inference` for specialized tasks.

What your AI agents can do

Create chat completion

Generates a conversation response using a specified LLM model and message history.

Create embedding

Converts a given block of text into a numerical vector representation.

Generate image

Creates a visual image based on a detailed text prompt.

+ 1 more capabilities included

Generate conversation text

Uses the create_chat_completion tool to write responses using models like DeepSeek-V3, allowing control over creativity and length.

Create images from text

Uses the generate_image tool to turn a simple text prompt into a high-quality visual asset.

Convert text to vectors

Uses the create_embedding tool to convert any body of text into numerical vectors for advanced search and retrieval.

Run specialized models

Uses run_native_inference for models that don't follow standard specs, such as OCR or speech-to-text.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

DeepInfra MCP Server: 4 Tools for Multi-Modal AI

This server gives your AI client access to four tools for advanced text generation, image creation, vector embedding, and specialized model inference.

create019e5d10

create chat completion

Generates a conversation response using a specified LLM model and message history.

create019e5d10

create embedding

Converts a given block of text into a numerical vector representation.

generate019e5d10

generate image

Creates a visual image based on a detailed text prompt.

run019e5d10

run native inference

Executes specialized models for tasks outside of standard AI specs, like OCR or speech-to-text.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with DeepInfra (Serverless LLM Inference), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Your AI agent can run big language models for text, images, and embeddings using DeepInfra. You don't gotta mess with GPUs or manage infrastructure; it's all serverless. You've got four tools here: create_chat_completion, generate_image, create_embedding, and run_native_inference.

Generating Conversation Text

To get text, you use create_chat_completion. It writes responses using top models like DeepSeek-V3, letting you control how creative it is and how long the response can be. You can make it sound exactly how you want it to.

Creating Images from Text

When you need a visual, you use generate_image. Just give it a text prompt, and it spits out a high-quality picture. You're basically telling it what you want, and it makes the art.

Converting Text to Vectors

For text, you use create_embedding. This tool takes any chunk of writing and turns it into a numerical vector. You need those vectors for advanced search or when you're doing RAG. It's how you make your data searchable by meaning, not just keywords.

Running Specialized Models

run_native_inference lets you access models that don't follow the standard AI playbook. You can run stuff like OCR or speech-to-text with it. It's for those weird, specialized tasks that need a custom engine.

How DeepInfra MCP Works

1 Subscribe to the server and provide your DeepInfra API Token.
2 Your AI client calls a specific tool (e.g., create_chat_completion) and passes the required parameters (model name, messages, etc.).
3 The server executes the model call and returns the result (text, image data, or vector) back to your agent.

The bottom line is you call the tool, and the server handles the complex model running and returns the structured data.

Who Is DeepInfra MCP For?

The developer building internal tools who needs top-tier AI capabilities without running a GPU cluster. It's for the data engineer building RAG pipelines, the content creator needing rapid visual assets, and the developer integrating LLMs into a complex workflow.

Data Engineer

Builds semantic search pipelines by using create_embedding to index documents and retrieving context for LLMs.

Full-Stack Developer

Integrates complex LLM features into a web app by calling create_chat_completion and managing the full workflow.

Marketing Content Creator

Generates large batches of visual assets and text variations by calling generate_image and create_chat_completion directly in their workspace.

What Changes When You Connect

Complex Text Generation: Use create_chat_completion with models like DeepSeek-V3 or Llama-3.3-70B. You get full control over temperature and tokens, making the output reliable for specific use cases.
Visual Asset Pipeline: The generate_image tool lets you turn any text prompt into a high-quality visual asset. You don't need a separate image API or service; it's right here.
Semantic Search Ready: The create_embedding tool converts raw text into vectors. This is the backbone of RAG and semantic search, letting your agent find information based on meaning, not keywords.
Specialized AI Handling: Need something non-standard? run_native_inference handles it. It covers tasks like OCR or Whisper speech-to-text, letting you use models that don't follow typical AI specs.
Zero Infrastructure Overhead: You run world-class AI models without managing GPUs or scaling compute. You just connect the API token and start using the tools.

Real-World Use Cases

Building a Q&A System

A data engineer needs to build a Q&A system over a private document set. They use create_embedding on all documents to create vectors. Then, when a user asks a question, the agent uses the query to search the vector index and feeds the retrieved context into create_chat_completion to generate a precise answer.

Designing Product Mockups

A marketing team needs 20 variations of a product mockup for a launch campaign. They write a base prompt and call generate_image 20 times. The agent iterates through the prompts and collects all the resulting image files for review.

Transcribing and Summarizing Meetings

A user records a meeting and needs to process the audio. They first use run_native_inference (Whisper) to convert the audio to text. Then, they pass that raw transcript into create_chat_completion to generate a concise summary and list action items.

Analyzing Competitor Screenshots

A researcher has a folder of competitor screenshots. They use run_native_inference (OCR) to extract all the visible text from the images. They then feed that structured text into create_embedding to analyze the common themes and patterns across the industry.

The Tradeoffs

Trying to generate images with text chat

Asking create_chat_completion to 'Generate an image of a cat on the moon.' The model will write a descriptive poem or a suggestion, but it won't output a usable picture file. You get text where you needed bytes.

→ You must use the generate_image tool. Pass your prompt directly to it. This is the dedicated path for visual content.

Calling chat completion for vector math

Attempting to use create_chat_completion to figure out the distance between two pieces of text. The model only outputs words; it can't perform the mathematical vector calculations required for semantic search.

→ Use the create_embedding tool. It takes text and reliably returns the mathematical vector representation needed for accurate comparison.

Ignoring specialized models

Assuming the standard LLM tools can handle non-text inputs, like a raw PDF or audio file. They can't; they only process text strings and structured data inputs.

→ Check the run_native_inference tool. It handles models for inputs like speech-to-text or OCR, making it the right place for specialized media tasks.

When It Fits, When It Doesn't

Use this server if your workflow needs more than just basic text generation. You need to combine text (LLMs), images, and data vectors. For example, if you build a Q&A system, you must call create_embedding first, then feed the result into create_chat_completion. Don't use it if you only need to send a simple email or call a basic database function—use a messaging or database tool instead. If your only need is to translate text, a dedicated translation tool is simpler. This server is for complex, multi-modal computation.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by DeepInfra. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 4 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

create_chat_completion create_embedding generate_image run_native_inference

Dealing with disparate AI tasks feels like managing three different API keys.

Before, running an AI workflow meant juggling endpoints. You'd call an LLM API for text, then send the text to a separate image service for visuals, and finally, if you needed search, you'd hit a third vector API. The process was a multi-step chore: copy the output from the first API, paste it into the second, and manually manage tokens and rate limits across three different services.

Now, you call DeepInfra once. Your agent uses the necessary tools—`create_chat_completion`, `generate_image`, or `create_embedding`—all from one place. You get the text, the image, or the vector, and your workflow stays contained. It's just cleaner.

DeepInfra MCP Server: Run specialized model tasks.

You don't have to use standard LLM tools for everything. If you're dealing with audio transcripts (Whisper) or raw document scans (OCR), you used to have to build a custom wrapper around those specific APIs. It was extra work just to get the input into the right format.

Now, the `run_native_inference` tool handles those specialized models. It lets you process raw inputs—like speech or scanned documents—directly without building custom middleware. It's just plug-and-play.

Common Questions About DeepInfra MCP

How do I use the `create_chat_completion` tool with a custom model? +

You specify the full model name (e.g., deepseek-ai/DeepSeek-V3) as a parameter when calling the tool. This gives you direct control over which specific LLM you use for the conversation.

Is `generate_image` the only way to make pictures? +

Yes, generate_image is the dedicated tool for creating visuals. You provide a text prompt, and it returns the image data. You can't use the chat completion tool for this.

What is the purpose of `create_embedding`? +

The create_embedding tool converts raw text into high-dimensional vectors. These vectors allow your agent to perform semantic search, finding information based on meaning rather than just matching keywords.

Can `run_native_inference` handle any AI model? +

It handles models that fall outside the standard OpenAI specifications. This includes specialized tasks like speech-to-text (Whisper) or Optical Character Recognition (OCR).

What kind of models can I use with `create_chat_completion`? +

You can use a massive library of open-source models, including DeepSeek-V3 and Llama 3. This gives you control over the model you use, letting you pick the best fit for your specific task.

How do I handle non-standard models with `run_native_inference`? +

You pass the specific model identifier and inputs to run_native_inference. It's designed for specialized tasks like OCR, video generation, or private deployments that don't follow the standard OpenAI format.

Are there limits on the images I can generate using `generate_image`? +

While usage limits are set by DeepInfra, the tool allows you to generate stunning visuals using models like FLUX-1 or Stable Diffusion. Check the provider's documentation for current rate limits.

Does `create_embedding` support different vector dimensions? +

Yes, create_embedding processes text into high-dimensional vectors using models like BAAI/bge-large-en-v1.5. The resulting vector size depends on the specific embedding model you choose.

Which LLM models can I use with the chat tool? +

You can use any model hosted on DeepInfra, such as deepseek-ai/DeepSeek-V3 or meta-llama/Llama-3.3-70B-Instruct, by passing the model name to the create_chat_completion tool.

How do I generate images using FLUX or Stable Diffusion? +

Use the generate_image tool. Simply provide the model name (e.g., black-forest-labs/FLUX-1-schnell) and your text prompt to receive the generated image URL.

What is the 'run_native_inference' tool used for? +

It is used for models that don't follow the OpenAI chat/image spec, such as audio transcription (Whisper), specialized OCR models, or your own private model deployments on DeepInfra.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript