DeepInfra MCP for AI. Run LLMs, Images, and Embeddings from your agent.

Q: Which LLM models can I use with the chat tool?

You can use any model hosted on DeepInfra, such as deepseek-ai/DeepSeek-V3 or meta-llama/Llama-3.3-70B-Instruct, by passing the model name to the createchatcompletion tool.

Q: How do I generate images using FLUX or Stable Diffusion?

Use the generateimage tool. Simply provide the model name (e.g., black-forest-labs/FLUX-1-schnell) and your text prompt to receive the generated image URL.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

DeepInfra provides serverless access to high-end AI models for text, image generation, and vector embeddings. Connect your agent to run state-of-the-art LLMs like Llama 3 or DeepSeek directly.

You can generate images from prompts, convert documents into searchable vectors, and handle specialized tasks (OCR, speech-to-text) all through a single connection.

What your AI can do

Create embedding

Converts provided text into numerical vectors for semantic search or RAG systems.

Generate image

Creates a visual image based on an input descriptive text prompt.

Create chat completion

Generates text by calling an LLM with specific models and message arrays.

+ 1 more capabilities included

Generate Conversational Text

Use state-of-the-art models to create long-form text, summaries, or structured responses based on chat prompts.

Create Visual Assets

Input a descriptive text prompt and receive high-resolution images generated by advanced diffusion models.

Vectorize Documents for Search

Process any block of text, converting it into numerical vectors suitable for Retrieval-Augmented Generation (RAG) or semantic indexing.

Handle Specialized Media Tasks

Run niche model deployments—like speech-to-text transcription or OCR—that don't follow standard LLM API formats.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

DeepInfra (Serverless LLM Inference) MCP - 4 Tools

Use these four tools to manage the full spectrum of model operations: chat completions, image generation, vector embeddings, and specialized native inference.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using DeepInfra (Serverless LLM Inference) on Vinkius

Create Embedding

Converts provided text into numerical vectors for semantic search or RAG systems.

Generate Image

Creates a visual image based on an input descriptive text prompt.

Create Chat Completion

Generates text by calling an LLM with specific models and message arrays.

Run Native Inference

Executes specialized models for tasks outside the standard OpenAI API spec, such as...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The DeepInfra integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "deepinfra-serverless-llm-inference": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the DeepInfra tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"deepinfra-serverless-llm-inference": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with DeepInfra (Serverless LLM Inference), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by DeepInfra. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 4 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Handling Specialized Model Calls

Today, if your chatbot needs to read text from a photo or transcribe an uploaded voice memo, you're forced to call three different services. You manage separate credentials for the general LLM, one for image processing, and another just for audio/vision tasks. This adds complexity and latency.

With this MCP, you use `run_native_inference`. It consolidates those specialized endpoints—OCR, Whisper, etc.—under one roof. Your agent calls a single tool, and it gets the result back. It's clean.

Generating Images with DeepInfra

Previously, generating an image required you to switch from your coding IDE over to a separate web UI. You'd copy the prompt, manually adjust the model settings (like aspect ratio), hit generate, and then wait for the asset to download before pasting it into your code.

Now, you call `generate_image` directly. The result is returned as data within your workflow. No context switching, no external UI needed. You just get the image.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

This MCP connects your AI agent to an extensive library of open-source models without you ever touching GPU infrastructure. It handles everything from complex text generation to visual asset creation. Need to build a semantic search pipeline? You use the embeddings endpoint to convert raw text into high-dimensional vectors. Want to create marketing visuals? Just give it a prompt and get stunning images back, using models like FLUX or Stable Diffusion.

And when standard LLM calls don't cut it—say you need to transcribe audio or read text from a photo—the native inference tools step in. By connecting this through Vinkius, your agent gets access to these world-class capabilities, allowing you to build complex workflows entirely within your existing coding environment.

Built · Hosted · Managed by Vinkius DeepInfra MCP - LLMs, Images, Embeddings

Server ID 019e5d11-145b-70a0-9911-dfb2bf1aebfd

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

What Changes When You Connect

You get high-performance text generation instantly. Use create_chat_completion with models like DeepSeek-V3 to build complex conversational logic without managing any infrastructure.

Image creation is simple. Just provide a prompt and use the generate_image tool to populate your application's visual assets directly from your coding environment.

Building search pipelines becomes straightforward. Use the create_embedding function to turn unstructured text into usable vectors, making RAG feasible for any project size.

Don't worry about model compatibility. The run_native_inference tool handles specialized needs—think OCR or Whisper audio transcription—that standard APIs ignore.

You maintain control over the output. These tools allow you to set parameters like temperature and token counts, ensuring predictable and reliable results.

See it in action

01 01

Building a Knowledge Chatbot

A data engineer needs a chatbot that answers questions based on proprietary documents. They use create_embedding to index the PDFs into vectors, then call create_chat_completion with those retrieved context chunks for accurate responses.

02 02

Generating Marketing Content

A content creator needs a visual asset library for a campaign. They use generate_image repeatedly in their workflow, feeding it different prompts to maintain brand consistency and speed up production time.

03 03

Transcribing Field Recordings

An operations manager records site interviews. Instead of using a separate service, they call run_native_inference to pass the audio file, getting clean text transcription in one step.

The honest tradeoffs

Assuming LLMs handle everything

Anti-pattern

Trying to use create_chat_completion for OCR. You'll get a vague failure because the model expects text input, not image data.

The Fix

If you need to read structured data from an image or document, don't rely on general chat tools. Use run_native_inference instead; it has specific endpoints designed for visual and specialized data extraction.

Building a search index manually

Anti-pattern

Writing custom Python scripts to handle text chunking, sending the chunks to an embedding service, and then storing them in a database.

The Fix

Skip the boilerplate. Use create_embedding directly within your agent workflow. It handles the vectorization call for you, keeping the logic clean.

Mixing up model APIs

Anti-pattern

Using one tool for chat and a different system for image generation, forcing multiple credentials and connection management.

The Fix

Keep it centralized. This MCP unifies everything under DeepInfra's infrastructure. Use create_chat_completion, generate_image, or run_native_inference all from the same Vinkius connection.

When It Fits, When It Doesn't

Use this MCP if your project requires a multi-modal pipeline: text generation plus image creation, or vectorization plus specialized data processing. Specifically, if you need to handle anything outside standard LLM chat—like OCR (via run_native_inference) or semantic search (create_embedding)—this is necessary. Don't use it if your task is purely monolithic; for instance, if you only need simple text generation, a dedicated chat-only tool might be lighter weight. But remember: when complexity increases, this MCP handles the routing and resource pooling across all four domains.

Questions you might have

Which LLM models can I use with the chat tool? +

You can use any model hosted on DeepInfra, such as deepseek-ai/DeepSeek-V3 or meta-llama/Llama-3.3-70B-Instruct, by passing the model name to the create_chat_completion tool.

How do I generate images using FLUX or Stable Diffusion? +

Use the generate_image tool. Simply provide the model name (e.g., black-forest-labs/FLUX-1-schnell) and your text prompt to receive the generated image URL.

What is the 'run_native_inference' tool used for? +

It is used for models that don't follow the OpenAI chat/image spec, such as audio transcription (Whisper), specialized OCR models, or your own private model deployments on DeepInfra.

What do I need to use an API key when running create_chat_completion? +

You must provide a valid DeepInfra API token for authentication. This token verifies your subscription and grants access to the models you're calling.

How should I handle rate limits when using create_embedding? +

If you hit a rate limit, your agent will receive an error code telling you how long to wait. You just need to implement simple backoff logic in your workflow.

What is the required input format for the text I pass to create_embedding? +

You must provide plain string(s) of text. The system will handle chunking and processing those inputs into high-dimensional vectors.

Does run_native_inference support models that don't follow the standard OpenAI spec? +

Yes, that's exactly what it does. This tool lets you access specialized models for tasks like OCR or custom deployments outside of the typical LLM format.

Can I control the output image size when using generate_image? +

You specify the desired dimensions—like 1024x1024 pixels—as part of the prompt parameters. This ensures your visual assets fit exactly where you need them.

Connect to your AI in seconds.

Create embedding

Generate image

Create chat completion

DeepInfra (Serverless LLM Inference) MCP - 4 Tools

Make your AI actually useful.

Create Embedding

Generate Image

Create Chat Completion

Run Native Inference

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

Handling Specialized Model Calls

Generating Images with DeepInfra

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

Building a Knowledge Chatbot

Generating Marketing Content

Transcribing Field Recordings

The honest tradeoffs

Assuming LLMs handle everything

Building a search index manually

Mixing up model APIs

When It Fits, When It Doesn't

Questions you might have