How to Use the Cerebras Inference MCP in Claude

Q: Can my Claude agent use Cerebras Inference to process a file?

Yes. Use the uploadfile tool to send a JSONL file to the server. Then, tell your agent to start a job with createbatch using that file.

Q: What's the difference between createcompletion and createchatcompletion in Cerebras Inference for Claude Desktop?

createcompletion is for straightforward text generation from a single prompt. Use createchatcompletion for multi-turn conversations, where you provide a history of messages for more context-aware replies.

Q: How is my data privacy handled with the Cerebras Inference MCP Server?

Your prompts and any uploaded JSONL files are sent to the Cerebras server for processing. Vinkius manages the server in a zero-trust, ephemeral environment. All connections are encrypted, and your data is only used to fulfill the inference request.

Run Cerebras inference jobs, manage models, and analyze results right from your Claude Desktop chat. No terminal switching needed.

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

MCP Servers - Free for Subscribers

Connect Cerebras Inference MCP to Claude Desktop

Create your Vinkius account to connect Cerebras Inference to Claude Desktop and route execution through our secure gateway. The platform manages server hosting, runtime updates, and security layers. Configuration requires no manual server provisioning.

GDPR Free for Subscribers

Setup Cerebras Inference with Claude Desktop

Ask AI about this MCP

ChatGPT

Claude

Perplexity

Generate Text and Chat Completions

Ask Claude to generate text using the Cerebras Wafer-Scale Engine. The `create_completion` tool handles simple prompts, while `create_chat_completion` is built for structured, multi-turn conversations where context is key. You're just having a conversation in the Claude Desktop app, but the heavy lifting is done by serious hardware. Ask your agent to `list_models`, pick one, and then tell it to generate something. It's a direct line from your chat window to a high-performance inference engine.

Manage Batch Jobs from your Claude MCP Server

Kick off large, asynchronous jobs without tying up your chat. Use the `upload_file` tool to send up a `JSONL` file, then tell Claude to start processing it with `create_batch`. You don't have to sit and wait. Later, just ask, "What's the status of my last batch job?" and Claude uses `get_batch` or `list_batches` to give you an update. This turns your chat client into a control panel for large-scale inference tasks.

Inspect Models and Server Health

Get the specs on any model. Ask Claude for details on a specific one using `get_model`, or see what's available to everyone with `list_public_models`. The entire Cerebras model catalog is now available through chat. This isn't just for running jobs. You can also monitor the server's operational health. The `get_metrics` tool pulls Prometheus-formatted data, so you can ask Claude to check for errors or performance dips without leaving the app.

Setup guide

Set up Cerebras Inference MCP in Claude Web or Desktop

1

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.
2

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL: https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.
3

Start a conversation

Open a new chat. The Cerebras Inference MCP tools are available immediately — no restart needed.

Endpoint URL

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

No configuration file needed — paste the URL directly in the Claude web interface.

Available on Free (1 connector), Pro, Max, Team, and Enterprise plans.

Prerequisites

Claude Desktop installed (macOS or Windows)
Active Vinkius subscription with a valid endpoint token

1

Open Claude Desktop Settings

Click the menu icon at the top-left corner, go to Settings → Developer → Edit Config. This opens claude_desktop_config.json in your default text editor.
2

Paste the Cerebras Inference MCP configuration

Copy the JSON snippet on the right into the mcpServers object. Replace [YOUR_TOKEN_HERE] with your endpoint token from cloud.vinkius.com.
3

Restart Claude Desktop

Close and reopen the application. Claude needs a full restart to load new MCPs — refreshing a conversation is not enough.
4

Verify the connection

Open a new conversation. Click the 🔌 icon at the bottom of the message input. You should see tools listed under cerebras-inference-mcp.

json

{
  "mcpServers": {
    "cerebras-inference-mcp": {
      "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
    }
  }
}

Get your connection token →

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Cerebras Inference. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

Why Choose Vinkius

Vinkius connects your tools to AI with real-time monitoring and automatic cost savings — all from one dashboard.

Connect Cerebras Inference now

Real-time monitoring

Live

visibility into every interaction

Connect your favorite tools to your AI and see exactly what's happening — every request, every response, in real time.

Built-in savings

60%

lower AI costs

Vinkius compresses data between your apps and your AI automatically. Lower bills every month — no configuration required.

Single dashboard

One

place for every integration

Every tool your AI connects to, managed from a single screen. One account, complete control.

Common questions about Cerebras Inference MCP in Claude Desktop

Add the Vinkius MCP Server URL in your Claude settings under Integrations. Once you connect it, the tools are ready to use in your next chat. There's nothing to install locally.

Yes. Use the `upload_file` tool to send a JSONL file to the server. Then, tell your agent to start a job with `create_batch` using that file.

`create_completion` is for straightforward text generation from a single prompt. Use `create_chat_completion` for multi-turn conversations, where you provide a history of messages for more context-aware replies.

Absolutely. Just ask Claude to list the available models. It will use the `list_models` tool to show you everything you can use for inference.

Your prompts and any uploaded `JSONL` files are sent to the Cerebras server for processing. Vinkius manages the server in a zero-trust, ephemeral environment. All connections are encrypted, and your data is only used to fulfill the inference request.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript