How to Use the Cerebras Inference MCP in Claude Code

Q: How do I install the Cerebras Inference MCP Server for my CLI?

Run claude mcp add --transport http cerebras-inference-mcp -- in your terminal. This adds the server to your local MCP configuration file instantly.

Pipe fast wafer-scale completions into your terminal pipelines by linking Claude Code to Cerebras Inference.

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

MCP Servers - Free for Subscribers

Connect Cerebras Inference MCP to Claude Code

Create your Vinkius account to connect Cerebras Inference to Claude Code and route execution through our secure gateway. The platform manages server hosting, runtime updates, and security layers. Configuration requires no manual server provisioning.

GDPR Free for Subscribers

Setup Cerebras Inference with Claude Code

Ask AI about this MCP

ChatGPT

Claude

Perplexity

Automate batch jobs with this MCP Server

The `create_batch` tool lets you run massive asynchronous jobs directly from your CLI session. Claude Code uploads your JSONL datasets with `upload_file` and starts the batch run. You can pipe the status check from `get_batch` into other shell scripts. Once finished, Claude Code downloads the raw output using `get_file_content` to feed your local data pipelines.

Query available models directly from the CLI

The `list_models` tool fetches the active models list directly inside your terminal session. Claude Code uses this to verify which endpoints are online before running heavy automation scripts. If you need to check details for a specific model, Claude Code runs `get_model` to verify context window limits. This ensures your shell commands never fail due to model mismatches.

Monitor Cerebras Inference throughput via CLI

The `get_metrics` tool pulls live Prometheus-formatted performance metrics into your terminal. Claude Code reads this data to show you real-time token speeds and queue wait times. This is perfect for DevOps engineers who need to monitor API health during automated cron jobs. You can easily log these metrics to a local file for later analysis.

Setup guide

Set up Cerebras Inference MCP in Claude Code

Prerequisites

Claude Code CLI installed (npm install -g @anthropic-ai/claude-code)
Active Vinkius subscription with a valid endpoint token

1

Run the add command

Open your terminal and run the command shown on the right. Replace [YOUR_TOKEN_HERE] with your endpoint token from cloud.vinkius.com. Use --scope user to make it available across all projects.
2

Verify the connection

Start a Claude Code session and type /mcp to list connected servers. You should see cerebras-inference-mcp with a green status indicator.
3

Start using tools

Ask Claude Code something like "Check my latest Cerebras Inference transactions." It will automatically discover and invoke the available Cerebras Inference tools.

Terminal

claude mcp add --transport http cerebras-inference-mcp https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Get your connection token →

Prerequisites

Claude Code CLI installed
Active Vinkius subscription with a valid endpoint token

1

Open the config file

Create or edit .mcp.json in your project root for project-level scope, or ~/.claude.json for user-level scope.
2

Add the Cerebras Inference MCP

Paste the JSON snippet shown on the right into the mcpServers object. Replace [YOUR_TOKEN_HERE] with your endpoint token from cloud.vinkius.com.
3

Restart Claude Code

Start a new Claude Code session. Type /mcp to confirm the server is connected. The tools will be automatically available in your conversation.

.mcp.json

{
  "mcpServers": {
    "cerebras-inference-mcp": {
      "type": "url",
      "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
    }
  }
}

Get your connection token →

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Cerebras Inference. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

Why Choose Vinkius

Vinkius connects your tools to AI with real-time monitoring and automatic cost savings — all from one dashboard.

Connect Cerebras Inference now

Real-time monitoring

Live

visibility into every interaction

Connect your favorite tools to your AI and see exactly what's happening — every request, every response, in real time.

Built-in savings

60%

lower AI costs

Vinkius compresses data between your apps and your AI automatically. Lower bills every month — no configuration required.

Single dashboard

One

place for every integration

Every tool your AI connects to, managed from a single screen. One account, complete control.

Common questions about Cerebras Inference MCP in Claude Code

Yes. You can trigger batch runs, upload JSONL datasets, and track progress using simple terminal commands. Claude Code handles the API polling autonomously.

Run `claude mcp add --transport http cerebras-inference-mcp -- ` in your terminal. This adds the server to your local MCP configuration file instantly.

It connects your terminal agent to ultra-fast hardware that generates completions in milliseconds. This allows complex shell scripts and multi-step terminal tasks to execute without waiting on slow API responses.

Yes. You can tell Claude Code to stop the job, and it will execute the cancel tool to terminate the run on the remote cluster immediately.

Your API keys and chat payloads are transmitted directly to the Cerebras API endpoints over TLS. Vinkius hosts the server in a secure, ephemeral sandbox that never caches or logs your credentials.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript