4,500+ servers built on MCP Fusion
Vinkius

RunPod MCP. Manage your entire GPU compute lifecycle via chat.

Claude Claude
ChatGPT ChatGPT
Cursor Cursor
Gemini Gemini
Windsurf Windsurf
VS Code VS Code
JetBrains JetBrains
Vercel Vercel
See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

RunPod MCP on Cursor AI Code Editor MCP Client RunPod MCP on Claude Desktop App MCP Integration RunPod MCP on OpenAI Agents SDK MCP Compatible RunPod MCP on Visual Studio Code MCP Extension Client RunPod MCP on GitHub Copilot AI Agent MCP Integration RunPod MCP on Google Gemini AI MCP Integration RunPod MCP on Lovable AI Development MCP Client RunPod MCP on Mistral AI Agents MCP Compatible RunPod MCP on Amazon AWS Bedrock MCP Support

Just plug in your AI agents and start using Vinkius.

RunPod MCP Server connects your AI client directly to RunPod's cloud infrastructure. It lets you manage GPU pods—creating new instances, checking existing ones, and stopping compute cycles—all through natural language commands.

Need to provision a large-scale ML model or check serverless endpoints? This tool manages the entire lifecycle of high-power computational resources.

What your AI agents can do

Create pod

Creates a new GPU pod by specifying the name, required GPU type, and Docker image.

Get pod

Fetches all specific details for one particular GPU pod using its unique ID.

List endpoints

Lists every serverless endpoint currently configured to handle containerized inference applications.

+ 4 more capabilities included
Provision GPU Pods

Creates a new, dedicated computing instance (pod) with specified GPU types and Docker images.

Check Instance Status

Retrieves detailed information for a specific, existing GPU pod ID.

List All Resources

Shows every active or paused GPU pod currently registered in the account.

Halt Compute Cycles

Stops a running GPU pod, immediately pausing billable compute activity and saving costs.

Audit Endpoints

Lists all registered serverless endpoints that route containerized inference applications.

Plan Resource Needs

Retrieves available GPU hardware types and saved pod templates for deployment planning.

Supported MCP Clients

Claude Claude
ChatGPT ChatGPT
Cursor Cursor
Gemini Gemini
Windsurf Windsurf
VS Code VS Code
JetBrains JetBrains
Vercel Vercel
+ other MCP clients
Free for Subscribers

Waiting for input…

AI Agent

RunPod MCP Server: 7 Tools for GPU Pod Management

These tools let you manage the entire lifecycle of your compute resources—from listing available GPUs to spinning up new pods and shutting them down.

create019d7601

create pod

Creates a new GPU pod by specifying the name, required GPU type, and Docker image.

get019d7601

get pod

Fetches all specific details for one particular GPU pod using its unique ID.

list019d7601

list endpoints

Lists every serverless endpoint currently configured to handle containerized inference applications.

list019d7601

list gpu types

Retrieves a list of all available GPU hardware types you can deploy.

list019d7601

list pods

Provides an inventory listing of every GPU pod currently in the account.

list019d7601

list templates

Shows saved and pre-configured templates that can be used for new pod creation.

stop019d7601

stop pod

Stops a running GPU pod, immediately halting billing operations for compute cycles.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

  • Import from OpenAPI, Swagger, or YAML specs
  • Create Agent Skills with progressive disclosure
  • Deploy to edge with MCPFusion framework
  • Built in DLP, auth, and compliance on every call
  • Real time usage dashboard and cost metering
  • Publish to catalog or keep private
Start building

Make Your AI Do More

Start with RunPod, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

  • Use this MCP plus 4,700+ others, all in one place
  • Add new capabilities to your AI anytime you want
  • Every connection is secured and compliant automatically
  • Track usage and costs across all your servers
  • Works with Claude, ChatGPT, Cursor, and more
  • New servers added to the catalog every week

What you can do with this MCP connector

This server connects your AI client straight into RunPod’s cloud infrastructure. You can manage GPU pods—creating new instances, checking existing ones, and shutting down compute cycles—all through natural language commands.

create_pod lets you spin up a brand-new GPU pod by specifying the name, the GPU type you need, and the Docker image it'll run off of. list_pods gives you an inventory list showing every single GPU pod currently running in your account. If you need to check specific details about one pod, get_pod fetches all that info using its unique ID.

To plan out what you gotta build, list_gpu_types retrieves a clean rundown of every available GPU hardware type you can actually deploy. For templates, list_templates shows you the saved and pre-configured setups you can use when starting new pods. You'll also see all running endpoints by calling list_endpoints, which lists every serverless endpoint set up to handle containerized inference applications.

When you’re done with a pod, stop_pod shuts it down instantly, halting billing on those compute cycles and saving your cash. The whole system lets your agent act like a full-on MLOps engineer: managing complex computational workloads, spinning up new hardware instances, and auditing serverless endpoints—all through chat.

Here’s the rundown of what you can do with these tools:

Provisioning Compute: You gotta build something? Use create_pod to spin up a dedicated computing instance (a pod). Just tell it the name, the GPU type required, and which Docker image you want. If you're planning ahead, checking available hardware types with list_gpu_types or reviewing saved configurations via list_templates helps keep your build process smooth.

Checking Status & Inventory: You can get a complete picture of what’s going on by running list_pods, which shows every active or paused pod registered to the account. Need details on just one? Use get_pod with its unique ID to pull all specific info. You can also check out all configured endpoints using list_endpoints, seeing exactly where your containerized inference applications are running.

Cost Management: When the job is done, you gotta kill it fast. Running stop_pod immediately halts a running GPU pod, which means billing for compute cycles stops right there. This keeps your costs under control.

The Lifecycle: The tools manage the whole lifecycle of high-power computational resources. You use create_pod to start it up, check its status with get_pod, and when you’re done, you hit stop_pod. If you want to see what hardware is available for future builds, run list_gpu_types. To keep tabs on everything, list_pods gives the full inventory.

You can also review pre-set options using list_templates, or check out all your serverless endpoints with list_endpoints.


You're not just reading data here; you're executing changes on a live cloud account. The agent uses create_pod to build nodes, using specified GPU types and Docker images. It checks the status of existing instances with get_pod via a unique ID. To see all resources, it runs list_pods. When you're done with compute cycles, stop_pod shuts everything down to save money.

Planning requires checking available hardware types using list_gpu_types or reviewing saved setups through list_templates. The server also audits your infrastructure by listing all configured endpoints via list_endpoints.

How RunPod MCP Works

  1. 1 First, you enable the RunPod orchestration integration in your core interface.
  2. 2 Next, sign into your RunPod cloud console. Go to 'Settings' > 'API Keys' and generate a new API Key with Read/Write permissions; copy this key.
  3. 3 Finally, paste that secret key into the secure connection module below. Your agent can now run commands like, "List all active GPU pods and point out any that are sitting idle without active usage."

The bottom line is: you link your AI client to RunPod using an API key so it can execute infrastructure management tasks directly on your cloud account.

Who Is RunPod MCP For?

This is for the MLOps Specialist who's tired of clicking through three different dashboards just to check compute costs. It’s also for the AI Developer who needs to provision a high-power inference endpoint without leaving their coding environment. If your job involves running or stopping serious hardware, this is for you.

DevOps Engineer

Manages the full lifecycle of compute resources—provisioning pods via create_pod, listing all assets with list_pods, and shutting down costs using stop_pod.

ML Ops Specialist

Needs to audit serverless deployments by running list_endpoints or check the latest available GPU hardware types using list_gpu_types before deployment.

AI Architect

Designs and manages complex, multi-stage agent pipelines; they use this to spin up specific templates (create_pod) for testing model performance.

What Changes When You Connect

  • Cost Control: Need to stop burning money? Just ask the agent to stop_pod. It halts active hourly billing for a specific pod ID, letting you save cash instantly without logging into the web dashboard.
  • Instant Provisioning: Don't waste time searching documentation. You can use create_pod to spin up an entirely new GPU node—specifying everything from name to image—and it happens right in your chat interface.
  • Resource Planning: Before you deploy, check what hardware you have with list_gpu_types. This tool gives you a clean list of all available GPUs, so you know if the model needs an A10G or something else.
  • Full Visibility: Run list_pods to get a complete inventory view of your entire GPU account. You can see every pod—running, paused, etc.—without navigating through multiple tabs.
  • Serverless Auditing: Keep track of all your microservices with list_endpoints. This shows you exactly which containerized inference applications are active and where their data is routed.
  • Template Reuse: If you built a perfect setup once, don't rebuild it. Use list_templates to see saved configurations, then use them when calling create_pod for faster deployments.

Real-World Use Cases

01

Stopping Overrun Costs

A team was testing a new LLM model and left the development pod running all weekend. Instead of logging in to manually find and stop the instance, they prompt their agent: "Pause pod with ID 'pod_xyz_980' immediately." The agent runs stop_pod, halting billing cycles right away.

02

Scaling Up for Testing

The ML architect needs to test a model on more powerful hardware than the current cluster. They first check available options using list_gpu_types, confirm they need A10G GPUs, and then use create_pod with the required specs to spin up two new nodes.

03

Checking Service Health

The DevOps engineer notices a microservice is failing intermittently. They ask the agent to run list_endpoints. The output shows exactly which registered endpoint isn't routing traffic, pointing them straight to the broken service.

04

Auditing Pre-Built Setups

A developer needs a baseline compute environment for testing. Instead of manually setting up Docker images and GPU types, they check list_templates to find an existing 'Llama-3 Base' template and use it with create_pod.

The Tradeoffs

Assuming the pod is running.

A user needs a resource but just types, "I need a GPU for my model.". The agent has no context and can't act.

Don’t assume. First, check capacity by calling list_gpu_types, then use create_pod with the specific specs you verified.

Forgetting to audit running pods.

The team finishes a test but forgets to turn off the pod, leading to continuous billing charges overnight. This wastes money.

Always run list_pods after testing cycles are complete, then immediately execute stop_pod on any idle instances.

Over-complicating resource creation.

The user tries to manually construct the entire YAML manifest for a new pod instance every time they deploy. This is slow and error-prone.

Use list_templates first. Then, use create_pod referencing a saved template ID—it's much faster.

When It Fits, When It Doesn't

You should use this MCP Server if your workflow involves managing physical computational resources: spinning up GPUs for model inference, checking current usage logs, or shutting down compute environments. The key is that you are dealing with high-cost, stateful hardware.

Don't use it if you just need to read simple data (like fetching a contact list) or run basic messaging tasks. For those things, an API client connected to a database or a chat service works better. This tool is strictly for infrastructure management.

Always remember the sequence: Plan with list_gpu_types > Save configuration with templates > Execute with create_pod. Never skip planning.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by RunPod API. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 7 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

create_pod get_pod list_endpoints list_gpu_types list_pods list_templates stop_pod

Checking compute status shouldn't require a dashboard deep-dive.

Right now, checking on your cluster means clicking into the RunPod web console. You jump to the 'Pods' tab, then maybe you have to filter by status (Running/Paused). If you want to know which pods are idle and costing money unnecessarily, it’s a three-click ordeal just to get basic information.

With this MCP Server, you just ask your agent: "List all active GPU pods and point out any that are sitting idle." The system runs `list_pods` and filters the output for you. You get an immediate list of exactly what's wasting cycles.

RunPod MCP Server gives you full control over your compute lifecycle.

Gone are the manual steps: no more logging into a separate billing dashboard just to see active costs. No more needing to switch context between a deployment tool and an infrastructure manager. Now, everything—from listing available GPUs (`list_gpu_types`) to spinning up a full-scale pod via `create_pod`—happens in one conversation thread. It's that simple.

Common Questions About RunPod MCP

How do I use the RunPod MCP Server to list my currently running pods? +

You use the list_pods tool. This command immediately gives you an inventory of every GPU pod in your account, showing their status and basic details.

What is the best way to save a GPU setup using RunPod MCP Server? +

Use list_templates first to see what's available. Then, if you build a perfect configuration, use the underlying mechanism to save it as a template for later calls to create_pod.

Should I call `get_pod` or `list_pods`? +

If you know the exact ID of the pod (e.g., 'pod-123'), use get_pod. If you want a general overview of everything in your account, run list_pods.

How do I stop billing using the RunPod MCP Server? +

You must call the stop_pod tool and provide the specific pod ID. This action halts the compute cycle immediately and saves you money.

When using `create_pod` or `get_pod`, what API key permissions are required? +

The agent needs Read and Write access to function. This setup allows it to both read pod status details and write new resources, like provisioning a machine.

If my attempt to `create_pod` fails, what's the most common issue? +

Usually, the Docker image or GPU type isn't found. Check those specific parameters in your request before retrying the command.

How do I see what pod configurations are pre-approved using `list_templates`? +

list_templates shows saved blueprints for pods. You can reuse these templates to ensure consistent setups without having to manually define every parameter.

What does the `list_endpoints` tool show me about my serverless apps? +

It lists all registered containerized inference applications. This lets you audit and review your active, serverless endpoints that are running in production.

Can the AI forcefully terminate or delete critical production endpoint fleets on demand? +

No. This module safely allows the AI to only pause and manage running instances. Destructive deletion actions (like completely erasing a pod) are intentionally prohibited by the tooling design to protect your critical compute resources from unintended loss.

Can the AI provision large GPU arrays automatically? +

Yes. Using the create_pod capability, the AI can query the available hardware models (such as A100 or H100) and immediately launch new Docker clusters based on existing community templates, simplifying complex DevOps scaling actions significantly.

Will the AI know the billing state or the real-time cost of running each endpoint? +

No. The current RunPod AI module is concentrated on operational control and system orchestration, such as discovering inactive processes and booting new instances. Deep billing analytics or invoice extraction is not natively integrated in the commands exposed to the AI at this time.

More in this category

You might also like

Built & Managed by Vinkius 30s setup 7 tools

We've already built the connector for RunPod. Just plug in your AI agents and start using Vinkius.

No hosting. No infrastructure. No complex setup.
All 7 tools are live and waiting. You're up and running in seconds.

Claude Claude
ChatGPT ChatGPT
Cursor Cursor
Gemini Gemini
Windsurf Windsurf
VS Code VS Code
JetBrains JetBrains
Vercel Vercel
+ other MCP clients

Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.

Zero hosting required Full MCP catalog included Enterprise-grade security Auto-updated by Vinkius

Built, hosted, and secured by Vinkius. You just connect and go.