RunPod MCP. Manage your entire GPU compute lifecycle via chat.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
RunPod MCP Server connects your AI client directly to RunPod's cloud infrastructure. It lets you manage GPU pods—creating new instances, checking existing ones, and stopping compute cycles—all through natural language commands.
Need to provision a large-scale ML model or check serverless endpoints? This tool manages the entire lifecycle of high-power computational resources.
What your AI agents can do
Create pod
Creates a new GPU pod by specifying the name, required GPU type, and Docker image.
Get pod
Fetches all specific details for one particular GPU pod using its unique ID.
List endpoints
Lists every serverless endpoint currently configured to handle containerized inference applications.
Creates a new, dedicated computing instance (pod) with specified GPU types and Docker images.
Retrieves detailed information for a specific, existing GPU pod ID.
Shows every active or paused GPU pod currently registered in the account.
Stops a running GPU pod, immediately pausing billable compute activity and saving costs.
Lists all registered serverless endpoints that route containerized inference applications.
Retrieves available GPU hardware types and saved pod templates for deployment planning.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
RunPod MCP Server: 7 Tools for GPU Pod Management
These tools let you manage the entire lifecycle of your compute resources—from listing available GPUs to spinning up new pods and shutting them down.
019d7601create pod
Creates a new GPU pod by specifying the name, required GPU type, and Docker image.
019d7601get pod
Fetches all specific details for one particular GPU pod using its unique ID.
019d7601list endpoints
Lists every serverless endpoint currently configured to handle containerized inference applications.
019d7601list gpu types
Retrieves a list of all available GPU hardware types you can deploy.
019d7601list pods
Provides an inventory listing of every GPU pod currently in the account.
019d7601list templates
Shows saved and pre-configured templates that can be used for new pod creation.
019d7601stop pod
Stops a running GPU pod, immediately halting billing operations for compute cycles.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with RunPod, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
This server connects your AI client straight into RunPod’s cloud infrastructure. You can manage GPU pods—creating new instances, checking existing ones, and shutting down compute cycles—all through natural language commands.
create_pod lets you spin up a brand-new GPU pod by specifying the name, the GPU type you need, and the Docker image it'll run off of. list_pods gives you an inventory list showing every single GPU pod currently running in your account. If you need to check specific details about one pod, get_pod fetches all that info using its unique ID.
To plan out what you gotta build, list_gpu_types retrieves a clean rundown of every available GPU hardware type you can actually deploy. For templates, list_templates shows you the saved and pre-configured setups you can use when starting new pods. You'll also see all running endpoints by calling list_endpoints, which lists every serverless endpoint set up to handle containerized inference applications.
When you’re done with a pod, stop_pod shuts it down instantly, halting billing on those compute cycles and saving your cash. The whole system lets your agent act like a full-on MLOps engineer: managing complex computational workloads, spinning up new hardware instances, and auditing serverless endpoints—all through chat.
Here’s the rundown of what you can do with these tools:
Provisioning Compute: You gotta build something? Use create_pod to spin up a dedicated computing instance (a pod). Just tell it the name, the GPU type required, and which Docker image you want. If you're planning ahead, checking available hardware types with list_gpu_types or reviewing saved configurations via list_templates helps keep your build process smooth.
Checking Status & Inventory: You can get a complete picture of what’s going on by running list_pods, which shows every active or paused pod registered to the account. Need details on just one? Use get_pod with its unique ID to pull all specific info. You can also check out all configured endpoints using list_endpoints, seeing exactly where your containerized inference applications are running.
Cost Management: When the job is done, you gotta kill it fast. Running stop_pod immediately halts a running GPU pod, which means billing for compute cycles stops right there. This keeps your costs under control.
The Lifecycle: The tools manage the whole lifecycle of high-power computational resources. You use create_pod to start it up, check its status with get_pod, and when you’re done, you hit stop_pod. If you want to see what hardware is available for future builds, run list_gpu_types. To keep tabs on everything, list_pods gives the full inventory.
You can also review pre-set options using list_templates, or check out all your serverless endpoints with list_endpoints.
You're not just reading data here; you're executing changes on a live cloud account. The agent uses create_pod to build nodes, using specified GPU types and Docker images. It checks the status of existing instances with get_pod via a unique ID. To see all resources, it runs list_pods. When you're done with compute cycles, stop_pod shuts everything down to save money.
Planning requires checking available hardware types using list_gpu_types or reviewing saved setups through list_templates. The server also audits your infrastructure by listing all configured endpoints via list_endpoints.
How RunPod MCP Works
- 1 First, you enable the RunPod orchestration integration in your core interface.
- 2 Next, sign into your RunPod cloud console. Go to 'Settings' > 'API Keys' and generate a new API Key with Read/Write permissions; copy this key.
- 3 Finally, paste that secret key into the secure connection module below. Your agent can now run commands like, "List all active GPU pods and point out any that are sitting idle without active usage."
The bottom line is: you link your AI client to RunPod using an API key so it can execute infrastructure management tasks directly on your cloud account.
Who Is RunPod MCP For?
This is for the MLOps Specialist who's tired of clicking through three different dashboards just to check compute costs. It’s also for the AI Developer who needs to provision a high-power inference endpoint without leaving their coding environment. If your job involves running or stopping serious hardware, this is for you.
Manages the full lifecycle of compute resources—provisioning pods via create_pod, listing all assets with list_pods, and shutting down costs using stop_pod.
Needs to audit serverless deployments by running list_endpoints or check the latest available GPU hardware types using list_gpu_types before deployment.
Designs and manages complex, multi-stage agent pipelines; they use this to spin up specific templates (create_pod) for testing model performance.
What Changes When You Connect
- Cost Control: Need to stop burning money? Just ask the agent to
stop_pod. It halts active hourly billing for a specific pod ID, letting you save cash instantly without logging into the web dashboard. - Instant Provisioning: Don't waste time searching documentation. You can use
create_podto spin up an entirely new GPU node—specifying everything from name to image—and it happens right in your chat interface. - Resource Planning: Before you deploy, check what hardware you have with
list_gpu_types. This tool gives you a clean list of all available GPUs, so you know if the model needs an A10G or something else. - Full Visibility: Run
list_podsto get a complete inventory view of your entire GPU account. You can see every pod—running, paused, etc.—without navigating through multiple tabs. - Serverless Auditing: Keep track of all your microservices with
list_endpoints. This shows you exactly which containerized inference applications are active and where their data is routed. - Template Reuse: If you built a perfect setup once, don't rebuild it. Use
list_templatesto see saved configurations, then use them when callingcreate_podfor faster deployments.
Real-World Use Cases
Stopping Overrun Costs
A team was testing a new LLM model and left the development pod running all weekend. Instead of logging in to manually find and stop the instance, they prompt their agent: "Pause pod with ID 'pod_xyz_980' immediately." The agent runs stop_pod, halting billing cycles right away.
Scaling Up for Testing
The ML architect needs to test a model on more powerful hardware than the current cluster. They first check available options using list_gpu_types, confirm they need A10G GPUs, and then use create_pod with the required specs to spin up two new nodes.
Checking Service Health
The DevOps engineer notices a microservice is failing intermittently. They ask the agent to run list_endpoints. The output shows exactly which registered endpoint isn't routing traffic, pointing them straight to the broken service.
Auditing Pre-Built Setups
A developer needs a baseline compute environment for testing. Instead of manually setting up Docker images and GPU types, they check list_templates to find an existing 'Llama-3 Base' template and use it with create_pod.
The Tradeoffs
Assuming the pod is running.
A user needs a resource but just types, "I need a GPU for my model.". The agent has no context and can't act.
→
Don’t assume. First, check capacity by calling list_gpu_types, then use create_pod with the specific specs you verified.
Forgetting to audit running pods.
The team finishes a test but forgets to turn off the pod, leading to continuous billing charges overnight. This wastes money.
→
Always run list_pods after testing cycles are complete, then immediately execute stop_pod on any idle instances.
Over-complicating resource creation.
The user tries to manually construct the entire YAML manifest for a new pod instance every time they deploy. This is slow and error-prone.
→
Use list_templates first. Then, use create_pod referencing a saved template ID—it's much faster.
When It Fits, When It Doesn't
You should use this MCP Server if your workflow involves managing physical computational resources: spinning up GPUs for model inference, checking current usage logs, or shutting down compute environments. The key is that you are dealing with high-cost, stateful hardware.
Don't use it if you just need to read simple data (like fetching a contact list) or run basic messaging tasks. For those things, an API client connected to a database or a chat service works better. This tool is strictly for infrastructure management.
Always remember the sequence: Plan with list_gpu_types > Save configuration with templates > Execute with create_pod. Never skip planning.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by RunPod API. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 7 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Checking compute status shouldn't require a dashboard deep-dive.
Right now, checking on your cluster means clicking into the RunPod web console. You jump to the 'Pods' tab, then maybe you have to filter by status (Running/Paused). If you want to know which pods are idle and costing money unnecessarily, it’s a three-click ordeal just to get basic information.
With this MCP Server, you just ask your agent: "List all active GPU pods and point out any that are sitting idle." The system runs `list_pods` and filters the output for you. You get an immediate list of exactly what's wasting cycles.
RunPod MCP Server gives you full control over your compute lifecycle.
Gone are the manual steps: no more logging into a separate billing dashboard just to see active costs. No more needing to switch context between a deployment tool and an infrastructure manager. Now, everything—from listing available GPUs (`list_gpu_types`) to spinning up a full-scale pod via `create_pod`—happens in one conversation thread. It's that simple.
Common Questions About RunPod MCP
How do I use the RunPod MCP Server to list my currently running pods? +
You use the list_pods tool. This command immediately gives you an inventory of every GPU pod in your account, showing their status and basic details.
What is the best way to save a GPU setup using RunPod MCP Server? +
Use list_templates first to see what's available. Then, if you build a perfect configuration, use the underlying mechanism to save it as a template for later calls to create_pod.
Should I call `get_pod` or `list_pods`? +
If you know the exact ID of the pod (e.g., 'pod-123'), use get_pod. If you want a general overview of everything in your account, run list_pods.
How do I stop billing using the RunPod MCP Server? +
You must call the stop_pod tool and provide the specific pod ID. This action halts the compute cycle immediately and saves you money.
When using `create_pod` or `get_pod`, what API key permissions are required? +
The agent needs Read and Write access to function. This setup allows it to both read pod status details and write new resources, like provisioning a machine.
If my attempt to `create_pod` fails, what's the most common issue? +
Usually, the Docker image or GPU type isn't found. Check those specific parameters in your request before retrying the command.
How do I see what pod configurations are pre-approved using `list_templates`? +
list_templates shows saved blueprints for pods. You can reuse these templates to ensure consistent setups without having to manually define every parameter.
What does the `list_endpoints` tool show me about my serverless apps? +
It lists all registered containerized inference applications. This lets you audit and review your active, serverless endpoints that are running in production.
Can the AI forcefully terminate or delete critical production endpoint fleets on demand? +
No. This module safely allows the AI to only pause and manage running instances. Destructive deletion actions (like completely erasing a pod) are intentionally prohibited by the tooling design to protect your critical compute resources from unintended loss.
Can the AI provision large GPU arrays automatically? +
Yes. Using the create_pod capability, the AI can query the available hardware models (such as A100 or H100) and immediately launch new Docker clusters based on existing community templates, simplifying complex DevOps scaling actions significantly.
Will the AI know the billing state or the real-time cost of running each endpoint? +
No. The current RunPod AI module is concentrated on operational control and system orchestration, such as discovering inactive processes and booting new instances. Deep billing analytics or invoice extraction is not natively integrated in the commands exposed to the AI at this time.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
Volvo Cars Connected
Monitor and manage your connected Volvo vehicle — check fuel levels, battery status, door locks, and trip statistics directly via AI.
NVIDIA AI
Access LLMs, embeddings, code generation, and reasoning via NVIDIA API Catalog.
AT&T 5G
Access Open Gateway 5G Network APIs -- Number Verify, Device Location, SIM Swap detection, Quality on Demand, and Network Slicing via AT&T.
You might also like
Baidu Qianfan
Orchestrate Baidu Qianfan AI models — manage chat completions, embeddings, and prompt templates directly from any AI agent.
Glama
Connect your AI agent to the Glama directory. Discover MCP servers dynamically, analyze attributes, and proxy external intelligence networks through a unified gateway natively.
JSBarcode Generator
Generate mathematically exact 1D barcodes (Code128, EAN-13, UPC, ITF, Codabar) as pure SVG vectors for shipping labels and inventory.