MCP Server

Vast.ai GPU MCP for AI. Search, deploy, and manage your entire compute fleet from chat.

Q: How do I find the best GPU price using searchoffers?

The agent uses searchoffers with a JSON query like {"gpuname":{"eq":"RTX 4090"}}. The results provide multiple offers, letting you compare prices and VRAM to pick the optimal one.

Q: What information does listinstances give me?

listinstances gives you a summary of all your rented GPUs. You get the Instance ID, current status (running/paused), IP address, and hourly cost estimate for each one.

Q: Can I rent an instance without knowing the specific offer ID?

No. You must first use searchoffers to find a valid Offer ID from the marketplace before you can tell the agent to run rentinstance. The ID ties your deployment to a live source.

Q: Is calling deleteinstance permanent?

Yes, calling deleteinstance terminates the compute job and frees up the resource immediately. This stops billing for that specific instance ID.

Q: If I see an old instance via listinstances, what is the fastest way to stop charges?

Calling deleteinstance terminates the compute resources instantly. This action stops all usage and prevents further billing from Vast.ai immediately.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

How this MCP server connects to your AI agent

Vast.ai GPU Rental Cloud API connects your AI client to the world's largest marketplace for high-performance GPUs. Use this server to search hardware like RTX 4090 or A100, spin up Docker containers instantly, and manage the entire compute lifecycle—from deployment to termination—without leaving your chat window.

What AI agents can do with Vast.ai (GPU Rental Cloud API) Automation

Delete instance

Stops and deletes a specific GPU instance you have rented on Vast.ai.

List instances

Retrieves a list of all your currently active or paused GPU instances on Vast.ai.

Rent instance

Initiates the process of renting a new, live GPU instance on Vast.ai using defined parameters.

+ 1 more capabilities included

Discover GPU Offers

Search the Vast.ai marketplace using specific hardware names or criteria to find available pricing and offers.

Provision Compute Instances

Rent a full GPU instance by selecting an offer ID, specifying a container image, and setting disk size.

Manage Active Fleet Status

List all current instances you've rented to check their live status, cost per hour, and network details.

Decommission Resources

Terminate and delete running GPU instances immediately when your work is finished.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

What AI agents can do with Vast.ai (GPU Rental Cloud API): 4 Tools for Compute Management

These four tools let you search the marketplace, deploy containers, monitor your GPU fleet status, and clean up resources via natural language commands.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Vast.ai (GPU Rental Cloud API) on Vinkius

Delete Instance

Stops and deletes a specific GPU instance you have rented on Vast.ai.

List Instances

Retrieves a list of all your currently active or paused GPU instances on Vast.ai.

Rent Instance

Initiates the process of renting a new, live GPU instance on Vast.ai using defined...

Search Offers

Queries the marketplace to find available GPUs and their current pricing by...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Vast.ai GPU integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "vastai-gpu-rental-cloud-api": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Vast.ai GPU tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"vastai-gpu-rental-cloud-api": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Vast.ai (GPU Rental Cloud API), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 4 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Web consoles make you copy-paste Offer IDs all day., Solved with Vinkius AI Gateway

Right now, if you want an A100 GPU, the process is tedious. You open the console, filter by hardware name, note down a promising Offer ID, then switch to your coding environment. Then, you have to copy that ID into another form just to start the rental—it's clicks and clipboard management.

With this MCP server, you skip all of that. You tell your agent: 'I need an A100.' Your AI client runs `search_offers`, gives you a list, and when you pick one, it executes the whole deployment (`rent_instance`) in one go. It's immediate.

Vast.ai GPU Rental Cloud API: Full Instance Control

The biggest manual steps that disappear are monitoring and cleanup. You don't have to manually check if the job is still running, or worry about remembering to delete it when you walk away from your desk.

Now, you just ask: 'List my instances.' The agent runs `list_instances`, giving you a clean status report. Need to stop billing? Ask it to run `delete_instance`. It's that simple.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

gpu-rental

deep-learning

docker-deployment

cloud-computing

ai-infrastructure

What your AI can actually do with this

Vast.ai GPU Rental API - Manage Cloud Compute

Look, forget navigating some clunky web console just to spin up a few GPUs for your deep learning model. This server connects your AI client straight into Vast.ai's massive marketplace. You can search for hardware like an RTX 4090 or an A100, deploy containers in seconds, and manage the whole compute life cycle—from when you start paying to when you shut it down—all without leaving your chat window.

What This Server Does

search_offers lets you query the marketplace. You give it specific hardware criteria using JSON format, and it spits back available GPUs and their current pricing structure. You ain't gotta guess what's out there; you just tell your agent what specs you need, and it checks all the offers for you.

Need to know if that A100 you want is actually available today? Run search_offers with your criteria. It tells you exactly which models are listed right now and how much they're costing per hour. This keeps you from wasting time looking at dead ends or deals that went stale five minutes ago.

When you find the perfect setup, rent_instance handles the deployment. You tell it the specific offer ID you want to use, what container image your code needs (like PyTorch or TensorFlow), and how much disk space you require for your data sets. It immediately initiates the process of renting a full GPU instance on Vast.ai.

You don't click buttons; you just send the command, and your compute environment is getting spun up.

Once it's live, list_instances gives you the rundown. You can list every single GPU instance you currently have rented—whether they're running hard right now or paused in a holding pattern. This tool shows you their live status, how much they cost per hour based on your usage time, and all the necessary network details like IP addresses.

It’s your central dashboard for keeping tabs on your whole fleet.

When your job is done—and it's always done—delete_instance steps in. You run this command, specifying the exact GPU instance you wanna kill, and it immediately terminates and deletes that resource. This isn't just a suggestion; it physically stops the billing cycle for that machine. It keeps your environment clean and makes sure you don't get accidentally charged when you walked away from your laptop.

You use search_offers to find what you need, then you use rent_instance to bring it online with its specific container image and disk size requirements. Once everything is running, you check the status of all your machines using list_instances. When the work’s done, you hit it with delete_instance to shut down the whole thing immediately.

This system means you never have to switch contexts or log into a separate portal. You keep doing your research and coding right here, letting your agent manage the entire hardware lifecycle from finding an offer to powering down the machine. It’s fast. It's clean. And it saves you cash.

Built · Hosted · Managed by Vinkius Vast.ai GPU Rental API - Manage Cloud Compute

Server ID 019e5d64-f0aa-73f0-bffa-693fcb9ac2fb

Vinkius Inspector

Compliance Grade F

Score 3.6/100

Report View Report ↗

Here's how it actually works

The bottom line is: your AI client handles all API calls—searching for hardware, deploying containers, and cleaning up after itself—all through natural conversation.

Subscribe to the server and provide your Vast.ai API Key.

Ask your AI client to execute a search using search_offers (e.g., 'Find RTX 4090 offers').

Once you select an Offer ID, tell the agent to run rent_instance with the required Docker image and disk size.

What Changes When You Connect

Speed: Instead of navigating multiple web consoles to find the best price-to-performance hardware, search_offers finds it instantly. You get immediate GPU discovery right where you are working.

Efficiency: Deploying a container is simple. Using rent_instance, you tell your agent exactly what image and disk size you need, spinning up a functional environment in seconds.

Visibility: Never wonder if your job is still running or what it costs. list_instances gives you real-time status, IP addresses, and cost tracking for all active deployments.

Cost Control: The biggest win is cleanup. When the task finishes, run delete_instance. This ensures you stop paying immediately, avoiding expensive orphaned resources.

Focus: Your AI client handles the API calls—the complex JSON queries, ID management, and lifecycle commands—so you just focus on your code.

See it in action

01 01

Training a New LLM Model

The ML Engineer needs an A100 GPU for training. Instead of manually checking the Vast.ai dashboard, they tell their agent: 'Search for A100s and rent one using the latest PyTorch image.' The agent runs search_offers, gets the cheapest ID, and executes rent_instance. When done, it calls delete_instance to stop billing.

02 02

Comparing GPU Performance

A Data Scientist needs to test three different hardware types (RTX 4090 vs. A100). They use search_offers four times, noting the price and VRAM for each. Then, they deploy a small container on each using rent_instance so they can benchmark them side-by-side before committing to a full deployment.

03 03

Monitoring Stale Resources

The DevOps team finishes a weekend test run but forgets which instances are still active. They simply ask their agent: 'What GPUs am I currently running?' The agent runs list_instances, showing them the IDs and statuses, allowing them to immediately call delete_instance on anything forgotten.

04 04

Debugging a Deployment Failure

A service fails because its GPU environment is unstable. Instead of logging into multiple portals, the engineer asks their agent to list all running jobs via list_instances. This reveals the instance ID and status (e.g., 'failed'), allowing them to target that specific resource with a cleanup command.

The honest tradeoffs

Manual Console Management

Anti-pattern

Spending 30 minutes clicking through the Vast.ai website, copying Offer IDs, and manually starting/stopping services in different web tabs.

The Fix

Use your AI agent to handle the whole flow: first run search_offers to find the best ID; then tell it to run rent_instance; and finally, make sure you call delete_instance when finished. It's all conversational.

Assuming Instances are Off

Anti-pattern

Completing a job and assuming that just because the script exited, billing has stopped. This is often not true for cloud resources.

The Fix

Always verify status using list_instances. If you don't need it, explicitly call delete_instance with the instance ID to guarantee zero charges.

Vague Searching

Anti-pattern

Asking the agent generally for 'a good GPU'. This leads to vague or unspecific results because hardware requirements are complex.

The Fix

Be specific in your search. Use search_offers and provide structured JSON criteria, like {"gpu_name":{"eq":"RTX 4090"}}, to nail down the exact hardware you need.

Questions you might have

How do I find the best GPU price using search_offers? +

The agent uses search_offers with a JSON query like {"gpu_name":{"eq":"RTX 4090"}}. The results provide multiple offers, letting you compare prices and VRAM to pick the optimal one.

What information does list_instances give me? +

list_instances gives you a summary of all your rented GPUs. You get the Instance ID, current status (running/paused), IP address, and hourly cost estimate for each one.

Can I rent an instance without knowing the specific offer ID? +

No. You must first use search_offers to find a valid Offer ID from the marketplace before you can tell the agent to run rent_instance. The ID ties your deployment to a live source.

Is calling delete_instance permanent? +

Yes, calling delete_instance terminates the compute job and frees up the resource immediately. This stops billing for that specific instance ID.

How does the `search_offers` tool validate my API Key? +

The server uses your provided Vast.ai API Key for all operations. If the key is invalid or lacks permission to view offers, the tool immediately returns an authentication error code. Always verify your credentials first.

When using `rent_instance`, what are the technical requirements for the Docker image? +

You must provide a valid container registry path for the specified Docker image (e.g., PyTorch or TensorFlow). The system uses this path to pull and deploy your compute environment.

Can I refine my GPU search using `search_offers` beyond just the hardware name? +

Yes, you can structure the JSON query to include multiple constraints, such as minimum VRAM or maximum hourly cost. You need to provide a detailed JSON object for advanced filtering.

If I see an old instance via `list_instances`, what is the fastest way to stop charges? +

Calling delete_instance terminates the compute resources instantly. This action stops all usage and prevents further billing from Vast.ai immediately.

How can I find a specific GPU model like an RTX 4090? +

Use the search_offers tool with a query like {"gpu_name": {"eq": "RTX 4090"}}. The agent will return a list of available offers matching that hardware.

What information do I need to rent a new GPU instance? +

You need an offer_id (from search results) and a Docker image name (e.g., 'pytorch/pytorch'). You can also optionally specify the disk size in GB using the rent_instance tool.

How do I stop an instance to avoid further charges? +

Simply use the delete_instance tool with the specific instance_id. This will terminate the instance and release the GPU back to the marketplace.

How this MCP server connects to your AI agent

What AI agents can do with Vast.ai (GPU Rental Cloud API) Automation

Delete instance

List instances

Rent instance

What AI agents can do with Vast.ai (GPU Rental Cloud API): 4 Tools for Compute Management

Delete Instance

Stops and deletes a specific GPU instance you have rented on Vast.ai.

List Instances

Retrieves a list of all your currently active or paused GPU instances on Vast.ai.

Rent Instance

Initiates the process of renting a new, live GPU instance on Vast.ai using defined...

Search Offers

Queries the marketplace to find available GPUs and their current pricing by...

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more

Web consoles make you copy-paste Offer IDs all day., Solved with Vinkius AI Gateway

Vast.ai GPU Rental Cloud API: Full Instance Control

gpu-rental

deep-learning

docker-deployment

cloud-computing

ai-infrastructure

What your AI can actually do with this

What This Server Does

Here's how it actually works

Who is this actually for?

What Changes When You Connect

Training a New LLM Model

Comparing GPU Performance

Monitoring Stale Resources

Debugging a Deployment Failure

The honest tradeoffs

Manual Console Management

Assuming Instances are Off

Vague Searching

When It Fits, When It Doesn't

Questions you might have