# CoreWeave MCP

> CoreWeave (AI GPU Cloud) MCP lets you manage specialized, high-performance GPU infrastructure using natural language. You can provision entire clusters, set up secure network boundaries with VPCs, and orchestrate model inference gateways—all without leaving your AI agent environment.

## Overview
- **Category:** cloud-infrastructure
- **Price:** Free
- **Tags:** gpu-cloud, kubernetes, inference, vpc, ai-infrastructure

## Description

Need to run a big ML job or deploy a new model? This MCP lets you manage all the underlying hardware—the GPU compute power and networking—directly through conversation. You can tell your agent to set up an isolated network for your test models, then spin up a dedicated cluster optimized for training, and finally route traffic to the finished service using inference gateways. It’s about making sure your entire AI stack runs reliably at scale. The process covers everything from creating a VPC to monitoring resource usage; you just talk to it. This capability is hosted on Vinkius, giving you access to this essential cloud control panel right alongside hundreds of other specialized MCP tools.

## Tools

### get_vpc
Fetches the current configuration and status of a specified VPC.

### create_capacity_claim
Requests a new amount of capacity needed for running inference models.

### create_cluster
Builds and provisions a new Kubernetes cluster optimized for AI workloads.

### update_cluster
Changes configurations for a live Kubernetes cluster, like scaling node counts.

### update_deployment
Adjusts parameters of an existing inference deployment, maybe changing the model version.

### update_gateway
Modifies rules or routing policies on an inference gateway.

### update_vpc
Changes network settings, like CIDR blocks, for a VPC.

### create_deployment
Sets up a new instance where an already trained model can receive traffic (inference).

### create_gateway
Establishes a new entry point to route and verify access for AI services.

### create_vpc
Creates a brand new, secure network boundary (VPC) for your resources.

### delete_capacity_claim
Removes an existing request for inference capacity.

### delete_cluster
Decommissions and removes a Kubernetes cluster.

### delete_deployment
Takes down a deployed AI model endpoint.

### delete_gateway
Removes an inference gateway, stopping all traffic routing through it.

### delete_vpc
Permanently deletes a Virtual Private Cloud network boundary.

### get_cluster
Retrieves specific details about an existing Kubernetes cluster.

### list_capacity_claims
Lists all active requests for inference capacity.

### list_clusters
Retrieves a list of all managed Kubernetes clusters.

### list_deployments
Shows a catalog of all currently running model deployments.

### list_gateways
Lists every inference gateway configured for traffic routing.

### list_vpcs
Retrieves a summary of all existing network VPCs.

### query_logs
Queries historical logs from the system for debugging purposes.

### query_metrics
Retrieves performance metrics data points (e.g., CPU, GPU utilization).

### update_capacity_claim
Modifies the size or parameters of an existing capacity claim.

## Prompt Examples

**Prompt:** 
```
List all my active CoreWeave clusters.
```

**Response:** 
```
I've retrieved your CKS clusters. You have 2 active clusters: 'production-gpu-1' (ID: cks-7721) and 'research-test-bed' (ID: cks-8832). Would you like details on either of them?
```

**Prompt:** 
```
Show me the details for VPC ID vpc-99402.
```

**Response:** 
```
Fetching VPC details... VPC 'vpc-99402' is currently active with CIDR block 10.0.0.0/16. It is configured for high-bandwidth interconnects between your GPU nodes.
```

**Prompt:** 
```
List all inference deployments and gateways currently configured.
```

**Response:** 
```
I've compiled the list. You have 3 active inference gateways routing traffic to 5 deployments, including your 'llama-3-70b-prod' service. All gateways are reporting healthy status.
```

## Capabilities

### Provision AI Clusters
Create and manage dedicated GPU clusters optimized for intensive machine learning and large-scale AI workloads.

### Isolate Network Segments
Build secure Virtual Private Clouds (VPCs) to keep your compute resources separated from other networks.

### Deploy Model Endpoints
Set up and manage inference gateways that handle traffic routing and authentication for deployed AI models.

### Monitor Infrastructure Health
Check the status of clusters, deployments, and network resources to ensure everything is running optimally.

### Manage Resource Lifecycles
Perform full operations—creation, updating, and deletion—across all core compute and networking components.

## Use Cases

### The Model Promotion Pipeline
A research team finishes training a model on an existing cluster. They need to move it into production. The agent first uses `list_clusters` to find the target environment, then calls `create_deployment` using the new model artifact, and finally establishes traffic via `create_gateway`. It’s fully automated.

### Network Audit and Compliance
Compliance requires proving that all development environments are isolated. The agent uses `list_vpcs` to map out every network segment, then calls `get_vpc` on each one to confirm the required security rules are in place.

### Post-Mortem Failure Analysis
A live AI service suddenly slows down. The ops engineer uses `list_deployments` to find the service, then immediately runs `query_metrics` and `get_cluster` details to diagnose if the bottleneck is compute or network related.

### Environment Teardown
A project concludes, leaving behind numerous resources. The agent systematically uses `delete_vpc`, `delete_gateway`, and `delete_cluster` to ensure zero lingering costs and a clean cloud slate.

## Benefits

- You don't have to manually jump between consoles. You can ask your agent to list all clusters, then check metrics for a specific deployment, and finally create a new VPC—all in one chat session.
- The toolset gives you full lifecycle control. Need to scale? Use `update_cluster`. Finished with a test environment? Run `delete_vpc` to clean up everything, preventing cloud waste.
- Debugging is faster. If an endpoint fails, use `query_logs` and `query_metrics` immediately after listing the deployments via `list_deployments`. You get context instantly.
- Security configuration becomes simple. Instead of manual firewall rules, you just tell your agent to `create_vpc`, guaranteeing network isolation for sensitive models.
- Deployment complexity drops. From creating a new VPC to setting up a gateway and deploying the model, every step is handled with one set of commands.

## How It Works

The bottom line is you get full control over complex cloud infrastructure by simply talking to it.

1. Connect your CoreWeave account credentials to the MCP via your AI client.
2. Use natural language prompts to define necessary resources, such as creating a VPC or listing existing clusters.
3. The agent executes the required sequence of calls and returns structured data detailing the deployed infrastructure.

## Frequently Asked Questions

**How do I list my active clusters using the list_clusters tool?**
Use the agent to call `list_clusters`. This immediately returns a comprehensive catalog of all managed Kubernetes clusters, letting you know exactly what compute power you have available.

**What is the difference between create_vpc and update_vpc?**
Use `create_vpc` when you need a new network boundary. Use `update_vpc` only if you need to change an existing VPC's parameters, like widening its IP range.

**Should I use list_gateways or get_gateway first?**
You should always run `list_gateways` first. This gives you a high-level view of all your entry points; then, if you need details on one, you can ask the agent to fetch it using the specific ID.

**Does create_deployment handle scaling?**
The `create_deployment` tool sets up the endpoint. If you need to scale it later due to increased traffic, use `update_deployment` to adjust its capacity instead.

**When should I use `create_vpc` versus just relying on my existing network setup?**
You create a VPC when you need strict, isolated networking for specific compute resources. This ensures your GPU nodes and services are separated from other traffic by a defined CIDR block. It’s the first step if security isolation is your top priority.

**I'm debugging performance issues; what kind of data do I get when running `query_metrics`?**
The query returns detailed Prometheus metrics, allowing you to monitor real-time resource utilization and latency. You can check specific endpoints or aggregate usage across your cluster fleet to identify bottlenecks.

**If a project finishes, what’s the proper sequence for cleanup using `delete_cluster`?**
You must delete resources in reverse order: first gateways and deployments, then the VPC, and finally the cluster itself. Deleting everything systematically prevents orphaned network rules or billing issues.

**Do I need to run `list_capacity_claims` before attempting to update a deployment? **
It's smart practice to list claims first. This lets you verify your current resource reservation status and ensure that the updated deployment still falls within an available, budgeted capacity claim.

**Can I list all my active Kubernetes clusters across the CoreWeave infrastructure?**
Yes. By using the `list_clusters` tool, your agent will retrieve a complete list of all bare-metal Kubernetes clusters (CKS) managed under your account.

**How do I check the specific network configuration of a VPC?**
You can use the `get_vpc` tool by providing the specific VPC ID. The agent will return detailed information about network isolation and configuration for that resource.

**Is it possible to create a new inference gateway via the AI agent?**
Absolutely. Use the `create_gateway` tool with the required specification JSON. This allows you to set up routing and authentication for your AI model traffic programmatically.