# Lambda Labs (GPU Cloud) MCP

> Lambda Labs (GPU Cloud) MCP connects your AI client directly to high-performance GPU infrastructure. Use natural conversation to launch H100 or A100 virtual machines, monitor ML workloads, check pricing, and manage secure SSH keys without touching a dashboard.

## Overview
- **Category:** superpower
- **Price:** Free
- **Tags:** gpu-cloud, machine-learning, infrastructure-as-code, virtual-machines, ai-training, ssh-management

## Description

This MCP gives you full control over powerful cloud compute resources through conversation. Instead of logging into a separate web portal and clicking through menus to provision hardware, your agent handles the entire workflow. You can ask it to launch specific GPU types for training or fine-tuning—say, an H100 box in us-east-1. Need to check which shared file systems are available across multiple workers? Just ask. If you need to shut down a running job to stop billing immediately, the agent terminates it instantly. It also keeps track of all your globally managed SSH keys and helps map persistent storage volumes for multi-node setups. When you connect this MCP through Vinkius, your AI client becomes an infrastructure expert, making complex resource management feel like chatting with a teammate.

## Tools

### list_instances
Retrieves a list of every GPU instance currently running on your Lambda Cloud account.

### get_instance
Pulls detailed information and the specific SSH connection string for one chosen instance.

### launch_instance
Provisions a brand-new GPU virtual machine, like an H100 box, ready for secure access over port 22.

### terminate_instances
Permanently and immediately destroys running GPU instances to stop billing and clean up resources.

### list_instance_types
Shows the catalog of available GPU node types, their specs, pricing, and current regional availability.

### list_ssh_keys
Lists all globally managed SSH public keys within your Lambda infrastructure for auditing purposes.

### list_filesystems
Maps out persistent, shared NAS volumes available for mounting across multiple compute nodes.

## Prompt Examples

**Prompt:** 
```
List all my running GPU instances in Lambda Cloud
```

**Response:** 
```
I've found 2 active instances: 1) 'training-node-01' (1x H100, Region: us-east-1, Status: active), 2) 'inference-test' (1x A100, Region: us-west-1, Status: booting). Would you like the SSH connection strings for either of these?
```

**Prompt:** 
```
Launch a 1x H100 instance in us-east-1 with my 'default-key' SSH key
```

**Response:** 
```
Launching GPU instance… I've triggered the provision of a 1x H100 box in us-east-1 with your 'default-key' attached. The Instance ID is 'inst-98765'. It should be ready for SSH access in approximately 2 minutes.
```

**Prompt:** 
```
What are the available instance types and their current pricing?
```

**Response:** 
```
I've retrieved the latest catalog. Highlights include: gpu_1x_h100 ($2.49/hr), gpu_8x_h100 ($19.92/hr), and gpu_1x_a100_pcie ($1.10/hr). Availability is currently highest in the us-east-1 and us-south-1 regions. Would you like to check for a specific hardware setup?
```

## Capabilities

### Provisioning Compute Resources
Launch new GPU virtual machines (H100/A100) and manage their entire lifecycle from start to finish.

### Monitoring Instance Status
List all currently running instances and retrieve key details like hardware specs, public IPs, and Jupyter Lab tokens.

### Inventory and Cost Planning
Discover available GPU node types across different regions and check their current pricing to plan budgets.

### Secure Access Management
View or manage the globally stored SSH public keys required for secure, zero-trust access over port 22.

### Shared Storage Mapping
Discover persistent shared NAS volumes available to mount across multiple worker nodes simultaneously.

## Use Cases

### Scaling up for a large model run
A Machine Learning Engineer needs 10 A100 GPUs in us-east-1. Instead of manually checking capacity and clicking 'launch' multiple times, they ask their agent to launch the required instances. The agent handles the provisioning using launch_instance and returns a list_instances confirmation.

### Auditing security access
An Ops Specialist needs to verify who has SSH access across all clusters. They use list_ssh_keys, which immediately enumerates every globally managed public key, ensuring compliance and zero-trust policies are met.

### Debugging a failed job
A Data Scientist finds an instance is stuck running old code. They realize they need to stop it before the next billing cycle hits. They ask the agent to terminate_instances, which immediately stops compute and clears the resource.

### Planning a multi-region deployment
A team lead needs to know if they can deploy their model training across two different geographical areas. They use list_instance_types to get the full pricing matrix and check physical availability in both regions.

## Benefits

- Launch powerful machines on demand. Instead of manually going through a dashboard to provision compute, you can ask the agent to launch an H100 instance instantly.
- Stop wasting money immediately. Use the termination tool to destroy compute nodes and stop billing with just a simple command, preventing accidental charges.
- Know your options before you start. You can use list_instance_types to discover every GPU node type and check its current pricing across various regions for budget planning.
- Maintain secure access easily. The agent lets you manage SSH keys by listing all globally managed public keys without having to log into a separate key management system.
- Keep your data centralized. Use list_filesystems to map shared NAS volumes, ensuring that every worker node can mount the same persistent storage for training data.

## How It Works

The bottom line is you use conversational prompts to manage complex infrastructure tasks that used to require manual API calls and dashboard navigation.

1. Subscribe to this MCP and provide your Lambda Labs API Key.
2. Connect the credentials to your preferred AI client (Claude, Cursor, etc.).
3. Ask your agent natural language questions like, 'Launch a 1x H100 instance in us-east-1 with key X' or 'List all running GPU instances.'

## Frequently Asked Questions

**How do I find out what GPU types are available using Lambda Labs (GPU Cloud) MCP?**
You use list_instance_types. This tool shows you the full catalog, including hardware specifications, regional availability, and current pricing matrices so you can plan your training budget.

**Can I launch a new GPU machine using Lambda Labs (GPU Cloud) MCP?**
Yes, use the launch_instance tool. You tell your agent what size and type of box you need, like an H100 or A100, and it handles the provisioning process.

**Does list_instances show me which machine I should connect to?**
list_instances shows a current list of all active compute nodes. If you need the exact connection string for one of those machines, ask the agent to run get_instance.

**How do I ensure my team can access files across multiple machines?**
You use list_filesystems to map out all persistent shared NAS volumes. This ensures that data stored in one location can be mounted simultaneously by every worker node your model uses.

**Is terminating an instance permanent and safe?**
Yes, terminate_instances permanently destroys the GPU machine. Be careful because attached ephemeral drives are vaporized immediately, but it's the fastest way to stop billing.