Modal MCP for AI. Control your serverless AI infrastructure via chat.
Works with every AI agent you already use
…and any MCP-compatible client








Connect to your AI in seconds.
Modal MCP Server connects your AI agent directly to a high-performance serverless compute backend. It lets you audit active apps, check GPU deployments, and track persistent storage volumes using natural conversation.
Need to manage complex ML infrastructure without touching the CLI? This is it.
What your AI can do
List apps
Lists all active and historical Modal app contexts currently running or stopped.
Get app
Pulls specific details for one Modal App ID you provide.
Stop app
Forces the immediate termination of an active Modal App execution using its ID.
Checks the status and context of all running Modal application instances.
Terminates an actively running Modal App using its ID, preventing further billing charges.
Retrieves a list of all promoted, long-running service deployments and their endpoint details.
Lists the disk network block volumes attached to your Modal account for storage visibility.
Retrieves a list of secret dictionary references and associated environment variable mappings.
Pulls detailed JSON metadata for any single App or Deployment ID you reference.
Ask an AI about this
Waiting for input…
Modal (Serverless AI Infrastructure) MCP Server: 7 Tools
These tools let you audit app status, manage deployments, and control resources on your Modal platform using structured function calls.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Modal (Serverless AI Infrastructure) on VinkiusList Apps
Lists all active and historical Modal app contexts currently running or stopped.
Get App
Pulls specific details for one Modal App ID you provide.
Stop App
Forces the immediate termination of an active Modal App execution using its ID.
List Secrets
Lists every configured secret dictionary reference in your account for auditing...
List Volumes
Shows a list of all persistent disk network block volumes attached to your project.
List Deployments
Provides a list of all actively managed, promoted service deployments on the platform.
Get Deployment
Retrieves detailed status information for a single tracked deployment.
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Modal (Serverless AI Infrastructure), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,100+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Modal. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 7 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
Checking infrastructure status shouldn't require five different dashboards and three tabs of copy-pasting.
Today, checking if a job finished or what resources were used means logging into the platform, finding the App ID in one tab, opening the billing dashboard in another, and manually cross-referencing volumes on a third. It's slow, it's error-prone, and you always feel like you missed something.
With this MCP server, your agent handles all that complexity. You just ask: 'What's the status of my latest model run?' The agent runs `list_apps`, pulls details using `get_app`, and gives you a single, actionable answer right in the chat window.
The Modal MCP Server makes resource control simple.
Before, if an experimental job ran wild, you had to find its specific App ID and then navigate a separate console page just to kill it. This was often confusing, leading to over-billing or missed endpoints.
Now, you tell your agent: 'Kill the app with ID ap-123.' It runs `stop_app` instantly. The job stops, the billing cycle ends, and you get confirmation—no clicking required.
What your AI can actually do with this
Modal MCP Server - Audit GPU & Compute Infrastructure
Look, you gotta manage complex ML infrastructure without opening a terminal and running twenty lines of ugly CLI commands. This server connects your AI client straight to Modal's backend. It lets your agent handle the heavy lifting: checking active apps, auditing GPU deployments, and tracking persistent storage volumes—all just by talking to it.
You'll get full control over high-performance, serverless compute resources through natural conversation. Here’s what you can do with these tools:
- Checking Status: You can use
list_appsto pull a rundown of every Modal application context—whether it’s running right now or if it just finished up. Need to know where your services are? Uselist_deploymentsto get a list of all promoted, long-running service deployments on the platform. - Storage and Security: You gotta see what's attached to your account, so you can use
list_volumesto show every persistent disk network block volume connected to your project. For security checks, runlist_secrets; it gives you a list of all configured secret dictionary references in your account. - Deep Dives: If you need the nitty-gritty on one specific thing, you've got two ways to drill down. Use
get_appand give it an App ID; it pulls precise JSON metadata about that single running application instance. Same deal with deployments: useget_deploymentand reference a Deployment ID to get its detailed status information. - Controlling the Compute: Sometimes you gotta hit the kill switch. If an app’s running too long and racking up bills, use
stop_app. You feed it the App ID, and it forces the immediate termination of that active Modal App execution, stopping billing charges right away.
Basically, if you need to audit resources or check a status without typing out a single command, this is your ride. It gives you visibility into what's running, where it's stored, and lets you shut down runaway processes instantly.
019d75d6-a79a-70ea-9c23-361273b417a7 Here's how it actually works
The bottom line is you get full visibility into complex AI infrastructure by talking to your agent instead of writing boilerplate CLI commands.
Subscribe to this server and provide your Modal Token ID and Secret.
Your AI client uses the tokens to authenticate with your compute environment.
You ask your agent a question (e.g., 'What apps are running?'), and it runs the necessary tool calls, returning the structured data.
Who is this actually for?
This is for ML Engineers and DevOps folks who are sick of spending hours clicking through multiple dashboards or running verbose modal commands just to check if a GPU job failed. If your job involves managing stateful, high-cost cloud compute resources, you need this.
Checks the status of training jobs using list_apps and verifies endpoint details with get_deployment before merging code.
Manages resource cleanup by running stop_app when a test run is finished, or audits credentials using list_secrets.
Checks persistent data integrity by listing volumes with list_volumes, ensuring datasets are mounted correctly before training begins.
What Changes When You Connect
Stop unexpected bills immediately. Instead of logging into the console to find a rogue job, use stop_app to force-terminate an active App execution with a single command.
Keep track of deployed services without manual lookups. Use list_deployments to get all web endpoints and serving configurations in one shot.
Audit your data security instantly. Running list_secrets lets you verify every stored secret reference, which is crucial before deploying sensitive models.
Manage massive datasets easily. list_volumes shows all persistent disk volumes, letting you know exactly where your training data lives across the cluster.
Get deep state info on demand. Use get_app or get_deployment to pull precise JSON metadata for any resource ID, bypassing vague status messages.
See everything at once. Running list_apps gives a clear snapshot of all running and historical compute contexts.
See it in action
The runaway GPU job
A data scientist kicks off an experimental model training run, forgets to monitor it, and gets hit with a massive bill. They ask their agent: 'What apps are running?' The agent uses list_apps to identify the rogue App ID, which the user then passes to stop_app, stopping the billing cycle immediately.
Checking pre-launch readiness
A DevOps engineer is deploying a new microservice. Before merging, they use their agent: 'List all deployments and check the credentials.' The agent runs list_deployments to verify endpoints and then list_secrets to confirm necessary keys are available.
Tracing dataset location
A new ML engineer needs to know where the historical data lives. They ask: 'Show me all stored datasets.' The agent runs list_volumes, providing a list of named persistent disks, allowing the user to confirm which volume ID holds the source files.
Debugging an app failure
An AI engineer finds that a specific application (app-xyz) is failing. They ask: 'What's wrong with this app?' The agent runs get_app and returns the full JSON metadata, letting the user pinpoint if the issue is resource allocation or configuration.
The honest tradeoffs
Treating it like a simple database query
The user assumes they can just ask, 'Give me all my app data.' The agent fails because the system needs explicit context about what state they want (running vs. historical).
You need to guide your agent's scope. Start by running list_apps to see the available IDs, then use get_app [ID] for specific data points.
Over-relying on a single list call
The user runs list_volumes and sees 10 volumes. They assume this means all data is fine, but they don't know which volume holds the active model weights.
Always cross-reference storage with deployment details. Use list_deployments first, then check the associated resource metadata via get_deployment [ID] to pinpoint critical volumes.
Manually tracking state changes
The user sees an app is running and waits for it to finish, wasting time and accruing unnecessary costs while monitoring a dashboard.
If you don't need the job to run, use stop_app [ID] immediately. This handles graceful termination and prevents billing cycles.
When It Fits, When It Doesn't
Use this server if your infrastructure relies on complex resource management: GPU clusters, persistent storage (volumes), or ephemeral, high-cost compute runs. You need to monitor the lifecycle of applications and deployments over time.
Don't use it if you just need simple data—like checking a static API key or reading user profiles. If your needs are limited to basic CRUD operations on non-compute resources, a standard database connector is better.
If you have an active app that costs money and isn't running, you MUST run stop_app to save cash. If you need to know why it failed, use get_app [ID] for the full metadata dump. This tool is your primary operational check-up.
Questions you might have
How do I find out what apps are running using list_apps? +
Run list_apps. This gives you a clear rundown of all active and historical Modal app contexts. It's the first place to check if something is running unexpectedly.
Can I stop an app with stop_app using just the name? +
No, stop_app requires the specific App ID. You must use list_apps or get_app first to get the exact identifier before you can terminate it.
What is the difference between list_volumes and getting app details? +
list_volumes shows all disk volumes attached to your account. get_app provides state-specific information about a single running application, which might reference those volumes.
Do I need list_secrets if I just want to check my app? +
You should run list_secrets whenever you suspect credential issues. It gives an audit trail of every secret dictionary reference attached to your services.
What happens if I use list_secrets with expired or wrong credentials? +
The server immediately fails and returns a specific authentication error code. You must ensure your Modal Token ID and Secret are current before running this tool.
If I run stop_app on an App ID that is already terminated, will it cause an error? +
No, the system handles this gracefully. It returns a status message confirming the app is already inactive and takes no action, preventing unnecessary billing cycles.
Does list_volumes show real-time network usage or just static storage details? +
It only lists the persistent disk block volumes. You get information on size and mount paths for the connected data stores, not active bandwidth metrics.
What if I try to use get_deployment with an ID that was never promoted? +
The tool returns a clear error stating that no tracked deployment exists for that specific ID. This confirms whether the resource is managed by Modal's promotion system.
Can I stop a running Modal app through my agent to save costs? +
Yes. Use the stop_app tool with an active App ID. Your agent will dispatch a termination command to Modal, gracefully stopping the serverless container spin-up and preventing further billing for that specific execution.
How do I check which web endpoints are active for my deployments? +
The list_deployments and get_deployment tools retrieve the Promoted image data. Your agent will expose the public URL endpoints and serving metadata associated with your long-running Modal deployments.
Can my agent audit the secrets and persistent volumes in my workspace? +
Absolutely. Use the list_secrets and list_volumes tools to monitor your infrastructure assets. Your agent will report the names and references for your stored secrets and network block storage mounts attached to your compute instances.
We've already built the connector for Modal. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 7 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.