LiteLLM MCP. Control key lifecycle, routing, and spending across all LLMs.

Q: What is the difference between generatekey and createteam?

generatekey creates an isolated key for one microservice. createteam groups multiple keys/services together, allowing you to apply a single budget ceiling across the whole division.

Q: How does getmodelinfo help me with model reliability?

getmodelinfo lists all defined fallback paths. This lets you audit exactly what happens when a primary LLM provider fails, ensuring your redundancy is configured correctly.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

LiteLLM Proxy & Spend Tracking manages your entire LLM gateway through a single proxy layer. It lets you generate isolated API keys, audit spending down to the user and team level, and programmatically manage complex model fallback paths (like OpenAI -> Anthropic).

Stop guessing about costs; control everything from one place.

What your AI agents can do

Create model

Injects entirely new, fresh API endpoints into the proxy runtime for a specific LLM model.

Create team

Creates an isolated team profile that tracks cost limits and operational boundaries for billing.

Create user

Registers a specific end-user identity to track their unique token consumption against proxy logs.

+ 7 more capabilities included

Audit API spending by user and team

You get precise, real-time records of who consumed tokens and how much it cost in USD.

Manage model fallbacks and routing endpoints

Check the exact sequence of models used when a primary provider fails (e.g., OpenAI -> Anthropic).

Isolate API keys for microservices

Generate unique sub-keys, each with its own budget and rate limits, so one service failing doesn't crash the whole system.

Enforce organizational spending boundaries

Create dedicated team profiles that track cost ceilings specific to a department or division.

Maintain infrastructure stability

Instantly delete broken model deployments (delete_model) or malicious API keys (delete_key) to prevent runtime 500 errors.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

LiteLLM (LLM Proxy & Spend Tracking): 10 Tools

Use these tools to audit spending, manage API keys, and programmatically control model routing paths across multiple LLM providers.

create019d75c8

create model

Injects entirely new, fresh API endpoints into the proxy runtime for a specific LLM model.

create019d75c8

create team

Creates an isolated team profile that tracks cost limits and operational boundaries for billing.

create019d75c8

create user

Registers a specific end-user identity to track their unique token consumption against proxy logs.

delete019d75c8

delete key

Removes an existing LLM proxy API key entirely, preventing its use and associated costs.

delete019d75c8

delete model

Deactivates a specific LLM deployment that is causing errors (500s) in the routing path.

generate019d75c8

generate key

Creates a new, distinct proxy API key for a microservice or team, applying defined budget limits.

get019d75c8

get key info

Retrieves the current configuration and established budget bounds for any given API key.

get019d75c8

get model info

Returns a list of all configured fallback paths, showing which models route to which providers.

get019d75c8

get team info

Retrieves the internal logic and cost boundaries associated with a specific team ID.

get019d75c8

get user info

Returns precise usage data for an end-user, including total tokens consumed and calculated USD cost.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with LiteLLM (LLM Proxy & Spend Tracking), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

LiteLLM Proxy & Spend Tracking: You run the whole show.

The LiteLLM Proxy gives you a single control layer for your entire LLM gateway. You ditch guessing about costs and manage everything—from API keys to model fallbacks—in one place using your AI client. Here's how it works.

Cost Management: Knowing Exactly Who's Spending What

You gotta track spending down to the user level, period. When you use create_user, you register a specific end-user identity into the proxy logs. This lets your agent monitor that unique token consumption against real-time usage data. Later, with get_user_info, you pull precise metrics for that end-user—it tells you their total tokens consumed and calculates exactly what that costs in USD.

For departmental tracking, you use create_team to set up an isolated profile. This tracks cost limits and operational boundaries specifically for a department or division. You can then call get_team_info to review the internal logic and established cost ceilings tied to that specific team ID.

Access Control & Isolation: Keys, Limits, and Boundaries

The proxy lets you build rock-solid isolation layers. If one microservice goes sideways, it shouldn't take down the whole stack. You generate unique sub-keys using generate_key. Each of these new proxy API keys can be configured with its own defined budget limits and rate controls, which is killer for multi-tenant setups.

When you need to check what key you're dealing with, or verify its current cost bounds, just run get_key_info against any specific API key. If a key gets compromised, don't sweat it; you can instantly eliminate it using delete_key, which removes the existing proxy API key entirely and prevents all associated costs.

Reliability & Fallbacks: Keeping Things From Crashing

Model failures are inevitable, but this setup handles them. You define complex model fallback paths—for instance, if OpenAI bails out, it automatically tries Anthropic or Groq. You check the exact sequence of models used when a primary provider fails using get_model_info, which returns a list of all configured routing fallbacks and shows exactly which models route to which providers.

The proxy also lets you manage the core model endpoints themselves. If a specific deployment is throwing 500 errors, you don't have to wait for it to fix itself; you use delete_model to deactivate that problematic LLM deployment immediately, keeping your routing path clean. You can also inject entirely new API endpoints into the proxy runtime for an untested or fresh model using create_model.

Operational Tools: The Full Control Panel

This setup lets you manage the infrastructure itself. By generating isolated keys and assigning them to specific teams, you enforce strict organizational spending boundaries right at the source. When your agent uses these tools, it's managing resources programmatically: It creates a team profile with cost tracking (create_team), registers individual users for auditing (create_user), generates highly restricted API access points (generate_key), and ensures that all operational costs are tied back to specific user or team IDs.

You get complete visibility into who consumed tokens, how many total tokens were used, and the corresponding USD cost, giving you full financial accountability over every single call made through your gateway.

How LiteLLM MCP Works

1 Subscribe to the server and provide your LiteLLM API URL along with a Master Key.
2 Your AI client connects through this proxy, allowing you to manage LLM calls conversationally.
3 You issue commands (e.g., 'What did user X spend?') which run tools like get_user_info against the live gateway.

The bottom line is: it puts your entire complex LLM setup behind a single, auditable control panel accessible via chat or code.

Who Is LiteLLM MCP For?

Platform Engineers who are sick of manual key rotation and cost auditing. It's for the ML Ops specialist facing unpredictable cloud bills from multiple models. If your team runs LLMs across more than two providers, you need this.

MLOps Specialist

Uses get_model_info to audit fallback paths and adjusts model deployments using create_model when a provider changes its endpoint.

Platform Engineer

Manages global gateway configuration, generating new sub-keys with rate limits via generate_key for every new microservice.

Backend Developer

Uses the server to generate dedicated user identities (create_user) and test model routing availability without impacting production costs.

What Changes When You Connect

Pinpoint exactly who spent the money. Use get_user_info to track total USD consumption per individual user ID—no more guessing where budget overruns come from.
Stop cascading failures with dynamic model control. If one provider hits an outage, use create_model or audit routing paths with get_model_info to ensure failover works when you need it.
Enforce true organizational separation. With generate_key and create_team, you can assign hard budget limits per department, keeping costs accountable at the division level.
Maintain uptime by proactively cleaning up. If a model deployment breaks or is leaked, use delete_model or delete_key instantly to prevent downstream 500 errors.
Audit everything in one chat session. Instead of hopping between billing dashboards and API logs, simply ask the agent: 'What was the cost for Team X last week?'
Build reliable microservices. Generate isolated sub-keys using generate_key so your new service can't accidentally drain the budget allocated to another team.

Real-World Use Cases

Billing mystery: Identifying cost centers.

A product manager notices costs spiked last month. Instead of blaming a 'team,' they ask their agent: 'Show me all spending for the Marketing department.' The agent runs get_team_info and identifies that a new, unbudgeted service deployed by an individual developer was responsible for 70% of the overrun.

Service deployment failure.

A backend team deploys a new LLM endpoint (e.g., AWS Bedrock Llama 4). Before connecting it, they use create_model to inject the fresh routing path and verify its availability via the agent. They prevent an outage before any live traffic hits the system.

Dealing with model instability.

The primary OpenAI endpoint starts throwing rate limit errors. The engineer runs get_model_info to see the full fallback chain (OpenAI -> Anthropic). They realize they need a new path and use create_model to inject a temporary Groq endpoint, restoring service immediately.

Security cleanup.

A developer accidentally shares their high-privilege API key. The security team uses the agent to run delete_key instantly and permanently, preventing unauthorized access or massive billing charges before the key can be exploited.

The Tradeoffs

Treating keys as global resources

Giving all microservices one main API key because 'it's easier.' This means a leak in Service A drains the budget for Service B.

→ Use generate_key to create distinct sub-keys for every service. Then, use get_key_info to ensure that each new key has its own hard budget limit.

Assuming model failover works.

The code relies on the LLM calling Anthropic, but if Anthropic is down, the request fails silently without a fallback mechanism or error message.

→ Use get_model_info to map out your current fallbacks. If they aren't comprehensive (e.g., missing Groq), use create_model to build in that redundancy.

Ignoring user accountability.

The monthly bill comes, and you have no idea which department or employee caused the spike. You just see 'Total Consumption: $1200.'

→ Always use create_user for new consumers and run get_user_info to break down that total cost into specific, attributable user accounts.

When It Fits, When It Doesn't

Use this server if your LLM architecture involves multiple providers (OpenAI, Anthropic, etc.), has distinct teams or services, or if unpredictable costs are a major pain point. You need it when you can't easily answer: 'Which department caused the last budget spike?'

Don't use it if all you do is call one single API key to one model from one source and never worry about cost attribution. If your primary concern is simply making an API call, a direct SDK integration works fine. But if governance, auditing, or reliability matters—if the system needs to survive provider outages—this proxy layer is non-negotiable.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by LiteLLM. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

create_model create_team create_user delete_key delete_model generate_key get_key_info get_model_info get_team_info get_user_info

Debugging LLM Costs: The Old Way

Today, tracking usage means logging into three separate dashboards. You check the main billing portal for total costs, then switch to a database table to see user IDs, and finally cross-reference another dashboard just to guess which model was running at what time. It’s slow, it's messy, and you always find a gap in the audit trail.

With this MCP Server, that process disappears. You tell your agent: 'What did the Dev team spend on embedding tokens last Tuesday?' The server runs `get_user_info` against all logs, spits out the exact cost, and tells you which model was responsible—all from one chat command.

LiteLLM Proxy: Model Fallback Paths

Before this proxy, if your main API key pointed to OpenAI, and OpenAI had an outage, your application simply broke. You'd have to manually update your code base with conditional logic like 'if openai fails, try anthropic...'. That’s a huge maintenance drag.

Now, you tell the agent: 'What are my fallback paths?' It runs `get_model_info` and shows you the full chain (OpenAI -> Anthropic -> Groq). You manage this routing intelligence programmatically. The service stays up even if one provider goes dark.

Common Questions About LiteLLM MCP

How do I track who is using my LLM tokens with get_user_info? +

The agent tracks total USD consumed and token count per user ID. You simply provide the target username, and get_user_info returns a detailed breakdown of that account's usage.

What is the difference between generate_key and create_team? +

generate_key creates an isolated key for one microservice. create_team groups multiple keys/services together, allowing you to apply a single budget ceiling across the whole division.

Can I use delete_model if my model is just having temporary issues? +

Yes. If a deployment fails consistently (500 errors), running delete_model removes it from the active routing path, immediately stabilizing your service until you fix the underlying issue.

How does get_model_info help me with model reliability? +

get_model_info lists all defined fallback paths. This lets you audit exactly what happens when a primary LLM provider fails, ensuring your redundancy is configured correctly.

How do I check the operational budget and rate limit boundaries using get_key_info? +

It returns the key's specific configuration and hard budget limits. This function lets you verify if a given API Key is nearing its cost cap or hitting predefined usage rates, preventing unexpected service overruns.

If I suspect a compromised service, how fast can I revoke its credentials using delete_key? +

It immediately vaporizes the specified LLM proxy key. This action completely removes the credential from the system, instantly stopping any unauthorized calls and mitigating potential data leaks or financial misuse.

How do I update a production model endpoint without service interruption using create_model? +

You inject fresh routing endpoints directly into the proxy runtime. This allows you to swap out old deployments for new ones—like updated Bedrock or Azure models—without ever taking the live system offline.

What specific data does get_team_info return regarding a Team UUID's operational scope? +

It returns internal logic bounds matched by that Team UUID. You can see exactly which users and services are governed under that team profile, helping you audit cost allocation across different organizational divisions.

Can I check the budget and rate limits for a specific proxy key? +

Yes. Use the get_key_info tool with the specific Key ID. Your agent will retrieve the exact rate limits, budget constraints, and current RPM usage associated with that token.

How do I see the model fallback paths configured in my proxy? +

The get_model_info tool allows your agent to extract the global model directory. You'll see the exact fallback chains (e.g., if OpenAI fails, use Anthropic) and the physical endpoints assigned to each model name.

Can my agent create a new team to track specific division costs? +

Absolutely. Use the create_team tool and provide a JSON payload defining the team name and optional budget limits. Your agent will provision the new team identity in LiteLLM, allowing for precise organizational cost tracking.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript