# LiteLLM MCP

> LiteLLM Proxy & Spend Tracking manages your entire LLM gateway through a single proxy layer. It lets you generate isolated API keys, audit spending down to the user and team level, and programmatically manage complex model fallback paths (like OpenAI -> Anthropic). Stop guessing about costs; control everything from one place.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** llm-gateway, load-balancing, spend-tracking, model-routing, api-key-management, proxy

## Description

**LiteLLM Proxy & Spend Tracking: You run the whole show.** 

The LiteLLM Proxy gives you a single control layer for your entire LLM gateway. You ditch guessing about costs and manage everything—from API keys to model fallbacks—in one place using your AI client. Here's how it works.

### Cost Management: Knowing Exactly Who's Spending What

You gotta track spending down to the user level, period. When you use `create_user`, you register a specific end-user identity into the proxy logs. This lets your agent monitor that unique token consumption against real-time usage data. Later, with `get_user_info`, you pull precise metrics for that end-user—it tells you their total tokens consumed and calculates exactly what that costs in USD. For departmental tracking, you use `create_team` to set up an isolated profile. This tracks cost limits and operational boundaries specifically for a department or division. You can then call `get_team_info` to review the internal logic and established cost ceilings tied to that specific team ID.

### Access Control & Isolation: Keys, Limits, and Boundaries

The proxy lets you build rock-solid isolation layers. If one microservice goes sideways, it shouldn't take down the whole stack. You generate unique sub-keys using `generate_key`. Each of these new proxy API keys can be configured with its own defined budget limits and rate controls, which is killer for multi-tenant setups. When you need to check what key you're dealing with, or verify its current cost bounds, just run `get_key_info` against any specific API key. If a key gets compromised, don't sweat it; you can instantly eliminate it using `delete_key`, which removes the existing proxy API key entirely and prevents all associated costs.

### Reliability & Fallbacks: Keeping Things From Crashing

Model failures are inevitable, but this setup handles them. You define complex model fallback paths—for instance, if OpenAI bails out, it automatically tries Anthropic or Groq. You check the exact sequence of models used when a primary provider fails using `get_model_info`, which returns a list of all configured routing fallbacks and shows exactly which models route to which providers. The proxy also lets you manage the core model endpoints themselves. If a specific deployment is throwing 500 errors, you don't have to wait for it to fix itself; you use `delete_model` to deactivate that problematic LLM deployment immediately, keeping your routing path clean. You can also inject entirely new API endpoints into the proxy runtime for an untested or fresh model using `create_model`.

### Operational Tools: The Full Control Panel

This setup lets you manage the infrastructure itself. By generating isolated keys and assigning them to specific teams, you enforce strict organizational spending boundaries right at the source. When your agent uses these tools, it's managing resources programmatically: It creates a team profile with cost tracking (`create_team`), registers individual users for auditing (`create_user`), generates highly restricted API access points (`generate_key`), and ensures that all operational costs are tied back to specific user or team IDs. You get complete visibility into who consumed tokens, how many total tokens were used, and the corresponding USD cost, giving you full financial accountability over every single call made through your gateway.

## Tools

### create_model
Injects entirely new, fresh API endpoints into the proxy runtime for a specific LLM model.

### create_team
Creates an isolated team profile that tracks cost limits and operational boundaries for billing.

### create_user
Registers a specific end-user identity to track their unique token consumption against proxy logs.

### delete_key
Removes an existing LLM proxy API key entirely, preventing its use and associated costs.

### delete_model
Deactivates a specific LLM deployment that is causing errors (500s) in the routing path.

### generate_key
Creates a new, distinct proxy API key for a microservice or team, applying defined budget limits.

### get_key_info
Retrieves the current configuration and established budget bounds for any given API key.

### get_model_info
Returns a list of all configured fallback paths, showing which models route to which providers.

### get_team_info
Retrieves the internal logic and cost boundaries associated with a specific team ID.

### get_user_info
Returns precise usage data for an end-user, including total tokens consumed and calculated USD cost.

## Prompt Examples

**Prompt:** 
```
List all active model fallback paths in LiteLLM
```

**Response:** 
```
I've retrieved your model configurations. Your current fallback paths include: 'gpt-4' -> 'claude-3-opus', 'text-embedding-3-small' -> 'voyage-2', and 'llama-3' -> 'mistral-large'. Would you like to check the latency for any of these providers?
```

**Prompt:** 
```
Generate a new API key for the 'Customer-Service' team with a $50 monthly budget
```

**Response:** 
```
Generating key… Done. I've successfully provisioned a new sub-key for the 'Customer-Service' team (ID: team-987) with a hard budget limit of $50.00. The new key is: `sk-litellm-abc123...`. You can now use this for their microservice deployment.
```

**Prompt:** 
```
How much has user 'alex_dev' spent on LLM tokens today?
```

**Response:** 
```
Retrieving data for user 'alex_dev'… So far today, this user has consumed 12,450 total tokens across 45 requests, resulting in a total cost of $0.85 USD. Their most used model is 'gpt-3.5-turbo'. Would you like to see their remaining daily budget?
```

## Capabilities

### Audit API spending by user and team
You get precise, real-time records of who consumed tokens and how much it cost in USD.

### Manage model fallbacks and routing endpoints
Check the exact sequence of models used when a primary provider fails (e.g., OpenAI -> Anthropic).

### Isolate API keys for microservices
Generate unique sub-keys, each with its own budget and rate limits, so one service failing doesn't crash the whole system.

### Enforce organizational spending boundaries
Create dedicated team profiles that track cost ceilings specific to a department or division.

### Maintain infrastructure stability
Instantly delete broken model deployments (`delete_model`) or malicious API keys (`delete_key`) to prevent runtime 500 errors.

## Use Cases

### Billing mystery: Identifying cost centers.
A product manager notices costs spiked last month. Instead of blaming a 'team,' they ask their agent: 'Show me all spending for the Marketing department.' The agent runs `get_team_info` and identifies that a new, unbudgeted service deployed by an individual developer was responsible for 70% of the overrun.

### Service deployment failure.
A backend team deploys a new LLM endpoint (e.g., AWS Bedrock Llama 4). Before connecting it, they use `create_model` to inject the fresh routing path and verify its availability via the agent. They prevent an outage before any live traffic hits the system.

### Dealing with model instability.
The primary OpenAI endpoint starts throwing rate limit errors. The engineer runs `get_model_info` to see the full fallback chain (OpenAI -> Anthropic). They realize they need a new path and use `create_model` to inject a temporary Groq endpoint, restoring service immediately.

### Security cleanup.
A developer accidentally shares their high-privilege API key. The security team uses the agent to run `delete_key` instantly and permanently, preventing unauthorized access or massive billing charges before the key can be exploited.

## Benefits

- Pinpoint exactly who spent the money. Use `get_user_info` to track total USD consumption per individual user ID—no more guessing where budget overruns come from.
- Stop cascading failures with dynamic model control. If one provider hits an outage, use `create_model` or audit routing paths with `get_model_info` to ensure failover works when you need it.
- Enforce true organizational separation. With `generate_key` and `create_team`, you can assign hard budget limits per department, keeping costs accountable at the division level.
- Maintain uptime by proactively cleaning up. If a model deployment breaks or is leaked, use `delete_model` or `delete_key` instantly to prevent downstream 500 errors.
- Audit everything in one chat session. Instead of hopping between billing dashboards and API logs, simply ask the agent: 'What was the cost for Team X last week?'
- Build reliable microservices. Generate isolated sub-keys using `generate_key` so your new service can't accidentally drain the budget allocated to another team.

## How It Works

The bottom line is: it puts your entire complex LLM setup behind a single, auditable control panel accessible via chat or code.

1. Subscribe to the server and provide your LiteLLM API URL along with a Master Key.
2. Your AI client connects through this proxy, allowing you to manage LLM calls conversationally.
3. You issue commands (e.g., 'What did user X spend?') which run tools like `get_user_info` against the live gateway.

## Frequently Asked Questions

**How do I track who is using my LLM tokens with get_user_info?**
The agent tracks total USD consumed and token count per user ID. You simply provide the target username, and `get_user_info` returns a detailed breakdown of that account's usage.

**What is the difference between generate_key and create_team?**
`generate_key` creates an isolated key for one microservice. `create_team` groups multiple keys/services together, allowing you to apply a single budget ceiling across the whole division.

**Can I use delete_model if my model is just having temporary issues?**
Yes. If a deployment fails consistently (500 errors), running `delete_model` removes it from the active routing path, immediately stabilizing your service until you fix the underlying issue.

**How does get_model_info help me with model reliability?**
`get_model_info` lists all defined fallback paths. This lets you audit exactly what happens when a primary LLM provider fails, ensuring your redundancy is configured correctly.

**How do I check the operational budget and rate limit boundaries using get_key_info?**
It returns the key's specific configuration and hard budget limits. This function lets you verify if a given API Key is nearing its cost cap or hitting predefined usage rates, preventing unexpected service overruns.

**If I suspect a compromised service, how fast can I revoke its credentials using delete_key?**
It immediately vaporizes the specified LLM proxy key. This action completely removes the credential from the system, instantly stopping any unauthorized calls and mitigating potential data leaks or financial misuse.

**How do I update a production model endpoint without service interruption using create_model?**
You inject fresh routing endpoints directly into the proxy runtime. This allows you to swap out old deployments for new ones—like updated Bedrock or Azure models—without ever taking the live system offline.

**What specific data does get_team_info return regarding a Team UUID's operational scope?**
It returns internal logic bounds matched by that Team UUID. You can see exactly which users and services are governed under that team profile, helping you audit cost allocation across different organizational divisions.

**Can I check the budget and rate limits for a specific proxy key?**
Yes. Use the `get_key_info` tool with the specific Key ID. Your agent will retrieve the exact rate limits, budget constraints, and current RPM usage associated with that token.

**How do I see the model fallback paths configured in my proxy?**
The `get_model_info` tool allows your agent to extract the global model directory. You'll see the exact fallback chains (e.g., if OpenAI fails, use Anthropic) and the physical endpoints assigned to each model name.

**Can my agent create a new team to track specific division costs?**
Absolutely. Use the `create_team` tool and provide a JSON payload defining the team name and optional budget limits. Your agent will provision the new team identity in LiteLLM, allowing for precise organizational cost tracking.