#Llm Inference MCP Servers
Discover 8 MCP servers tagged with Llm Inference on the Vinkius App Catalog.
Groq MCP Server
8 toolsEmpower LLM applications via Groq. Perform ultra-fast LPU-accelerated chat completions, handle audio transcription and translation, and use JSON mode directly from any AI agent.
Groq MCP Server
10 toolsRun large language models at unprecedented speed with custom LPU hardware that delivers real-time AI inference at massive scale.
Cerebras Inference MCP
15 toolsAccess lightning-fast AI inference via Cerebras Wafer-Scale Engine. Generate chat completions, manage models, and run batch jobs at record speeds.
Anyscale MCP
7 toolsOrchestrate your Anyscale infrastructure. Manage LLM queries, vectors, services, and cluster batch jobs directly from your AI agent.
Fireworks AI MCP Server
6 toolsEmpower LLM applications via Fireworks AI. Perform ultra-fast chat completions, generate embeddings and images, and transcribe audio directly from any AI agent.
LocalAI
19 toolsRun LLMs, generate images, and process audio locally. OpenAI-compatible API for your own hardware.
DeepInfra (Serverless LLM Inference)
4 toolsRun top-tier LLMs, image generation, and embeddings via DeepInfra's serverless infrastructure directly from your AI agent.
SambaNova (AI Inference)
3 toolsHigh-speed AI inference for Llama 3, DeepSeek, and MiniMax models via SambaNova's ultra-fast SN40L chips.