# Groq MCP

> Groq MCP delivers ultra-fast LLM inference by leveraging LPU hardware acceleration directly through your AI client. It lets you run chat completions on models like Llama 3 and Mixtral with blazing speed, while also handling complex media tasks. You can transcribe audio streams into text, translate non-English speech immediately to English, or force the output into rigid JSON formats for system integration.

## Overview
- **Category:** superpower
- **Price:** Free
- **Tags:** llm-inference, lpu-acceleration, ai-latency, audio-transcription, generative-ai, high-performance-computing

## Description

Connect this MCP to your preferred AI client to gain full control over high-speed generative AI and multimodal workflows. Instead of waiting minutes for complex requests, you run everything—from simple text generation to audio processing—at hardware speed using Groq's LPU architecture. You can instruct the agent to transcribe an audio file, then immediately translate that resulting text into English. Need data for a database? Use structured output to force the AI response into perfect JSON format, eliminating messy parsing steps later on. Furthermore, you don't have to worry about model compatibility; you can use tools like list_models and get_model to check exactly what high-speed models are available before running your main chat completions or creating embeddings for context.

## Tools

### chat_completion
Generates a response using Llama, Mixtral, or Gemma models with ultra-fast inference speed.

### list_models
Retrieves a list of all available high-speed language models you can use.

### get_model
Fetches specific metadata and details about any particular model.

### create_embedding
Converts text into vector embeddings, which allows your AI agent to understand relationships between pieces of text.

### transcribe_audio
Takes an audio file and converts the spoken word into a written transcript.

### translate_audio
Converts non-English audio files into English text translations.

### moderate_content
Checks any given content to determine if it violates safety guidelines.

### structured_output
Forces the AI model to generate output that strictly adheres to a predefined JSON data structure.

## Prompt Examples

**Prompt:** 
```
Ask llama3-70b: 'Write a python function to scrape a website.'
```

**Response:** 
```
Inference complete! Llama 3 response: 'Here is a simple python function using BeautifulSoup and requests to scrape data...' [Blazing-fast response delivered via Groq LPU].
```

**Prompt:** 
```
Transcribe this audio meeting: https://example.com/meeting.mp3
```

**Response:** 
```
Transcription started! I'm using Groq optimized Whisper large-v3 model to parse your meeting audio. I'll provide the full timestamped text for you in just a few seconds.
```

**Prompt:** 
```
Get model info for 'mixtral-8x7b-32768'
```

**Response:** 
```
Retrieving model metadata... Mixtral-8x7b-32768 is a high-performance LLM with a context window of 32,768 tokens. It supports chat completions and tool-calling on Groq's LPU architecture.
```

## Capabilities

### Execute Ultra-Fast Conversational AI
Run text generation, using chat_completion, against accelerated hardware endpoints supporting Llama and Mixtral.

### Process Audio to Text
Transcribe audio files into accurate language transcripts using the transcribe_audio tool.

### Translate Spoken Language
Take non-English audio and retrieve immediate text translations exclusively in English via translate_audio.

### Generate Structured Data
Constrain AI inference to output only valid JSON format using structured_output, perfect for automating data pipelines.

### Embed Text Data
Create high-quality text embeddings using create_embedding for advanced retrieval and context building.

### Manage Model Instances
Check available models or retrieve detailed metadata about specific LLMs through list_models and get_model.

## Use Cases

### Analyzing international meeting transcripts
An operations team member records an audio meeting in Mandarin. They ask their agent to first transcribe the entire file using transcribe_audio, and then immediately run translate_audio on that transcript to get actionable English notes.

### Building a structured knowledge base
A data scientist uploads 10 research papers. They use create_embedding to index the content. Later, they ask their agent a question and retrieve the answer using chat_completion, grounded by the indexed context.

### Automating form submission data
A developer needs an agent to process user input text about a new product. They use structured_output to force the AI to return a clean JSON object containing specific fields like 'product name,' 'price,' and 'category' for immediate API insertion.

### Testing model capabilities pre-launch
A product team needs to know if their new agent can handle different models. They use get_model to check the metadata and context window size of Mixtral before running a final, high-stakes chat completion test.

## Benefits

- You get immediate results when generating text. Using chat_completion means you're not stuck waiting on slow endpoints; responses arrive almost instantly, letting you build real-time applications.
- Your data pipelines become reliable. Instead of hoping the AI gives readable output, using structured_output forces it into perfect JSON, making post-processing trivial and bug-free.
- You handle global content without friction. If you need to process audio from a non-English speaker, combine transcribing with translate_audio to get immediate English text.
- Context retrieval is fast and accurate. By running create_embedding first, your agent can pull relevant knowledge from massive datasets quickly, ensuring the LLM responds with highly specific information.
- Model management happens in context. You don't guess which model works best; you use list_models to check availability before initiating a complex workflow.

## How It Works

The bottom line is that instead of managing separate APIs for speed, media, or structure, everything runs through one unified, blazing-fast connection point.

1. First, subscribe to this MCP and enter your Groq API Key. You'll find the key in your Groq Cloud Dashboard under API Keys.
2. Next, connect it to your AI client—like Cursor or Claude—through Vinkius. Your agent now sees all available high-speed tools.
3. Finally, you prompt your agent with a complex request, and it executes the necessary actions (e.g., transcribe_audio, followed by translate_audio) using accelerated hardware.

## Frequently Asked Questions

**Does Groq MCP support multiple file types?**
Yes, this MCP handles both text and audio files. You can use transcribe_audio on an MP3 or WAV file and then process the resulting text.

**How do I make sure the output is usable in my database using Groq?**
Use structured_output with the tool. By defining a rigid JSON schema, you guarantee that the AI response will match the exact format your database expects.

**Can Groq MCP handle audio translation and transcription together?**
Absolutely. You can chain these operations. First, transcribe_audio captures the speech, and then translate_audio takes that output to provide a clean English text file.

**Why should I use Groq MCP for embeddings instead of another service?**
Groq provides extremely fast context generation. Using create_embedding ensures your knowledge base is updated and searchable with minimal latency, keeping your agents responsive.

**What models can chat_completion access on Groq MCP?**
The chat_completion tool supports several high-performance open-source models, including Llama 3, Mixtral, and Gemma, all optimized for speed.