# Language Detector Engine MCP MCP

> Language Detector Engine provides deterministic language detection for any text, supporting over 400 languages. Instead of relying on a general AI's probabilistic guess, this MCP uses exact N-gram math to classify text into precise ISO 639-3 codes. It reliably tells you the true source language, even when dealing with short or ambiguous phrases.

## Overview
- **Category:** customer-support
- **Price:** Free
- **Tags:** n-gram-analysis, language-detection, deterministic-logic, text-processing, localization, data-validation

## Description

When an agent receives a ticket like 'O produto não chegou,' it can't afford to guess the language and route it incorrectly. This MCP changes that. Instead of letting your AI client make a probabilistic call on what the text means, this engine calculates the language using deterministic N-gram analysis. That calculation returns exact ISO 639-3 codes for over 400 languages, which is critical when failure isn't an option.

It even gives you granular control. You can force it to only check against a specific list of languages, like Spanish or Portuguese, using a whitelist. If the text is too ambiguous to categorize with certainty, it properly returns 'undefined,' preventing your system from hallucinating a language code. Connecting this through Vinkius makes highly accurate localization available across any MCP-compatible client, giving you true confidence in your data routing.

## Tools

### detect_language
Analyzes text using N-gram math to return a precise ISO 639-3 language code for over 400 languages.

## Prompt Examples

**Prompt:** 
```
Detect the language of this support ticket: 'Não consigo acessar minha conta desde ontem'.
```

**Response:** 
```
Detected Language: 'por' (Portuguese). 100% confidence.
```

**Prompt:** 
```
We only support English and Spanish. Detect the language of 'Hola como estas' using the whitelist.
```

**Response:** 
```
Detected Language: 'spa' (Spanish) from the allowed list ['eng', 'spa'].
```

**Prompt:** 
```
Get the top 3 language probabilities for this ambiguous name: 'Alejandro'.
```

**Response:** 
```
Top Candidates: 1. spa (Spanish): 100% | 2. glg (Galician): 82% | 3. cat (Catalan): 64%
```

## Capabilities

### Classify text into ISO 639-3 codes
Determines the exact language of a given string using N-gram analysis and returns standard three-letter codes (e.g., 'por', 'eng').

### Force language scope via whitelisting
Restricts detection to only specified languages, guaranteeing the text belongs to a known set.

### Get probability scores for multiple candidates
Calculates and returns an array of all possible matches along with their precise confidence percentages.

## Use Cases

### Misrouted Support Tickets
A customer sends a ticket in Portuguese that your general AI client mistakenly routes to the Spanish queue. The agent wastes time, and the SLA drops. By calling `detect_language`, you guarantee the correct language code ('por'), ensuring immediate routing to the right team.

### Validating Content for a New Market
You need to verify that all content uploaded by a partner is only English and French. You use `detect_language` with a whitelist, confirming any text outside ['eng', 'fra'] fails immediately, preventing data contamination.

### Analyzing Ambiguous User Names
A user provides a short, ambiguous name like 'Alejandro.' Instead of getting one guess, you call `detect_language` to get the top 3 probability candidates (e.g., Spanish: 100%, Galician: 82%), giving your team context for manual review.

### International Data Pipeline
Your data pipeline processes millions of records from diverse sources. You use `detect_language` to categorize the text, ensuring that downstream systems only process language-specific data streams based on deterministic results.

## Benefits

- Accurate routing: You eliminate the risk of misrouting customer tickets because `detect_language` uses N-gram analysis, not general LLM probability, providing reliable ISO 639-3 codes every time.
- Control over inputs: Need to know if text is Spanish or Portuguese? Pass a whitelist to force evaluation. This prevents false positives from unexpected languages.
- Handle ambiguity safely: When the input data is too unclear, the engine doesn't guess; it returns 'undefined,' allowing your agent to handle the failure gracefully instead of failing silently.
- Deep insights via probability: By using the `all` flag in `detect_language`, you get a full list of potential language matches and their exact confidence scores for complex data points.
- Scalable detection: With support for 400+ languages, this engine handles everything from common global tongues like English (eng) to niche ones like Zulu (zul).

## How It Works

The bottom line is, you get a calculated language code, not an educated guess.

1. Provide the text you need to analyze, ensuring you include as much text as possible for better accuracy.
2. The engine runs the N-gram analysis and checks it against any configured whitelists or blacklists.
3. You get back the language code (e.g., 'spa') or a list of top candidates with their exact probability scores.

## Frequently Asked Questions

**Why is this better than asking Claude to detect the language?**
LLMs often hallucinate languages for short strings or names. They also struggle to provide standardized ISO codes reliably. This engine uses mathematical N-gram analysis (the same technique behind Google Search language detection) to deterministically map text to one of 400+ ISO 639-3 codes.

**What does it mean if it returns 'und'?**
'und' stands for Undefined. It means the text is too short, mostly numbers, or too ambiguous to confidently map to a single language. This is a feature — it prevents your routing logic from making false assumptions.

**Can I force it to choose between specific languages?**
Yes. Pass an array of ISO 639-3 codes to the 'only' parameter (e.g., ['eng', 'por', 'spa']). The engine will only calculate probabilities within that subset.

**When I run `detect_language` on large batches of text, are there any rate limits I should know about?**
The MCP handles standard API rate limiting. For high-volume processing, you'll need to implement exponential backoff in your agent logic. Vinkius manages the overall throughput, but sustained, massive requests require thoughtful throttling on your end.

**What happens if I pass an empty string or null data to `detect_language`?**
The tool is designed to handle non-text input gracefully. If you send nothing, it won't crash; instead, it will return a defined error status indicating that no text was provided for analysis.

**Are the language codes returned by `detect_language` reliable enough for direct database lookups?**
Yes. The output uses standard ISO 639-3 codes (like 'eng' or 'por'), which are industry standards. This means they map directly and reliably to existing localization fields in most modern databases.

**Do I need any special setup when connecting my agent to the Language Detector Engine via MCP?**
No specialized setup is required beyond standard Vinkius authentication. Once your AI client connects through the MCP, you just call `detect_language` using its native API structure.

**Can I pass multiple text segments to `detect_language` at once for comparison?**
You can include multiple distinct texts in a single prompt. The tool will process each segment individually, returning separate detection results and confidence scores for every piece of input you provide.