# String Metrics Analyzer MCP

> String Metrics Analyzer handles text auditing that LLMs fail at. It gives you absolute counts—exact character length, word count, and specific substring occurrences—using pure string mathematics. Need to ensure your meta description is exactly 160 characters or count how many times 'error' appears in a document? Use this server for deterministic text metrics.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** string-analysis, character-counting, tokenization-bypass, text-metrics, deterministic-logic

## Description

Listen, you know how big language models count tokens instead of actual characters? That difference is huge when you're running copy constraints or doing any kind of precise auditing. This server handles that problem by giving you pure string math. It lets your AI client perform deterministic text metrics—you get hard counts, not estimates.

The core function of the `analyze_string_metrics` toolset gives you absolute control over how you measure text. You can use it to count exact characters and words, audit specific substrings, or calculate mathematical scores that tell you exactly how similar two pieces of writing are. It’s built for jobs where approximating a number isn't gonna cut it.

When you need to know the absolute length of any piece of text—including every single space and newline character—you call the tool to get an exact count. This tells you the total character length, period. If your meta description needs to be precisely 160 characters for SEO purposes, this is what you use. It provides that raw, verifiable number.

For word counts, it's equally direct. You pass in a block of text and get a deterministic count of the words inside. It doesn't guess; it just counts based on standard string definition. This makes it perfect for content audits where every single word matters to your usage limits or client requirements.

When you need to audit specific keywords, the tool lets you pass in a main string and a search term, and it gives you an exact count of how many times that term appears within the text. If you're tracking compliance violations or counting instances of a proper name across thousands of documents, this feature is critical.

To check for fuzzy matches or deduplicate content, you use the advanced metrics available via **`analyze_string_metrics`**. It computes several mathematical scores to tell you how far apart two strings are. For instance, it calculates the Levenshtein distance. This metric counts the minimum number of single-character edits—insertions, deletions, or substitutions—needed to change one word into another. A low score means they're pretty close; a high score means they're way off.

Another metric it provides is the Jaccard index. You pass in two sets of text and this tool calculates their similarity based on shared elements relative to all unique elements. This helps you determine if two documents are dealing with the same core concepts even if they use different phrasing. It’s a quick way to gauge content overlap.

These metrics let your agent perform deep text analysis, whether you're trying to see how similar two product descriptions are for potential duplication checks or just need a reliable word count for billing purposes. You never have to worry about an LLM hallucinating a count; this tool gives pure string math results every time.

## Tools

### analyze_string_metrics
Pass strings and get Levenshtein distance, Jaccard index, and exact metrics for deduplication or fuzzy matching.

## Prompt Examples

**Prompt:** 
```
Analyze this blog text and calculate exactly how many times the substring 'Stripe' appears.
```

**Response:** 
```
✅ **String Metrics Analyzed:** The exact occurrence count for 'Stripe' is 14.
```

**Prompt:** 
```
Count the absolute character length of this SEO description, including whitespaces.
```

**Response:** 
```
✅ **Length:** The string has exactly 168 characters natively measured in V8.
```

**Prompt:** 
```
Does this meta title exceed the recommended 60 character threshold?
```

**Response:** 
```
✅ **Evaluation:** Yes. The native count is 72, which exceeds the 60-character limit.
```

## Capabilities

### Count exact characters
It returns the absolute character length of any given text block, including spaces.

### Audit specific substrings
You pass a string and a search term, and it counts exactly how many times that term appears.

### Calculate similarity scores
It computes mathematical metrics (like Levenshtein distance) to determine how similar or different two strings are.

### Get word count
The tool provides a deterministic count of the words in your text block.

## Use Cases

### SEO title length enforcement
A content manager writes a meta title that's supposed to be 60 characters but it keeps failing. They ask their agent to run the String Metrics Analyzer, which confirms the native count is 72 characters, forcing them to rewrite and trim the copy until the measurement is correct.

### Content deduplication
An analyst has two versions of a product description. They use `analyze_string_metrics` to calculate the Jaccard index. The score shows they are 92% similar, confirming that one version is just a heavily reworded copy of the other.

### Billing compliance auditing
A billing agent needs to prove exactly how many times a specific service tag was mentioned across hundreds of customer tickets. They run the String Metrics Analyzer, which returns an exact count (e.g., 45 instances), giving them irrefutable data for reporting.

### Ad copy constraints
A marketer drafts three ad headlines and needs to know their absolute character length including all spaces. They feed the text into the String Metrics Analyzer, which confirms one headline is 168 characters, making it instantly unusable for a strict 150-character limit.

## Benefits

- Avoid tokenization errors. When you need to know the *real* character length, this server runs pure math, giving you accurate counts that AI models can't guarantee.
- Pinpoint specific instances. Need to audit a document for every mention of 'API key' or a unique product code? Use the analyzer to get an undeniable count.
- Handle SEO limits perfectly. You can test ad copy and meta descriptions against strict character thresholds, knowing if they pass before publishing anything.
- Check for near-duplicates. By running Levenshtein distance via `analyze_string_metrics`, you determine if content is slightly different but functionally the same.
- Verify structural integrity. Quickly get exact word counts or overall string lengths to maintain consistency across large bodies of technical documentation.

## How It Works

The bottom line is you get math that doesn't rely on an AI model guessing what your text means.

1. You send the String Metrics Analyzer your source text and what you need to measure (e.g., 'count the word X' or 'get character length').
2. The server runs pure JavaScript string math, ignoring LLM tokenization rules, to calculate the precise metrics.
3. It returns a definitive count or score—for example, 'The exact occurrence count is 14,' or 'The length is 168 characters.'

## Frequently Asked Questions

**How does String Metrics Analyzer work around LLM tokenization limits?**
It uses pure string mathematics instead of language model tokens. This means it counts actual characters and letters directly, bypassing the way an AI client normally breaks text into chunks for processing.

**Can I use analyze_string_metrics to find how many times a word appears?**
Yes. You pass the source string and the specific substring (the word or tag) you're looking for, and it returns an exact count of every occurrence.

**Is String Metrics Analyzer better than standard NLP libraries for counting?**
For pure character counts and strict auditing, yes. Standard NLP libraries often abstract away the raw string layer; this tool operates directly on the characters to guarantee accuracy.

**What kind of similarity scores can analyze_string_metrics calculate?**
It computes common metrics like Levenshtein distance (edit distance) and Jaccard index, which are standard ways to quantify how mathematically close two pieces of text are.

**How does String Metrics Analyzer handle text encoding and special characters?**
It processes all standard UTF-8 character sets accurately. The engine doesn't treat exotic symbols or non-Latin characters differently; it counts them as distinct, measurable units of length.

**What are the rate limits for running analyze_string_metrics on large documents?**
While we handle high volumes, please monitor the usage dashboard for specific throughput caps. For massive batch processing, it's best to chunk your data and run separate calls to avoid hitting temporary rate limits.

**Does String Metrics Analyzer support metrics across different languages?**
Yes, it calculates deterministic string metrics regardless of the language used. It counts characters by their native encoding unit, so Hindi or Japanese text is audited just as accurately as English.

**What programming context should I use to connect String Metrics Analyzer?**
Since this server runs via MCP, you simply invoke the `analyze_string_metrics` function within your connected agent's code. You don't need specific library installations outside of standard client protocols.

**Why not just ask the LLM to count?**
Because LLMs process text in chunks called 'tokens', not individual characters.

**Does it count whitespaces?**
Yes, it provides an exact Javascript string length.

**Can it find how many times a word appears?**
Yes, substring occurrence counting is fully supported.