# Deterministic Text Summarizer MCP MCP

> The Deterministic Text Summarizer & Extractor MCP provides pure, mathematical text analysis without needing external API calls or hallucination risk. It extracts key information by pulling exact sentences and phrases directly from source material using Term Frequency (TF) algorithms. Use this to reliably find core concepts, analyze keyword density, and condense long documents into actionable data points.

## Overview
- **Category:** knowledge-management
- **Price:** Free
- **Tags:** extractive-summarization, term-frequency, keyword-extraction, text-analysis, nlp

## Description

When you need to extract facts, not interpretations, this MCP is the right tool. Most large language models generate 'abstractive' summaries; they write new text based on what they *think* the source means. That process is prone to hallucination and burns through tokens fast. Our approach flips that script entirely. It uses pure math—Term Frequency analysis—to identify the most statistically important parts of a document, pulling out those exact, unmodified sentences. This MCP lets you analyze text structure directly. Need to find recurring themes or boost SEO content? You can use it to pinpoint the top two-word phrases (bigrams) or count core vocabulary with `extract_top_keywords`. By connecting this through Vinkius, your agent gets a guaranteed way to process complex documents for strict data extraction.

## Tools

### extract_top_bigrams
Pulls the top N most common two-word phrases from a text, ideal for mapping out SEO topics or semantic links.

### extract_top_keywords
Calculates and returns the top N keywords based on term frequency, filtering out meaningless stop words.

### extractive_summary
Runs a mathematical algorithm to select and combine the most important sentences from a document for condensation. 

## Prompt Examples

**Prompt:** 
```
Create a 3-sentence extractive summary of this long article.
```

**Response:** 
```
Using the extractive_summary tool (sentenceCount=3): Here are the 3 most mathematically relevant sentences extracted exactly from the source.
```

**Prompt:** 
```
What are the top 10 keywords in this SEO text?
```

**Response:** 
```
Using the extract_top_keywords tool (topN=10): The JSON array shows the top frequency counts, ignoring standard stop words.
```

**Prompt:** 
```
Find the top 5 bigrams (two-word phrases) repeated in this transcript.
```

**Response:** 
```
Using the extract_top_bigrams tool: The most repeated bigram is 'machine learning' with 14 occurrences.
```

## Capabilities

### Generate factual summaries
Selects the most mathematically important sentences from a document and compiles them into an extractive summary.

### Identify core concepts by frequency
Counts the top recurring keywords in a text using Term Frequency analysis, ignoring common stop words like 'the' or 'a'.

### Model topical relationships
Finds and counts the most frequently occurring two-word phrases (bigrams), useful for understanding semantic connections.

## Use Cases

### Summarizing a large legal filing
A paralegal needs to synthesize key points from a 50-page deposition transcript. Instead of asking an agent for a general overview, they use `extractive_summary` with the request: 'Extract the top 5 most mathematically relevant sentences.' The resulting output provides only verifiable claims, citing the exact text needed for cross-referencing.

### Analyzing competitive blog content
An SEO manager collects ten competitor articles. To understand their core focus areas, they use `extract_top_bigrams` on all texts. This reveals patterns like 'cloud computing' or 'data security,' allowing them to target topic gaps that the competition overlooked.

### Mining academic literature for a review
A student has fifty research papers and needs to write an introduction section. They process each paper individually using `extract_top_keywords` to pull out the most frequent, non-stop words. This builds a robust foundation of technical vocabulary before writing the draft.

## Benefits

- Accuracy: Because the `extractive_summary` tool only pulls existing text, you eliminate the risk of hallucinations common with abstractive models. The output is guaranteed to be factually sourced from the input document.
- Granularity: Instead of just knowing a topic, the `extract_top_bigrams` tool lets you see exactly which two words appear together most often. This is critical for deep SEO topic modeling or identifying specialized technical phrases.
- Efficiency: The `extract_top_keywords` function efficiently counts core vocabulary using TF analysis, saving you from sifting through massive documents manually just to find the main themes.
- Speed: The architecture runs on a pure Javascript runtime. This means fast processing of text data without loading bloated NLP packages into your agent's environment.
- Control: You control the output depth. By defining 'top N' for keywords or sentences, you maintain precise control over how much detail is included in the final summary.

## How It Works

The bottom line is that you get deterministic, verifiable text features without relying on interpretive language generation.

1. You provide your AI client with the source text you want analyzed.
2. Your agent calls one of the dedicated tools, like `extract_top_keywords`, specifying what kind of analysis is needed (e.g., top 10 keywords).
3. The MCP returns a structured data array containing only the requested elements—the exact sentences, keyword counts, or bigrams.

## Frequently Asked Questions

**What is the difference between Extractive and Abstractive summarization?**
Abstractive summarization (what ChatGPT does) writes a completely new text based on its understanding. Extractive summarization (what this tool does) selects the most mathematically important sentences directly from the original text without changing a single word. It guarantees 100% factual accuracy.

**Does the keyword extraction ignore simple connection words?**
Yes. It has a built-in cross-language 'Stop Words' dictionary (supporting English, Portuguese, and Spanish) to ensure words like 'the', 'and', 'for', 'uma' are completely ignored during Term Frequency calculations.

**Why use this tool instead of just asking an AI to summarize?**
If you have a massive 50-page document, passing the entire text into an AI context window is extremely expensive and slow. Running an algorithmic extraction first condenses the text dramatically while retaining all key facts.

**Do I need to connect any external API keys for `extractive_summary` to work?**
No, you don't. This MCP uses a purely mathematical algorithm that runs in the Javascript runtime. It never requires connecting to paid APIs or external services.

**What determines the performance of the Deterministic Text Summarizer & Extractor?**
It’s fast because it's built on a pure JS runtime. The system analyzes text frequency directly, avoiding resource-intensive calls that bog down traditional NLP packages.

**How does using `extract_top_bigrams` differ from running `extract_top_keywords`?**
Keywords find single words based on their individual frequency. Bigrams, however, look for pairs of adjacent words that appear together often, giving you a deeper semantic view.

**Does the Deterministic Text Summarizer & Extractor have limits when running `extract_top_keywords`?**
The tool is designed to handle large volumes of text efficiently. It processes data by calculating term frequency, which scales better than model-based approaches.

**Is the Deterministic Text Summarizer & Extractor suitable for non-English content using `extractive_summary`?**
While optimized for English stop word handling, it uses a foundational Term Frequency (TF) algorithm. You can run it on other languages to extract important phrases.