# Statistics Engine MCP

> The Statistics Engine is a zero-latency server that runs complex mathematical calculations locally within your environment. It instantly computes key descriptive statistics like mean, median, mode, standard deviation, and percentiles on any dataset. Since it never sends data over the network, you get absolute privacy and mathematically certain results for rigorous analysis.

## Overview
- **Category:** data-analytics
- **Price:** Free
- **Tags:** statistical-analysis, math-engine, data-processing, mean-median-mode, standard-deviation

## Description

**Listen up. The problem with relying on big language models for math is they're unreliable.** When you gotta crunch numbers—like metrics, finances, or sensor data—you can't trust an LLM to handle the statistical heavy lifting. They make little errors when they try to aggregate a dataset. Period. This engine fixes that whole mess. It gives your agent access to a highly optimized computational core that runs math locally within your own environment. That means you ditch trusting AI models for anything involving arrays or precise numbers and start using deterministic functions instead. Best of all? Your sensitive data never leaves your infrastructure. Zero API calls are necessary because the calculations happen right where they live.

**Central Tendency: Finding the Core Number**

When you need to know what a dataset is centered around, this engine gives you three ways to look at it. You can use `calculate_mean` if you want the mathematical average of every number in your set; that's simple enough. But sometimes, one huge outlier throws off the mean, right? For instance, if you measure employee salaries and the CEO makes five times what everyone else does, the mean gets skewed fast. That’s where `calculate_median` comes into play. It figures out the middle value when all your numbers are sorted, totally ignoring those extreme outliers that mess up a straight average. If you're just trying to pinpoint the most common data point—the number that shows up the most often—you call `calculate_mode`. These tools let you determine exactly how centered or varied your data is.

**Measuring Data Spread: How Wild Is It?**

Knowing the average isn't enough. You gotta know if your numbers are clumped together tight or if they're flying all over the place. That’s where `calculate_standard_deviation` steps in. It measures the amount of variation, or dispersion, in your dataset compared to the mean. A low standard deviation means your data points are grouped close to the average; a high number tells you that the data is spread out—it's wild. This gives you actual quantitative proof of how consistent your metrics are.

**Pinpointing Distribution: Finding Specific Spots**

Sometimes, you don't just want the middle, and sometimes you don't even care about the average at all. You might need to know where the bulk of your data falls, or what that really high end looks like without being dragged down by one weird number. That’s why `calculate_percentile` is crucial. It lets you calculate a specific point—like the 95th percentile (p95). This tells you exactly where the top 5% of your data falls, which is essential for understanding upper bounds or identifying how extreme an outlier actually is. You can use this to understand what 'normal' looks like within the full range.

**The Bottom Line**

This engine means you get mathematically certain results every single time. Because everything runs locally, your data stays private. It gives your agent a reliable way to perform rigorous statistical analysis without sending anything over the network. You stop guessing and start knowing exactly what those numbers mean.

## Tools

### calculate_mean
Finds the mathematical average of all numbers in your dataset.

### calculate_median
Determines the middle value when all numbers are sorted, ignoring extreme outliers.

### calculate_mode
Identifies the number that appears most often in the dataset.

### calculate_percentile
Calculates a specific point (k-th percentile) to show where data falls within its full range.

### calculate_standard_deviation
Measures the amount of variation or dispersion in the dataset from the mean.

## Prompt Examples

**Prompt:** 
```
Here is the latency data for our server today. Calculate the 95th percentile (p95): [102, 105, 110, 150, 400, 108, 112].
```

**Response:** 
```
Using the calculate_percentile tool (percentile=95): The 95th percentile latency is 325ms.
```

**Prompt:** 
```
What is the standard deviation for the daily active users this week: [1500, 1520, 1490, 1550, 2100, 1510, 1480]?
```

**Response:** 
```
Using the calculate_standard_deviation tool: The standard deviation is 201.27 users.
```

**Prompt:** 
```
Identify the mode (most common value) from this array of rating scores: [5, 4, 5, 5, 3, 2, 5, 4, 4].
```

**Response:** 
```
Using the calculate_mode tool: The most frequent score is 5.
```

## Capabilities

### Calculate Central Tendency
Determine the average (mean), middle value (median), or most common point (mode) of a dataset.

### Measure Data Spread
Quantify how spread out your data is using population standard deviation.

### Determine Distribution Points
Find specific points in the dataset, such as the 95th percentile (p95), to understand outliers and upper bounds.

## Use Cases

### Debugging System Latency
The Ops Engineer is looking at server logs and sees average latency is low. But they suspect the occasional huge spikes are messing up the service. They run `calculate_percentile` (p95) on the raw data array to prove that while the mean looks fine, the true experience for most users is much worse.

### Analyzing Customer Ratings
A Product Manager needs to know if a bad review skewed their average rating. They use `calculate_median` on all rating scores. If the median is high, but the mean is dragged down by one or two low numbers, they have proof that outliers are skewing the data.

### Identifying Peak Usage Patterns
A Marketing Analyst needs to know what the single most common interaction was last month. Instead of calculating a mean usage time, they use `calculate_mode` on event logs to pinpoint which specific action (e.g., 'checkout') happened most frequently.

### Assessing Workforce Consistency
HR needs to gauge how consistent employee completion times are across a team. They run `calculate_standard_deviation` on the dataset of submission timestamps. A low standard deviation means high consistency; a high number signals training is needed.

## Benefits

- **Absolute Privacy:** Because the computation runs locally, your financial or user telemetry data never leaves your machine. Zero API calls are required for analysis.
- **Mathematical Certainty:** Stop trusting LLMs with stats. Use `calculate_mean`, `calculate_median`, and others to get results that are 100% mathematically accurate every time.
- **Outlier Resistance:** When data has extreme outliers, the mean lies. Use `calculate_median` or `calculate_mode` instead to find a more honest representation of central tendency.
- **Understanding Spread:** Don't just report averages. Run `calculate_standard_deviation` to show how much your metrics actually fluctuate over time.
- **Pinpoint Performance:** Forget general averages. Use `calculate_percentile` to determine the 95th percentile latency, which gives you a true picture of worst-case user experience.

## How It Works

The bottom line is: you give it raw numbers, call a specific function, and get back the mathematically correct result without waiting for external APIs.

1. Provide the server with a clean array of numbers you want to analyze.
2. Your agent calls the specific tool (e.g., `calculate_median`) and passes it the data array.
3. The local core runs the calculation instantly, returning the precise statistical value.

## Frequently Asked Questions

**How is calculate_median different from calculate_mean?**
The median finds the middle value; the mean calculates the average. If your data has extreme outliers (very high or very low numbers), the mean gets pulled toward those outliers, making the median a more honest measure of what's typical.

**Can I use calculate_percentile to find my 90th percentile latency?**
Yes. Using `calculate_percentile` with '90' as the parameter will tell you that 90% of your measurements were below that value, providing a much tighter service guarantee than just relying on the mean.

**Does calculate_standard_deviation account for different data types?**
No. This engine is designed only for numerical datasets. You must pass arrays of numbers to `calculate_standard_deviation` or any other statistical tool; it won't process text.

**Is the calculation done securely using calculate_mean?**
Yes, absolutely. All calculations run locally within your environment (vurb). This means your data never leaves your local infrastructure and isn't sent to a third-party API for processing.

**What format should the data be in for calculate_mode to work?**
The input must be a simple array of numbers. The engine accepts standard JavaScript number arrays, so you just pass it an ordered list like `[1, 2, 3, 5]`. It handles single-dimensional datasets perfectly.

**Does calculate_standard_deviation handle very large data sets?**
Yes. Because the calculation runs locally using a highly optimized JavaScript core, it processes massive arrays of numbers without network lag or memory overflow issues you'd see with cloud APIs.

**What happens if I use calculate_mean on an empty dataset?**
If you pass an empty array, the tool returns `NaN` (Not a Number). This predictable error allows your agent to immediately catch invalid inputs and prompt for correct data.

**How is the privacy of my data maintained when using calculate_percentile?**
The calculation never leaves your machine. The entire process runs locally, meaning your sensitive metrics or user telemetry are processed entirely on your local computational core. Zero API calls means zero data leaving your network.

**Why use this instead of asking the AI to analyze the dataset directly?**
AIs hallucinate complex data calculations because they generate text, not numbers. This MCP provides the AI with a deterministic tool, forcing it to offload the actual number-crunching to a strict JavaScript engine.

**Is my data sent to any external service?**
No. The entire engine runs completely local in your local environment. It is "Privacy First" by design, requiring no external APIs or network access.

**How does the percentile calculation work?**
The tool sorts your dataset and uses a robust interpolation method to find the exact boundary value below which a given percentage of observations fall. Perfect for p95 or p99 SLA reporting.