# Outlier Detection Engine MCP

> Outlier Detection Engine runs deterministic statistical analysis on massive datasets. It uses Z-Score and IQR methods to flag data points that deviate mathematically, stopping your AI client from guessing what's wrong. Get exact scores for every anomaly found.

## Overview
- **Category:** artificial-intelligence
- **Price:** Free
- **Tags:** statistical-analysis, anomaly-detection, z-score, iqr, data-cleaning, math-engine

## Description

**Outlier Detection Engine - Find Data Anomalies by Math**

The `detect_outliers` tool stops guesswork right where your AI client might fail it: when dealing with massive datasets. Instead of letting an LLM run out of context and just *guess* what looks weird, this engine runs deterministic statistical analysis to flag data points that mathematically deviate from the norm. You get exact scores for every single anomaly; no gut feelings required.

Your agent doesn't rely on pattern matching or superficial visual cues. It calculates precise Means, Standard Deviations, and Quartiles across your selected columns. Then, it flags specific records using established statistical bounds: Z-Score or IQR. You're getting pure math here—nothing else.

***

**Using Z-Score Outliers**

When you need to know how far a data point is from the average, the engine calculates the Z-Score for every single record in that column. The Z-score tells you exactly how many standard deviations a value falls away from the mean of the dataset. If your client flags a row using this method, it means the number is statistically distant—it's outside the range defined by your specified threshold. This approach works best when your data tends to follow a normal distribution.

**Detecting IQR Outliers**

The engine also uses the Interquartile Range (IQR) method, which you should use if your dataset isn't normally distributed—if it's skewed or asymmetrical. The IQR identifies anomalies by analyzing the middle 50% of your data points. This makes the detection highly stable because it doesn't rely on a central average that could be pulled off-kilter by just one extreme value. It pinpoints outliers relative to the spread of the core data.

**Controlling Sensitivity with Custom Thresholds**
The system gives you granular control over what counts as an 'outlier.' You don't have to take a default setting. By applying custom thresholds, you set your own sensitivity level—for example, telling it that any Z-Score greater than 3 is anomalous, or requiring the IQR bounds to be crossed by a factor of 1.5 times the calculated range. This lets you precisely control which records get flagged as deviations.

When you run `detect_outliers`, the tool processes your data column by column and instantly returns a list of all flagged records. For each anomaly, it provides the exact statistical boundary values that caused the flag—you'll see the score proving why the point is abnormal. This mathematical proof means you know if an unusually high price point is genuinely outside normal operating parameters or just naturally extreme.

This engine handles the heavy lifting by running these complex calculations on a local machine, so your AI client gets reliable results without worrying about context window limits or hallucination. It gives you deterministic certainty for every piece of data.

## Tools

### detect_outliers
Stops guesswork by deterministically identifying statistical outliers in any dataset using Z-Score or IQR methods.

## Prompt Examples

**Prompt:** 
```
Find all rows where the 'Temperature' reading is a statistical outlier using Z-Score > 3.
```

**Response:** 
```
Found 4 outliers. The most extreme is row 142 with Temperature 98.5°C (Z-Score = 4.1), followed by row 87 (Z = 3.8), row 201 (Z = 3.4), and row 15 (Z = 3.1).
```

**Prompt:** 
```
Check the 'Price' column for anomalies using the robust IQR method with a 1.5 multiplier.
```

**Response:** 
```
Using an IQR threshold of 1.5, I identified 12 items priced significantly above the upper bound of $450 (Q3 + 1.5×IQR). These appear to be luxury or premium-tier products.
```

**Prompt:** 
```
Are there any abnormal network latency values in this monitoring dataset?
```

**Response:** 
```
Yes. Using Z-Score analysis, 3 network requests had ping times exceeding 3 standard deviations (Z > 3): rows 44 (Z=3.9), 128 (Z=3.5), and 302 (Z=3.2).
```

## Capabilities

### Determine Z-Score Outliers
Calculates how many standard deviations each data point falls from the mean, flagging records outside a specified threshold.

### Detect IQR Outliers
Identifies anomalies using the Interquartile Range (IQR) method, which is best for datasets that aren't normally distributed.

### Apply Custom Thresholds
Allows you to set specific sensitivity levels, such as Z > 3 or IQR × 1.5, controlling what counts as an 'outlier'.

## Use Cases

### Detecting Fraudulent Transactions
A fraud analyst needs to check a list of 50,000 transactions for unusually large or fast movements. They ask their agent to run `detect_outliers` on the 'Transaction Amount' column using IQR with a 1.5 multiplier. The tool immediately isolates all items priced significantly above the upper bound, letting the analyst focus only on high-risk data points.

### Monitoring System Latency Spikes
A DevOps engineer gets alerts that network latency is spiking but can't pinpoint the source. They use `detect_outliers` to run Z-Score analysis on their monitoring dataset. The tool flags specific rows (like row 44) with high Z-Scores, pointing directly to the exact moments and requests causing the performance issue.

### Quality Control for Manufacturing Data
A quality control manager receives temperature logs from a machine. They use `detect_outliers` to check the 'Temperature' column with Z-Score > 3. The tool tells them precisely which readings are statistically impossible, allowing maintenance staff to pinpoint sensor failures before they cause major downtime.

### Validating Financial Modeling Inputs
A financial modeler is prepping quarterly reports and suspects some input data was manually entered incorrectly. They run `detect_outliers` on the 'Price' column using IQR. The tool pinpoints all items whose prices fall outside expected ranges, preventing bad numbers from corrupting the final forecast.

## Benefits

- Stops LLMs from hallucinating outliers. Instead of having a general AI 'feel' like something is off, `detect_outliers` gives you mathematical proof and an exact Z-Score or IQR boundary for every flagged point.
- Handles massive datasets quickly. You don't have to chunk your data; the engine scans thousands of rows instantly on your local machine, regardless of context window limits.
- Flexible analysis methods. Choose between Z-Score (best if data is normal) or IQR (better if data is skewed). This keeps your validation flexible for different industries.
- High precision output. When `detect_outliers` runs, you get the actual statistical metrics—the score itself—not just a binary 'yes/no' flag. You know exactly *why* it flagged something.
- Saves time on data cleaning. Instead of spending hours cross-referencing reports to find the cause of an anomaly, you run one command and isolate only the records that fail mathematical validation.

## How It Works

The bottom line is: you get hard statistical proof, not an educated guess, about your data's integrity.

1. You feed the engine a data column and specify your detection method (Z-Score or IQR) and threshold.
2. The MCP server runs the statistical calculation against the entire dataset, determining which records violate the established mathematical boundaries.
3. It returns a precise list of flagged records. Each record includes its calculated score (e.g., Z=3.9), showing exactly how far it deviated.

## Frequently Asked Questions

**How does Outlier Detection Engine handle non-numeric data?**
It only processes numeric columns. You must select a quantitative column (like 'Temperature' or 'Price') before running `detect_outliers`. The engine can’t calculate statistics on text fields.

**Is Outlier Detection Engine better than just using the AI client?**
Yes. Your AI client is great for interpretation, but it's bad at calculation. `detect_outliers` runs pure math, guaranteeing that every flagged point has a verifiable Z-Score or IQR boundary.

**What if my data isn't normally distributed?**
Use the Interquartile Range (IQR) method instead of Z-Score. The IQR approach is designed for skewed data and gives you more reliable boundaries than standard deviation calculations do in those cases.

**Can I change the sensitivity of detect_outliers?**
Absolutely. You control the threshold. If you want to ignore minor deviations, raise the Z-Score (e.g., Z > 3.5). To catch everything, lower it.

**How quickly does running `detect_outliers` process very large datasets?**
It scans thousands of rows instantly because it runs locally. Since the calculation is deterministic, performance doesn't rely on LLM context limits; you get fast results even when processing huge data inputs.

**Does Outlier Detection Engine keep my dataset private or is it cloud-based?**
The calculations happen entirely on your machine. Your datasets never leave the local environment when you call `detect_outliers`, meaning all of your private data stays secure and confidential.

**What is the maximum size of data I can pass into detect_outliers?**
There are no hard context limits like those found in standard LLMs. The engine processes raw data streams, allowing you to analyze datasets far exceeding typical AI prompt token counts.

**What AI clients work with Outlier Detection Engine via MCP?**
It connects to any client that supports the Model Context Protocol (MCP). You can route statistical data and anomaly findings from tools like Claude, Cursor, or VS Code using your preferred agent framework.

**What is the difference between Z-Score and IQR?**
Z-Score assumes data is normally distributed and is sensitive to extreme outliers. IQR is based on percentiles (25th and 75th), making it robust and ideal for skewed or non-normal data.

**Can I customize the outlier sensitivity threshold?**
Yes! You set the threshold parameter: typically 3 for Z-Score (flagging values beyond 3 standard deviations) or 1.5 for IQR (the standard Tukey fence multiplier).

**Does it automatically remove the outliers?**
No. The engine flags the outliers and provides their exact Z-Scores or IQR bounds so the AI can report them to you. The decision to drop or keep them remains with you.