# Confusion Matrix Engine MCP

> Confusion Matrix Engine calculates True Positives, False Negatives, Precision, Recall, F1-Score, and Accuracy from classification arrays. It offloads model evaluation metrics to a deterministic JavaScript runtime, stopping LLM hallucinations when you need mathematically perfect data science results.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** machine-learning, model-evaluation, data-science, metrics, statistical-analysis

## Description

Running model evaluations can be tricky. When you ask an AI agent to calculate standard metrics like the F1-Score or Precision/Recall using actual versus predicted labels, it often guesses at the math. The Confusion Matrix Engine solves that problem by running the calculation locally in a deterministic JavaScript environment. You feed it simple arrays of real and predicted class labels, and the MCP instantly computes exact numbers for everything: True Positives, False Negatives, overall Accuracy, and more. This makes it essential for data scientists who need to trust their model metrics completely. By connecting this Engine through Vinkius, you can ensure your AI agent handles complex statistical analysis without relying on unreliable language generation. It’s pure, verifiable math.

## Tools

### calculate_confusion_matrix
Takes arrays of actual and predicted labels to compute the full confusion matrix and accuracy score mathematically.

## Prompt Examples

**Prompt:** 
```
Here are my actual labels: ['cat','dog','cat']. And predictions: ['cat','cat','cat']. Calculate the exact accuracy and confusion matrix.
```

**Response:** 
```
The computation has been executed with mathematical precision. All results are exact and ready for review.
```

**Prompt:** 
```
I have 100 binary predictions (1s and 0s) and their actual outcomes. Can you generate the confusion matrix to find the False Positives?
```

**Response:** 
```
The computation has been executed with mathematical precision. All results are exact and ready for review.
```

**Prompt:** 
```
Run these actual values and predicted values through the confusion matrix tool and tell me if the model is biased toward class A.
```

**Response:** 
```
The computation has been executed with mathematical precision. All results are exact and ready for review.
```

## Capabilities

### Calculate full classification breakdown
Generates the complete confusion matrix and overall model accuracy from pairs of actual and predicted labels.

### Determine specific error types
Pinpoints False Positives (FP) and False Negatives (FN), allowing you to understand exactly where your model fails.

### Measure classification confidence
Provides core metrics like Precision, Recall, and the F1-Score for a deep performance assessment.

### Verify mathematical integrity
Guarantees that all calculated values are based on deterministic JavaScript computation, eliminating probabilistic errors.

## Use Cases

### Model A vs Model B comparison
A data scientist needs to decide between two classification models. Instead of asking an agent to compute metrics for both (risking errors), they use the MCP's `calculate_confusion_matrix` tool twice, ensuring the resulting Precision and Recall scores are mathematically identical and trustworthy.

### Debugging a production failure
An ML engineer suspects their model is biased against one class. They feed the actual vs predicted labels into the engine to pinpoint the exact ratio of False Negatives, immediately guiding them toward data correction rather than guesswork.

### Validating academic research
A statistician has a new dataset and needs an objective measure. They use the MCP to run `calculate_confusion_matrix` against their ground truth labels, generating verifiable metrics that withstand peer review scrutiny.

## Benefits

- Stops hallucination. You don't have to worry about the AI guessing your True Positives or F1-Score; this MCP uses a deterministic runtime for perfect math every time.
- Deep dive into errors. Instead of just getting an accuracy percentage, you get the full breakdown (FP/FN), telling you exactly where your model is failing in the real world.
- Verify performance reliably. You can quickly run `calculate_confusion_matrix` on multiple datasets to compare different models side-by-side for optimal results.
- Better pipeline integration. ML Engineers use this MCP to ensure that any agent output used for metrics passes through a verified, non-hallucinatory calculation step.
- Focus on insight, not math. By offloading the complex statistics, your agent can focus on interpreting *why* the model performs poorly, rather than just reporting numbers.

## How It Works

The bottom line is that it takes simple lists of labels and turns them into verifiable, error-free model metrics.

1. You provide the MCP with two sets of data: an array of actual labels and a matching array of predicted labels.
2. The engine processes these arrays through a deterministic local runtime, calculating all necessary metrics (TP, TN, FP, FN) step-by-step.
3. Your agent receives a precise output containing mathematically perfect True Positives, False Negatives, Accuracy, and other key performance indicators.

## Frequently Asked Questions

**Why not let Claude/GPT calculate the accuracy?**
LLMs operate on tokens and probability distributions. If you give them 500 predictions, they might summarize or estimate the F1-score rather than calculating it exactly. This engine ensures 100% mathematical precision.

**Does it support multi-class classification?**
Yes, the engine automatically detects unique labels from both arrays and constructs an N-by-N confusion matrix, handling both binary and multiclass evaluations flawlessly.

**Is there a limit to the array size?**
The only limit is the standard Context Window limit for transmitting the JSON arrays. For arrays exceeding 100k items, consider chunking or local CSV aggregators.

**What input structure does `calculate_confusion_matrix` require?**
It requires two separate, equally sized arrays: one for the actual labels and one for the predicted labels. The elements must match index-by-index to ensure accurate pairing of true vs. predicted results.

**How does `calculate_confusion_matrix` guarantee mathematical accuracy?**
The tool runs on a deterministic, local JavaScript runtime. Unlike probabilistic models that might hallucinate decimals, this engine follows strict statistical rules, eliminating any chance of rounding errors in metrics like F1-Score.

**Can `calculate_confusion_matrix` process categorical strings or only numbers?**
It processes string arrays for labels. As long as the actual and predicted values are consistent categories, the tool correctly calculates counts across all defined classes, regardless of whether they are represented by text or numbers.

**What should I do if my input data has missing or null values?**
The function expects clean, non-null labels. If an array contains missing data points, the MCP will throw a specific error indicating incomplete inputs. You must pre-process your data to remove those gaps before running `calculate_confusion_matrix`.

**Does using `calculate_confusion_matrix` require any external dependencies?**
No, it operates within a standard local JavaScript runtime (V8). The MCP handles all necessary computation locally. You won't need to worry about installing or managing extra libraries in your workflow.