# Chi-Square Test Engine MCP

> Chi-Square Test Engine runs exact Chi-Square independence tests on categorical data tables locally. You input observed counts, and this MCP returns guaranteed chi² statistics and p-values for rigorous statistical analysis—without relying on an LLM's math.

## Overview
- **Category:** data-analytics
- **Price:** Free
- **Tags:** statistics, data-analysis, categorical-data, hypothesis-testing, math-engine, data-science

## Description

You need to know if two groups are actually related, or if their differences are just random chance. Trying to calculate expected frequencies or summing residuals in a large language model is risky; you risk getting hallucinated results that look authoritative but aren’t mathematically sound. This MCP fixes that. It computes the full statistical test deterministically using guaranteed math on your CPU. You send it an observed frequency matrix, and it calculates everything: the exact expected counts, the chi² statistic, degrees of freedom, and the p-value. The whole thing runs locally, so your sensitive survey or business data never leaves your environment. Connecting to this MCP through Vinkius gives you reliable statistical proof for any categorical analysis, letting your agent focus on interpretation instead of calculation.

## Tools

### calculate_chi_square
Performs exact chi-Square tests of independence on categorical data tables, eliminating math hallucinations from LLMs.

## Prompt Examples

**Prompt:** 
```
Is there a statistically significant relationship between user gender and subscription tier?
```

**Response:** 
```
The Chi-Square test returns chi² = 8.42, df = 2, p-value = 0.015. Since p < 0.05, there is a statistically significant relationship between gender and subscription tier.
```

**Prompt:** 
```
Check if the distribution of customer complaints varies by product category.
```

**Response:** 
```
The p-value is 0.32. We cannot reject the null hypothesis — the complaint distribution appears independent of product category.
```

**Prompt:** 
```
Run a chi-square test on this survey data to see if education level affects voting preference.
```

**Response:** 
```
Chi² = 15.8, df = 6, p-value = 0.015. The result is statistically significant — education level and voting preference are not independent.
```

## Capabilities

### Determine Statistical Independence
Calculates the chi-square statistic to prove if two variables are statistically related or independent.

### Generate Expected Counts
Builds the entire expected frequency matrix internally based on your observed data input.

### Calculate P-Values
Provides the precise p-value, letting you determine if a result is due to chance or a genuine pattern.

### Handle Any Data Size
Supports contingency tables of any size, from simple 2x2 matrices up through larger data sets.

## Use Cases

### Determining if education level affects voting preference
A researcher inputs a matrix of vote counts by education level. The MCP returns the p-value, confirming that since p < 0.05, education level and voting preference are not independent.

### Checking if complaint distribution varies by product line
A support manager feeds in customer complaint counts across different products. The MCP returns a high p-value, showing that the complaints appear independent of the product category and aren't worth acting on.

### Validating A/B Test results for feature adoption
You feed in user counts comparing two groups (A vs B) across two outcomes (clicked vs didn't click). The MCP calculates the chi² statistic to see if the difference is genuine or just chance.

### Assessing gender and subscription tier relationships
You run a test on user counts comparing gender groups against premium/basic subscriptions. The result shows statistical significance, proving that gender and subscription tier are definitely related.

## Benefits

- Stop trusting AI math. The `calculate_chi_square` tool runs the entire statistical test locally, guaranteeing you get precise chi² statistics and p-values every time.
- Analyze complex relationships without worrying about data privacy. Your survey or business tables stay local on your CPU, so sensitive information is never exposed to an external cloud endpoint.
- The engine automatically builds the full expected matrix for any size table (2x2, 3x3, etc.). You just provide the observed counts; it handles the rest of the heavy lifting.
- Move past 'maybe' conclusions. Use the precise p-value output to tell stakeholders with confidence whether a relationship is statistically significant or just coincidence.
- This MCP processes categorical data directly, making it perfect for A/B test results and survey cross-tabulations where independence testing is key.

## How It Works

The bottom line is you get mathematically verified proof about data relationships, every time.

1. You feed the MCP an observed frequency matrix showing counts across two categorical variables.
2. The engine runs a deterministic calculation locally on your CPU, generating expected frequencies and computing all necessary statistical metrics.
3. Your agent receives a clean report containing the chi² statistic, degrees of freedom, and the associated p-value.

## Frequently Asked Questions

**What is a contingency table?**
It's a matrix showing the frequency distribution of two categorical variables (e.g., rows = Gender, columns = Subscription Tier). The AI will automatically convert your raw data into this format.

**Does it handle expected frequencies below 5?**
The engine computes the result regardless, but the AI is instructed to warn you when expected frequencies are low, as the chi² approximation becomes less reliable in those cases.

**Can it test more than two variables at once?**
This engine performs a single pairwise independence test per execution. For multi-variable analysis, the AI can chain multiple calls to test different variable pairs sequentially.

**How does `calculate_chi_square` ensure that my sensitive survey data stays private?**
The calculation runs locally on your CPU. Your observed frequency matrix and resulting statistics never leave your secure environment, keeping your business data confidential.

**Is the result from `calculate_chi_square` deterministic and reliable compared to LLM math?**
Yes, it uses jstat for exact statistical computation. This means you get deterministically calculated chi² statistics and p-values, eliminating the risk of mathematical hallucinations.

**What specific metrics does `calculate_chi_square` provide when I run a test?**
The engine returns three key values: the chi² statistic, the degrees of freedom (df), and the corresponding p-value. These are essential for determining statistical significance.

**What format does `calculate_chi_square` require when I provide it with my data?**
It requires an observed frequency matrix, which is a structured representation of your contingency table (e.g., rows and columns detailing counts). The tool builds the expected frequencies internally.

**How does `calculate_chi_square` handle different sizes of contingency tables?**
It supports any size matrix, from simple 2x2 tables up to larger dimensions like 3x3 or bigger. You don't have to limit your data just because the tool handles multiple variables.