# Curie Measurement Prover MCP

> The Curie Measurement Prover forces empirical rigor into complex claims. Instead of accepting vague statements like 'improved performance' or 'better reliability,' this MCP demands quantified evidence across five dimensions: baseline measurements, single-variable isolation, multi-context validation, systematic persistence, and calculated risk impact. It turns gut feelings into defensible data points.

## Overview
- **Category:** complex-reasoning
- **Price:** Free
- **Tags:** measurement, empirical-validation, performance-testing, cross-domain, marie-curie, risk-assessment, isolation

## Description

You know the problem. Your team delivers a presentation full of positive adjectives: 'significantly improved,' 'greatly enhanced,' 'highly reliable.' They changed five things at once—the supplier, the software, the schedule—and call it an improvement. But those words mean nothing without numbers. This MCP makes sure you don't fall for that. It forces a scientific discipline onto your data analysis.

It treats every claim like a complex experiment: If you say something is faster, we need to know the old baseline time and the new cycle time, with a clear percentage delta. If you change three things simultaneously (like upgrading software *and* changing staff *and* moving offices), this tool forces you to separate those variables. You have to prove which single change actually caused the result.

This isn't just another data check; it's a structured way of thinking about causality and risk. By using Vinkius, you connect your agent to this MCP and ensure that every major operational decision is measured against established baselines, validated across multiple real-world environments, and quantified for inherent risks.

## Tools

### validate_curie_measurement
This tool systematically checks complex claims against five scientific pivots: measurement delta, single-variable isolation, cross-domain validation, systematic persistence documentation, and quantifiable risk assessment.

## Prompt Examples

**Prompt:** 
```
We switched from the old process to the new one and operations feel significantly faster. Performance is greatly improved.
```

**Response:** 
```
MEASUREMENT_ABSENT — 'Feels significantly faster' and 'greatly improved' are adjectives, not measurements. Curie did not say radium 'seems more radioactive' — she measured its activity at 2 million times that of uranium. Where is YOUR baseline? What was the old process cycle time under normal demand? What is the new process cycle time under the same conditions? What is the delta? 'Faster' measured HOW — time study? controlled comparison? subjective impression? Without numbers, you have an opinion, not evidence.
```

**Prompt:** 
```
Measurement: Old process baseline: 847 minutes per batch under standard conditions (timed over 5 consecutive batches, same team, same materials). New process: 312 minutes per batch (same team, same materials, same conditions). Delta: 535 minutes reduction (63.2%). BUT — we also changed the raw material supplier and restructured the team schedule in the same transition.
```

**Response:** 
```
ISOLATION_INCOMPLETE — Measurement is rigorous: 847min → 312min, 63.2% reduction, same team and materials. Excellent baseline discipline. But isolation fails: three variables changed simultaneously — new process, new supplier, new schedule. The new supplier alone might have delivered higher-quality inputs requiring less rework. The schedule restructuring might have eliminated bottleneck shift changes. You do not know if the new process contributed 90% of the improvement or 10%. Curie did not test the entire rock — she separated it into fractions. Suggestion: test the new process with the old supplier and old schedule first (isolate process variable), then introduce one change at a time.
```

**Prompt:** 
```
Measurement: 847→312min (63.2%). Isolation: tested new process with old supplier and old schedule = 624min. Then new process with new supplier and old schedule = 480min. Then all three changes = 312min. Attribution: process change accounts for 223min (42%), supplier accounts for 144min (27%), schedule accounts for 168min (31%). Reproducible: 3 runs, variance ±8min. Cross-domain: tested in main facility and satellite facility. Peak conditions: not yet tested under peak season demand, we assume it transfers.
```

**Response:** 
```
CROSS_DOMAIN_UNVALIDATED — Measurement and isolation are excellent: 847→312min, process 42%, supplier 27%, schedule 31%, 3 runs with ±8min variance. Rigorous isolation across two facilities. But cross-domain fails: 'not yet tested under peak season demand, we assume it transfers.' Curie's measurements were reproducible in EVERY laboratory in Europe. Normal conditions are NOT peak conditions: peak season has different volumes, different material quality, different team fatigue levels, different equipment utilization rates. Run the new process through one peak season cycle, measure for a full cycle under real conditions, then compare.
```

## Capabilities

### Validate Process Improvements
It checks if performance gains are based on measurable deltas by requiring both a baseline metric and an after-value.

### Prove Variable Attribution
You test changes one variable at a time, proving that only the specific change caused the observed result.

### Stress Test Across Environments
The system validates results by requiring evidence from multiple operational contexts (e.g., main office vs. satellite branch).

### Document Systematic Effort
It tracks the full history of investigation, noting specific attempts and measured outcomes over time.

### Quantify Business Risk
The MCP forces you to assign probability, financial impact, and mitigation plans for every potential failure point.

## Use Cases

### Post-Deployment Feature Review
The PM team reports that the new checkout flow improved conversion by 15%. The agent uses `validate_curie_measurement` to challenge this, requiring proof that the lift wasn't just due to a concurrent marketing campaign (isolation) or limited to only desktop users (cross-domain).

### Optimizing Warehouse Logistics
The Operations team claims faster packing times. The agent uses `validate_curie_measurement` to require proof that the speed increase came from a new conveyor belt (variable) and not just from adding more staff during the test period (confounding variable).

### Model Performance Auditing
The Data Science team claims their predictive model is 'much better.' The agent uses `validate_curie_measurement` to force a comparison against historical data sets, ensuring the performance lift isn't just an artifact of the current limited training batch.

## Benefits

- Eliminate vague adjectives from reports. The system forces you to provide a clear baseline, an after-value, and the precise percentage change (delta).
- Prove causality instead of correlation. You must test process changes one variable at a time, ensuring that the improvement came from the specific factor you introduced.
- Guarantee findings apply everywhere. It flags results that only worked in your primary lab but fail when scaled to different user groups or peak demand periods.
- Avoid premature conclusions. Instead of stopping after two failed attempts, it demands documentation showing systematic effort over time.
- Account for risk upfront. You must quantify every danger—not just state 'the risk is minimal.' This adds probability and impact metrics.

## How It Works

The bottom line is that you get an objective score showing exactly which assumptions in your plan are unsupported by verifiable data.

1. You provide the agent with a claim of improvement or success (e.g., 'Our new process is 20% faster').
2. The MCP prompts you to structure your evidence by providing baseline metrics, detailing which single variables changed, confirming validation across different environments, and mapping out all associated risks.
3. You receive a verdict: either the claim passes rigorous empirical proof or it fails at one of the five key pivots (e.g., MEASUREMENT_ABSENT).

## Frequently Asked Questions

**Is this only for performance optimization?**
No. Curie's method applies to any domain requiring empirical validation — process improvement (measure before/after cycle times, isolate each change), vendor evaluation (measure cost/quality/reliability, not 'it seems better'), risk assessment (quantify probability and impact, not 'the risk is minimal'), method selection (benchmark each candidate in isolation), controlled experiments (single variable, controlled conditions). Anywhere you would say 'better' or 'faster' or 'more reliable,' replace the adjective with a number.

**What if isolation is impractical?**
Sometimes variables are genuinely coupled — changing the supplier requires changing the delivery schedule. The engine does not demand artificial isolation. It demands AWARENESS of what was changed together and WHY isolation was impractical. Document: 'We changed X and Y together because X requires Y. We cannot isolate their effects. We accept that the 67% improvement is from X+Y combined, with Y alone contributing approximately 15% based on a separate controlled test.' Honest documentation of coupled changes is acceptable. Pretending 3 changes are one is not.

**How does it differ from the Watt Efficiency Prover?**
Watt validates EFFICIENCY ENGINEERING — finding waste, instrumenting baselines, designing feedback loops, isolating bottlenecks, quantifying improvements. It asks 'where is the bottleneck?' Curie validates EMPIRICAL RIGOR — measuring instead of claiming, isolating variables, cross-domain validation, persistence, risk quantification. It asks 'where is the number?' Watt finds WHERE to optimize. Curie proves THAT you optimized. Use Watt to identify bottlenecks. Use Curie to prove your fix actually worked — with numbers, not adjectives.

**What happens if I provide incomplete data when running `validate_curie_measurement`?**
The tool will reject the input immediately, flagging exactly which of the five pivots are missing. You must supply sufficient detail for measurement, isolation, cross-domain testing, persistence documentation, and risk quantification to get a verdict.

**Is there a rate limit when I use `validate_curie_measurement` frequently?**
The MCP enforces standard usage limits. If you hit the cap, your AI client will receive an appropriate error code. You'll need to build in a brief delay or implement an exponential backoff strategy into your workflow.

**How does `validate_curie_measurement` handle data coming from different formats (e.g., spreadsheets vs. databases)?**
The tool only requires structured, quantifiable inputs for its five checks. As long as you've extracted the necessary values—baselines, deltas, and risk probabilities—the format of the original source doesn't matter.

**Does running `validate_curie_measurement` affect data security or modify any external systems?**
No. This MCP is read-only regarding your input context. It processes and analyzes the variables you provide; it doesn't write to, alter, or require access permissions for any of your underlying databases.

**What are the prerequisites for calling `validate_curie_measurement`?**
You need a clear empirical claim that needs rigorous validation. The input must contain specific numerical data and documented methodologies, not just subjective adjectives or general feelings of improvement.