# Spec Prover MCP

> Spec Prover forces AI agents to validate formulas against real data inputs. It checks for boundary errors—like negative time values or midnight wraps—and ensures all defined constants are actually used in the logic, catching bugs that abstract review always misses.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** specification-verification, structured-reasoning, trace-validation, edge-case-detection, multi-agent, requirements-engineering, formula-proving, agentic-pipeline

## Description

**Spec Prover MCP Server - Validate Formula Logic**

You know how easy it is for an agent writing a spec to miss something obvious? They write out this beautiful, complex formula, but they forget that zero matters, or that time wraps around at midnight. Spec Prover fixes those blind spots. It's built so your AI client can force a mathematical proof against real-world data inputs using the `prove_spec_function` tool. This isn't just syntax checking; it forces the agent to prove the logic works across every single possible scenario.

When you use this server, your agent doesn't just assume the math is sound; it has to *show* that it's sound. It executes a mathematical proof using concrete inputs to verify the formula’s logical consistency and detect any arithmetic errors before they even make it into code. This process goes way beyond basic review.

The server forces testing of boundary conditions—the edge cases you never think about but always fail on. You can force identification of failures related to zero, negative numbers, or maximum wraps like midnight transitions. These are the spots where formulas break under normal review, and Spec Prover catches them hard. It checks if your spec handles these boundaries correctly, demanding that the definition explicitly accounts for what happens when things go sideways.

The tool also makes sure you haven't left any junk in there. When it runs a constant usage check, it confirms that every single declared domain constant is actually referenced in the formula. If an agent declares three variables but only uses two of them, Spec Prover flags that unused variable immediately; those unused constants just clutter up your spec and look messy.

It's also critical for financial or scientific logic: detecting precision loss. When calculations involve floating-point numbers—the kind you see when dealing with money or complex measurements—you can run checks to verify the math and prevent rounding errors from silently breaking your entire system. The `prove_spec_function` tool doesn't just check if the final number is right; it tracks every intermediate step, verifying that the calculation remains logically consistent from start to finish.

Because of this deep validation, Spec Prover validates logical consistency across multiple vectors: you can force the agent to trace a formula with concrete inputs and verify its behavior by detecting arithmetic errors. It forces verification of output against defined expectations. You're not relying on guesswork; you’re running a full audit that proves the logic works under pressure. This gives your entire pipeline confidence, knowing that the specification itself is battle-tested.

## Tools

### prove_spec_function
Forces an agent to validate a mathematical formula by tracing it with concrete inputs and verifying its behavior across defined edge cases.

## Prompt Examples

**Prompt:** 
```
I wrote a spec for calculateBedtime(wakeTime, cycles). The formula is: totalMinutes = wakeHour×60 - (cycles×90 + 15). Prove this function works.
```

**Response:** 
```
Verdict: SPEC_INCOMPLETE. Happy path (cycles=2, wake=08:00) traces to 04:45 ✅. Edge case (cycles=6, wake=08:00) traces to -75 minutes → '-02:-15' — your spec has no handling for negative time values. Add: 'if totalMinutes < 0, totalMinutes = ((totalMinutes % 1440) + 1440) % 1440'.
```

**Prompt:** 
```
After fixing the midnight wrap-around, prove all 3 functions in my spec are ready for the developer.
```

**Response:** 
```
Spec Readiness Report — 3 functions: 3 SPEC_PROVEN, 0 wrong, 0 incomplete. Recommendation: SPEC_READY. All functions proven with happy path + edge case traces. Safe to pass to the developer.
```

**Prompt:** 
```
My spec declares GRAVITY=9.81 and DRAG_COEFFICIENT=0.47 as constants for calculateTrajectory, but only GRAVITY appears in the formula. What should I do?
```

**Response:** 
```
Verdict: ORPHAN_CONSTANTS. DRAG_COEFFICIENT=0.47 is declared but never referenced in any computation step. Either integrate it into the trajectory formula (if drag affects the physics model) or remove it from the constants list to avoid confusing the developer.
```

## Capabilities

### Prove Formula Logic
Executes a mathematical proof using concrete inputs to verify the formula's logical consistency and detect arithmetic errors.

### Identify Edge Case Failures
Forces testing of boundary conditions—like zero, negative numbers, or maximum wraps—that fail under normal review.

### Validate Constant Usage
Checks if every declared constant is actually referenced in the formula, flagging unused variables that clutter the specification.

### Detect Precision Loss
Verifies calculations involving floating-point numbers to prevent rounding errors from breaking financial or scientific logic.

## Use Cases

### Calculating Time Off Work
A PM writes a function to calculate bedtime based on cycles. They assume simple subtraction works. The agent calls `prove_spec_function` with an edge case (going from 08:00 to negative time). The tool fails the proof, forcing the PM to add explicit modular arithmetic for midnight wrap-around.

### Financial Discounting
The system calculates a discount on product price. If the spec uses standard floating-point math, it can fail due to IEEE 754 errors (e.g., $0.10 + $0.20). Calling `prove_spec_function` forces a precision check, requiring the PM to mandate integer cents or decimal libraries.

### Average Rating Calculation
The logic calculates average ratings (sum/count). If the input is an empty array of reviews, standard math results in NaN. `prove_spec_function` catches this undefined division edge case and forces the PM to define a specific return value (e.g., null or 'No data').

### Physics Trajectory Modeling
A spec uses constants like GRAVITY. The agent calls `prove_spec_function`, but it flags an orphan constant: DRAG_COEFFICIENT is declared but never used in the formula, telling the PM they either need to integrate it or delete it.

## Benefits

- Catches time math errors. Spec Prover validates complex calculations—like subtracting sleep duration from wake time—handling midnight wraps and negative results automatically.
- Guarantees logical completeness. The tool verifies that every constant you declare is actually used in the formula, preventing spec noise and gaps.
- Stops floating-point bugs. If your logic involves money or ratios, Spec Prover forces precision checks to prevent tiny rounding errors from failing comparisons.
- Minimizes developer guessing. By forcing explicit handling of boundaries (zero division, empty arrays), you eliminate ambiguity that leads developers down wrong paths.
- Supports multi-agent reliability. It acts as a single truth source for logic; if the spec fails here, no subsequent agent should proceed.

## How It Works

The bottom line is, you get mathematical certainty that your specification handles every possible input state before writing a single line of code.

1. You feed the Spec Prover your complete specification, including all formulas and declared domain constants.
2. The tool executes a series of mandatory checks: it forces you to trace the formula with concrete inputs, run defined edge cases (e.g., midnight wrap), and verify constant usage.
3. It returns a verdict: either `SPEC_PROVEN` (meaning the logic withstands scrutiny) or detailed rejection reports pointing out exactly which boundary condition failed.

## Frequently Asked Questions

**How do I use Spec Prover with my time calculations?**
You pass the formula and the boundary parameters to `prove_spec_function`. The tool will specifically look for midnight wraps. If it fails, you must update your spec to include explicit modular arithmetic handling.

**Can Spec Prover check if I used all my variables?**
Yes. Running `prove_spec_function` checks for 'orphan constants.' If a variable is declared but never appears in the calculation steps, the tool tells you to either use it or delete it.

**What if I get an error with Spec Prover?**
The rejection report from `prove_spec_function` doesn't just say 'Error.' It pinpoints the exact failure mode—like a negative result or undefined division—and tells you which part of the spec needs fixing.

**Is Spec Prover better than traditional unit testing?**
Yes. Unit tests run on code, assuming the spec is correct. Spec Prover runs *before* coding, validating the mathematical rules themselves, catching errors at the source where they cause maximum damage.

**How does Spec Prover handle floating-point precision loss during calculations?**
Spec Prover forces you to define a clear precision strategy. When running prove_spec_function, the tool demands that your trace uses actual arithmetic methods—like integer cents or Decimal.js—rather than idealized math assumptions. This prevents silent errors common in floating-point comparisons.

**What types of inputs can I provide when using the prove_spec_function?**
You must supply concrete, typed data for your proof. The tool requires specific variable assignments (e.g., `array = [1, 2, 3]`, `rate = 0.15`) so it can perform a full step-by-step trace. It works best with structured inputs that represent real-world domain values.

**Is Spec Prover slow to run on large or complex specifications?**
No, running the proof saves exponentially more time than debugging later. Because it catches errors at the source—like boundary conditions or orphaned constants—it prevents massive cascading failures in your codebase. The initial validation is quick, but the payoff is huge.

**How do I integrate Spec Prover into an existing multi-agent workflow?**
You connect Spec Prover via the Model Context Protocol (MCP) to ensure it runs as a mandatory pre-step. This forces every agent responsible for creating logic to pass the proof step before development begins, guaranteeing logical consistency across teams.

**Does Spec Prover compute or verify the arithmetic itself?**
No. Spec Prover performs zero computation. It forces the AI agent to structure its own reasoning into traceable steps, then validates that the reasoning is logically consistent. If the agent says the output matches the trace but also says the spec is wrong, the tool rejects the contradiction. The agent does all the math — the tool enforces honesty.

**What happens when the tool rejects my proof?**
The tool returns a detailed consistency error explaining exactly which Decision Pivot contradicts your verdict. For example, if you mark `outputMatchesTrace: true` but choose `SPEC_WRONG`, the rejection will explain that if the output matches your trace, the formula cannot be wrong — re-examine your trace arithmetic. Fix the contradiction and call the tool again with `isRevision: true`.

**What kind of edge cases should I trace?**
The tool requires edge case inputs that differ from your normal inputs. Focus on boundaries: zero values (0 cycles), negative results (subtraction below zero), maximum values (24 hours, 1440 minutes), wrap-around conditions (midnight crossover), and empty/null inputs. The tool rejects edge cases that are identical to normal inputs — a second normal case is not an edge case.