# Context Engineering Prover MCP

> Context Engineering Prover validates and structures prompts before they run. This MCP forces your agent to audit context for relevance, structure it with priority delimiters, calculate token budgets, ground instructions in evidence, and define measurable quality metrics. Stop feeding your AI client noise; prove your context works.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** context-engineering, prompt-optimization, token-budget, attention-decay, context-window, evidence-based-prompting, quality-metrics, structured-reasoning

## Description

You know how easy it is to dump every document, schema, and conversation history into a prompt, thinking 'more context' means better results? It doesn't. Too much unreferenced data confuses the model, diluting its attention on what actually matters. This MCP solves that structural problem. Instead of just sending context, you run this validation process first. It forces your agent to prove five things: which parts of the context are absolutely needed (the removal test), how those parts are prioritized and separated, exactly how many tokens they take up (token budgeting), why every instruction is accurate (evidence grounding), and what the final success metric will be (quantified measurement). By running this check first, you stop guessing at good prompts. You get a clear verdict on whether your context setup is ready for production use. It's the mandatory quality gate for any complex AI task, making sure that when your client connects through Vinkius, it only receives high-fidelity instructions.

## Tools

### validate_context_engineering
This function audits a prompt's context by forcing five checks: proving every block is relevant, ordering the structure, setting token budgets and waste ratios, citing evidence for instructions, and defining measurable quality metrics.

## Prompt Examples

**Prompt:** 
```
I'm building a chatbot. I'll include the entire codebase, all docs, and the full conversation history in every prompt. The context window is 128K, so it all fits.
```

**Response:** 
```
CONTEXT_IRRELEVANT — 'It fits' is not a context strategy. 128K tokens with 80% unreferenced content means 102K tokens of noise degrading attention. Apply the removal test: remove a file, if the output doesn't change, that file is noise. Audit each block: what is its purpose? What breaks if removed? If removal changes nothing, the block degrades quality.
```

**Prompt:** 
```
Context audit: Block 1 — schema.prisma (2,400 tokens, model needs table structure for SQL generation, removal causes hallucinated column names). Block 2 — README.md conventions (800 tokens, naming patterns, removal causes inconsistent naming). Block 3 — 3 few-shot examples (1,800 tokens, reinforces output format). Budget: 8,192 total, 792 response headroom. Structure: <SYSTEM> → <SCHEMA> → <EXAMPLES>. Evidence: delimiters improved accuracy 23% on 50 eval cases. Metric: SQL accuracy on 50 test cases, baseline 62%, target 85%.
```

**Response:** 
```
CONTEXT_PROVEN — Context engineering validated. Relevance: each block justified with removal test. Structure: priority-ordered with semantic delimiters. Bounds: 8,192 tokens allocated per block with 792 response headroom. Grounding: delimiters validated by A/B test (23% accuracy improvement). Measurement: SQL accuracy on 50 test cases, baseline 62%, target 85%. Production-ready context construction.
```

**Prompt:** 
```
I structured my context with delimiters and ordered by priority. I think the structure helps based on best practice. Token budget: it fits in the window. Quality: the outputs look better than before.
```

**Response:** 
```
CONTEXT_UNGROUNDED — Structure and ordering pass, but the grounding fails. 'I think it helps' and 'best practice' are not evidence. Cite a test result: 'delimiters improved accuracy by X% on N eval cases.' Cite a documented pattern: 'diminishing returns beyond 3 examples.' Also: 'it fits' is not a token budget — allocate per block with waste ratio. 'Looks better' is not a metric — define baseline, target, and measurement method.
```

## Capabilities

### Audit Context Relevance
It runs a removal test on every context block to ensure the information is critical and not just filler noise.

### Structure with Priority Delimiters
The MCP orders your context blocks from most important to least, wrapping them in semantic tags so the model knows what it's reading.

### Calculate Token Budgets and Waste Ratio
It specifies total token limits, allocates tokens per block, and quantifies how much of the context is unreferenced waste.

### Ground Instructions in Evidence
You must cite test results or documented patterns to justify every major instruction given to your agent.

### Define Quantifiable Quality Metrics
It requires you to set a specific metric, a baseline performance number, and an achievable target for the output.

## Use Cases

### Debugging an unreliable customer service chatbot
The agent keeps hallucinating answers because the prompt includes too much outdated documentation. You run the Prover, which flags irrelevant docs and forces you to prune the context down to only the last 3 versions of the policy manual.

### Building a financial data extraction pipeline
You need your AI client to pull specific fields from PDFs. Before running it, you use the Prover to enforce schema delimiters and allocate token budgets based on the expected length of the source documents.

### Improving complex code generation tasks
The model fails because it gets lost in a massive codebase dump. You run the Prover, which forces you to structure the context by component priority and only include files that pass the removal test.

## Benefits

- Eliminate 'Context Dumping': The removal test ensures that every piece of data you include is critical, saving compute time by cutting out unreferenced noise.
- Guaranteed Structure: By requiring priority ordering and semantic delimiters, the MCP keeps critical instructions front-and-center where attention weights are highest.
- Financial Control: It forces a token budget calculation and waste ratio analysis, letting you know exactly how much of your prompt is pure filler before you hit send.
- Accountability Check: You can't rely on 'best practices.' This MCP demands that every major instruction be backed by test results or documented patterns.
- Measurable Results: Instead of accepting vague feedback like 'it looks better,' it forces you to define a baseline, target metric, and measurement method for true quality control.

## How It Works

The bottom line is that it turns 'I think this helps' into an objective, measurable pass/fail grade for your prompt engineering effort.

1. First, feed your agent all the context blocks and instructions you plan to use. Then, call the validate_context_engineering tool.
2. The MCP forces a structured reflection process: it runs relevance tests, orders the content with delimiters, calculates token usage, demands evidence citations, and sets success metrics.
3. You receive a verdict (CONTEXT_PROVEN or one of five failure modes) telling you exactly what structural flaw degrades performance.

## Frequently Asked Questions

**Why do I need the Context Engineering Prover MCP?**
You need it because simply including information doesn't mean the AI uses it effectively. This MCP forces you to prove relevance, structure, and budget before running any complex task.

**Does validate_context_engineering write my prompt for me?**
No, it acts as a mandatory quality check on your existing context setup. It doesn't write the content; it audits the structure and effectiveness of the content you provide.

**What if validate_context_engineering fails? What does that mean?**
It means there is a structural flaw, like too much unreferenced noise or missing metrics. The output will tell you the exact axis (e.g., CONTEXT_UNBOUNDED) that needs fixing.

**Is this better than just using a larger context window?**
Absolutely. A bigger window only means more potential noise. This MCP teaches you how to use the space efficiently by forcing token budgeting and waste ratio quantification.