# Task Completion Enforcer Prover MCP

> Task Completion Enforcer Prover provides a rigorous, multi-stage audit for AI outputs. When you run this tool, it forces your agent to execute five checks: listing every original requirement; providing specific code evidence (file/line numbers); identifying all work gaps; completing the missing work immediately; and comparing the final output line-by-line against the initial request. It stops declaring 'done' until everything is proven.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** task-completion, qa, verification, llm-enforcement, checklist, requirement-tracking, anti-placeholder

## Description

This server's job is simple: it makes sure your AI agent actually finishes what you asked for. It runs a five-axis audit that refuses to let your agent declare 'done' until everything's proven and verified. You won't get vague summaries here; you only get verifiable proof.

The `validate_task_completion` tool forces your agent through a rigorous, multi-stage process every single time it runs. First, it makes the agent list every original requirement from your prompt as a numbered checklist. This step ensures that nothing gets missed because of oversight or assumption.

For each item on that mandatory list, the server demands proof of implementation. It won't accept 'I did it'; it needs specific artifacts—file paths, line numbers, or function calls pointing directly to the work. If your agent just says it wrote a class, this tool makes it cite exactly where that class lives and what lines are involved.

If anything is missing, the system flags it immediately. It automatically finds and reports any requirement for which there's no evidence, or any piece of work the agent skipped over entirely. Furthermore, it shuts down the output if your agent leaves in placeholders like `TODO`, `FIXME`, or 'TBD'; you gotta replace those with actual, working code before anything moves forward.

The process keeps running until every single gap is closed. The agent must address and fix everything flagged by the tool, then call it back out again to re-verify its own work. This isn't a suggestion; it’s mandatory for the task to proceed. Finally, the server performs a deep comparison of the resulting code or document against your initial prompt. It checks that the final output only addresses what you asked for and doesn't drift off into some adjacent, but totally incorrect, problem area. If even one line of the final product doesn't match the scope laid out in your original request, the whole thing fails the audit, and the loop starts over.

## Tools

### validate_task_completion
Forces the AI agent to prove task completion by running a five-axis audit: extracting requirements, providing evidence, identifying gaps, closing those gaps, and performing final verification against the original request.

## Prompt Examples

**Prompt:** 
```
The user asked me to create 5 endpoints. I built 3 with full validation. The tests cover the main happy path. Documentation was mentioned but I focused on the code. I believe this is complete.
```

**Response:** 
```
REQUIREMENT_AMNESIA — 3 of 5 endpoints built (60%). Tests incomplete (happy path only, no error cases). Documentation not started. You forgot 40% of the requirements. Re-read the original request. List ALL 5 endpoints. Build the missing 2. Add error case tests. Write the documentation. Call again.
```

**Prompt:** 
```
Requirements: 1. POST /users ✅ at routes/users.ts:15-42. 2. GET /users/:id ✅ at routes/users.ts:44-58. 3. PUT /users/:id ✅ at routes/users.ts:60-82. 4. DELETE /users/:id ✅ at routes/users.ts:84-95. 5. GET /users (list) ✅ at routes/users.ts:97-115. Tests: 15 cases at tests/users.test.ts:1-180, all passing. Docs: OpenAPI updated at docs/api.yaml:200-340. Gaps: 0 — all 5 endpoints implemented with evidence. Verified against original request line by line: 5/5 requirements match.
```

**Response:** 
```
DELIVERY_PROVEN — All 5 requirements extracted, each mapped to specific file and line evidence, zero gaps identified, verification complete against original request. This task is genuinely complete.
```

**Prompt:** 
```
Built the authentication system. Login works. Registration has a TODO for email verification. Password reset says 'implement similar logic as login'. The middleware is a stub that returns true. Tests exist but 2 are skipped. This should be everything the user asked for.
```

**Response:** 
```
PLACEHOLDER_INFECTION — Four placeholders found: 'TODO for email verification' (not done), 'implement similar logic' (not done), middleware stub returning true (not done), 2 skipped tests (not done). Replace EVERY placeholder with actual implementation. A skeleton is not a building. Do the work, then call again.
```

## Capabilities

### Audit All Requirements
It forces your agent to extract and list every single requirement from the initial user prompt as a numbered checklist.

### Prove Implementation Evidence
The tool requires specific artifacts for each requirement, naming file paths, line numbers, or function calls instead of just saying 'I did it'.

### Flag Missing Work (Gaps)
It automatically finds and reports any requirements that lack evidence or which the agent skipped over.

### Prevent Placeholder Submission
The system rejects outputs containing placeholders like `TODO`, `FIXME`, or 'TBD' until they are replaced with actual, working code.

### Verify Scope Adherence
It checks that the final output only addresses the original prompt and doesn't veer off into an adjacent, but incorrect, problem area.

## Use Cases

### Building a Microservice API
A developer asks their agent to create 5 endpoints and write tests. The agent finishes 3 and calls 'done.' Running the Task Completion Enforcer Prover immediately flags Requirement Amnesia, forcing the agent to build the two missing endpoints before it can proceed.

### Updating Compliance Documentation
You ask an agent to update a 50-page compliance manual with three specific regulatory changes. The agent writes the sections but leaves placeholders like `[Pending Review]` and fails to reference two mandatory source documents. The Prover catches Placeholder Infection, forcing it to cite the real sources.

### Refactoring Legacy Codebase
You task your AI agent with modernizing a module's authentication flow (Login, Register, Reset). The agent implements Login but solves a more complex 'user profile update' problem instead. Using the Prover catches Scope Drift, forcing the agent to stick strictly to the original auth requirements.

### Completing Multi-Step Onboarding
You ask an agent to set up user roles (Admin, Editor, View Only) across three different systems. The agent implements Admin and Editor but fails on the third system's API calls. The Prover identifies a gap in the final verification step, ensuring all three role levels are accounted for.

## Benefits

- Eliminates Requirement Amnesia. You don't have to manually check if the agent forgot one of the five items asked for; the tool forces an explicit, numbered checklist against the original prompt.
- Stops Premature Completion. The server won't let your agent call 'done.' It keeps running until it has verified evidence for every single component, preventing summary-based false positives.
- Catches Placeholder Infection. If your agent leaves a `TODO` or `FIXME`, the tool finds it and forces you to write actual code instead of submitting a skeleton.
- Guarantees Scope Adherence. It prevents scope drift by comparing every line of the output back to the original request, ensuring the AI solved *that* problem, not a related one.
- Provides Full Audit Trail. The process generates an audit trail—the five axes (Requirements, Evidence, Gaps, Continuation, Verification)—giving you undeniable proof of completeness.

## How It Works

The bottom line is: it turns an AI's self-declaration of completion into a structured, auditable, and verifiable engineering deliverable.

1. Start by feeding the tool your initial request. It immediately extracts a numbered checklist of all requirements.
2. Your agent then works through the task and calls the tool repeatedly, providing concrete evidence (file/line numbers) for each requirement until zero gaps are reported.
3. The process concludes only when the system confirms that every line of the original prompt matches verifiable output.

## Frequently Asked Questions

**How does the Task Completion Enforcer Prover know what requirements I asked for?**
It starts by analyzing your original request and automatically generating a numbered, specific checklist of every actionable requirement. This list becomes the absolute standard against which all subsequent work is measured.

**Does Task Completion Enforcer Prover just check if I mentioned five endpoints?**
No. It checks for physical evidence. For each endpoint, it requires specific artifacts—a file path, a line number, or a test result—proving the code actually exists and works.

**What happens if my agent leaves a TODO comment after using Task Completion Enforcer Prover?**
The tool detects this as Placeholder Infection. It stops execution and forces the agent to replace every placeholder with actual, working code before allowing the process to continue.

**Is Task Completion Enforcer Prover useful for general knowledge retrieval?**
No. This tool is built specifically for enforcing complex task completion in technical outputs (code, documentation). It won't help you find a random fact; it only verifies work against a defined scope.

**If `validate_task_completion` identifies gaps, does it just stop or force me to finish the work?**
It forces continuous work until all axes pass. If any gap remains after a check, the tool will not declare completion. You must fix the specific missing pieces and call the function again.

**What kind of evidence does Task Completion Enforcer Prover accept for proof of work?**
It requires concrete artifacts like file paths, line numbers, or test results. General statements or summaries are not enough; it demands specific proof that the requirement was met.

**Does using Task Completion Enforcer Prover require me to keep the original user request visible?**
Yes. The tool must re-read and compare every line of your output against the original prompt multiple times. Keeping the source material accessible is critical for verification.

**Can Task Completion Enforcer Prover handle massive, multi-domain requirements in one go?**
It handles complex tasks by breaking them into five verifiable axes. For extremely large requests, splitting the work into smaller, domain-specific chunks often yields the most reliable results.

**Why do LLMs forget requirements?**
Autoregressive generation allocates decreasing attention to earlier tokens as output grows. A 10-step task at token 200 competes with 2,000 tokens of generated output for attention. The model literally loses track of requirement #7 while implementing requirement #3. The fix: force a re-read of ALL requirements before declaring completion.

**What counts as 'completion evidence'?**
Not 'I implemented the function.' Evidence means: 'Requirement 1: POST /users endpoint at src/routes/users.ts lines 15-42, validates email/name/role via Zod schema, returns 201 with user object.' File path, line number, specific behavior. If you cannot point to the exact artifact, it is not done.

**What happens when gaps are found?**
The LLM MUST close them immediately — not later, not in a follow-up. Do the remaining work NOW. Then call this tool AGAIN to verify the gaps are actually closed. The loop continues until EVERY requirement has concrete evidence. 'I will do it later' is rejected. 'I just did it, here is the evidence' is accepted.