# Reversibility Architect Prover MCP

> Reversibility Architect Prover forces your AI agent to prove an architectural change won't break production systems. It runs a mandatory 6-pivot validation, mapping data rollbacks, blast radius containment, and precise abort criteria before any migration or deployment executes.

## Overview
- **Category:** devops
- **Price:** Free
- **Tags:** reversibility, sre, rollbacks, canary, multilingual, deployment-safety

## Description

**validate_reversibility** forces your AI agent to prove that any architectural change you propose won't wreck production when it deploys. This isn't just a warning system; it runs a mandatory, six-point validation cycle before *anything*—migrations, code changes, or feature flips—can ever execute against live systems.

You gotta treat your AI agent like an actual Site Reliability Engineer (SRE) and make it prove its homework. It doesn't take vague assurances; it demands concrete proof across six critical vectors to ensure you don't ship broken crap.

First off, when dealing with data changes, the tool mandates that you prove the migration is reversible. You map out the rollbacks using additive patterns, which means you guarantee no information gets lost if things go sideways and you have to revert. It ensures that even if the new code fails, the old system can still read and process everything without a hitch.

Next up are the fail-safes. The agent requires you to define measurable rollback triggers, setting quantifiable thresholds like specific error rate spikes or sudden latency jumps. If these metrics cross your predetermined line—say, the error rate hits 4% for two minutes straight—the system automatically triggers an abort sequence before any user notices a blip.

Failure containment is crucial too. You prove this by having the tool calculate the potential failure size using canary strategies. Instead of hitting all users at once and watching everything crash, it restricts rollouts to small segments first—maybe 1%, then maybe 10%—letting you slowly ramp up traffic while monitoring for issues.

The time estimate isn't optional either. You run the tool against production-scale data volumes to accurately gauge potential downtime. This testing prevents unexpected write locks or service outages that only show up when you hit gigabytes of real data, not just some dev sandbox dummy set.

Every new user-facing feature must be protected by a kill switch—a feature flag. The validation confirms that the feature is deployed behind this flag, guaranteeing an instant, non-disruptive rollback at any point in time by simply toggling it off. It's your emergency brake, and you gotta prove it works.

Finally, before deployment, you run a post-mortem simulation. This doesn't just check the code; it simulates failure happening at 2 AM and maps out the exact recovery steps for your on-call team. You prove readiness—that the team can recover from major failures without needing manual intervention or waking up the whole damn staff.

If you can’t pass all six of these rigorous checks, the tool stops everything dead in its tracks. Period.

## Tools

### validate_reversibility
Runs a structured validation check, forcing analysis of data migration reversibility, rollback criteria, blast radius, downtime estimation, feature flags, and post-mortem simulation for any proposed change.

## Prompt Examples

**Prompt:** 
```
Data: drop column. Rollback: restore backup.
```

**Response:** 
```
Verdict: DATA_CORRUPTION_RISK. A backup restore implies data loss between the backup and the rollback. You must use additive migrations only.
```

**Prompt:** 
```
Data: Additive only. Rollback: On error. Blast: Whole app. Down: None. Flag: Deploy all.
```

**Response:** 
```
Verdict: ALL_OR_NOTHING_DEPLOY. You failed to use feature flags or a canary strategy.
```

**Prompt:** 
```
Data: Additive migration. Rollback: 5xx > 1%. Blast: Search service only. Down: Zero. Flag: Canary 1%. Sim: Fails if redis times out.
```

**Response:** 
```
Verdict: REVERSIBILITY_PROVEN. Architecture is safe to deploy.
```

## Capabilities

### Prove Data Migration Safety
Maps data rollbacks using additive migration patterns to ensure no information is lost when reverting database changes.

### Define Measurable Rollback Triggers
Sets quantifiable thresholds (e.g., latency spikes, error rates) that automatically trigger an abort sequence during a deployment.

### Contain Failure Scope (Canary)
Calculates the potential failure size by restricting rollouts to small user segments before expanding the traffic slowly.

### Estimate Production Downtime
Tests time estimates against production-scale data volumes, preventing unexpected write locks or service outages.

### Implement Kill Switches (Feature Flags)
Verifies that all new features are deployed behind a flag, allowing instant, non-disruptive rollback at any point.

### Simulate Failure Scenarios
Runs simulated post-mortems to confirm the on-call team can recover from failure without manual intervention or waking up the whole staff.

## Use Cases

### Refactoring an old API endpoint
The team needs to update a core payment service. Instead of deploying, the agent runs `validate_reversibility`. The tool forces them to prove they added new fields before retiring old ones and that the rollback path supports legacy clients until migration is complete.

### Deploying a major redesign
The UX team finishes a beautiful dashboard v2. Before deployment, running `validate_reversibility` forces them to wrap it in feature flags and mandate a canary rollout (e.g., 1% of users first). This prevents the entire user base from seeing a slow or buggy interface.

### Fixing a database schema bug
An engineer proposes dropping an old, unused column. The agent immediately flags this as high risk, forcing the use of additive migrations and proving that no current reports rely on that specific data point before allowing the change.

### Preparing for a major traffic increase
The company is launching into a new market. The agent runs `validate_reversibility` to calculate the maximum acceptable blast radius and confirms that auto-rollback criteria are set high enough to handle unforeseen spikes without manual intervention.

## Benefits

- **Data integrity is guaranteed.** The tool forces additive migrations, meaning you can't just `ALTER TABLE users RENAME COLUMN username TO display_name`. You must build the column first and drop it later. This prevents data loss during rollback.
- **Rollback criteria are measurable.** Instead of 'if something goes wrong,' you define metrics like `Error rate > 4% for > 2 minutes`. This removes human guesswork from critical, high-stress decisions at 2 AM.
- **Blast radius is contained.** The tool mandates canary deployment strategies (1% → 10% → 50%). You detect and fix issues when only a tiny fraction of users are affected, not all 50,000 daily users.
- **Downtime estimates are accurate.** It forces you to test timing against production-scale data. Don't trust development database timings; the tool tells you if an index creation will lock your tables for hours.
- **Feature flags become mandatory.** For user-facing changes, it demands a kill switch strategy. This means instant rollback by setting a flag back to 0%, bypassing slow redeployments entirely.

## How It Works

The bottom line is that it forces your AI client to stop making assumptions and start providing engineering proof before touching production code.

1. You feed your AI client the proposed change: a database migration script, deployment plan, or architectural update.
2. The agent runs `validate_reversibility`, which forces the analysis across the six mandatory pivots (data reversibility, blast radius, etc.).
3. It returns a final verdict: either 'REVERSIBILITY_PROVEN' with an actionable green light, or a detailed list of required mitigations and risks.

## Frequently Asked Questions

**How does validate_reversibility handle schema changes?**
The tool mandates additive migrations. It won't approve dropping a column or renaming one unless the process proves data can be read and written to both old and new formats during the transition period.

**Is validate_reversibility only for database changes?**
No, it covers all architectural risk. It analyzes things like feature flag strategies and blast radius isolation, not just SQL scripts. You feed it the overall plan.

**What if I can't define clear rollback criteria?**
The tool will fail the validation immediately. It forces you to quantify 'something going wrong,' demanding specific metrics like error rates or latency thresholds for an automated abort trigger.

**Does validate_reversibility account for global scale?**
Yes. The blast radius and downtime estimation pivots force consideration of geographical scaling, ensuring failures are contained to small, manageable regions first (canary deployments).

**What input format does validate_reversibility need to provide a comprehensive assessment?**
You must supply details for all six validation pivots: data migration, rollback criteria, blast radius isolation, downtime estimates, feature flag strategy, and post-mortem simulations. Missing any step prevents the tool from giving a complete verdict.

**If validate_reversibility rejects my architecture proposal, how should I interpret the output?**
The verdict points directly to the specific failed pivot (e.g., ALL_OR_NOTHING_DEPLOY). The accompanying text explains which safety rule was violated and what specific mechanism you need to implement.

**Does validate_reversibility only process code changes, or can it handle procedural updates?**
It handles both. While its core is technical, the tool forces your AI agent to map out the entire lifecycle impact—operational procedures, manual steps, and data flow—not just lines of code.

**Is the architectural information I input into validate_reversibility kept secure?**
Yes. Vinkius processes all inputs confidentially within your workspace scope. The tool's purpose is to generate safety reports for internal review; your data remains private.

**Why force a post-mortem simulation?**
Optimism bias. Forcing the AI to explain WHY the deployment failed before it happens exposes edge cases it ignored.

**Why is data corruption the first pivot?**
Code can be rolled back. Data loss is permanent. If data isn't safe, the architecture is invalid.

**What counts as a rollback criterion?**
Measurable SLA violations, like '5xx errors > 1%' or 'Latency > 200ms'.