# Regex Toolkit MCP

> The Regex Toolkit MCP Server forces your AI client to use strict mathematical patterns for data handling. It pulls every unique email, URL, or phone number from a text block into a clean JSON array using `extract_pattern`. You can also run `mask_sensitive_data` to redact PII instantly, or `validate_pattern` to confirm if user input matches a perfect structural format.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** regex, data-redaction, input-validation, pattern-matching, data-sanitization, security

## Description

Listen up, 'cause most general-purpose LLMs are sloppy when it comes to pulling structured data out of a big chunk of text. They guess where an email ends or they hallucinate phone numbers when they summarize something—it's garbage. This Regex Toolkit MCP fixes that mess by forcing every single piece of data through strict mathematical patterns. You never get guesswork here; you get pure, reliable structure.

This server gives your agent three highly specific tools for handling contact details and validation. When you need to pull all the good stuff out of a massive transcript or report, use `extract_pattern`. This tool doesn't just list contacts; it pulls every unique email address, URL, or phone number from whatever text block you feed it, spitting the results back as a clean, predictable JSON array. If you dump a whole meeting transcript into this, you know exactly what you're gonna get: a structured list of all the contact patterns without missing a single one.

If your goal is to make that data public—like for a report or a knowledge base entry—and you gotta redact anything sensitive first, use `mask_sensitive_data`. This tool scans the entire text blob and instantly replaces every piece of PII it finds—emails, phone numbers, and URLs—with standardized `[REDACTED]` tags. It's quick, clean sanitization for public-facing documents.

When you need to confirm that a user input *is* actually in the right format before processing anything else, run `validate_pattern`. You feed it a single string, and this tool checks if that string perfectly matches the mathematical structure required for an email, URL, or phone number. It's a hard pass/fail check; there's no gray area here.

Think about the sheer volume of unstructured text you deal with. Whether you’re processing a batch of customer feedback containing dozens of contacts, or whether you just need to verify if that single piece of data someone typed in is actually valid—you got your answer right here. The `extract_pattern` tool handles entire passages and reliably pulls out all unique contact patterns into a usable JSON list. The `mask_sensitive_data` tool ensures that when you're sharing data, the PII is stripped clean by replacing emails, phones, and URLs with those standard `[REDACTED]` tags. You use `validate_pattern` to make sure an input string adheres perfectly to the mathematical structure of a phone number, URL, or email before your client runs any other process on it.

These tools let you treat data validation like code—it's absolute and precise. Your agent doesn't guess; it executes based on rules. You feed `extract_pattern` that massive text dump, and it spits back a structured JSON array of unique emails, URLs, or phone numbers. When you need to sanitize for public consumption, `mask_sensitive_data` scans the whole thing and replaces all sensitive data with standardized `[REDACTED]` tags. If you're just checking one piece of info—say, an email address passed in a form—you run `validate_pattern`, which confirms if that single input matches the perfect mathematical format for emails, URLs, or phone numbers. This server guarantees that whatever contact pattern extraction, sanitization, or validation task you give it, you're getting machine-enforced accuracy every time.

## Tools

### extract_pattern
Pulls all unique emails, URLs, or phone numbers from a large body of text into a structured JSON array.

### mask_sensitive_data
Scans and redacts sensitive PII (emails, phones, URLs) in text by replacing them with `[REDACTED]` tags.

### validate_pattern
Checks if a single input string perfectly matches the mathematical format of an email, URL, or phone number.

## Prompt Examples

**Prompt:** 
```
Extract all email addresses from this massive support ticket transcript.
```

**Response:** 
```
Using the extract_pattern tool (type='email'): I found 3 unique emails: ['user@gmail.com', 'admin@company.net', 'billing@startup.io'].
```

**Prompt:** 
```
Mask all sensitive phone numbers and emails in this document before I save it to the public database.
```

**Response:** 
```
Using the mask_sensitive_data tool: The sanitized text is: 'Contact [EMAIL_REDACTED] or call [PHONE_REDACTED] for support.'
```

**Prompt:** 
```
Verify if 'https://www.vinkius.com/dashboard?user=123' is a mathematically valid URL.
```

**Response:** 
```
Using the validate_pattern tool (type='url'): Yes, it is a perfectly valid URL structure.
```

## Capabilities

### Extracting Contact Details
It pulls every unique emails, URLs, and phone numbers from any large text passage into a predictable JSON list.

### Sanitizing PII for Public Use
It instantly scans text and replaces all sensitive data (emails, phones, URLs) with standardized `[REDACTED]` tags.

### Verifying Data Structure Integrity
It checks if a single input string perfectly follows the mathematical structure of an email, URL, or phone number format.

### Batch Pattern Extraction
You feed it a massive transcript, and it reliably pulls out all contact patterns without missing any.

## Use Cases

### Processing Support Ticket Logs
A support manager needs to analyze 500 transcripts for every unique user email and URL mentioned. Instead of writing complex Python scripts, the agent calls `extract_pattern` once. It returns a clean JSON list of all contacts, saving hours of manual data parsing.

### Preparing Data for Public Reporting
You are drafting an annual report that references client interactions but must remain compliant. You pipe the raw text through `mask_sensitive_data`. The tool ensures all names, emails, and phone numbers become `[REDACTED]`, making it safe to distribute without losing context.

### API Input Sanitization
Before submitting a user-provided link or contact number to an external CRM via API, you run the input through `validate_pattern`. If the tool returns false, your agent halts execution and prompts for correction, preventing broken records.

### Data Pipeline Cleanup
You receive a large chunk of text from multiple sources. You use `extract_pattern` to pull out all contacts, then pass that JSON array through a validation check (e.g., only keep emails) before passing the clean data set to your database write tool.

## Benefits

- Guaranteed Accuracy: Forget LLM guesswork. `extract_pattern` forces the extraction of only mathematically correct emails, URLs, and phone numbers, returning a clean JSON array every time.
- Compliance Ready: Use `mask_sensitive_data` to scrub client data before it hits public reports or non-secure databases. It replaces PII with `[REDACTED]` tags instantly.
- Pre-flight Validation: Never send malformed input downstream again. `validate_pattern` confirms if a single string is structurally perfect against specific formats, blocking errors at the source.
- Structured Output: Instead of wading through paragraphs, `extract_pattern` gives you a clean JSON array containing only the unique patterns found in massive text blocks.
- Privacy Focus: The regex engine runs entirely locally within your infrastructure. Your data never leaves your system boundary.

## How It Works

The bottom line is that you get deterministic results. The tool uses fixed rules, not linguistic inference, so you know exactly what data gets extracted or scrubbed.

1. Feed the text or data point to your AI client, specifying which pattern type you need (e.g., 'Extract all emails').
2. Your agent calls the specific tool—for instance, `extract_pattern` for bulk extraction or `mask_sensitive_data` for redaction.
3. The server executes the regex engine locally and returns a structured output: either a clean JSON array of patterns or the sanitized text.

## Frequently Asked Questions

**Why use this instead of asking the AI to find the emails?**
Because LLMs predict text probabilistically. They might miss emails embedded in weird characters (like `contact@company.com.` with a trailing dot) or hallucinate non-existent addresses. Regex provides mathematical certainty.

**Does the PII masking send data to the cloud?**
Never. The `mask_sensitive_data` tool runs exclusively on your local Javascript engine (V8). It acts as a local firewall, ensuring sensitive strings are redacted before any external processing happens.

**What format of phone numbers are supported?**
The regex captures international formats with country codes (e.g., +1, +55), optional parentheses for area codes, and spacing/hyphens commonly used globally.

**If I use the `validate_pattern` tool on a string that is close but incorrect, what specific error details does it return?**
It returns precise failure diagnostics. If your input doesn't match a standard email, URL, or phone structure, the tool reports 'False' and specifies which pattern type failed validation.

**Are there rate limits when I use `extract_pattern` on extremely large documents (e.g., 50MB+ transcripts)?**
No, Vinkius manages the throughput for you. The service is built for high-volume processing and handles massive text inputs without imposing artificial usage limits.

**How does the `mask_sensitive_data` tool ensure that only PII gets replaced?**
It uses defined, strict regex groups to identify patterns. When it finds a match for an email, URL, or phone number, it replaces *only* those specific text segments with [REDACTED].

**If I need to extract data that isn't an email, URL, or phone (like product SKUs), can the Regex Toolkit handle it?**
No. The toolkit is designed exclusively for standard PII patterns: emails, URLs, and phones. For custom formats like SKUs, you must use a different specialized pattern extraction service.

**Do I need to worry about setting up authentication or credentials when connecting the Regex Toolkit via MCP?**
No extra setup is required beyond your standard Vinkius API key. The Model Context Protocol handles the secure connection, letting your AI client access the tools immediately.