# CPF/CNPJ Batch Processor MCP

> CPF/CNPJ Batch Processor handles large lists of Brazilian identification numbers. It validates if records are real or duplicates and formats raw data so it looks right for humans, while also stripping all punctuation into pure digits for database entry.

## Overview
- **Category:** utility
- **Price:** Free
- **Tags:** brazil, cpf, cnpj, batch, formatter, validator

## Description

Working with Brazilian client data means dealing with messy identifiers. You've got lists where some IDs are clean, some have dots, some are just a jumble of numbers—and sometimes they're duplicates. This MCP lets you process entire batches through three necessary steps automatically. First, it strips away all punctuation and non-digit characters, giving you raw data ready for a database field. Next, if you need to show the IDs to a person, it applies standard formatting masks (like dots and dashes). Finally, it verifies everything in the batch against official rules, telling you exactly which numbers are valid or if there's a duplicate. You don't have to copy, paste, and manually scrub hundreds of records; just feed the list into your agent via Vinkius, and it handles all the cleanup.

## Tools

### clean_batch
This tool strips away dots, dashes, and any non-numeric characters from a group of identifiers.

### format_batch
This tool adds standard formatting (dots/dashes) to make the batch of CPF or CNPJ numbers readable.

### validate_batch
This tool checks a list of identifiers against official rules, telling you if they are valid and detecting duplicates.

## Prompt Examples

**Prompt:** 
```
Check if these numbers are valid: ['123.456.789-01', '00.000.000/0001-91']
```

**Response:** 
```
The validation results are: 123.456.789-01 is invalid, and 00.000.000/0001-91 is valid.
```

**Prompt:** 
```
Format these raw numbers: ['12345678901', '00000000000191']
```

**Response:** 
```
The formatted numbers are: 123.456.789-01 and 00.000.000/0001-91.
```

**Prompt:** 
```
Clean this list: ['123.456.789-01', '00.000.000/0001-91']
```

**Response:** 
```
The cleaned identifiers are: 12345678901 and 00000000000191.
```

## Capabilities

### Check ID validity
It verifies a group of CPF or CNPJ numbers to confirm if they are officially real or if duplicates exist.

### Format identifiers for display
It applies standard masking (dots and dashes) to raw ID numbers, making them easy for people to read.

### Strip non-numeric characters
It removes all punctuation, dots, and dashes from a batch of identifiers, leaving only pure digits for systems.

## Use Cases

### Migrating old client records
A company needs to move 10,000 archived customer files into a new system. The IDs are stored in varied formats—some have dots, some don't. Instead of writing complex regex code, your agent can process the whole list using `clean_batch` and then load the pure digits into the database.

### Onboarding a new department
A branch office is getting new client lists from various departments. The IDs are inconsistent and some might be fake. You feed the list to your agent, which runs `validate_batch` immediately, flagging every single bad ID so the team knows exactly what needs manual correction.

### Creating a public-facing report
You need to show a client's IDs in a summary document for review. The data is currently pure digits. You run `format_batch` on the clean list, and the output looks exactly like how it should be printed.

### Auditing internal records
You suspect that over time, the same client ID might have been entered twice by different staff members. You run `validate_batch` on your entire database backup to check for duplicates and ensure data integrity.

## Benefits

- Stop manual scrubbing. By using `clean_batch`, you strip away all non-numeric characters from entire lists in one go, preparing the identifiers for database ingestion.
- Keep your users happy with readable data. The `format_batch` tool automatically applies standard dot and dash masks, so staff don't have to guess where a number goes.
- Avoid compliance headaches. Running `validate_batch` confirms that every ID in your batch is officially valid, saving you from bad client records.
- Speed up migration. You can process thousands of IDs at once, making data transfers simple and fast without writing complex scripts.
- Consistency across the stack. It ensures whether an ID is being read by a human or written into a database, it always follows the correct structural rules.

## How It Works

The bottom line is: you start with messy data, run it through structured processing steps, and finish with clean, verifiable results.

1. Give the MCP your raw list of Brazilian IDs. The system first runs this through the cleaning stage to strip away all dots and dashes.
2. Next, you can send that cleaned data through the formatting tool. This step applies standard masks so the numbers look correct for a user interface.
3. Finally, the output is passed to the validation checker. You get back a report detailing which IDs are valid and if any duplicates were found.

## Frequently Asked Questions

**How do I clean a batch of CPF numbers using `clean_batch`?**
`clean_batch` strips all non-numeric characters from your list, leaving only the pure digits. This is what you use when loading data into a database that doesn't accept formatting.

**Can I validate CNPJ numbers in bulk with `validate_batch`?**
Yes, `validate_batch` handles both CPF and CNPJ identifiers. You give it the list, and it returns a clear report on which are valid or if there are duplicates.

**What is the difference between `clean_batch` and `format_batch`?**
`clean_batch` removes structure (it strips dots/dashes). `format_batch` adds structure back in (it puts dots/dashes) to make the data human-readable.

**Do I need to run all three tools?**
Not always. If your source data is already clean and formatted, you might only need `validate_batch`. But for maximum safety, running them in sequence is best practice.

**If I use `validate_batch` with identifiers that aren't CPF or CNPJ, what does it do?**
The function returns validation failures for those inputs. It strictly checks against known Brazilian ID formats and will flag any non-conforming numbers immediately.

**Are there rate limits when running `clean_batch` on a very large dataset?**
Yes, high volume usage is subject to standard API rate controls. For massive datasets, we recommend breaking your input into smaller, manageable batches.

**What specific mask does `format_batch` apply when formatting CPF numbers?**
It applies the standardized 123.456.789-01 format. This ensures that every single number you receive is instantly readable by a human.

**How does `validate_batch` report if there are duplicate IDs in my list?**
The response object explicitly flags any identifier found more than once. You get immediate confirmation of duplicates right alongside the validation status.