# SMS GSM-7 Sanitizer MCP

> SMS GSM-7 Sanitizer sanitizes draft text by stripping emojis, complex Unicode characters, and non-GSM-7 symbols. It converts content to pure 7-bit ASCII encoding, guaranteeing compliance with the standard 160-character SMS limit. This prevents messaging services from automatically splitting your campaign into multiple parts, saving you unnecessary telecom charges.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** text-encoding, gsm7, sms-optimization, string-sanitization, telecom-billing, unicode-conversion

## Description

You know how it is with marketing copy generated by an AI agent. The thing spits out this fancy-pants text, loaded up with emojis and characters that look cool but absolutely wreck your SMS campaign. Those little flourishes—the zero-width spaces, the hearts, the complex Unicode symbols—they instantly break the standard 160-character GSM-7 limit. When that happens, the messaging service doesn't just chop it off; it forces you to pay for multiple segments, which costs a fortune and makes your messages look sloppy.

That’s where **`sanitize_gsm7`** comes in. It’s not some vague suggestion box tool; it’s a dedicated sanitizer that runs aggressive cleaning protocols straight through your draft text. You feed it whatever mess you've got—whether it's riddled with accent marks, fancy quotes, or obscure Unicode symbols—and what you get back is pure, clean 7-bit ASCII copy. This process guarantees the message adheres to strict GSM-7 encoding rules, keeping everything simple and fitting into a single segment.

The tool works by systematically identifying non-compliant characters. First up, it strips all emojis and every zero-width character from your text payload. There are no exceptions; anything that isn't basic English ASCII gets wiped out. This step alone eliminates the visual clutter and the technical risk posed by modern Unicode symbols. You'll only be left with plain, readable letters.

Next, it handles those complex accents. If your source text uses characters like é, ü, or ñ, the system doesn't just fail; it transcribes them. It converts every accented letter into its basic, safe 7-bit English equivalent (like converting ‘é’ to ‘e’). This transliteration process ensures compatibility with older, stricter SMS systems that don't recognize special diacritics.

This whole mechanism guarantees the output text maintains strict adherence to 7-bit ASCII encoding. Because of this rigorous filtering and conversion, your resulting message is guaranteed to fit within the single-segment 160-character limit required by GSM-7 standards. You don't get those multi-part segments that cost you extra money just because some agent got too creative with a rocket emoji.

The tool processes content from various sources; it doesn’t care if your copy came from an LLM draft or a legacy system. It simply cleans the payload until it meets the exacting standards of single-segment SMS messaging. You write the message, you run it through `sanitize_gsm7`, and you get back text that's ready to blast out without worrying about unexpected carrier fees or accidental truncation. It’s simple, direct cleanup for maximum delivery reliability.

## Tools

### sanitize_gsm7
Strips emojis and complex Unicode from draft text, converting it to pure 7-bit ASCII. This guarantees GSM-7 encoding so your SMS message remains under the 160-character limit.

## Prompt Examples

**Prompt:** 
```
Sanitize this SMS draft: `Hey John! 🚀 Your order is on the way. The "coffee" is hot!`
```

**Response:** 
```
SMS GSM-7 Sanitization: `Hey John! Your order is on the way. The "coffee" is hot!`
```

**Prompt:** 
```
Strip all Unicode from this alert message before I send it to Twilio.
```

**Response:** 
```
SMS GSM-7 Sanitization: Message is safe to send.
```

**Prompt:** 
```
Make sure this marketing broadcast is strictly 7-bit ASCII.
```

**Response:** 
```
SMS GSM-7 Sanitization: Sanitization applied.
```

## Capabilities

### Strip Emojis and Special Characters
The tool removes all emojis and zero-width characters from the text, leaving only plain, readable ASCII characters.

### Convert Complex Accents
It converts accented letters (like é or ü) into their basic 7-bit English equivalents, ensuring compatibility with older SMS systems.

### Guarantee GSM-7 Encoding
The output text is guaranteed to adhere to the strict 7-bit ASCII encoding required for single-segment SMS messages.

## Use Cases

### The AI Draft Went Too Fancy
An agent drafts a campaign: 'Check out our new ☕️ feature! Don't forget to call us today!' The developer tries to send it, but the carrier splits it into three parts because of the emoji and Unicode. Instead, they run the draft through `sanitize_gsm7`, which cleans it up to: 'Check out our new feature! Don't forget to call us today!', solving the billing problem instantly.

### Handling International Copy
A marketing team has copy with many accented characters (e.g., 'résumé', 'São Paulo'). If they send it raw, some carriers might reject or mangle the text. They use `sanitize_gsm7` to strip these accents down to plain ASCII letters, ensuring maximum delivery success across different regional SMS gateways.

### Cleaning Up Raw User Input
A system accepts user feedback via chat and needs to send a summarized alert SMS. The raw input includes emojis and complex symbols. Running the payload through `sanitize_gsm7` cleans up the data first, ensuring that the resulting automated summary is clean and compliant for immediate broadcast.

### Validating AI Output Before Publish
Before publishing a high-volume alert, developers pipe every draft message through `sanitize_gsm7`. This acts as a mandatory final gate check. If the tool returns a clean string, they know it's safe for mass distribution and won't trigger billing overruns.

## Benefits

- Stops unexpected billing charges. Since the `sanitize_gsm7` tool enforces pure 7-bit ASCII, you guarantee your message stays within the single 160-character segment limit. No more surprise fees from carriers.
- Works with any AI client. Whether your agent is running in Claude Desktop or VS Code Copilot, connecting this MCP ensures that the text data passed for SMS always meets GSM-7 compliance standards.
- Saves time on debugging failures. Don't waste hours figuring out why a message was cut off mid-sentence. Run it through `sanitize_gsm7` first to validate payload integrity before sending.
- Handles complex inputs reliably. The tool aggressively strips emojis and zero-width characters, meaning you don't have to worry about the LLM adding flair that breaks your campaign copy.
- Universal compatibility. It doesn't matter if your source text has smart quotes (`”`) or fancy accents; `sanitize_gsm7` converts it all into basic ASCII, making the message universally compatible.

## How It Works

The bottom line is you get a single, guaranteed GSM-7 compatible text payload that won't break carrier billing rules.

1. Pass your draft message (the raw, unsanitized text) into the `sanitize_gsm7` tool.
2. The engine strips all emojis and transliterates complex Unicode characters to pure 7-bit ASCII.
3. You receive the clean, compliant string, ready for immediate sending via SMS APIs.

## Frequently Asked Questions

**Does SMS GSM-7 Sanitizer strip out useful emojis?**
Yes, it strips them. That's how it works. Emojis are non-ASCII characters and break the standard encoding limit; stripping them is necessary to guarantee single-segment message delivery.

**How do I use sanitize_gsm7 in a Python agent?**
You simply pass your draft text as an argument to `sanitize_gsm7`. The function returns the cleaned, compliant string that you can then safely send via any messaging API.

**What if my message is already clean? Does sanitize_gsm7 break it?**
No. If your text is already pure 7-bit ASCII and contains no complex Unicode, the tool passes the content through unchanged. It only modifies what needs fixing.

**Is this better than using a basic regex clean-up?**
Yes. A simple regex won't handle transliteration (like converting `é` to `e`) or complex Unicode symbols. This tool handles the full spectrum of encoding compliance.

**When I run sanitize_gsm7 with accented characters like é or ñ, what happens to them?**
It converts them using Unicode Transliteration. The tool doesn't just strip accents; it safely translates complex characters into their closest 7-bit ASCII equivalents. This ensures your message retains its intended meaning while guaranteeing strict GSM-7 compliance.

**If I need to send thousands of messages, does sanitize_gsm7 handle rate limiting or batch processing?**
The server is built for high throughput and continuous message flow. While you should always check the Vinkius Marketplace documentation for current usage quotas, it's designed to process large volumes efficiently, preventing character segment failures even in big batches.

**When I pass sensitive text to sanitize_gsm7, how is my input data handled?**
Data processing occurs via the MCP standard. Your AI client sends the text directly for sanitization. The provider and Vinkius adhere to strict protocols; the service only reads and modifies the string as needed for GSM-7 compliance.

**What happens if my SMS draft contains highly corrupted or unknown Unicode characters?**
The tool is designed to be robust. It uses aggressive regular expressions that detect unsupported sequences and removes them entirely instead of failing the process. This guarantees you always receive a clean, sendable string.

**Why does an emoji cost more?**
Emojis force the telecom provider to use UCS-2 encoding, which cuts the character limit from 160 down to 70 per message segment.

**What happens to accented letters?**
Using the `unidecode` algorithm, letters like 'ç' become 'c' and 'ã' become 'a', ensuring strict ASCII compatibility.

**Is this needed for WhatsApp?**
No, WhatsApp handles Unicode perfectly. This is specifically for legacy SMS gateways like Twilio and SNS.