# Public Suffix Extractor MCP

> Public Suffix Extractor uses the official Mozilla Public Suffix List to parse any URL accurately. It reliably separates hostnames into their true root domain, TLD, and subdomain. Stop guessing if a domain is `.uk` or `.co.uk`. This tool handles complex structures like `.com.br`, AWS cloud suffixes, and thousands of other global domains with 100% accuracy.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** url-parsing, tld-extraction, domain-analysis, data-normalization, browser-standards

## Description

Listen, if your AI agent is dealing with web analytics or any kind of data normalization that involves URLs, you know how messy it gets. You can't just split a hostname by dots and expect to get clean data. That simple trick fails every time you run into country code TLDs (ccTLDs) like `.co.uk`, because those aren't single segments; they’re two pieces that need treating as one unit.

That's where `extract_domain` comes in. This tool uses the official Mozilla Public Suffix List logic, which is what major browsers use. It takes any full hostname you throw at it and figures out exactly how it's structured. You don't have to guess if a domain belongs under `.uk`, or if it’s something more complex like `.com.br`. This thing handles that with one hundred percent accuracy.

The function breaks down the entire process into three distinct, reliable parts: deconstructing hostnames, identifying complicated TLD structures, and correctly processing cloud suffixes.

When you run `extract_domain` on a hostname, it doesn't just give you a mess of strings. It reliably separates the components using PSL logic. First, it breaks down any full hostname into its constituent parts: the true domain root, the TLD, and the subdomain. You get those three pieces separated cleanly.

Think about multi-part country domains. The tool handles them flawlessly. If you feed it a URL that uses `.com.br` or `.co.uk`, you don't run into parsing errors in your data pipeline. It accurately determines these complex, multi-segment TLDs, which is huge when you’re dealing with global datasets.

It also handles the cloud provider suffixes correctly. If a hostname uses an AWS suffix or some other cloud platform tag, this tool identifies that specific suffix as a proper PSL entry, instead of assuming it's part of your main domain root—and that difference is everything for clean data. 

Using `extract_domain` lets your agent handle every structure you can imagine, from the simplest domains to complicated regional or cloud setups, no matter how deep the naming convention goes. It always returns the correct segmentation.

If you pass it a full hostname, you'll get three specific outputs: 

*   **The Domain:** This is the true registrable root domain (for example, if the input was `app.vinkius.co.uk`, this output gives you `vinkius.co.uk`).
*   **The TLD:** These are the actual Top-Level Domain segments. It pinpoints the entire suffix structure accurately.
*   **The Subdomain:** This captures any host information that came before the main domain root (like getting `app` from `app.vinkius.co.uk`).

This mechanism ensures your agent doesn't fail when it hits tricky corner cases in real-world data. You just throw a URL at it, and you get structured, clean components back every single time. It’s the definitive way to parse domains because it uses industry-standard rules, not guesswork.

## Tools

### extract_domain
Takes any hostname and returns the structured Domain, TLD, and Subdomain using the official Mozilla Public Suffix List logic.

## Prompt Examples

**Prompt:** 
```
What is the root domain of app.vinkius.co.uk?
```

**Response:** 
```
Domain: vinkius.co.uk | TLD: co.uk | Subdomain: app
```

**Prompt:** 
```
Extract the registrable domain from https://shop.example.com.br/products?id=1
```

**Response:** 
```
Domain: example.com.br | TLD: com.br | Subdomain: shop
```

**Prompt:** 
```
Is mybucket.s3.amazonaws.com a registrable domain or a cloud subdomain?
```

**Response:** 
```
Domain: mybucket.s3.amazonaws.com | TLD: s3.amazonaws.com (PSL-listed cloud suffix)
```

## Capabilities

### Deconstruct Hostnames
The tool breaks down any full hostname into its constituent parts: the true domain root, the TLD, and the subdomain.

### Identify Complex TLDs
It accurately determines multi-part country code domains (like .com.br or .co.uk), solving common parsing errors in data pipelines.

### Process Cloud Suffixes
The tool correctly handles cloud provider suffixes, identifying them as specific PSL entries rather than assuming they are part of the main domain root.

## Use Cases

### Normalizing Web Crawl Data
A web analytics developer gets a list of URLs scraped from various regions (e.g., `shop.example.com.br`, `site.test.co.uk`). They ask their agent to run `extract_domain` on the batch. The tool consistently returns the correct, registrable domains (`example.com.br`, `vinkius.co.uk`), allowing the developer's pipeline to group traffic correctly without messy manual cleaning.

### Validating Cloud Endpoints
A backend engineer is building a system that accepts storage bucket endpoints (`mybucket.s3.amazonaws.com`). They need to know if the full string is valid or what its true components are. The agent uses `extract_domain` to classify and correctly report the TLD as an AWS cloud suffix, preventing routing errors.

### Building a Domain Hierarchy
A data scientist needs to categorize websites based on their root domain (e.g., distinguishing between `google.com` and its subdomain `mail.google.com`). They feed the agent mixed URLs, and `extract_domain` reliably separates the true root (`google.com`) from the current operational segment (`mail`), enabling proper data classification.

### Parsing Redirect Chains
A system tracks user redirects that often involve complex paths like `staging.sub-company.globaldomain.net/page`. The agent uses `extract_domain` to extract the clean, registrable domain (`globaldomain.net`) and its TLD, ignoring temporary or staging subdomains for accurate reporting.

## Benefits

- Stop guessing the TLD. The `extract_domain` tool uses the official Mozilla Public Suffix List, ensuring you never confuse `.uk` with `.co.uk` again. This is critical for accurate data normalization across all regions.
- Handle complex domains effortlessly. Whether it's a cloud service like an AWS bucket or a regional site like `example.com.br`, the tool correctly separates the root domain and TLD components.
- It works on any hostname. You don't need to write custom regex for every possible country code or sub-domain structure. Just pass the URL, and it gets parsed into clean parts.
- Accuracy over assumptions. Instead of relying on simple string splits that fail instantly with multi-part TLDs, this server processes hostnames using proven browser standards.
- Predictable data output. Your agent always receives a consistent object structure: Domain, TLD, and Subdomain. This makes writing downstream processing code much simpler.

## How It Works

The bottom line is: you get reliable, standardized parsing of any URL, regardless of how complex its domain structure is.

1. Pass a full hostname or URL string to the `extract_domain` tool.
2. The server queries its internal model, which uses the Mozilla Public Suffix List logic, to determine domain boundaries and hierarchical structure.
3. It returns a structured object containing the Domain, TLD, and Subdomain components.

## Frequently Asked Questions

**How does Public Suffix Extractor handle .com.br?**
It correctly identifies `example.com.br` as the full domain root. It separates the TLD accurately, recognizing that `.com.br` is a single registered suffix, not just `.br`.

**Is Public Suffix Extractor better than regex for domains?**
Yes. Regex struggles with the sheer complexity and variability of global TLDs. This server uses the actual Mozilla Public Suffix List logic, which is a much more robust standard.

**What if my domain has an AWS cloud suffix?**
The tool handles this. It will correctly identify components like `mybucket.s3.amazonaws.com` and classify the TLD portion according to PSL standards, preventing misclassification.

**Does Public Suffix Extractor only work on full URLs?**
No. You don't need a full URL. You can pass just the hostname (e.g., `app.vinkius.co.uk`), and it will still return the structured domain components.

**How does the `extract_domain` function handle bare hostnames instead of full URLs?**
It processes bare hostnames just fine. You don't need a protocol or path to use it.
The tool focuses only on the hostname provided, correctly identifying the root domain and TLD without needing the surrounding URL structure.

**Are there rate limits when calling `extract_domain`?**
Vinkius manages typical usage rates. If you exceed the defined limit for a given time period, your AI client will receive an appropriate HTTP 429 error.
You'll need to implement back-off logic in your workflow to respect these limits.

**Does `extract_domain` use the most current Public Suffix List data?**
Yes, it uses the latest version of Mozilla’s PSL.
We integrate updates regularly, ensuring that domain parsing remains accurate even when new TLDs or cloud suffixes emerge.

**Can I process a large list of domains with `extract_domain`?**
The API accepts lists of strings for efficient batch processing. This saves you from making multiple sequential calls.
Just provide an array of hostnames, and the tool will return structured results for each entry.

**Why can't I just split the domain by dots?**
Because TLDs like .co.uk, .com.br, and .org.au have multiple parts. Splitting by dots would incorrectly identify the root domain. The PSL has 9,000+ entries.

**Does it handle cloud provider domains?**
Yes. Domains like *.amazonaws.com, *.azurewebsites.net, and *.cloudfront.net are in the PSL and handled correctly.

**Can I pass a full URL with protocol and path?**
Yes. The engine automatically strips the protocol (http/https), path, and query parameters before parsing.