# Robots.txt Generator MCP

> The Robots.txt Generator creates syntactically perfect instructions for web crawlers, telling search engine bots exactly what parts of your site they can and cannot crawl. It lets you define specific rules based on user-agent types, set crawl delays, and properly list your sitemaps so Google and Bing index your content correctly.

## Overview
- **Category:** automation
- **Price:** Free
- **Tags:** robotstxt, seo, web-crawler, sitemap, automation, web-development

## Description

Controlling how web crawlers interact with your site is crucial for SEO, and this MCP handles that job entirely. You use it to write the robots.txt file, which gives search engine bots like Googlebot specific instructions about navigating your domain. Need to block access to temporary directories while still allowing indexing of product images? This tool lets you set those rules precisely. It manages complex directives like 'Allow', 'Disallow', and 'Crawl-delay' for different bot types. If you run into syntax issues, you can check everything first, ensuring every directive follows the standard protocol before publishing. When you connect this MCP through Vinkius, your agent handles all the complexity; you just tell it what rules you need to enforce.

## Tools

### generate_robots_txt
Generates a complete robots.txt file based on agent rules.

### get_configuration_summary
Provides an overview of your current robots.txt setup and active directives.

### validate_rule_syntax
Checks the syntax of any specified robots.txt paths to prevent formatting errors.

## Prompt Examples

**Prompt:** 
```
Generate a robots.txt file for all bots that disallows the '/private/' directory and includes a sitemap at https://mysite.com/sitemap.xml.
```

**Response:** 
```
User-agent: *
Disallow: /private/

Sitemap: https://mysite.com/sitemap.xml
```

**Prompt:** 
```
Create rules for Googlebot to allow '/images/' but disallow '/temp/', with a crawl delay of 10 seconds.
```

**Response:** 
```
User-agent: Googlebot
Disallow: /temp/
Allow: /images/
Crawl-delay: 10
```

**Prompt:** 
```
Check if my rules for 'BadBot' are valid: agentName: 'BadBot', disallowedPaths: ['/api/v1/'], allowedPaths: []
```

**Response:** 
```
The syntax for the rules provided is valid.
```

## Capabilities

### Generate robots.txt file
Creates a complete, correctly formatted robots.txt file based on the specific crawling instructions you provide.

### Validate rule syntax
Checks your proposed rules and paths to ensure they are syntactically correct before publishing the file to avoid crawler errors.

### Get configuration summary
Provides an audit report on your current robots.txt setup, helping you understand all active directives across your site.

## Use Cases

### A staging site leaks private content
The dev team needs to block all bots from the `/staging/` directory immediately. They ask their agent to generate a robots.txt file that only uses 'Disallow: /staging/' for all user-agents, ensuring the build is live before going public.

### Site structure changes often
After migrating product lines, the SEO team needs confirmation that no critical directories were accidentally blocked. They run `get_configuration_summary` to audit all current rules and confirm the paths are open for search engines.

### Adding a new content type
A marketing campaign launches thousands of articles, requiring specific crawl delays. The manager uses the MCP to generate the correct file structure, including 'Crawl-delay' rules, and publishes it instantly.

### Testing custom bot behavior
The developer needs to make sure a new API endpoint is inaccessible but also wants to check if the path syntax for that block rule is valid. They use `validate_rule_syntax` first, then confirm with generation.

## Benefits

- Stop guessing about indexing rules. Use the `get_configuration_summary` tool to audit your existing setup and confirm every bot directive is working as intended.
- Guarantee compliance with the `generate_robots_txt` function. Just tell it which paths need blocking or allowing, and it spits out a perfect file for deployment.
- Catch syntax errors before they hurt SEO. The `validate_rule_syntax` tool checks your rules instantly, preventing manual mistakes that waste crawling budget.
- Manage multiple directives (Allow, Disallow, Crawl-delay) in one place. You don't need to memorize the entire standard protocol; just describe what you want done.
- Control bot behavior by user-agent. You can write specific rules for Googlebot versus a specialized analytics crawler, optimizing resource usage.

## How It Works

The bottom line is you get a guaranteed compliant file that manages bot access without needing manual knowledge of web crawling standards.

1. First, tell the MCP which bot (user-agent) needs specific rules and what paths it should follow.
2. Second, run the appropriate tool to validate that all syntax is clean or to generate a summary of existing rules.
3. Finally, use the generation function to output the finished robots.txt content for immediate deployment.

## Frequently Asked Questions

**How do I use robots.txt Generator to block a folder?**
To block a folder, you must specify 'Disallow:/folder/path/' when generating the file using `generate_robots_txt`. This prevents any bot from crawling that specific directory.

**Does robots.txt Generator work for all search engines?**
Yes, it handles rules for multiple user-agents (like Googlebot or Bingbot), ensuring your instructions apply broadly across major web crawlers.

**What is the best way to check if my robots.txt file has errors?**
Run the `validate_rule_syntax` tool first. This function checks for protocol and syntax mistakes, giving you confidence before deployment.

**Can I use get_configuration_summary with this MCP?**
Yes, running `get_configuration_summary` audits your current settings. It provides a comprehensive overview of all the active directives currently in place on your site.