Welcome
to the Vinkius
Open Data Initiative.
We are opening access to the Vinkius Model Context Protocol (MCP) catalog. Automatically updated documentation for 5,200+ unique MCP servers.
Why this matters
This highly structured corpus is designed for AI researchers, data scientists, and language model developers. No one else has compiled, structured, and published the MCP ecosystem as a proper open dataset. Vinkius is the first to treat MCP metadata as what it is: critical infrastructure data that belongs in the open.
5,200+
MCP Servers
Fully documented — the largest structured MCP dataset ever published
4
Global Platforms
GitHub · Hugging Face · Kaggle · ModelScope — simultaneously
12h
Auto-Sync
Refreshed every 12 hours by autonomous pipelines
CC0
Public Domain
Use, modify, redistribute for any purpose — zero restrictions
Global distribution
One dataset.
Four continents.
We didn't just publish a CSV. We mirrored the entire registry — with native integrations — across the four platforms where the world's researchers, data scientists, and AI engineers actually work.
Run notebooks directly against the data. Kaggle kernels, versioned snapshots, community discussions, and instant ML pipeline integration — the dataset is ready for analysis the moment you open it.
Used by data scientists researching MCP adoption patterns and building recommendation systems for AI tool selection.
Streaming-compatible dataset on the Hub. Load it with a single line of Python using the datasets library. Preview it in the dataset viewer. Integrate directly with transformers and fine-tuning pipelines.
Ideal for training LLMs to understand and recommend MCP servers — the foundation for agentic tool discovery.
The source of truth. Clone, fork, or contribute. Full documentation, schema definitions, automation scripts, and CI/CD pipelines that keep downstream mirrors in sync.
Open-source infrastructure — audit the pipeline, propose additions, or build your own mirror. The entire system is transparent.
Full-parity mirror for the Chinese developer ecosystem. Native integration with Alibaba Cloud, PAI, and the ModelScope model hub — bringing the MCP catalog to 10 million+ developers in China.
The only structured MCP dataset accessible behind the Great Firewall — bridging two ecosystems.
Schema reference
13 columns.
One CSV. Ready to query.
Each row captures the full profile of an MCP server — from metadata and tool inventory to quality grades assigned by the Vinkius Debugger. UTF-8 encoded, ~3 MB, updated every 12 hours.
slug
Unique server identifier
title
Server display name
category
Primary classification (e.g., Development, Data Analytics, Communication)
tags
Comma-separated descriptive keywords
short_description
One-line capability summary
description
Full capability and integration details
tools_count
Number of tools exposed by this server
tool_names
Comma-separated list of tool names
prompt_examples
Real-world usage prompts designed for this server
debugger_grade
Automated quality grade (A+ through F)
debugger_score
Numeric reliability score assigned by the Vinkius Debugger
created_at
Listing creation timestamp
url
Direct link to the server page (vinkius.com/mcp/{slug})
Research & training applications
Built for AI research,
product development, and market intelligence.
When the entire MCP ecosystem is structured, searchable, and open — researchers, builders, and autonomous agents can work with real data instead of guesswork.
LLM Fine-Tuning
Real-world schemas for training models in advanced function calling and tool utilization. Use tool_names and prompt_examples as a structured corpus for teaching LLMs how to select and invoke MCP servers.
Ecosystem Intelligence
Map the growth trajectory of MCP server categories over time. Identify which tool types — database, API, file system, communication — are expanding fastest. Track new server registrations as a proxy for ecosystem adoption.
RAG & Fine-Tuning
Feed the registry into retrieval-augmented generation systems. Train an LLM that understands available servers, their tools, and how to recommend the right integration for any task.
Quality & Reliability Analysis
Explore the distribution of debugger_grade and debugger_score to identify patterns in high-quality vs. poorly maintained servers. Correlate tools_count with reliability scores to measure complexity vs. stability trade-offs.
Machine Learning
Multi-label classification — predict category or tags from description using NLP models. Recommendation systems — build tool recommenders based on tag similarity or embedding distance. Anomaly detection — flag servers with abnormal score patterns.
Agentic Frameworks
Operational metadata for studying multi-agent orchestration and system bridging. Analyze how external software platforms are mapped to natural language interfaces across 5,200+ real-world implementations.
Open invitation
We opened the catalog.
Now it's yours.
CC0 1.0 — Public Domain. Use, modify, and distribute for any purpose — commercial or non-commercial — without restrictions. Citation is appreciated, not required.
Citation
@dataset{vinkius_mcp_registry,
title = {Vinkius MCP Registry — Global Model Context Protocol Dataset},
author = {Vinkius},
year = {2026},
url = {https://www.kaggle.com/datasets/renato2marinho/vinkius-mcp-registry},
note = {Updated every 12 hours. 5,200+ MCP servers indexed.}
}