Compatible with every major AI agent and IDE
What is the CERN Open Data MCP Server?
Connect to the CERN Open Data Portal and access the world's largest repository of open particle physics data — over 66,000 datasets from the Large Hadron Collider and LEP experiments.
What you can do
- Dataset Discovery — Search across 66,000+ records with powerful filters for experiment (CMS, ATLAS, ALICE, LHCb, DELPHI, OPERA), collision type (pp, e+e−, Pb-Pb), collision energy (7–13.6 TeV), and physics category
- Physics Categories — Browse datasets by research topic including Higgs boson, Exotica (Dark Matter, Gravitons, Extra Dimensions, Leptoquarks), B physics, heavy-ion collisions, and more
- Record Intelligence — Retrieve complete metadata for any record: abstracts, authors with ORCID, DOI, event counts, file listings with ROOT/EOS URIs, and processing configurations
- Portal Analytics — Get comprehensive statistics across all facets: experiments, collision types, energies, file formats, years, and event count distributions
- Physics Glossary — Search 1,000+ glossary entries for definitions of particle physics terms, detector components, and analysis techniques
- Software & Documentation — Find analysis frameworks, reconstruction software, guides, and supplementary materials needed to reproduce published results
How it works
- Subscribe to this server
- No API key required — the CERN Open Data Portal is a fully public service
- Start querying particle physics data from Claude, Cursor, or any MCP-compatible client
Your AI agent becomes a particle physics research assistant with direct access to LHC collision data. All data is sourced from the official CERN Open Data Portal powered by InvenioRDM.
Who is this for?
- Particle Physicists — discover and access collision datasets, reconstruction configurations, and analysis software without navigating complex web interfaces
- Data Scientists & ML Researchers — find labeled physics datasets for machine learning applications in particle identification, anomaly detection, and event classification
- Educators & Students — access curated educational datasets and physics glossary entries for teaching and learning particle physics
- Science Communicators — retrieve real data from Higgs boson discoveries, Dark Matter searches, and other landmark physics results for accurate reporting
Built-in capabilities (16)
Use this to verify the integration is working correctly before performing data queries. The API uses the InvenioRDM REST framework. Verify CERN Open Data API connectivity and portal status
Returns term names, definitions, and associated experiments. Covers fundamental particles, detector components, analysis techniques, and physics phenomena. Use this to explain technical physics terms like "luminosity", "transverse momentum", "pseudorapidity", "b-tagging", or "muon spectrometer". Invaluable for science communication and educational contexts. Search the CERN particle physics glossary for term definitions
), record types (Dataset, Documentation, Software, Glossary, Supplementaries), data-taking years, keywords, availability status, and event count distributions. This is the single most informative endpoint for understanding the scope and composition of available CERN data. Get comprehensive CERN Open Data portal statistics and facets
Returns the full title, abstract, experiment, authors with ORCID identifiers, collision parameters, publication dates, DOI, file distribution summary (number of files, events, size), usage instructions, and a direct link. Use this after finding a record via search to obtain complete details. Example: recid "1" returns the CMS BTau primary dataset. Get detailed metadata for a specific CERN Open Data record
Returns the resolved record ID, title, experiment, type, and direct link if found. Useful when you have a DOI from a publication or reference and need to find the corresponding open dataset. DOIs follow the format "10.7483/OPENDATA.CMS.XXX". Returns a "not found" result if the DOI does not match any record. Resolve a DOI to a CERN Open Data record
Returns category names and dataset counts. Categories span the full range of particle physics research: Higgs boson searches, exotic particles (Dark Matter, Extra Dimensions, Gravitons), B physics, heavy-ion collisions, and more. Subcategories within Exotica and Higgs Physics provide finer granularity. List all physics categories and subcategories with dataset counts
Currently includes CMS (the largest contributor with ~52,000 datasets), DELPHI (LEP era), ATLAS, ALICE, LHCb, OPERA (neutrino physics), TOTEM, JADE, and PHENIX. Use this as a starting point to understand what data is available before drilling into specific experiments. List all available CERN experiments and their dataset counts
Returns filename, size in bytes, checksum, ROOT/EOS URI for direct data access, and file format. Useful for understanding what data is available in a dataset before downloading. Large datasets may contain hundreds of ROOT files. Example: record 1 contains AOD format files from CMS BTau data. List all data files in a CERN Open Data record
Major categories include: Exotica (~13,000 datasets, including Dark Matter, Extra Dimensions, Gravitons, Heavy Fermions, Leptoquarks), Higgs Physics (~10,400, Standard Model and Beyond Standard Model), Higgs (~10,700), Beyond 2 Generations (~1,600), 2 Fermion (~1,200), B physics and Quarkonia (~500), 4 Fermion (~380), Heavy-Ion Physics (~220). Some categories have subcategories — use the subcategory parameter for more precise filtering. Search datasets filtered by physics category
Available energies include: 13TeV (~50,500 datasets, LHC Run 2), 181-210 GeV (~11,700, LEP2), 7TeV (~1,100, LHC Run 1), 8TeV (~900, LHC Run 1), 5.02TeV (~310, heavy-ion), 2.76TeV (~120, heavy-ion), 130-140 GeV (~120, LEP), 13.6TeV (LHC Run 3). The vast majority of data comes from 13 TeV proton-proton collisions at the LHC. Search datasets filtered by collision energy
Available collision types: pp (proton-proton, ~52,000 datasets), e+e- (electron-positron, ~12,700), Pb-Pb (lead-lead, ~140), pPb (proton-lead, ~140). Proton-proton collisions from the LHC dominate the dataset. Electron-positron data comes primarily from the LEP era (DELPHI). Use this to focus on a specific collision topology. Search datasets filtered by particle collision type
Available experiments include CMS (~52,000 datasets), DELPHI (~12,700), ATLAS (~160), ALICE (~150), LHCb (~108), OPERA (~900), and TOTEM. Combine with a text query for targeted searches within an experiment. This is the fastest way to scope results to a single collaboration. Search datasets filtered by a specific LHC experiment
Supports full-text queries combined with filters for experiment, collision type, collision energy, physics category, file type, and year. Returns paginated results with metadata including record ID, title, abstract, event counts, file sizes, and direct links. Use this as the primary discovery tool for finding specific physics data. Example queries: "Higgs boson", "dark matter", "top quark pair production". Search CERN Open Data datasets with full-text query and filters
Returns document titles, abstracts, subtypes (Guide, Policy, About, Activities, Authors, Report, Help, Stripping), and direct links. Use this to find instructions on how to use specific datasets, understand detector configurations, or learn about data processing workflows. Search CERN guides, policies, and documentation
Returns software title, description, associated experiment, and subtypes (Analysis, Framework, Tool, Validation, Workflow). Use this to find reconstruction software, analysis frameworks like CMSSW, or specific analysis code associated with published physics results. Search CERN analysis software, frameworks, and tools
These ~5,900 records provide the technical context needed to reproduce physics analyses. Filter by subtype to find specific configuration types. Essential for researchers reproducing or extending published analyses. Search CERN supplementary materials and configurations
Why Pydantic AI?
Pydantic AI validates every CERN Open Data tool response against typed schemas, catching data inconsistencies at build time. Connect 16 tools through Vinkius and switch between OpenAI, Anthropic, or Gemini without changing your integration code. full type safety, structured output guarantees, and dependency injection for testable agents.
- —
Full type safety: every MCP tool response is validated against Pydantic models, catching data inconsistencies before they reach your application
- —
Model-agnostic architecture. switch between OpenAI, Anthropic, or Gemini without changing your CERN Open Data integration code
- —
Structured output guarantee: Pydantic AI ensures tool results conform to defined schemas, eliminating runtime type errors
- —
Dependency injection system cleanly separates your CERN Open Data connection logic from agent behavior for testable, maintainable code
CERN Open Data in Pydantic AI
CERN Open Data and 4,000+ other MCP servers. One platform. One governance layer.
Teams that connect CERN Open Data to Pydantic AI through Vinkius don't need to source, host, or maintain individual MCP servers. Every tool call runs inside a hardened runtime with credential isolation, DLP, and a signed audit chain.
Raw MCP | Vinkius | |
|---|---|---|
| Server catalog | Find and host yourself | 4,000+ managed |
| Infrastructure | Self-hosted | Sandboxed V8 isolates |
| Credential handling | Plaintext in config | Vault + runtime injection |
| Data loss prevention | None | Configurable DLP policies |
| Kill switch | None | Global instant shutdown |
| Financial circuit breakers | None | Per-server limits + alerts |
| Audit trail | None | Ed25519 signed logs |
| SIEM log streaming | None | Splunk, Datadog, Webhook |
| Honeytokens | None | Canary alerts on leak |
| Custom domains | Not applicable | DNS challenge verified |
| GDPR compliance | Manual effort | Automated purge + export |
Why teams choose Vinkius for CERN Open Data in Pydantic AI
The CERN Open Data MCP Server runs on Vinkius-managed infrastructure inside AWS — a purpose-built runtime with per-request V8 isolates, Ed25519 signed audit chains, and sub-40ms cold starts. All 16 tools execute in hardened sandboxes optimized for native MCP execution.
Your AI agents in Pydantic AI only access the data you authorize, with DLP that blocks sensitive information from ever reaching the model, kill switch for instant shutdown, and up to 60% token savings. Enterprise-grade infrastructure, zero maintenance.

* Every MCP server runs on Vinkius-managed infrastructure inside AWS - a purpose-built runtime with per-request V8 isolates, Ed25519 signed audit chains, and sub-40ms cold starts optimized for native MCP execution. See our infrastructure
How Vinkius secures
CERN Open Data for Pydantic AI
Every tool call from Pydantic AI to the CERN Open Data MCP Server is protected by DLP redaction, cryptographic audit chains, V8 sandbox isolation, kill switch, and financial circuit breakers.
Frequently asked questions
Do I need an API key to use this server?
No. The CERN Open Data Portal API is completely public and requires no authentication. Simply subscribe to this server and enter any placeholder value in the API key field to start querying particle physics datasets immediately.
What kind of data can I access from CERN?
You can access over 66,000 datasets from major LHC experiments (CMS, ATLAS, ALICE, LHCb) and legacy experiments (DELPHI, OPERA). This includes real collision data, Monte Carlo simulations, derived datasets, analysis software, physics glossary entries, and detailed documentation. Data covers Higgs boson searches, Dark Matter studies, exotic particle searches, heavy-ion physics, and more.
Can I use CERN data for machine learning projects?
Absolutely. CERN provides labeled datasets specifically designed for ML applications, including particle identification, jet classification, event reconstruction, and anomaly detection. Use the search tools with queries like 'machine learning' or filter by file type 'csv' or 'nanoaodsim' to find ML-ready formats. The CMS experiment alone has published thousands of simulated datasets with known physics labels.
How does Pydantic AI discover MCP tools?
Create an MCPServerHTTP instance with the server URL. Pydantic AI connects, discovers all tools, and generates typed Python interfaces automatically.
Does Pydantic AI validate MCP tool responses?
Yes. When you define result types as Pydantic models, every tool response is validated against the schema. Invalid data raises a clear error instead of silently corrupting your pipeline.
Can I switch LLM providers without changing MCP code?
Absolutely. Pydantic AI abstracts the model layer. your CERN Open Data MCP integration works identically with OpenAI, Anthropic, Google, or any supported provider.
MCPServerHTTP not found
Update: pip install --upgrade pydantic-ai
Explore More MCP Servers
View all →
Learn Amp
10 toolsCombine learning, engagement, and performance in one people development platform that helps employees grow and organizations thrive.

Mercury
10 toolsEquip your AI agent with direct access to Mercury — check account balances, review transactions, and manage recipients without opening the banking dashboard.

Elevio
10 toolsEquip your AI agent to manage knowledge base articles, track categories, and monitor assistant modules via the Elevio API.

UniOne
12 toolsSend transactional and bulk emails with high deliverability, detailed analytics, and infrastructure that handles millions of messages.
