CERN Open Data MCP for AI. Access 66,000+ Particle Physics Datasets Instantly
Works with every AI agent you already use
…and any MCP-compatible client








Connect to your AI in seconds.
CERN Open Data connects your agent directly to over 66,000 particle physics datasets and research documents from the Large Hadron Collider.
You can query by experiment type, collision energy range, or specific theoretical concept like Dark Matter; it retrieves full metadata, file listings, and technical glossaries.
What your AI can do
Check cern opendata status
Verifies that the connection to the CERN Open Data Portal is active and operational.
Get glossary
Searches the official particle physics glossary for definitions of technical terms, components, or phenomena.
Get portal statistics
Retrieves high-level statistics on the entire data portal's scope, including record counts and available file formats.
Locate specific records using filters like collision energy (e.g., 13 TeV) or particle collision type (e+e-).
Fetch complete details for any dataset, including authors' ORCID IDs and the DOI.
Get a full count and list of major CERN collaborations like CMS, ATLAS, and ALICE.
Filter the data pool by broad research topics, such as Exotica or B physics.
Access a specialized glossary to define terms like pseudorapidity or luminosity for reports or presentations.
Ask an AI about this
Waiting for input…
CERN Open Data with 16 Tools
These tools allow you to query the full spectrum of particle physics data, from high-level statistics to individual file URIs.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using CERN Open Data on VinkiusCheck Cern Opendata Status
Verifies that the connection to the CERN Open Data Portal is active and operational.
Get Glossary
Searches the official particle physics glossary for definitions of technical terms...
Get Portal Statistics
Retrieves high-level statistics on the entire data portal's scope, including record...
Get Record By Doi
Finds the corresponding open data record when you provide a digital object...
Get Record
Fetches comprehensive metadata for a specific dataset ID, detailing authors...
List Categories
Lists all available physics research categories and their associated dataset counts.
List Experiments
Provides an inventory of active CERN collaborations, like CMS or ATLAS, along with the number of datasets each has published.
List Record Files
Lists every file associated with a specific dataset record, providing size and...
Search By Category
Searches the entire repository using physics research categories to narrow down the...
Search By Collision Energy
Filters datasets based on the specific collision energy used during the experiment...
Search By Collision Type
Narrows results by the particle interaction type, such as proton-proton (pp) or...
Search By Experiment
Focuses searches exclusively on data generated by one specific collaboration, like ALICE.
Search Datasets
Performs a broad search across all available fields using keywords plus multiple filters for maximum precision.
Search Documentation
Locates user guides, policies, and technical documentation related to the data or...
Search Software
Finds analysis frameworks, reconstruction tools, and specialized code used in...
Search Supplementaries
Retrieves technical context documents essential for reproducing published scientific...
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with CERN Open Data, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,100+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by CERN Open Data. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 16 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
The tedious part is compiling a bibliography that actually works.
Currently, if you need to reference a dataset for a paper—say, one related to Dark Matter searches—you must navigate through multiple academic portals. You find the abstract, then cross-reference the DOI on another page, and finally go to a third site just to see if the raw file links are listed. It’s a painful cycle of copy/pasting IDs across disparate web pages.
With this MCP, you ask your agent for the record using its identifier or the DOI. The system immediately pulls the complete metadata—the abstract, the authors' identifiers, and even a list of all associated files via `list_record_files`—all in one structured response. You get the data structure instantly.
The record details are now just one prompt away.
You no longer have to manually track down the specific file formats or check if a dataset is even associated with an experiment. Instead, you ask for the full metadata using `get_record`, and it returns everything: publication date, required collision parameters, and the file distribution summary.
The difference isn't just convenience; it’s rigor. You get verified data structure every time. It lets you focus on the physics, not the web interface.
What your AI can actually do with this
Need access to high-energy physics data? This MCP gives your agent direct read access to the CERN Open Data Portal, a massive repository of scientific research. Forget navigating complex web forms just to check an event count or find a specific analysis framework. You query for 'Higgs boson' or 'ATLAS experiment,' and you get metadata right back.
It’s designed for those who need raw data details—full abstracts, author ORCID identifiers, file URIs—without the clicks. Vinkius hosts this connection, making it available to any MCP-compatible client. Your agent instantly becomes a particle physics research assistant, giving you immediate access to datasets and documentation spanning decades of collision history.
019dea5e-810d-724a-a1b6-a359ceb7092c Here's how it actually works
The bottom line is that your agent treats the entire CERN Open Data Portal as an immediate, queryable knowledge base.
Subscribe to this MCP. Since the CERN portal is public, no API key is required.
Instruct your AI client to perform a query; specify if you need datasets filtered by collision type or records resolved via a DOI.
The agent returns structured data containing metadata, file links, and abstract summaries for review.
Who is this actually for?
Anyone who needs to move beyond academic abstracts and actually work with raw scientific data. This means particle physicists, quantitative data scientists, or even science communicators needing hard numbers for accurate reporting.
Needs to locate specific datasets from a collaboration (like CMS) and pull the necessary metadata—including reconstruction configurations—to reproduce published results.
Requires labeled physics data for training models, using functions like 'Search by category' to find specific dataset subsets or file listings to understand data size.
Must define niche technical terms and pull verifiable facts (like Higgs boson discovery parameters) directly from the glossary and record metadata.
What Changes When You Connect
Precision filtering saves time. Instead of browsing general results, you can narrow the search immediately by collision energy using search_by_collision_energy or particle type with search_by_collision_type.
Reproducibility is built in. Need to understand how a result was achieved? Use get_record for full metadata or run list_record_files to see the exact files available for analysis.
No jargon left unexplained. The dedicated get_glossary tool lets you define obscure physics terms instantly, which is critical when writing technical reports.
The scope is visible upfront. Before deep diving, use list_experiments to understand the sheer volume and variety of data contributed by major collaborations like CMS (52k datasets).
Full traceability means confidence. If you have a publication DOI, run get_record_by_doi. It resolves that reference directly into an open dataset record, skipping manual searches.
Beyond just numbers: Use search_supplementaries to find the technical configuration details and guides necessary to actually replicate published research.
See it in action
Tracking historical data gaps
The user knows they need to compare LEP era results with modern LHC runs. They first use list_experiments to confirm DELPHI and CMS exist, then combine search_by_collision_type (e+e- for DELPHI; pp for CMS) with get_portal_statistics to gauge the historical scope of available data.
Recreating a complex analysis
A researcher finds an abstract but needs the underlying files. They use get_record_by_doi first, then run list_record_files to get file URIs and checksums, finally checking search_supplementaries for the specific analysis configuration needed.
Understanding a niche term
The user encounters 'pseudorapidity' in an article. They immediately use the get_glossary tool to get a precise definition, ensuring their report is technically accurate before proceeding with dataset queries.
Finding analysis code for a specific topic
A student wants to build a model for Dark Matter. They use search_by_category and filter by 'Exotica,' then run search_software to find the appropriate reconstruction frameworks before they even touch the raw data.
The honest tradeoffs
Treating it like a general search engine
The user simply types 'Higgs boson' into a generic search field, getting thousands of unrelated documents and datasets mixed together.
Start by using search_datasets with the keyword, but always combine that with filters. For instance, filter by both 'physics category: Higgs Physics' AND 'collision energy: 13 TeV'. This provides actionable results.
Assuming a DOI is enough
The user has an old publication reference and assumes the dataset record exists simply because they have the DOI, but doesn't know if it's linked.
First, use get_record_by_doi to check for direct linkage. If that fails, broaden the search using search_datasets, combining the publication year and keywords from the reference.
Needing files without knowing the record
The user finds an abstract but doesn't know which specific dataset ID (the 'record') it belongs to, so they can't get file links.
Use get_record after finding a promising candidate via search_datasets. This pulls the record metadata and provides the necessary IDs needed before calling list_record_files.
When It Fits, When It Doesn't
Use this MCP if you need to prove where scientific data comes from or how it was analyzed. Don't use it if your goal is merely general information retrieval, like 'What is particle physics?' — the get_glossary tool handles that better. If you know the exact experiment and energy range, run search_by_experiment combined with search_by_collision_energy. If you only have a keyword (e.g., 'Heavy Fermions'), start with search_datasets using text queries combined with filters like list_categories to define the scope first. Never skip checking the get_portal_statistics endpoint; it gives you immediate context on the sheer scale of the data available.
Questions you might have
How do I search for a specific experiment like ALICE using search_datasets? +
You combine search_datasets with the 'experiment' filter. This lets you scope your full-text query specifically to data from that collaboration, giving you highly targeted results.
I found a publication DOI; how do I get the data record using get_record_by_doi? +
You pass the DOI directly to get_record_by_doi. This tool resolves the reference ID and returns the dataset's title, type, and direct link if one exists.
What is the best way to find all available physics research topics? +
Run list_categories first. It provides a master list of every major topic, like Exotica or B physics, along with an immediate count of datasets for each.
Can I check if the data portal connection is working before querying? +
Yes, run check_cern_opendata_status. This simple tool verifies the API connectivity and overall status of the entire CERN Open Data system.
I want to know what specific files are inside a record; how do I use list_record_files? +
It returns the filename, size in bytes, checksum, and direct data URI for every file linked to that dataset. This tool is essential because it lets you verify exactly what you'll download before pulling large datasets into your analysis.
How do I find out the overall scope of all available physics data using get_portal_statistics? +
It provides comprehensive statistics across every facet: record types, years, keywords, and event count distributions. This is the best way to gauge the total volume and composition of the entire CERN dataset repository.
I need instructions on how to use a specific dataset or understand detector setups; should I use search_documentation? +
Yes, it searches for guides, policies, and documentation. You'll find titles and abstracts that point you toward usage instructions, detector configurations, or data processing workflows needed for reproduction.
I know the specific collision energy I need; how does search_by_collision_energy help me scope my results? +
It filters datasets based on established collision energies (like 13TeV or 7TeV). This lets you quickly narrow down millions of records to only those matching your precise experimental conditions.
Do I need an API key to use this server? +
No. The CERN Open Data Portal API is completely public and requires no authentication. Simply subscribe to this server and enter any placeholder value in the API key field to start querying particle physics datasets immediately.
What kind of data can I access from CERN? +
You can access over 66,000 datasets from major LHC experiments (CMS, ATLAS, ALICE, LHCb) and legacy experiments (DELPHI, OPERA). This includes real collision data, Monte Carlo simulations, derived datasets, analysis software, physics glossary entries, and detailed documentation. Data covers Higgs boson searches, Dark Matter studies, exotic particle searches, heavy-ion physics, and more.
Can I use CERN data for machine learning projects? +
Absolutely. CERN provides labeled datasets specifically designed for ML applications, including particle identification, jet classification, event reconstruction, and anomaly detection. Use the search tools with queries like 'machine learning' or filter by file type 'csv' or 'nanoaodsim' to find ML-ready formats. The CMS experiment alone has published thousands of simulated datasets with known physics labels.
We've already built the connector for CERN Open Data. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 16 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.