CERN Open Data MCP. Query 66k LHC Datasets by Topic, DOI, or Experiment
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
CERN Open Data MCP Server connects your AI client directly to the world's largest particle physics dataset repository. Access 66,000+ datasets from the Large Hadron Collider and LEP experiments by querying specific metadata, searching physical categories (like Dark Matter), or resolving data records using their DOI number.
What your AI agents can do
Check cern opendata status
Verifies the connection to the CERN Open Data API is working correctly before you run any data queries.
Get glossary
Searches the official CERN particle physics glossary for definitions of technical terms like 'luminosity' or 'b-tagging'.
Get portal statistics
Retrieves overall statistics about the entire dataset portal, showing how many records exist by type and year.
The agent returns a list of datasets filtered by scientific category, such as Higgs boson or Exotic particles.
You receive complete metadata—abstracts, authors, file counts, and usage instructions—for any specific data record ID.
The server finds the corresponding open dataset when you provide a publication's Digital Object Identifier (DOI).
You can query the CERN glossary to get definitions for technical terms, which is useful for science communication.
The agent lists all files associated with a record, providing filenames, sizes, and direct ROOT/EOS URIs.
You retrieve high-level stats on the entire dataset repository, broken down by experiment and data type.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
CERN Open Data MCP Server: 16 Tools for Physics Research
Orchestrate data searches and metadata retrieval across the entire CERN dataset repository using these specialized tools.
019dea5echeck cern opendata status
Verifies the connection to the CERN Open Data API is working correctly before you run any data queries.
019dea5eget glossary
Searches the official CERN particle physics glossary for definitions of technical terms like 'luminosity' or 'b-tagging'.
019dea5eget portal statistics
Retrieves overall statistics about the entire dataset portal, showing how many records exist by type and year.
019dea5eget record
Pulls all detailed metadata for a specific data record ID, including abstracts, authors, file counts, and direct links.
019dea5eget record by doi
Finds the associated CERN Open Data record when you only have a publication's DOI number.
019dea5elist categories
Lists all available physics research categories and subcategories, along with dataset counts for each one.
019dea5elist experiments
Lists every major experiment (CMS, ATLAS, ALICE, etc.) and how many datasets are attributed to it.
019dea5elist record files
Returns a list of all physical files within a specific record, including size, checksum, and the URI needed for download.
019dea5esearch by category
Filters datasets based on major physics categories like 'Exotica' or 'Higgs Physics', providing dataset counts.
019dea5esearch by collision energy
Narrows down the search results by specific collision energies, such as 13 TeV or 7 GeV.
019dea5esearch by collision type
Filters datasets based on the type of particle collision, like proton-proton (pp) or electron-positron (e+e-).
019dea5esearch by experiment
Focuses the search results down to a single specific collaboration, such as CMS or ALICE.
019dea5esearch datasets
Performs a full-text query across all datasets, allowing combined filters for experiment, energy, and collision type.
019dea5esearch documentation
Finds technical guides, policies, or help documents related to the dataset structure or analysis workflow.
019dea5esearch software
Locates reconstruction software and specific analysis frameworks used with the data.
019dea5esearch supplementaries
Searches supplementary materials and configuration files needed to reproduce published scientific analyses.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with CERN Open Data, then connect any of our 4,500+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,500+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
CERN Open Data MCP Server connects your AI client straight to the world's biggest particle physics dataset repository. Forget searching through messy web portals; you get direct access to 66,000+ datasets from Large Hadron Collider and LEP experiments. This isn't just a search engine—it's a full data workflow toolset.
First things first: You gotta make sure the connection is good before running anything big. Use check_cern_opendata_status to verify that your AI client can talk to the CERN Open Data API, so you don't waste time on dead ends.
To get oriented in this massive data jungle, start broad. You can run get_portal_statistics to see high-level stats across the whole repository, checking how many records exist by type and year. If you need context for your report, get_glossary pulls up definitions for technical jargon—you know, terms like 'luminosity' or 'b-tagging.' You can also get a map of the available research areas using list_categories, which lists every physics category and subcategory while telling you exactly how many datasets are in each bucket.
To see which major collaborations contributed data, run list_experiments to list every experiment like CMS, ATLAS, or ALICE, along with their dataset counts.
When you're ready to drill down into the actual data, you have a few ways to filter your search results. You can focus on specific physics topics using search_by_category, filtering by things like 'Exotica' or 'Higgs Physics.' If your research hinges on the collision parameters, narrow it down first with search_by_collision_type (like proton-proton 'pp' or electron-positron 'e+e-'), then use search_by_collision_energy to specify a precise energy like 13 TeV or 7 GeV.
You can also limit your search by targeting one specific collaboration using search_by_experiment. For the deepest dive, run search_datasets; this performs a full-text query across all datasets and lets you combine filters for experiment, energy, and collision type simultaneously.
Finding specific records is super easy. If you only have a publication's Digital Object Identifier (DOI), use get_record_by_doi to find the associated open dataset record immediately. Alternatively, if you have a record ID, get_record pulls all the detailed metadata for that dataset—you get abstracts, authors, file counts, and direct links right off the bat.
Once you've got the metadata, you can check exactly what files are in play using list_record_files; this returns a list of every physical file associated with the record, giving you the size, checksum, and the specific URI you need for download.
When you’re writing up your project, you won't just be dealing with data records. You can use search_documentation to find technical guides and policies about the dataset structure or analysis workflow itself. Need code? Run search_software to locate reconstruction software and specific analysis frameworks used alongside this data. If you gotta reproduce a published study, search_supplementaries finds those supplementary materials and configuration files required for accurate reproduction.
And don't forget context. You can get an overall picture of the repository using get_portal_statistics, which gives high-level stats broken down by experiment type and data year. If you need general information about a specific dataset record, even if you don't have a DOI, running get_record with a simple ID provides all the necessary metadata to get started.
How CERN Open Data MCP Works
- 1 First, tell your AI client exactly what you're looking for. For example: 'Find all Dark Matter datasets from CMS.'
- 2 Next, the agent runs a targeted search (like
search_by_categoryorlist_experiments) and returns a list of candidate record IDs. - 3 Finally, use
get_recordwith one of those IDs to pull the full metadata, file structure, and author details you need.
The bottom line is... your AI agent handles the complex filtering logic across 66k datasets so you don't have to navigate a web portal manually.
Who Is CERN Open Data MCP For?
This is for the data scientist or physicist who gets frustrated trying to find specific, labeled physics datasets. You need to cut through months of institutional documentation and click-heavy portals to get directly to the metadata and file URIs.
You use it to quickly check if a required dataset (e.g., specific collision energy or experiment) exists, bypassing complex web forms.
You feed the agent target datasets using search_by_category and then use get_record to pull necessary metadata for labeling ML training data.
You run get_glossary first, then search for records by topic, ensuring you have accurate definitions and verifiable source links for your article.
What Changes When You Connect
- Find exact file locations: Using
list_record_filesgives you the checksum and direct URI for every single data file in a record. No guessing what to download. - Pinpoint your search immediately: Use
search_by_experiment(e.g., CMS) orsearch_by_category(e.g., Dark Matter) instead of running vague full-text searches. - Understand the context: Run
get_glossaryto define terms like 'transverse momentum' before you even start writing your report, ensuring technical accuracy. - Trace data back to publications: If you have a DOI, use
get_record_by_doi. This immediately links published work back to its raw dataset source. - Get the big picture: The
get_portal_statisticstool gives you an immediate statistical map of the whole repository's composition by year or data type.
Real-World Use Cases
Need to reproduce a published result
A postdoc finds a paper mentioning 'CMS BTau primary dataset.' They run get_record_by_doi with the DOI. The agent returns the record ID and links, then they use list_record_files on that ID to get all necessary URIs for the raw data required by the publication.
Need a definition for a report
A science communicator needs to explain 'lepton number' in an article. They use get_glossary first, getting a precise definition and context from CERN before writing anything. This avoids using Wikipedia definitions.
Starting a general search for Dark Matter
A data scientist wants to start exploring 'Dark Matter.' They use search_by_category first, which gives them an estimated count (e.g., 13k datasets). Then they run list_experiments and filter by CMS to narrow the search scope.
Investigating a specific collision type
A researcher is only interested in electron-positron collisions (e+e-) from the LEP era. They use search_by_collision_type and then combine that result with list_experiments to focus solely on DELPHI data.
The Tradeoffs
Using only full-text search
Running a simple query like 'Dark Matter collision' via the general search tool. This returns thousands of irrelevant results because it mixes metadata, documentation, and actual data points.
→
First, use search_by_category to limit the scope to relevant datasets (e.g., Exotic Particles). Then, refine with list_experiments or search_by_collision_energy for maximum precision.
Assuming data is available
Writing a script and hitting the download function without checking if the dataset exists. This results in connection errors and wasted compute time.
→
Always check status first with check_cern_opendata_status, then use list_experiments to confirm the collaboration, and finally use get_record on a specific ID.
Copying/pasting DOIs blindly
Just pasting a DOI into a general search field. The system might fail because it needs the dedicated tool endpoint.
→
Always use the get_record_by_doi tool. It's designed specifically to resolve that format and return the correct record ID.
When It Fits, When It Doesn't
Use this MCP Server if your primary need is scientific data retrieval: you must locate specific datasets, validate complex technical terms, or trace raw files back to a publication. Don't use it if you are simply browsing general physics news or looking for basic definitions—though get_glossary helps there. If you just need a simple definition, stick with the glossary tool. But if you have a specific scientific problem (e.g., 'I need all CMS datasets at 13 TeV related to Higgs bosons'), this is your only option because it combines search filters (search_by_experiment, search_by_collision_energy) with deep record access (get_record).
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by CERN Open Data. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 16 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Finding the right dataset shouldn't require three different portals.
Today, if you need a specific data set—say, CMS records from 13 TeV for Dark Matter—you might start on one portal for experiments, leave to another site to check collision energy parameters, and then use a third resource just to find the official glossary definition. You end up copying IDs between three different tabs, losing context every time.
With this MCP server, you run one query through your agent: 'Get all Dark Matter datasets from CMS at 13 TeV.' The agent handles the cross-referencing using `search_by_experiment`, `search_by_collision_energy`, and `search_by_category`. You get a single, filtered list of records; no tab switching required.
CERN Open Data MCP Server: Get the complete file manifest for any record.
Manually downloading data is worse than it seems. You find a record ID, and then you have to guess which files are relevant—is it the AOD format? The ROOT files? Do you need the checksums for integrity checks? You're stuck looking at vague file listings.
Now, running `list_record_files` gives you everything in one shot: filename, size, checksum, and the direct URI. It's a complete inventory that lets your agent prepare all necessary data paths for download or analysis.
Common Questions About CERN Open Data MCP
How do I find datasets by experiment using search_by_experiment? +
You just specify the experiment name, like 'CMS' or 'ATLAS'. The tool returns a list of all related dataset IDs and their counts, letting you focus only on the collaboration you need.
What if I only have a DOI? Should I use get_record_by_doi? +
Yes, absolutely. Use get_record_by_doi immediately. It resolves the published reference (the DOI) directly to the corresponding open data record ID, skipping all manual searching.
Which tool should I use for general dataset discovery? +
Start with search_datasets. This is your primary tool because it accepts a full-text query and lets you combine that text search with filters like collision type or energy.
Can I get definitions for physics terms using the get_glossary tool? +
Yes. get_glossary is built specifically to define complex particle physics concepts—like 'luminosity'—giving you a precise, authoritative definition.
Before I run a major search query, how do I verify connectivity using `check_cern_opendata_status`? +
You use it to confirm the API link is live before querying. This tool verifies CERN Open Data's connection status against the InvenioRDM framework. It tells you if the portal is accepting requests, so you don't waste time on failed queries.
What can I learn about the overall scope of datasets using `get_portal_statistics`? +
It returns high-level statistics across all facets. This lets you see counts by experiment, data format, and years available. You get a full picture of what types of records exist in the repository.
After I find a record ID, how do I use `list_record_files` to check the actual file formats? +
It lists every physical data component associated with that dataset. You get filenames, byte sizes, and crucial ROOT/EOS URIs for direct access. This shows you exactly what you're downloading.
I need to reproduce a published result; where do I look for guides using `search_documentation`? +
This tool retrieves official guides, policies, and technical instructions. Use it to find out how specific datasets were generated or what analysis workflows are required. It’s essential reading for reproduction.
Do I need an API key to use this server? +
No. The CERN Open Data Portal API is completely public and requires no authentication. Simply subscribe to this server and enter any placeholder value in the API key field to start querying particle physics datasets immediately.
What kind of data can I access from CERN? +
You can access over 66,000 datasets from major LHC experiments (CMS, ATLAS, ALICE, LHCb) and legacy experiments (DELPHI, OPERA). This includes real collision data, Monte Carlo simulations, derived datasets, analysis software, physics glossary entries, and detailed documentation. Data covers Higgs boson searches, Dark Matter studies, exotic particle searches, heavy-ion physics, and more.
Can I use CERN data for machine learning projects? +
Absolutely. CERN provides labeled datasets specifically designed for ML applications, including particle identification, jet classification, event reconstruction, and anomaly detection. Use the search tools with queries like 'machine learning' or filter by file type 'csv' or 'nanoaodsim' to find ML-ready formats. The CMS experiment alone has published thousands of simulated datasets with known physics labels.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
U.S. Treasury Debt — National Debt & Interest Rates
Access real-time data on the U.S. National Debt (currently $34T+). Retrieve 'Debt to the Penny', monitor average interest rates on Treasury securities, and access results from Treasury auctions.
EBI InterPro
Classify protein sequences into families, predict functional domains, and explore evolutionary relationships across species.
ISO New England
Access real-time and historical energy market data via ISO New England API.
You might also like
Focus Logística
Emit CT-e, MDF-e, manage cargo manifests and transport documents via Focus NFe API.
SpyFu
Automate SEO and PPC research via SpyFu — analyze domain metrics, track keyword stats, and uncover competitor ad history directly from any AI agent.
Coppel
Automate retail operations via Coppel — browse products, manage orders, check customer credit, and find stores across Mexico from any AI agent.