How to Use the CERN Open Data MCP in LlamaIndex

Q: How do I use LlamaIndex to compare datasets from different experiments in CERN Open Data?

First, use the searchbyexperiment tool for each experiment you care about (e.g., 'ATLAS', 'CMS') and index the results. Then, you can ask natural language questions to compare the indexed metadata, like 'Compare the number of 13 TeV datasets between CMS and ATLAS'.

Q: Is it possible to build a LlamaIndex RAG app over CERN Open Data that explains physics concepts?

Absolutely. Your agent can use getrecord to fetch dataset abstracts and index them. When you ask a question, it can also use the getglossary tool to pull in definitions for technical terms, grounding its answer in both the dataset context and the official glossary.

Q: How can I find the original paper for a dataset using LlamaIndex?

After your agent finds a dataset, it should use the getrecord tool to get its DOI. Then, it can call getrecordbydoi to confirm the link or even use another tool to fetch the paper from an academic search engine.

Build RAG apps with LlamaIndex that index and query live particle physics data from CERN.

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

MCP Servers - Free for Subscribers

Connect CERN Open Data MCP to LlamaIndex

Create your Vinkius account to connect CERN Open Data to LlamaIndex and route execution through our secure gateway. The platform manages server hosting, runtime updates, and security layers. Configuration requires no manual server provisioning.

GDPR Free for Subscribers

Setup CERN Open Data with LlamaIndex

Ask AI about this MCP

ChatGPT

Claude

Perplexity

Index CERN Data as a Knowledge Base

Don't just call an API; index its output. With LlamaIndex, you can run `search_datasets` for 'dark matter' and feed the results directly into a vector index. Now those dataset titles and abstracts are part of a searchable knowledge base your agent can query against. This works for any tool in this MCP server. Run `list_experiments` or `get_portal_statistics` and index the results. You're building a local, queryable snapshot of the CERN portal's structure, grounded in real-time data.

Query Your Indexed Physics Data

Here's the difference: instead of asking your agent to *find* a dataset again, you just ask a question. 'Which CMS experiment datasets mention top quark pair production?' LlamaIndex turns your question into a vector search against the data you already indexed. This gets you answers based on the actual contents of record abstracts you've fetched with tools like `get_record`. It's faster and stops your agent from making redundant API calls for information it's already seen.

Augment Queries with Real-Time Tools

LlamaIndex combines indexed knowledge with live tool calls. Your agent might find a relevant record ID from its index, then use the `list_record_files` tool to get a fresh list of the data files inside that record right now. It can also use tools to enrich its answers. If a query result contains a term like 'leptoquark', the agent can automatically call `get_glossary` to provide a definition alongside the dataset information. You get a complete picture, mixing stored knowledge with live API data.

Setup guide

Set up CERN Open Data MCP in LlamaIndex

Prerequisites

Python 3.10+ installed
llama-index-tools-mcp package
Active Vinkius subscription with a valid endpoint token

1

Install dependencies
Run pip install llama-index-tools-mcp llama-index-llms-openai. The MCP tools package provides BasicMCPClient and McpToolSpec.
2

Connect with BasicMCPClient
Point BasicMCPClient to your Vinkius endpoint URL. Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. Supports SSE and Streamable HTTP transports.
3

Convert to LlamaIndex tools
Call mcp_tool_spec.to_tool_list_async() to convert all CERN Open Data MCP tools into native FunctionTool objects that any LlamaIndex agent can use.
4

Run with any LLM
Create a FunctionAgent with the tools and your preferred LLM. Swap OpenAI for Anthropic, Gemini, or any LlamaIndex-supported provider.

agent.py

from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

# Connect to the MCP
mcp_client = BasicMCPClient(
    "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
)
mcp_tool_spec = McpToolSpec(client=mcp_client)

# Convert MCP tools to LlamaIndex tools
tools = await mcp_tool_spec.to_tool_list_async()

# Create and run the agent
agent = FunctionAgent(
    tools=tools,
    llm=OpenAI(model="gpt-4o"),
    system_prompt="You have access to CERN Open Data tools.",
)
response = await agent.run("List recent CERN Open Data data")

Get your connection token →

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by CERN Open Data. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

Why Choose Vinkius

Vinkius connects your tools to AI with real-time monitoring and automatic cost savings — all from one dashboard.

Connect CERN Open Data now

Real-time monitoring

Live

visibility into every interaction

Connect your favorite tools to your AI and see exactly what's happening — every request, every response, in real time.

Built-in savings

60%

lower AI costs

Vinkius compresses data between your apps and your AI automatically. Lower bills every month — no configuration required.

Single dashboard

One

place for every integration

Every tool your AI connects to, managed from a single screen. One account, complete control.

Common questions about CERN Open Data MCP in LlamaIndex

Yes. You can set up a LlamaIndex agent to periodically call `search_documentation` and ingest the titles and abstracts into a vector index. This makes all the guides, policies, and reports semantically searchable.

First, use the `search_by_experiment` tool for each experiment you care about (e.g., 'ATLAS', 'CMS') and index the results. Then, you can ask natural language questions to compare the indexed metadata, like 'Compare the number of 13 TeV datasets between CMS and ATLAS'.

Absolutely. Your agent can use `get_record` to fetch dataset abstracts and index them. When you ask a question, it can also use the `get_glossary` tool to pull in definitions for technical terms, grounding its answer in both the dataset context and the official glossary.

After your agent finds a dataset, it should use the `get_record` tool to get its DOI. Then, it can call `get_record_by_doi` to confirm the link or even use another tool to fetch the paper from an academic search engine.

The requests to the MCP server, such as your search terms or DOI lookups, pass through a dedicated V8 Isolate. This sandboxed environment handles the API call and is destroyed immediately afterward, ensuring none of your query data is stored or analyzed.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript