Vinkius
Ragas

Ragas MCP for AI. Run RAG evaluations and track metrics from your chat.

Claude Claude
ChatGPT ChatGPT
Cursor Cursor
Gemini Gemini
Windsurf Windsurf
VS Code VS Code
JetBrains JetBrains
Vercel Vercel
See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Ragas MCP on Cursor AI Code EditorRagas MCP on Claude Desktop AppRagas MCP on OpenAI Agents SDKRagas MCP on Visual Studio CodeRagas MCP on GitHub Copilot AI AgentRagas MCP on Google Gemini AIRagas MCP on Lovable AI DevelopmentRagas MCP on Mistral AI AgentsRagas MCP on Amazon AWS Bedrock

Connect to your AI in seconds.

Ragas lets your AI client manage professional RAG evaluation and tracking directly inside your chat or IDE. It provides specialized tools to list datasets, run evaluations against LLM pipelines, fetch detailed metrics like faithfulness, and track experiment versions without needing a separate dashboard.

What your AI can do

List datasets

Lists all available datasets used for RAG testing in your project.

Get dataset

Retrieves specific metadata for one evaluation dataset ID.

List experiments

Shows a list of past experiments tied to a specific dataset ID.

+ 4 more capabilities included
List available datasets

The agent calls list_datasets to retrieve the names and IDs of all evaluation datasets configured in your Ragas project.

Get specific dataset details

You use get_dataset to pull metadata for a single dataset, checking its schema or required parameters before an evaluation run.

Start a new RAG pipeline evaluation

The agent executes run_evaluation, kicking off the scoring process against a specified dataset and model configuration.

Find experiment history

You ask the client to run list_experiments to see all past evaluation runs associated with a given dataset ID.

Retrieve final test scores

The agent calls get_results to pull the summarized, aggregate performance score for a completed experiment.

List all measurable metrics

You use list_metrics to check which scoring dimensions (e.g., faithfulness, answer relevancy) are available for reporting.

Included with Plan

Waiting for input…

AI Agent

Ragas MCP Server: 7 Tools for RAG Evaluation

These tools let your agent handle the full lifecycle of RAG testing: listing data, running tests, and retrieving verifiable performance metrics.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Ragas on Vinkius

List Datasets

Lists all available datasets used for RAG testing in your project.

Get Dataset

Retrieves specific metadata for one evaluation dataset ID.

List Experiments

Shows a list of past experiments tied to a specific dataset ID.

Get Experiment

Gets detailed information about a single, recorded experiment run.

Run Evaluation

Initiates a new Ragas evaluation run on the specified dataset ID.

List Metrics

Outputs every scoring dimension available for RAG evaluation (e.g., faithfulness, relevancy).

Get Results

Retrieves the final scoring metrics and outcomes from a completed evaluation run.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Claude AI

1

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

2

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

3

Start a conversation

Open a new chat. The Ragas integration is available immediately — no restart needed.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

  • Import from OpenAPI, Swagger, or YAML specs
  • Create Agent Skills with progressive disclosure
  • Deploy to edge with MCPFusion framework
  • Built in DLP, auth, and compliance on every call
  • Real time usage dashboard and cost metering
  • Publish to catalog or keep private
Start building

Make Your AI Do More

Start with Ragas, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

  • Use this MCP plus 5,100+ others, all in one place
  • Add new capabilities to your AI anytime you want
  • Every connection is secured and compliant automatically
  • Track usage and costs across all your servers
  • Works with Claude, ChatGPT, Cursor, and more
  • New servers added to the catalog every week
Ragas MCP server cover

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Ragas. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 7 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Testing LLMs shouldn't require context switching or boilerplate code.

Today, running a proper RAG evaluation means navigating to a separate dashboard. You upload the dataset there, click 'Run Test,' wait for it to process, then download a CSV of scores. If you want to compare two models, you repeat that entire cycle—copying IDs, remembering which score is faithfulness, and pasting everything into a spreadsheet.

With this MCP server, your agent handles all the tedious parts. You simply talk to your client: 'Run Model B against the Legal Q1 Test.' The agent executes `run_evaluation`, pulls back the detailed metrics via tools like `get_results`, and shows you the clean numbers right in your chat window. No dashboard hops required.

Ragas MCP Server: Get structured RAG evaluation results.

Manual testing means running scripts locally, then manually updating a central tracking sheet with the final score and model version. It's slow, prone to human error, and makes comparing multiple runs nearly impossible without deep manual effort.

Now, your client controls this entire process. You use `list_datasets` for discovery, trigger tests with `run_evaluation`, and finally retrieve structured data using `get_results`. The whole measurement chain is automated, verifiable, and right where you work.

What your AI can actually do with this

Ragas gives your AI client professional-grade Retrieval-Augmented Generation (RAG) evaluation and tracking right inside your chat or IDE. It's built to let you manage datasets and measure how well your LLM pipelines actually perform, all without needing some separate dashboard. You don't gotta leave your workflow just to check scores.

If you need to get started, the first thing your agent calls is list_datasets. This action shows you every dataset ID configured for RAG testing in your project. Once you know which data pool you're working with, you can use get_dataset to pull specific metadata for a single ID; this lets you check things like schema details or required parameters before you kick off any evaluation run.

When it comes time for the test itself, you first need to know what metrics you're supposed to measure. Call list_metrics and you get every scoring dimension available—stuff like faithfulness and answer relevancy—that Ragas can report on. After that, your agent executes run_evaluation, which kicks off the full scoring process against a specific dataset ID and model setup.

This initiates the whole thing.

Once the evaluation finishes, you use get_results to pull the summary: it gives you the final, aggregate performance score for that entire run. But if you need to track how your models change over time, you can ask the client to look at experiment history. By calling list_experiments, you see a record of every past evaluation run tied back to a specific dataset ID.

If you're digging into the specifics of one of those old tests, get_experiment pulls all the detailed information about that single recorded run.

Basically, if you're checking up on your RAG process, you'll use these tools in sequence: List what datasets exist; check a dataset's parameters; list the metrics available for scoring; initiate the evaluation run; and then grab the final scores or dive deep into the history of past runs.

Built · Hosted · Managed by Vinkius Ragas MCP Server - Evaluate RAG Models & Metrics
Server ID 019d75fc-3898-7169-9831-0da3f7c25d5a
Vinkius Inspector
Compliance Grade A+
Score 100/100
Vinkius Inspector Badge — Score 100/100

Questions you might have

How do I check if my dataset list is up to date using list_datasets? +

You call the list_datasets tool. This command retrieves all current datasets associated with your project ID, letting you confirm which versions are available for testing.

I need to compare two models, do I use get_results or list_experiments? +

Use list_datasets first. Then, run both models separately using run_evaluation. Finally, use get_experiment for each model's ID to pull detailed results and compare them.

What is the difference between get_results and list_experiments? +

list_experiments shows you a history of runs (the metadata). get_results pulls the actual, final calculated scores for one specific run ID.

Can I see what metrics are available before I run an evaluation with list_metrics? +

Yes. Running list_metrics shows every scoring dimension (like faithfulness) that Ragas can calculate, helping you know exactly what numbers to look for in the final report.

How do I authenticate my AI agent before using `list_datasets`? +

You must provide your Ragas Application URL and a generated token. The client uses these credentials to validate access immediately, ensuring the agent has proper permissions for any read operation like listing datasets.

If I run an evaluation with `run_evaluation` and it fails, how do I debug the error? +

The system response includes a detailed stack trace or specific error code. Check this output first; it points directly to input data issues or configuration problems within your Ragas setup that need correcting.

When using `get_dataset`, are there specific document formats required for optimal performance? +

The system handles standard text inputs, but structured data performs best. Make sure your source documents include clear metadata fields (like 'source' or 'date') so Ragas can accurately attribute scores when you later use the results.

Is there a rate limit for how many evaluations I can run using `run_evaluation`? +

While specific limits vary by subscription tier, running multiple evaluations is generally fine. If you hit an API call threshold error, check the server logs; they will flag whether you've exceeded usage quotas.

How do I secure an App Token for Ragas? +

Log into your provided Ragas dashboard. In your project's settings or dedicated security section, you will find the ability to generate a new Application Token. Copy it immediately, as it may only appear once.

What format is required to upload a dataset? +

The tool uses common array formats through the MCP wrapper. When passing data, the AI maps arrays containing question, ground_truth and contexts natively matching Ragas base requirements.

Does the server evaluate prompts automatically during testing? +

Yes. When triggering evaluations, Ragas uses its own sophisticated metrics (like Faithfulness, Answer Relevance) running internally. The MCP server simply pipes these generated reports back to your chat.

Built & Managed by Vinkius 30s setup 7 tools

We've already built the connector for Ragas. Just plug in your AI agents and start using Vinkius.

No hosting. No infrastructure. No complex setup.
All 7 tools are live and waiting. You're up and running in seconds.

Vinkius runs on Claude Claude
Vinkius runs on ChatGPT ChatGPT
Vinkius runs on Cursor Cursor
Vinkius runs on Gemini Gemini
Vinkius runs on Windsurf Windsurf
Vinkius runs on VS Code VS Code
Vinkius runs on JetBrains JetBrains
Vinkius runs on Vercel Vercel
+ other MCP clients

Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.

Zero hosting required Full MCP catalog included Enterprise-grade security Auto-updated by Vinkius

Built, hosted, and secured by Vinkius. You just connect and go.