Vinkius
Braintrust

Braintrust MCP for AI. Prove model quality with systematic evaluation.

Claude Claude
ChatGPT ChatGPT
Cursor Cursor
Gemini Gemini
Windsurf Windsurf
VS Code VS Code
JetBrains JetBrains
Vercel Vercel
See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Braintrust MCP on Cursor AI Code EditorBraintrust MCP on Claude Desktop AppBraintrust MCP on OpenAI Agents SDKBraintrust MCP on Visual Studio CodeBraintrust MCP on GitHub Copilot AI AgentBraintrust MCP on Google Gemini AIBraintrust MCP on Lovable AI DevelopmentBraintrust MCP on Mistral AI AgentsBraintrust MCP on Amazon AWS Bedrock

Connect to your AI in seconds.

Braintrust helps developers systematically test and validate LLMs. You manage projects, track prompt versions, run complex benchmark experiments, and query structured 'Ground Truth' data—all within one place.

Stop guessing if your model works; prove it.

What your AI can do

Create experiment

Records a new historical experiment trace to track LLM pipeline tests.

Create project

Sets up a new project environment for tracking AI evaluations and data sets.

List datasets

Lists available 'Ground Truth' text banks used for automated evaluation scoring.

+ 7 more capabilities included
Track Model Performance

Run formal experiments that record and compare LLM outputs against historical runs.

Manage Test Data Sets

Query accurate, structured 'Ground Truth' data sets to score model responses automatically.

Version Prompt Templates

Securely grab and compare specific versions of system prompts without touching the core code base.

Organize Evaluation Scope

Create isolated projects to keep different model test runs separate and clean.

Included with Plan

Waiting for input…

AI Agent

Braintrust: 10 Tools for Evaluation

These tools let you build a complete testing pipeline, allowing you to define projects, retrieve data sets, version prompts, and track every single test run result.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Braintrust on Vinkius

Create Experiment

Records a new historical experiment trace to track LLM pipeline tests.

Create Project

Sets up a new project environment for tracking AI evaluations and data sets.

List Datasets

Lists available 'Ground Truth' text banks used for automated evaluation scoring.

List Env Vars

Checks the Braintrust AI Gateway configurations, showing model API keys securely.

List Experiments

Retrieves all recorded evaluation experiments, mapping out model test scores and...

Get Dataset

Retrieves a specific dataset containing structured schemas that bound LLM outputs.

Get Prompt

Grabs the exact variable contexts and literal text templates used in a prompt.

Insert Dataset Row

Adds new test cases into an existing dataset matrix for specific evaluations.

List Projects

Lists all existing AI evaluation projects configured in Braintrust.

List Prompts

Retrieves a list of system prompts that are explicitly version-controlled inside...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Claude AI

1

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

2

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

3

Start a conversation

Open a new chat. The Braintrust integration is available immediately — no restart needed.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

  • Import from OpenAPI, Swagger, or YAML specs
  • Create Agent Skills with progressive disclosure
  • Deploy to edge with MCPFusion framework
  • Built in DLP, auth, and compliance on every call
  • Real time usage dashboard and cost metering
  • Publish to catalog or keep private
Start building

Make Your AI Do More

Start with Braintrust, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

  • Use this MCP plus 5,100+ others, all in one place
  • Add new capabilities to your AI anytime you want
  • Every connection is secured and compliant automatically
  • Track usage and costs across all your servers
  • Works with Claude, ChatGPT, Cursor, and more
  • New servers added to the catalog every week
Braintrust MCP server cover

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Braintrust. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 10 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Testing AI outputs feels like guesswork right now.

Today, when your model fails, you usually end up in a messy cycle: checking the input logs, opening another tab to look at the prompt template, cross-referencing it with a separate spreadsheet of 'known good' answers. You spend half your time just trying to collect enough data points to figure out *why* it went wrong.

With this MCP, you stop guessing. The platform handles that messy process. You define the scope using projects and datasets; then, when the model outputs a response, the system scores it against your 'Ground Truth' immediately. What you get is clean, measurable data about performance.

Braintrust gives you full control over model evaluation.

You no longer have to rely on vague metrics or manual spot-checks. You can use `list_projects` to see every test environment, and then run specific comparisons by retrieving prompt templates using `get_prompt`. This gives you an audit trail of everything.

The difference is control. You move from 'I hope this works' to 'Here are the metrics proving it works.' It’s a fundamental shift in how you build reliable AI.

What your AI can actually do with this

Building reliable AI models means more than just writing a single good prompt. It demands rigorous testing across multiple variables. This MCP lets you set up formal evaluation pipelines right from your agent, giving you full visibility into exactly how the model behaves under pressure. You can track specific variable distributions and compare outputs against historical benchmarks without ever leaving your chat window.

Need to check if a new feature broke an old response pattern? Use this MCP. If you're building anything complex for production, connecting it through the Vinkius catalog is the right move. It lets you turn vague model performance anxiety into concrete data points.

Built · Hosted · Managed by Vinkius Braintrust MCP - Systematic AI Model Benchmarking
Server ID 019d7562-3c18-72ce-8b34-c4fc9e9f37ad
Vinkius Inspector
Compliance Grade A+
Score 100/100
Vinkius Inspector Badge — Score 100/100

Questions you might have

How do I start testing my model with Braintrust using `create_project`? +

You first call create_project to establish the boundaries for your tests. This gives you a clean, isolated environment that prevents new test runs from contaminating existing project data.

What is the difference between `get_dataset` and `list_datasets`? +

list_datasets shows you all available 'Ground Truth' text banks. You then use get_dataset to pull a specific, structured dataset for active testing.

How do I track changes to my prompt templates with Braintrust? +

Use the list_prompts tool to see all version-controlled prompts. You can then call get_prompt to retrieve a specific template ID, ensuring you test against an exact version.

Can I add custom failed tests using Braintrust? +

Yes. After running a batch of tests, you use insert_dataset_row to manually append new failure cases or specific edge-case inputs into your dataset matrix for future runs.

How do I check which API keys are configured for Braintrust using `list_environments_vars`? +

It shows you all the current gateway configuration variables. This is how your agent accesses the necessary model API keys securely without needing manual setup.

If I want to review previous test runs, what does `list_experiments` retrieve? +

list_experiments retrieves a comprehensive map of all past evaluation attempts. This lets you check historical metrics and model scores across various run IDs.

Can I use `insert_dataset_row` to append just a single test case into my matrix? +

Yes, that's exactly what it does. You can target a specific dataset and inject new evaluation data row by row without having to build an entire master sheet first.

Before starting a new project, how do I use `list_projects` to see current evaluations? +

list_projects gives you the list of all existing AI evaluation containers. This helps you confirm your scope and choose the right environment for your next test.

Can I insert new test data dynamically tracking specific limits? +

Yes. Utilizing the insert_dataset_row method, you can effortlessly inject exact JSON tracking payload mapping strings directly inside the text corpus evaluating the final results.

Does it pull out original Prompt definitions stored securely? +

Certainly. The get_prompt command isolates and returns perfectly version-controlled bounding parameters slicing literal templates natively hosted under the Braintrust database.

How deeply can it inspect test regressions or scoring limits? +

Using the robust list_experiments call, you can branch full arrays separating LLM version behaviors over massive iterations tracking the performance anomalies accurately.

Built & Managed by Vinkius 30s setup 10 tools

We've already built the connector for Braintrust. Just plug in your AI agents and start using Vinkius.

No hosting. No infrastructure. No complex setup.
All 10 tools are live and waiting. You're up and running in seconds.

Vinkius runs on Claude Claude
Vinkius runs on ChatGPT ChatGPT
Vinkius runs on Cursor Cursor
Vinkius runs on Gemini Gemini
Vinkius runs on Windsurf Windsurf
Vinkius runs on VS Code VS Code
Vinkius runs on JetBrains JetBrains
Vinkius runs on Vercel Vercel
+ other MCP clients

Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.

Zero hosting required Full MCP catalog included Enterprise-grade security Auto-updated by Vinkius

Built, hosted, and secured by Vinkius. You just connect and go.