Vinkius
Braintrust

Braintrust MCP. Stop guessing if your model broke after an update.

Claude Claude
ChatGPT ChatGPT
Cursor Cursor
Gemini Gemini
Windsurf Windsurf
VS Code VS Code
JetBrains JetBrains
Vercel Vercel
See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Braintrust MCP on Cursor AI Code Editor MCP Client Braintrust MCP on Claude Desktop App MCP Integration Braintrust MCP on OpenAI Agents SDK MCP Compatible Braintrust MCP on Visual Studio Code MCP Extension Client Braintrust MCP on GitHub Copilot AI Agent MCP Integration Braintrust MCP on Google Gemini AI MCP Integration Braintrust MCP on Lovable AI Development MCP Client Braintrust MCP on Mistral AI Agents MCP Compatible Braintrust MCP on Amazon AWS Bedrock MCP Support

Just plug in your AI agents and start using Vinkius.

Braintrust lets you stop guessing if your model broke. Connect this MCP to run structured tests, track prompt changes, and benchmark AI logic against specific ground truth datasets.

It's for developers who need proof that their LLM output meets strict quality standards every single time.

What your AI agents can do

Create experiment

Sets up a new historical experiment trace to record specific LLM pipeline tests and metrics.

Create project

Initializes an isolated project environment for tracking AI evaluations and related datasets.

Get dataset

Fetches a specific dataset containing predefined schemas for bounding LLM outputs.

+ 7 more capabilities included
Setup Evaluation Projects

Create new containers to organize and track multiple related AI testing efforts.

Run Model Benchmarks

Execute isolated test runs, appending unique scores and metrics for every model iteration.

Manage Ground Truth Data

Query or append specific datasets that define the perfect, expected output for your models to measure against.

Version Prompt Templates

Save and track exact prompt text templates so you can compare older versions without changing core code.

Review Test History

Retrieve comprehensive lists of past experiments, showing which metrics were tracked across different model runs.

Supported MCP Clients

OAuth 2.0 Compatible
Vinkius runs on Claude Claude
Vinkius runs on ChatGPT ChatGPT
Vinkius runs on Cursor Cursor
Vinkius runs on Gemini Gemini
Vinkius runs on VS Code VS Code
Vinkius runs on JetBrains JetBrains
Vinkius runs on Vercel Vercel
Vinkius runs on Zendesk Zendesk
+ other MCP clients
Free for Subscribers

Waiting for input…

AI Agent

Braintrust: 10 Tools for AI Evaluation

These tools let you manage the entire lifecycle of a model test, from setting up a project to running benchmarks and scoring results.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Braintrust on Vinkius
create019d7562

create experiment

Sets up a new historical experiment trace to record specific LLM pipeline tests and metrics.

create019d7562

create project

Initializes an isolated project environment for tracking AI evaluations and related datasets.

get019d7562

get dataset

Fetches a specific dataset containing predefined schemas for bounding LLM outputs.

get019d7562

get prompt

Retrieves the exact variable contexts and literal text templates used in a given prompt version.

insert019d7562

insert dataset row

Adds new test cases into an existing dataset matrix targeting specific evaluations for scoring.

list019d7562

list datasets

Lists all isolated Ground Truth text banks used specifically for automated evaluation scoring.

list019d7562

list env vars

Probes the Braintrust AI Gateway configurations, showing model API keys and setup variables securely.

list019d7562

list experiments

Retrieves a list of all evaluation experiments, detailing historical model test scores and metrics.

list019d7562

list projects

Gets the complete list of all AI evaluation projects currently running in Braintrust.

list019d7562

list prompts

Retrieves a record of explicitly version-controlled system prompts isolated within Braintrust.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

  • Import from OpenAPI, Swagger, or YAML specs
  • Create Agent Skills with progressive disclosure
  • Deploy to edge with MCPFusion framework
  • Built in DLP, auth, and compliance on every call
  • Real time usage dashboard and cost metering
  • Publish to catalog or keep private
Start building

Make Your AI Do More

Start with Braintrust, then connect any of our 4,800+ other servers whenever your AI needs more. One click, no limits.

  • Use this MCP plus 4,800+ others, all in one place
  • Add new capabilities to your AI anytime you want
  • Every connection is secured and compliant automatically
  • Track usage and costs across all your servers
  • Works with Claude, ChatGPT, Cursor, and more
  • New servers added to the catalog every week
Braintrust MCP server cover

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Braintrust. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Manually validating model logic is a nightmare of tabs and spreadsheets.

Right now, checking your model's behavior feels like forensic accounting. You have to copy the prompt into an agent, run it; check the output. Then you manually update a spreadsheet with the expected result—the Ground Truth. If you change one variable, you repeat 50+ times, copying everything, hoping you didn't miss a corner case.

With this MCP, that manual grind disappears. You define your entire test suite in an isolated project and use `create_project`. Your agent handles the repetition; it runs dozens of varied tests automatically, giving you one clean report showing where every single model failed to meet the standard.

Get Exact Model Benchmarks with Braintrust

You eliminate the need for manual data entry by using `list_datasets` to find existing test banks and then appending new failure points via `insert_dataset_row`. You never have to guess what inputs matter; you just add them.

What's different now is certainty. Instead of having a gut feeling about model quality, you have verifiable scores across every single dimension you care about.

What you can do with this MCP connector

This connector gives you a platform to observe, test, and debug your AI models in isolation. Instead of just running prompts and hoping they work, you establish formal projects where you define the inputs (the prompt templates) and the expected outputs (the ground truth dataset). You run structured experiments that execute model variations against this known standard, generating detailed performance traces.

This capability is critical for catching subtle regressions—when a minor update to your code causes a massive drop in quality. Whether you're testing how different versions of a prompt affect the tone, or checking if two models handle edge cases differently, you get hard metrics on alignment scores. You can also build up and version those core datasets over time.

When you run these evaluations through Vinkius, your AI agent doesn't just send data; every tool call is recorded in a cryptographically signed audit trail. This means that when you review the results, you know exactly which inputs caused which failures, giving you full visibility into what happened across the entire test run.

Built · Hosted · Managed by Vinkius Braintrust MCP - Benchmark AI Model Performance Server ID 019d7562-3c18-72ce-8b34-c4fc9e9f37ad
Vinkius Inspector
Compliance Grade F
Score 3.6/100
Vinkius Inspector Badge — Score 3.6/100

Common Questions About Braintrust MCP

How do I start tracking new evaluations with the Braintrust MCP? +

Start by calling create_project to establish a dedicated scope for your work. This container keeps all related tests and datasets isolated from other projects.

What is Ground Truth in the context of list_datasets? +

Ground Truth refers to the definitive, correct answers or expected outputs used as a benchmark. The list_datasets tool helps you find these core repositories for your model testing.

Can I compare two different prompts using Braintrust MCP? +

Yes. You use get_prompt to retrieve both templates, then create separate experiments using create_experiment to run them side-by-side against the same dataset.

Do I need to manually manage API keys for this MCP? +

The platform handles credential management via a zero-trust proxy. You only need to confirm your environment variables using list_env_vars, and Vinkius manages the secure transit of those credentials.

How do I check my Braintrust Gateway settings using `list_env_vars`? +

It probes the secure configuration variables for your gateway. This tool lets you confirm which model API keys and parameters are active without exposing sensitive credentials, giving you confidence in the setup.

If I find a gap in my test data, how do I add new examples using `insert_dataset_row`? +

You append new records directly into your dataset matrix. This tool lets you feed specific, high-value test cases to the evaluation system without needing to manually update the source file.

I'm starting a totally different product line; how do I use `create_project`? +

It establishes a completely isolated workspace for your new efforts. This ensures that testing, datasets, and metrics for Project A won't accidentally mix with or affect Project B.

What if I need to see all saved versions of a prompt? Can `list_prompts` help? +

It retrieves an explicit list of every version-controlled system prompt. This is crucial for auditing and tracking how your core instructions have evolved over time.

Built & Managed by Vinkius 30s setup 10 tools

We've already built the connector for Braintrust. Just plug in your AI agents and start using Vinkius.

No hosting. No infrastructure. No complex setup.
All 10 tools are live and waiting. You're up and running in seconds.

Vinkius runs on Claude Claude
Vinkius runs on ChatGPT ChatGPT
Vinkius runs on Cursor Cursor
Vinkius runs on Gemini Gemini
Vinkius runs on Windsurf Windsurf
Vinkius runs on VS Code VS Code
Vinkius runs on JetBrains JetBrains
Vinkius runs on Vercel Vercel
+ other MCP clients

Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.

Zero hosting required Full MCP catalog included Enterprise-grade security Auto-updated by Vinkius

Built, hosted, and secured by Vinkius. You just connect and go.