Gradient AI MCP. Build AI pipelines: Train models, embed data, and extract knowledge.

Q: How do I use generateembeddings in a workflow?

generateembeddings converts any text into a vector. You run this first on your source data, then you pass those vectors to createragcollection. The resulting collection is what you query with answerquestion.

Q: What is the difference between summarizedocument and answerquestion?

summarizedocument gives you a high-level overview of the whole text. answerquestion is surgical; it finds the specific passage that answers your question and only returns that answer.

Q: Can I use extractentity on a PDF file?

You need to pre-process the PDF first. Run extractpdf to get the raw text, and then pass that text output to extractentity to pull out the structured data.

Q: Which tool do I use to check available models?

Use listmodels to see all available models (both foundational and custom). Use listragcollections to see what knowledge bases you've already built.

Q: Do I need to upload files before I can use analyzesentiment?

No. If the text is already in your prompt, you can run analyzesentiment immediately. You only need uploadfile if the text is coming from an external, un-pasted source.

Q: How do I use finetunemodel on a large, proprietary dataset?

You provide the training samples directly to the finetunemodel tool. This process trains a new model instance on your specific data, improving its performance for niche tasks.

Q: What is the purpose of createragcollection in my workflow?

The createragcollection tool sets up a dedicated knowledge base for Retrieval Augmented Generation (RAG). This allows your AI client to answer questions using only the context you provide.

Q: When should I use getmodel versus listmodels?

Use listmodels to see all foundational and fine-tuned models available in your workspace. Use getmodel when you already know the ID of a specific model and need its detailed metadata.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Gradient AI (LLM API & Finetuning) connects your AI client to enterprise-grade LLM infrastructure. This server lets your agent manage custom fine-tuned models, generate high-quality text completions, and process text with specialized tools like sentiment analysis and entity extraction.

You can train models on proprietary data, generate vector embeddings, and perform complex NLP tasks without leaving your chat window.

What your AI agents can do

Analyze sentiment

Determines the emotional tone (positive, negative, neutral) of a provided document.

Answer question

Reads a source document and answers a specific question based only on the provided text.

Complete model

Generates continuous text based on a starting prompt and selected model.

+ 16 more capabilities included

Analyze document sentiment

The analyze_sentiment tool determines if a given text expresses positive, negative, or neutral sentiment.

Query documents for answers

The answer_question tool reads a source document and generates a direct answer to a specific user question.

Generate text completions

The complete_model tool creates text based on a provided prompt, supporting advanced context and retrieval parameters.

Manage custom models

Tools like create_model, delete_model, and get_model allow you to list, create, and manage your unique, fine-tuned AI models.

Create vector search collections

The create_rag_collection tool sets up a dedicated knowledge base for Retrieval Augmented Generation (RAG) operations.

Extract structured data

The extract_entity tool pulls specific pieces of structured data (like names, dates, or IDs) from a document based on a predefined schema.

Process PDFs and documents

You can use extract_pdf to pull text and data from a PDF file, or upload_file to prepare any document for subsequent analysis.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Gradient AI (LLM API & Finetuning): 19 Tools for LLM Operations

These tools let your AI agent manage the full lifecycle of advanced LLM workflows: ingesting data, training models, generating vectors, and extracting specific knowledge.

analyze019e5d21

analyze sentiment

Determines the emotional tone (positive, negative, neutral) of a provided document.

answer019e5d21

answer question

Reads a source document and answers a specific question based only on the provided text.

complete019e5d21

complete model

Generates continuous text based on a starting prompt and selected model.

create019e5d21

create model

Initializes a new, blank instance for a custom fine-tuned AI model.

create019e5d21

create rag collection

Sets up a new collection designed for retrieving information from multiple sources (RAG).

create019e5d21

create transcription

Starts an asynchronous job to convert an audio file into text.

delete019e5d21

delete model

Permanently removes a specific fine-tuned AI model instance.

extract019e5d21

extract entity

Pulls structured data like names, dates, and IDs from a document according to a specific schema.

extract019e5d21

extract pdf

Extracts raw text and data from a PDF file into usable formats.

fine019e5d21

fine tune model

Trains an existing model using a custom set of labeled samples to improve its performance on niche tasks.

generate019e5d21

generate embeddings

Converts input text or documents into numerical vectors for similarity search and indexing.

get019e5d21

get model

Retrieves detailed metadata and status for a specific fine-tuned model.

get019e5d21

get transcription

Checks the status and retrieves the completed text result of a transcription job.

list019e5d21

list embeddings

Lists all foundational models available for creating vector embeddings.

list019e5d21

list models

Retrieves a list of all available models, both foundational and custom-trained.

list019e5d21

list rag collections

Lists all existing RAG knowledge bases within the workspace.

personalize019e5d21

personalize document

Tailors the tone and content of a document so it speaks effectively to a defined audience.

summarize019e5d21

summarize document

Generates a concise summary of a large document while retaining key facts.

upload019e5d21

upload file

Uploads any file type (PDF, DOCX, etc.) into the workspace to make it available for other tools.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Gradient AI (LLM API & Finetuning), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Look, this server gives your AI client the keys to enterprise-grade LLM muscle. You're not just calling some basic API; you're building a whole AI pipeline right from your chat window. It lets your agent manage custom models, spit out high-quality text, and run specialized text processing—like figuring out sentiment or pulling out names and dates.

You'll train models on your own data, generate vector embeddings, and handle complex NLP tasks without ever leaving your chat.

How Gradient AI MCP Works

1 First, you upload a document or text block using upload_file or extract_pdf to get the source material ready.
2 Next, you feed that source material into a specialized process—maybe running generate_embeddings or creating a knowledge base with create_rag_collection.
3 Finally, you call the desired tool, like answer_question or complete_model, which uses the prepared data to give you a specific, actionable result.

The bottom line is, you move from raw data to actionable insight using a staged process: ingest, prepare, and execute.

Who Is Gradient AI MCP For?

This is for the AI Engineer who needs to test model limits quickly, the Data Scientist who can’t afford complex local setups, or the Developer integrating LLMs into a production app. If your job involves turning unstructured text into structured data or building proprietary chatbots, this server is for you.

ML Engineer

Uses fine_tune_model to train models on proprietary datasets, then uses list_models to test the performance difference between the base model and the new custom version.

Data Scientist

Uses generate_embeddings and create_rag_collection to build vector stores from research papers, then uses answer_question to query the knowledge base.

Software Developer

Integrates complete_model into an application backend, using the server's model management tools to select the optimal model for a given endpoint.

What Changes When You Connect

Structured Data Extraction: Instead of manually reading and copying data from a PDF, use extract_pdf and then run extract_entity. This pulls out specific fields—like invoice numbers or names—and formats them into a clean JSON object you can use immediately.
Custom Intelligence: Don't rely on general-purpose LLMs. Use fine_tune_model to train a model specifically on your company's internal documentation. Then, use that model ID in complete_model to ensure all generated responses match your company's voice and technical jargon.
Knowledge Base Building: Building a RAG system is complex. Start by running generate_embeddings on your corpus, then use create_rag_collection to hold the vectors. Finally, query the collection with answer_question to get answers grounded in your private data.
Workflow Visibility: You can't troubleshoot what you can't see. Use list_models and list_rag_collections to get a full inventory of every model and knowledge base you've created. This prevents version control headaches when debugging complex pipelines.
Media Input: The server handles more than just text. Use create_transcription to process audio files, and then pass the resulting transcript to summarize_document or analyze_sentiment for immediate analysis.
Audience Targeting: If a document is for internal use but needs to be presented to clients, use personalize_document. This tool adjusts the complexity and tone of the text, making it instantly usable for a different audience.

Real-World Use Cases

Triage customer support tickets

A support agent receives a ticket (PDF attachment). They use extract_pdf to get the text, then extract_entity to pull out the account ID and product name. Finally, they run analyze_sentiment on the text. The agent gets a clean JSON object containing the ID, product, and a 'Negative' sentiment score, allowing them to route it instantly to the correct Tier 2 team.

Building a specialized chatbot

A developer needs a chatbot that only answers questions about the company's latest product specs. They use upload_file to ingest the specs, then run generate_embeddings and create_rag_collection. The chatbot's logic relies entirely on answer_question, ensuring every response is sourced from the private, verified documentation.

Analyzing competitor claims

A marketing team wants to know the overall tone of competitor press releases. They use extract_pdf to grab multiple documents, then analyze_sentiment on each. They can also run summarize_document on the results to quickly identify common themes of praise or criticism.

Onboarding new compliance staff

A compliance officer needs to train a model on hundreds of pages of regulatory guidelines. They use fine_tune_model with the guidelines. After training, they use the resulting model ID in complete_model to generate compliance summaries that adhere strictly to the latest regulations.

The Tradeoffs

Treating all text as simple input

Passing a raw PDF file directly to complete_model and expecting a structured output. The model will hallucinate or fail because it can't read the layout, tables, or headers.

→ Always preprocess the document first. Use extract_pdf or upload_file to turn the PDF into raw text, then run extract_entity to enforce structure. Only feed the structured output to complete_model.

Building RAG without indexing

Running answer_question using only a single, isolated prompt. The system won't know where to find the source material, leading to generic, ungrounded answers that don't reference the source document.

→ You must first establish a knowledge base. Use generate_embeddings on your source documents, then use create_rag_collection to index them. Finally, run answer_question against the collection.

Skipping model version control

Relying on the default, foundational model (llama3-8b) for critical tasks, even when a custom, fine-tuned model exists. The default model might drift in tone or accuracy, causing inconsistencies.

→ Always use list_models to confirm your custom model ID. Then, specify that ID when calling complete_model or answer_question. This ensures you are always running the most up-to-date, trained version.

When It Fits, When It Doesn't

Use this server if your workflow requires moving from messy, unstructured data (PDFs, audio, raw text) to highly specific, actionable insights. You need to know what the text says (sentiment via analyze_sentiment), who is mentioned (entities via extract_entity), or what the document means (answers via answer_question).

Don't use this if you just need simple chat conversation or basic summarization. For pure summarization, summarize_document is enough. But if you need the summary and the sentiment and the key entities, this server lets you chain those calls together. If your primary need is just to talk to an LLM without proprietary data, a simpler, non-finetuning server might suffice. But if your data is the product, this is your tool.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Gradient AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 19 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

analyze_sentiment answer_question complete_model create_model create_rag_collection create_transcription delete_model extract_entity extract_pdf fine_tune_model generate_embeddings get_model get_transcription list_embeddings list_models list_rag_collections personalize_document summarize_document upload_file

Copying data from PDFs and reports is a slow, error-prone mess.

Today, you open a compliance report, then you manually copy the key dates into a spreadsheet. Next, you have to copy the names into a database. If the report is formatted weirdly—like it has a table of contents—you end up missing critical data points or having to spend time cleaning up formatting.

With this MCP server, you run `extract_pdf` and it pulls all the raw text and data into a clean format. Then, you use `extract_entity` to grab only the dates and names into a structured JSON. You never touch a spreadsheet again.

The Gradient AI (LLM API & Finetuning) MCP Server. Use `complete_model`.

Manually prompting an LLM in a chat box and hoping it maintains your company's specific terminology and compliance rules is a gamble. The model might use outdated jargon or misinterpret your internal acronyms, forcing you to edit the output every time.

By using `fine_tune_model` and then running `complete_model` with the resulting model ID, you guarantee the output adheres to your specific corporate vocabulary. The model speaks your language, every time.

Common Questions About Gradient AI MCP

How do I use `generate_embeddings` in a workflow? +

generate_embeddings converts any text into a vector. You run this first on your source data, then you pass those vectors to create_rag_collection. The resulting collection is what you query with answer_question.

What is the difference between `summarize_document` and `answer_question`? +

summarize_document gives you a high-level overview of the whole text. answer_question is surgical; it finds the specific passage that answers your question and only returns that answer.

Can I use `extract_entity` on a PDF file? +

You need to pre-process the PDF first. Run extract_pdf to get the raw text, and then pass that text output to extract_entity to pull out the structured data.

Which tool do I use to check available models? +

Use list_models to see all available models (both foundational and custom). Use list_rag_collections to see what knowledge bases you've already built.

Do I need to upload files before I can use `analyze_sentiment`? +

No. If the text is already in your prompt, you can run analyze_sentiment immediately. You only need upload_file if the text is coming from an external, un-pasted source.

How do I use `fine_tune_model` on a large, proprietary dataset? +

You provide the training samples directly to the fine_tune_model tool. This process trains a new model instance on your specific data, improving its performance for niche tasks.

What is the purpose of `create_rag_collection` in my workflow? +

The create_rag_collection tool sets up a dedicated knowledge base for Retrieval Augmented Generation (RAG). This allows your AI client to answer questions using only the context you provide.

When should I use `get_model` versus `list_models`? +

Use list_models to see all foundational and fine-tuned models available in your workspace. Use get_model when you already know the ID of a specific model and need its detailed metadata.

How can I start training a custom model with my own data? +

You can use the fine_tune_model tool. Simply provide the model ID and an array of training samples. The agent will handle the submission to Gradient's training infrastructure.

Can I use RAG (Retrieval Augmented Generation) with this server? +

Yes! The complete_model tool includes an optional rag parameter, allowing you to provide context or collection IDs to ground the model's responses in specific data.

How do I generate vector embeddings for my documents? +

Use the generate_embeddings tool by specifying a model slug (like 'bge-large') and a list of text inputs. It will return the high-dimensional vectors for your text.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript