Lingyi Wanwu MCP for AI. Orchestrate chat, embeddings, and usage metrics for Yi models.

Q: How do I check my token usage using the getusage tool?

Call getusage() in your agent workflow. It will return a JSON object detailing your current consumption and remaining balance for the Yi models.

Q: listmodels tool: does it list all LLMs?

No, listmodels only lists the available Yi models. For a complete picture of every model on the market, you'll need to consult external documentation.

Q: Can I use checkmoderation before running chatcompletions?

Yes. It’s best practice to run a user prompt through checkmoderation first. If the output is flagged, you stop the workflow and prevent the chat call from ever happening.

Q: If I use an outdated model name in chatcompletions, how does listmodels help?

The listmodels tool provides the definitive, currently active names and versions of all supported Yi models. Run this first to guarantee you are using the correct identifier before submitting any chat request.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

Lingyi Wanwu connects your AI agent directly to the Yi LLM ecosystem. This MCP handles chat completions, generates semantic embeddings for RAG pipelines, and provides real-time account usage monitoring.

You get a single point of control over high-performance bilingual models like Yi-Large.

What your AI can do

List models

Fetches a list of all accessible Yi model names and their technical specifications.

Chat completions

Sends a prompt message to one of the Yi models and returns the generated response.

Check moderation

Runs content through policy filters, flagging any text that violates usage guidelines.

+ 1 more capabilities included

Generate conversation responses

Send prompts to Yi models (like chat-34B or Yi-Large) and receive structured text outputs, maintaining context across turns.

Create semantic vectors

Take any piece of text and generate a high-dimensional embedding vector for use in search indexes and RAG systems.

Check content compliance

Pass outgoing prompts or generated responses through the moderation tool to check for policy violations before they are sent.

View available models

List all Yi model versions and retrieve their specific technical details, helping you choose the right model for the job.

Track token usage

Retrieve current account statistics, including consumed tokens and remaining balance, keeping your operational costs clear.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

Lingyi Wanwu: 5 Tools for Model Operations

Use these tools to manage the entire lifecycle of your LLM integration—from model selection and conversation running to cost tracking.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Lingyi Wanwu on Vinkius

List Models

Fetches a list of all accessible Yi model names and their technical specifications.

Chat Completions

Sends a prompt message to one of the Yi models and returns the generated response.

Check Moderation

Runs content through policy filters, flagging any text that violates usage...

Get Embeddings

Takes input text and generates a numerical vector representing its semantic meaning.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Lingyi Wanwu integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "lingyi-wanwu": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Lingyi Wanwu tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"lingyi-wanwu": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Lingyi Wanwu, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Lingyi Wanwu. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 4 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Manually checking model availability and usage stats is a time sink.

Today, if you want to know what models are available or how much money you’ve spent, you open three separate dashboards. One for billing, one for the model catalog, and another just for running tests. You copy IDs here, paste them there, and manually track tokens across spreadsheets.

With this MCP Server, all that data is exposed via simple tools. Run `list_models` to see every version available, then run `get_usage` to check your budget—all in the same agent workflow. It keeps the complexity visible, not hidden in a dashboard.

Using Lingyi Wanwu MCP Server for Chat Completions

Without this server, every time you want to update your chat logic—say, going from Yi-Large to Yi-34B—you have to write new boilerplate code and manually manage context window sizes.

Now, your agent handles the model switching. You just call `chat_completions` with the desired model ID, and it executes the logic. It makes model selection a simple function call, not a rewrite of your application's core logic.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

Lingyi Wanwu connects your AI agent right into the whole Yi LLM ecosystem. You're getting a single point of control over high-performance, bilingual models like Yi-Large. This MCP handles everything you need—from running chats to generating vectors and keeping tabs on what you spend.

You can use the chat_completions tool to send any prompt message to one of the available Yi models; it'll return a generated response while maintaining context across multiple turns in the conversation. Before sending or receiving text, you can pass content through check_moderation. This tool runs your prompts and responses against policy filters, flagging anything that violates usage guidelines so you know your output is clean.

When you need to power up an advanced search index or build out a Retrieval Augmented Generation (RAG) system, use get_embeddings. It takes any piece of text you throw at it and generates a high-dimensional numerical vector representing the semantic meaning. For model selection, the list_models tool lets you fetch a complete list of all accessible Yi models, giving you their specific technical specs so you know exactly what you're working with.

Lastly, tracking costs is simple. The get_usage tool retrieves your current account metrics. It shows you how many tokens you've consumed and what your remaining balance is. You can keep an eye on your operational spending without having to check a dashboard manually.

Built · Hosted · Managed by Vinkius Lingyi Wanwu MCP Server - Yi LLM API & Embeddings

Server ID 019d8454-2612-7344-ac66-7c9d803e1830

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

Who is this actually for?

This is for the ML Engineer who needs predictable model access. It's also for the Data Architect building RAG pipelines that need reliable embeddings, and the Backend Developer responsible for monitoring API costs in production. If you deal with high-volume LLM calls, this saves you time.

ML Engineer

Needs to swap between models (e.g., chat-34B for general chat vs. specialized model) and needs programmatic access to embeddings.

Data Architect

Builds RAG pipelines that require consistently generated semantic vectors, using the get_embeddings tool before querying a database.

Backend Developer

Must monitor token consumption in production services to prevent unexpected billing spikes; they use the get_usage tool constantly.

What Changes When You Connect

Stop guessing which model to use. Use list_models first to see every available version of the Yi LLM, then select the specific one you need for the task.

Control your spend right from your agent. The get_usage tool lets you pull real-time token counts and balances before running expensive chat completions.

Build better search systems. Instead of basic keyword matching, use get_embeddings to convert company documents into semantic vectors, making RAG searches far more accurate.

Keep your outputs clean. Run any generated text through check_moderation immediately after the call. This stops policy violations from ever reaching the user.

Manage complex conversations easily. The chat_completions tool handles persistent context, so you don't have to resend the entire chat history with every follow-up message.

See it in action

01 01

Building a secure internal knowledge bot

A user needs an agent that answers questions based on private documents. First, they run get_embeddings on their 100 PDFs to create vectors. Then, when a question comes in, the agent uses those embeddings for retrieval and finally executes chat_completions to synthesize the answer. The whole flow is contained and verifiable.

02 02

Developing an automated content moderation pipeline

A platform needs to filter all user-submitted comments before saving them. The agent first runs a pre-check using check_moderation. If clean, it proceeds with the main task via chat_completions; otherwise, it flags the failure and stops.

03 03

Optimizing cost for an enterprise app

A developer suspects their service is running out of budget. They immediately call get_usage to see the current token count. This insight guides them to use a cheaper model found via list_models instead of defaulting to Yi-Large.

04 04

Integrating new LLM features into a client app

A team needs to test an entirely new feature that requires complex chat logic. Before writing any code, they call list_models to confirm the model ID exists and then use chat_completions in a sandboxed environment.

The honest tradeoffs

Ignoring cost visibility

Anti-pattern

The developer just calls chat_completions repeatedly without checking the billing. They run 50,000 tokens in an hour and get a massive surprise bill later.

The Fix

Always sandwich your main function call with checks. Before running chat completions, use get_usage. After you're done, run it again to confirm consumption. This keeps cost management visible.

Assuming model availability

Anti-pattern

The code fails because the developer hardcoded an old model name (like 'yi-v1') that no longer exists or is deprecated.

The Fix

Never assume. Always start by calling list_models to get the current, active list of models. This prevents runtime failures and lets you use the right ID.

Sending raw user input directly

Anti-pattern

A malicious or inappropriate prompt gets passed straight into the chat completion engine without review.

The Fix

Always filter first. Insert a call to check_moderation immediately before calling chat_completions. This gatekeeper tool keeps bad content out of your system.

When It Fits, When It Doesn't

Use this MCP Server if: You need reliable orchestration across multiple, distinct LLM functions (Chat, Embeddings, Moderation) and you must control the flow using API calls. Specifically, if cost tracking (get_usage) or model selection (list_models) is part of your core logic, this server gives you that necessary gatekeeping.

Don't use it if: You only need to run a simple chat query in isolation and don't care about moderation or usage. In those cases, a dedicated, single-purpose API might be simpler. Also, if your primary requirement is integrating with a platform other than the Yi ecosystem, you should look at a general-purpose LLM gateway that supports multiple vendors.

Questions you might have

How do I check my token usage using the `get_usage` tool? +

Call get_usage() in your agent workflow. It will return a JSON object detailing your current consumption and remaining balance for the Yi models.

What is the difference between chat completions and embeddings? +

Chat completions generate text based on prompts (like having a conversation). Embeddings (get_embeddings) convert text into numerical vectors, which are used by search engines to find semantic matches.

`list_models` tool: does it list all LLMs? +

No, list_models only lists the available Yi models. For a complete picture of every model on the market, you'll need to consult external documentation.

Can I use `check_moderation` before running `chat_completions`? +

Yes. It’s best practice to run a user prompt through check_moderation first. If the output is flagged, you stop the workflow and prevent the chat call from ever happening.

How do I handle rate limits when running `chat_completions`? +

The service manages standard API rate limits. If your agent exceeds the quota, it will receive a specific HTTP error code that tells you exactly how long to wait before retrying. You must implement exponential backoff in your workflow logic.

Does `get_embeddings` handle bilingual text, specifically Chinese characters? +

Yes, the embedding model is optimized for both English and Mandarin (EN/CN). You can pass combined English and Chinese texts together; it generates a single semantic vector that properly accounts for both language inputs.

If I use an outdated model name in `chat_completions`, how does `list_models` help? +

The list_models tool provides the definitive, currently active names and versions of all supported Yi models. Run this first to guarantee you are using the correct identifier before submitting any chat request.

What is the expected input format when running `check_moderation`? +

The tool expects either a single string or an array of strings in the payload. It checks all provided text elements against policy rules and returns a status flag for every item you send.

Which Yi model is best for complex reasoning? +

For complex reasoning and high-quality outputs, yi-large is recommended. For faster response times and cost efficiency, yi-medium or yi-spark are excellent alternatives.

Can I automatically retrieve my remaining account balance? +

Yes! Use the get_balance tool. Your agent will connect to the Lingyi Wanwu billing service and return your current remaining credits.

How do I list all the technical specs for the Yi models? +

Use the list_models tool. Your agent will retrieve a list of all models currently available on the platform, along with their IDs and capabilities.

Connect to your AI in seconds.

List models

Chat completions

Check moderation

Lingyi Wanwu: 5 Tools for Model Operations

Make your AI actually useful.

List Models

Chat Completions

Check Moderation

Get Embeddings

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

Manually checking model availability and usage stats is a time sink.

Using Lingyi Wanwu MCP Server for Chat Completions

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

Building a secure internal knowledge bot

Developing an automated content moderation pipeline

Optimizing cost for an enterprise app

Integrating new LLM features into a client app

The honest tradeoffs

Ignoring cost visibility

Assuming model availability

Sending raw user input directly

When It Fits, When It Doesn't

Questions you might have