Kling AI MCP. Generate cinematic media without leaving your chat.

Q: How do I check the status of a video generated by the texttovideo tool?

You check the status using the getvideotask tool. You pass the Task ID returned when you run texttovideo. The tool tells you if the job succeeded and, if so, provides the final MP4 link.

Q: What is the difference between texttoimage and texttovideo?

The difference is dimension. texttoimage creates a static, high-fidelity picture (up to 4 at once). texttovideo creates a moving, cinematic sequence.

Q: How do I animate a picture using the imagetovideo tool?

You pass the image URL to the imagetovideo tool. It returns a Task ID, which you then monitor with the getvideotask tool until the final video is available.

Q: What is the difference between texttoimage and textto-video?

They handle different media types. texttoimage creates static visuals, generating up to four high-fidelity images from a text prompt using Kolors AI. texttovideo builds cinematic clips, turning a text description into a video using Kling V3.

Q: How do I check the status of my virtual try-on task using gettryontask?

You pass your unique task ID to gettryontask. The tool returns the final composited image URL if the job succeeds. If it's still running, it reports the current status.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Kling AI (Generative Video & Image) MCP Server gives you full control over high-fidelity media creation. Use text-to-video, image-to-video, and virtual try-on to generate cinematic videos and images through natural conversation.

It handles everything from animating static frames to synchronizing speech onto AI avatars, letting you manage complex generative jobs right from your agent.

What your AI agents can do

Get image task

Checks the status of a Kling image task and returns the generated image URLs if successful.

Get lipsync task

Checks the status of an AI Lip-Sync task and retrieves the final MP4 URL if successful.

Get tryon task

Checks the status of an AI Virtual Try-On task and returns the final composite image URL if successful.

+ 7 more capabilities included

Generate video from text

Takes a text prompt and generates a cinematic, high-fidelity video using the Kling V3 engine.

Animate an image into video

Takes a static image and creates a video by defining motion trajectories and dynamics.

Visualize clothing on models

Blends digital garment images onto target human photos for virtual try-on.

Create lip-sync video

Synchronizes an audio file's speech to a video portrait, creating a realistic talking avatar.

Generate multiple images at once

Creates up to four high-fidelity images based on a text prompt using the Kolors architecture.

Manage complex job status

Polls for, and retrieves, final URLs for all generation types (video, image, try-on).

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Kling AI MCP Server: 10 Tools for Generative Media

These tools let your AI agent manage the entire media creation pipeline, from generating raw images and videos to performing virtual try-on and lip-syncing.

get019d75c1

get image task

Checks the status of a Kling image task and returns the generated image URLs if successful.

get019d75c1

get lipsync task

Checks the status of an AI Lip-Sync task and retrieves the final MP4 URL if successful.

get019d75c1

get tryon task

Checks the status of an AI Virtual Try-On task and returns the final composite image URL if successful.

get019d75c1

get video task

Checks the status of a Kling video task and returns the final MP4 URLs if successful.

image019d75c1

image to video

Starts the process of animating a static image into a video using Kling AI and returns a Task ID for polling.

lip019d75c1

lip sync video

Starts a task to synchronize speech to a video portrait, returning a Task ID for polling.

list019d75c1

list video tasks

Lists all recent video generation tasks completed by Kling AI.

text019d75c1

text to image

Generates up to four images from text prompts using the Kolors AI model and returns a Task ID for polling.

text019d75c1

text to video

Generates a cinematic video from a text prompt using Kling V3 and returns a Task ID for polling.

virtual019d75c1

virtual try on

Blends a source garment image naturally onto a target person's photo and returns a Task ID for polling.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Kling AI (Generative Video & Image), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Listen up. With the Kling AI MCP Server, you're getting total control over making high-fidelity videos and images. You can run text-to-video, image-to-video, and virtual try-on right through your agent. It handles everything, from animating a still shot to syncing speech onto an AI avatar, so you don't have to jump between a dozen tools.

Text-to-Video
You just feed it a text prompt, and it spits out a cinematic, high-fidelity video using the Kling V3 engine. You start the job with text_to_video and get a Task ID; you'll need to check that ID later.

Animating Stills
Need to bring a static image to life? Use image_to_video to animate a picture into a video by defining the motion and dynamics. It returns a Task ID so you can track the progress.

Virtual Try-On
Want to visualize clothes on a model? virtual_try_on blends a digital garment image naturally onto a target person's photo and gives you a Task ID to poll.

Lip-Sync Avatars
Make an AI avatar talk. lip_sync_video synchronizes an audio file's speech to a video portrait, giving you a realistic talking head. This also returns a Task ID.

Multiple Images at Once
Need a bunch of visuals? text_to_image generates up to four high-fidelity images from a text prompt using the Kolors AI model, and you get a Task ID back.

Tracking Your Jobs
Managing all this stuff means you gotta know how to track the jobs. You can check the status of any task—whether it's a video, image, or try-on—using get_video_task, get_image_task, or get_tryon_task. You'll get the final MP4 or image URLs when the job's done. You can also list every video task you've run with list_video_tasks.

The Workflow
When you run a task, you get a Task ID. You use the respective get_task function to poll that ID. Once the job finishes, you retrieve the final URLs. If you need images, text_to_image works with the Kolors architecture. If you need video from text, use text_to_video. If you need video from a picture, use image_to_video.

If you're doing a try-on, use virtual_try_on. If you're syncing speech, use lip_sync_video. You'll use get_video_task for video results, get_image_task for images, and get_tryon_task for try-on results. That's how you run it.

How Kling AI MCP Works

1 Subscribe to the server and input your Kling Access Key and Secret Key.
2 Use your agent to submit a generation request (e.g., text_to_video or virtual_try_on).
3 The agent tracks the task ID, polls the status until it's 'succeeded', and then retrieves the final MP4 or image URL.

The bottom line is, your agent handles the entire workflow: submission, polling for completion, and final asset retrieval.

Who Is Kling AI MCP For?

Video Editors and Creative Directors who need high-fidelity visual assets fast. If you spend time in manual rendering software or clicking through multiple asset pipelines, this is for you. It lets you generate complex media sequences and iterate on concepts directly through conversation.

Video Editor

Generates B-roll and cinematic sequences by prompting the agent, eliminating the need to switch to manual rendering software.

Creative Director

Rapidly tests visual concepts and storyboards by commanding the agent to generate varied image and video styles.

E-commerce Merchandiser

Uses the virtual_try_on tool to visualize new garment collections on different models efficiently for marketing assets.

What Changes When You Connect

Generate B-roll and cinematic sequences instantly. Instead of manually rendering footage in After Effects, you just prompt your agent to run text_to_video and get the MP4.
Iterate on visual concepts faster than ever. Use text_to_image to generate up to four visual variations immediately, letting you decide on the best look without running separate prompts.
Visualize products accurately. Use virtual_try_on to map digital garments onto various models, giving e-commerce teams high-fidelity mockups without expensive photoshoots.
Create professional avatars. lip_sync_video synchronizes audio to a video portrait, making it easy to produce professional, talking spokesperson videos for training or marketing.
Keep track of all your jobs. The get_video_task, get_image_task, and get_tryon_task tools let you manage, poll, and retrieve the final assets for every job, no matter how complex the workflow is.
Animate visuals easily. Need to bring a still photo to life? Run image_to_video to give static frames consistent, AI-generated motion dynamics.

Real-World Use Cases

Needs a series of cinematic background shots.

The user needs multiple B-roll clips for a documentary. They ask their agent to run text_to_video multiple times with different prompts (e.g., 'foggy mountain pass,' 'busy market street'). The agent manages the sequence of jobs and retrieves all the final MP4s, saving the user hours of manual rendering.

Launching a new clothing line.

The e-commerce team needs to show a jacket on five different body types. Instead of hiring models and taking photos, they run virtual_try_on five times, feeding the jacket and the model images. The agent collects the final, composited images for the product catalog.

Creating a spokesperson video.

A corporate trainer needs a video of the CEO giving a presentation. The user uses lip_sync_video, providing the CEO's video portrait and the audio track. The agent waits for the task to finish and delivers the final, perfectly synchronized MP4.

Rapid storyboarding for a film.

A creative director is visualizing a scene. They use text_to_image to generate four different visual interpretations of the same prompt. They compare the four results immediately, quickly locking down the visual style before starting any video work.

The Tradeoffs

Trying to combine everything in one prompt

The user asks, 'Make a video of a futuristic city, and put a jacket on a model in it, and make it talk.' The agent fails because the tools require separate, sequential calls, and the model can't combine modalities in one shot.

→ Break the task down. First, use text_to_video for the background footage. Next, use virtual_try_on to generate the garment on a model image. Finally, use lip_sync_video to add the talking avatar, managing the task IDs for each step.

Forgetting to check job status

The agent submits text_to_video and the user assumes the video is ready immediately. The task is running asynchronously, and the user gets an error or an incomplete file.

→ After submitting any task (e.g., text_to_video), you must use the corresponding getter tool (get_video_task) in a loop until the status is 'succeeded'. Only then can you retrieve the final MP4 URL.

Using the wrong image generation tool

The user tries to animate a photo using the text_to_image tool. This fails because text_to_image only accepts text prompts, not a source image.

→ If you have a static image and need to animate it, use the image_to_video tool. If you just need new images from scratch, use text_to_image.

When It Fits, When It Doesn't

Use this server if your workflow requires multiple, specialized media outputs in sequence: video, images, or product visualization. You need to combine different types of assets (e.g., video background + try-on image + talking head). The core value is the ability to orchestrate these specialized tools—like combining text_to_video with lip_sync_video—from a single chat interface. Don't use this if you just need simple text-to-text summaries or basic data retrieval. If your task is confined to a single modality (e.g., 'just write a script'), don't use this server; it's overkill. If you only need simple image generation, text_to_image is enough, but this server gives you the full video pipeline.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Kling AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_image_task get_lipsync_task get_tryon_task get_video_task image_to_video lip_sync_video list_video_tasks text_to_image text_to_video virtual_try_on

Manual media production requires switching between 5+ specialized applications.

Right now, if you need a single piece of content—say, a product video—you have to jump through hoops. You're in Photoshop to generate a mockup. Then you leave to After Effects to animate it. Then you move to Premiere to compile it. You're copying assets and managing render queues across three or four different, expensive-to-license programs.

With the Kling AI MCP Server, you keep the entire process in your agent chat. You tell your agent, 'I need a video of the jacket on the model.' The agent handles the `virtual_try_on` for the visual asset, then runs `text_to_video` for the background, and finally compiles the whole thing. It just works.

Kling AI (Generative Video & Image) MCP Server: Video and Visual Assets

You no longer need to manually manage render queues for text-to-video or image-to-video. You simply submit the prompt, and the agent manages the asynchronous job, polling using `get_video_task` until the MP4 is ready. The whole process is invisible to you.

It's not just about generating files; it's about building a full pipeline. You can generate the video (`text_to_video`), then use the resulting frames to power a separate `lip_sync_video` task. This interconnectedness is what changes the game.

Common Questions About Kling AI MCP

How do I check the status of a video generated by the `text_to_video` tool? +

You check the status using the get_video_task tool. You pass the Task ID returned when you run text_to_video. The tool tells you if the job succeeded and, if so, provides the final MP4 link.

Can I use the `virtual_try_on` tool with multiple garments? +

The tool processes one garment at a time. You'll need to call virtual_try_on sequentially for each garment and then collect the final results using get_tryon_task for each task ID.

Does `text_to_video` support all types of prompts? +

It generates cinematic videos from scenic descriptions. The prompt needs to be descriptive enough to guide the Kling V3 engine toward a specific, high-fidelity visual outcome.

What is the difference between `text_to_image` and `text_to_video`? +

The difference is dimension. text_to_image creates a static, high-fidelity picture (up to 4 at once). text_to_video creates a moving, cinematic sequence.

How do I animate a picture using the `image_to_video` tool? +

You pass the image URL to the image_to_video tool. It returns a Task ID, which you then monitor with the get_video_task tool until the final video is available.

What is the difference between `text_to_image` and `text_to-video`? +

They handle different media types. text_to_image creates static visuals, generating up to four high-fidelity images from a text prompt using Kolors AI. text_to_video builds cinematic clips, turning a text description into a video using Kling V3.

How do I check the status of my virtual try-on task using `get_tryon_task`? +

You pass your unique task ID to get_tryon_task. The tool returns the final composited image URL if the job succeeds. If it's still running, it reports the current status.

Does `lip_sync_video` require specific audio formats? +

Yes, the tool requires an audio file to synchronize speech. This allows you to drive mouth movements and match the video portrait to professional AI-driven speech.

Can I check the progress of my video generation task? +

Yes. Use the get_video_task tool with your Task ID. Your agent will poll the Kling API and report the current status (Submitted, Processing, or Succeed). Once finished, it will provide the direct MP4 download URLs.

How does the AI Virtual Try-On work through my agent? +

Use the virtual_try_on tool and provide a public URL of a target person and a garment image. Your agent will submit the job to Kolors AI, which naturally blends the clothing onto the person. You can then retrieve the final image using the Task ID.

Can I synchronize audio to a video portrait using my agent? +

Absolutely. The lip_sync_video tool allows you to submit a portrait video and a driving audio file. Your agent will trigger the AI lip-sync process to align the mouth movements to the speech, perfect for creating professional avatars.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript