AI Captions

Generate accurate captions with word-level timestamps and 6 professional presets.

AI Caption Generator

Captions aren’t optional anymore. They’re how most people watch video. ChatCut generates accurate, styled captions with word-level timing and gives you real control over how they look.

Skip the menus. Type what you need. Tell the AI “Add Netflix-style captions” and they’re on your timeline in seconds, perfectly synced.

20+Properties
6Style presets
2Engines
WordLevel timing

Dual-Engine Transcription

ChatCut runs two transcription engines to handle different languages at their best:

  • AssemblyAI – optimized for English and European languages, high accuracy on conversational speech
  • Huoshan – purpose-built for Chinese (Mandarin, Cantonese), handles tonal languages and CJK character segmentation correctly

The right engine is selected automatically based on your content’s language. You’ll get accurate transcription without configuring anything.


6 Professional Presets

Start with a look that works, then customize from there:

  • Netflix – clean white text, semi-transparent background, industry-standard positioning
  • Minimal – no background, subtle drop shadow, stays out of the way
  • Vox – bold, colorful word-by-word highlights (Vox Media style)
  • Focus – highlights the current word, dims surrounding text
  • TikTok – large, centered, high-contrast, built for vertical video
  • YouTube – readable at any size, optimized for 16:9 content

Each preset’s a starting point. Every visual property is adjustable.

1

Add your video

Import footage or use content already on your timeline

2

Generate captions

The AI transcribes with word-level timestamps and speaker identification

3

Pick a preset

Choose from 6 professional styles: Netflix, Minimal, Vox, Focus, TikTok, or YouTube

4

Customize anything

Adjust 20+ properties: font, size, color, position, animation, background, and more


20+ Customizable Properties

This is where ChatCut pulls ahead of basic caption tools. You’re not limited to font and color. Pair these with AI voiceover narration or text-based editing for a complete spoken-word workflow. The full property list includes:

  • Font family, weight, and size
  • Text color, stroke color, stroke width
  • Background color and opacity
  • Position (x, y) and alignment
  • Line height and letter spacing
  • Word highlight color and animation
  • Shadow properties
  • Maximum lines and characters per line
  • Animation style (fade, pop, slide)

Every property updates in real-time on your preview. There’s no re-rendering, no guessing.

Try this prompt
Add captions to my interview video, use the Netflix preset, but make the font slightly larger and use a blue highlight for the current word
Result

Captions generated with word-level timestamps, Netflix styling applied, font size increased, active word highlighted in blue, all synced to timeline


Word-Level Timestamps

ChatCut doesn’t just timestamp sentences; it timestamps every word. This enables:

  • Per-word highlighting – the active word lights up as it’s spoken
  • Precise trimming – cut to the exact word boundary
  • Text-based editing – delete a word from the transcript, and the corresponding video is removed
  • Accurate sync – captions never drift, even in fast speech

Speaker Identification

Multi-speaker content is handled automatically. According to Wistia’s research, captioned videos see significantly higher engagement. The transcription engine identifies different speakers and labels them. This means:

  • Interview captions show who’s talking
  • Podcast episodes with multiple hosts are properly attributed
  • Panel discussions don’t get confusing
  • You can style different speakers with different colors

CJK Language Support

Most caption tools treat Chinese, Japanese, and Korean as afterthoughts. ChatCut doesn’t. The Huoshan engine provides:

  • Proper character segmentation (there’s no mid-word breaks)
  • Intelligent line breaking that respects grammar
  • Correct punctuation handling
  • Natural reading flow for vertical and horizontal text

If you’re creating content in Chinese or for Chinese-speaking audiences, this is the caption tool that actually works.

FeatureChatCutDescript
Customizable properties20+ visual propertiesBasic font, color, position
Style presets6 professional presetsLimited preset options
CJK language supportDedicated engine with intelligent line breakingBasic support, frequent segmentation issues
Word-level timestampsYes, with per-word highlightingYes
Speaker identificationAutomatic with color codingAutomatic
Ready to try it yourself?Try Now

Describe What You Want in Plain English. ChatCut Handles the Rest.

You don’t need to manually position text boxes or fiddle with timing. Tell the AI agent what style you want, and it configures everything. Want to change the look later? Just describe the change.

“Make the captions bigger, move them to the top third, and use a bold font,” done.

“Switch to TikTok style but keep my custom colors,” done.

The AI understands context and applies changes across all caption segments at once.

Try this prompt
Remove all filler words from the captions and make the remaining text use the Focus preset with a yellow highlight
Result

Filler words ('um', 'uh', 'like', 'you know') removed from transcript and timeline, Focus preset applied with yellow word highlights


When to Use AI Captions

  • Social media – most social media content is watched on mute, so captions are required
  • YouTube – burned-in captions improve watch time and accessibility
  • Interviews and podcasts – speaker identification keeps talking-head editing and multi-person content clear
  • Educational content – word-level highlighting aids comprehension
  • International content – dual-engine transcription handles English and Chinese natively

Checking your footage...

Less editing. More creating.

It's time you had a superhuman editor on your side. ChatCut handles everything between recording and exporting.

Try it for free