Text-Based Video Editing
Edit video by editing text. Delete words, reorder paragraphs, and the timeline follows.
Text-Based Video Editing
What if editing video was as simple as editing a document? In ChatCut, it is. Your video is transcribed into text, and every edit you make to that text (deleting a word, removing a paragraph, reordering sections) instantly updates the video timeline.
Don’t click through menus. Just tell ChatCut what you want. Say “Remove all filler words” and the AI edits both the transcript and timeline in one pass.

How It Works
ChatCut transcribes your video with word-level timestamps, then presents the transcript as an editable document. Edit the text, and the corresponding video segments are automatically cut, moved, or removed. Changes sync to your timeline in milliseconds through the Zero real-time engine.
There’s no scrubbing through footage. No marking in-points and out-points. Read the transcript, make your edits, done.
Import your video
Add footage to your project: interviews, vlogs, podcasts, lectures, anything with speech
Generate transcript
Dual-engine transcription creates word-level timestamps with speaker identification
Edit the text
Delete words, remove paragraphs, reorder sections, close gaps, just like editing a document
Timeline syncs instantly
Every text edit updates the video timeline in real-time via Zero engine
7 Editing Operations
The text editor supports the operations that matter for video editing:
Delete Words
Select a word or phrase in the transcript and delete it. The corresponding audio and video are removed from the timeline. Use this to clean up stutters, repeated words, or unwanted phrases.
Delete Paragraphs
Remove entire sections at once. Select a paragraph, and it’s gone from both the transcript and the timeline. Fast way to cut segments that don’t belong.
Split
Split the transcript (and timeline) at any word boundary. Useful for dividing long takes into segments for rearranging.
Reorder
Drag transcript sections to rearrange them. The video follows. You’ll re-sequence your content by moving paragraphs around instead of shuffling timeline clips.
Close Gap
After deleting content, gaps may remain on the timeline. Close gap removes the empty space, pulling subsequent content forward.
Change Speaker
Reassign speaker labels when automatic identification needs correction. Keeps multi-speaker content properly attributed.
Edit Text
Modify the transcript text itself without changing the video. It’s useful for correcting transcription errors before generating captions.
AI agent identified and removed 47 filler words across the transcript. Timeline updated: 23 seconds of dead air removed, gaps closed automatically.
Dual-Engine Transcription
Just like ChatCut’s caption system, text-based editing uses two transcription engines:
- AssemblyAI – optimized for English and European languages
- Huoshan – purpose-built for Chinese with proper character segmentation
Word-level timestamps mean every edit is frame-accurate. Delete a single word, and only that word’s audio is removed, not the surrounding sentence.
Real-Time Sync via Zero Engine
This is the technical backbone that makes text-based editing feel instant. ChatCut uses Zero (by Rocicorp) for real-time data synchronization. When you delete a word from the transcript:
- The transcript update is written
- Zero propagates the change
- The timeline reflects the edit
This happens in milliseconds. You don’t wait for re-rendering or re-syncing. The timeline updates as fast as you can edit text. As Wistia’s research shows, tighter edits lead to higher retention, so speed matters.
AI Agent Integration
Text-based editing becomes even more powerful with the AI agent. Instead of manually selecting and deleting content, describe what you want. You can also pair this with AI captions for a complete subtitle workflow:
- “Remove all filler words” – the agent identifies and deletes every um, uh, like, you know, basically, actually, and similar fillers
- “Cut the section where I talk about pricing” – the agent finds the relevant paragraph and removes it
- “Move the conclusion before the case study” – the agent reorders the transcript sections
- “Remove all pauses longer than 2 seconds” – the agent tightens the pacing throughout
The AI agent performs text-based edits programmatically, handling bulk operations that would’ve taken minutes to do manually.
Filler words removed (31 instances), long pauses trimmed (12 gaps closed), closing statement moved to after intro. Total runtime reduced from 8:42 to 6:15.
| Feature | ChatCut | Descript |
|---|---|---|
| AI agent automation | Natural language commands execute bulk edits | Manual transcript editing |
| Filler word removal | AI agent removes all fillers in one command | Manual or semi-automated |
| Chinese language support | Dedicated engine with intelligent segmentation | Basic CJK support |
| Real-time sync | Millisecond sync via Zero engine | Sync after processing |
| Bulk operations | Describe the edit, AI executes across entire transcript | Section-by-section manual editing |
| Feature | ChatCut | CapCut |
|---|---|---|
| Text-based editing | Full transcript editing with 7 operations | Auto captions only, no transcript editing |
| Delete by word | Delete any word, video updates instantly | Not available |
| Reorder by text | Drag paragraphs to rearrange video | Not available |
| AI agent | Natural language bulk editing commands | No agent-based editing |
| Speaker identification | Automatic with reassignment | Limited |
You Describe the Edit. ChatCut Executes It.
The combination of text-based editing and AI agent control creates a workflow that’s fundamentally different from traditional video editing. You’re not manipulating a timeline; you’re describing what the final video should be, and the system makes it happen.
“Keep only the parts where the guest talks about machine learning, remove everything else, and close all gaps.”
That’s a complex edit. In a traditional editor, it’s 15 minutes of scrubbing, marking, cutting, and ripple-deleting. In ChatCut, it’s one sentence.
Identified section from 12:34 to 15:08 matching the described tangent. Removed from transcript and timeline. Gap closed. Runtime reduced by 2:34.
When to Use Text-Based Editing
- Interviews – cut questions, trim rambling answers, rearrange topics by dragging paragraphs
- Podcasts – remove filler words, tighten pacing, cut tangents, all through the transcript
- Lectures and courses – reorganize content flow, remove mistakes, split into chapters
- Vlogs – delete off-topic sections, clean up natural speech patterns
- Meetings and webinars – extract key segments, remove small talk, create highlight reels
- Any talking-head video – if there’s speech, text-based editing is faster than timeline editing