🎨Voice Design← Mode
No reference audio required. Describe the target voice characteristics (gender, age, tone, emotion, pace …) in Control Instruction and VoxCPM2 will craft a unique voice from your description alone.
🎛️Controllable Cloning
Upload a reference audio clip, then optionally use Control Instruction to steer emotion, pace, and style while preserving the original timbre.
🎙️Ultimate Cloning
Enable Ultimate Cloning Mode and provide the reference audio's transcript (auto-filled via ASR). The model treats the clip as a spoken prefix and continues from it, reproducing every vocal nuance faithfully.
Click to upload or drag & drop an audio file
Supports WAV · MP3 · M4A · FLAC · OGG
Generated audio will appear here
Examples — click to fill
🗣️ English Accent & Dialect Tips
Write Target Text in the target dialect's vocabulary and expressions, then describe the accent in Control Instruction.
✅ Correct (Southern American):Y'all fixin' to head out? I reckon we oughta grab some sweet tea before we go, bless your heart.
❌ Wrong (neutral Standard American):Are you all about to leave? I think we should get some iced tea before we go.
Not sure how to write a regional dialect? Ask ChatGPT or Claude to rewrite your text in that accent first, then paste it in.
© 2026 WisVid · Powered by VoxCPM2