Audio models

Audio is where a lot of AI-generated video falls apart. The voice shifts between scenes. The music doesn’t match the mood. The ambient drops out on a cut. CreatorStudio treats audio as a first-class stage in the Movie Maker pipeline, with persistent voice casting through the Director Memory Graph.

All audio models are accessed through Ra. The creator picks a voice once for each character; Ra holds that voice across every scene, every story, forever.

What audio generation covers

Dialogue. Character voices for scripted lines. Every character has a voice in their profile; Ra casts the voice that’s already locked.
Voice-over / narration. Documentary, explainer, and narration formats. The channel’s house voice lives in Memory and renders automatically.
Background music. Per-scene scoring that matches mood and pacing. Set once in the scene editor’s Audio tab.
Ambient sound. Room tone, weather, crowd. The layer that makes a scene feel real rather than rendered.
Sound effects. Foley, hits, stingers. Can be added directly or pulled from the library.
Voice effects. Processing on top of a voice (echo, whisper, telephone, etc.) when a scene calls for it.

Supported audio models

Model	Provider	Strengths	Typical use
ElevenLabs	ElevenLabs	Photoreal voice cloning, multi-language, emotion control	Dialogue, voice-over, character voices

ElevenLabs powers dialogue and voice-over end-to-end. Voice cloning is used to lock a creator’s house voice into Memory so every narration across every video sounds like the same person. Additional TTS and music models are routed in as they ship.

How voice casting works

When you create a character, you set their voice once:

Pick from stock voices. ElevenLabs ships a library of pre-built voices by gender, age, and register.
Use a custom voice. Clone a voice from a short audio sample; use across every line that character speaks.
Set per-line mood. Inside the scene editor’s Dialogues tab, each line can carry a mood tag (neutral, worried, angry, calm, etc.) that Ra passes through to the model.

Once set, the voice lives on the character forever. Re-casting a character is a single action; re-running dialogue is a single action. No re-prompting. Ever.

Working with audio in the app

The scene editor’s Audio tab surfaces every audio layer for a scene: dialogue volume, background music, ambient sound, voice-over, sound effects, and a master output with normalize + limiter controls. Each layer has filters (remastering, EQ) applied on top.

Dialogue is generated on the scene editor’s Dialogues tab, one line at a time or in batch via Generate All Audio. Every line can be regenerated without touching the video, and every generated clip is written to Assets with full provenance.

Character cast walks through locking voices into Memory.
Subtitle Studio converts dialogue into per-platform captions and multilingual dubs automatically.
Orchestration covers the routing layer above every model stage.

Audio models

What audio generation covers

Supported audio models

How voice casting works

Working with audio in the app

Related