Skip to content

Audio models

Audio is where a lot of AI-generated video falls apart. The voice shifts between scenes. The music doesn’t match the mood. The ambient drops out on a cut. CreatorStudio treats audio as a first-class stage in the Movie Maker pipeline, with persistent voice casting through the Director Memory Graph.

All audio models are accessed through Ra. The creator picks a voice once for each character; Ra holds that voice across every scene, every story, forever.

  • Dialogue. Character voices for scripted lines. Every character has a voice in their profile; Ra casts the voice that’s already locked.
  • Voice-over / narration. Documentary, explainer, and narration formats. The channel’s house voice lives in Memory and renders automatically.
  • Background music. Per-scene scoring that matches mood and pacing. Set once in the scene editor’s Audio tab.
  • Ambient sound. Room tone, weather, crowd. The layer that makes a scene feel real rather than rendered.
  • Sound effects. Foley, hits, stingers. Can be added directly or pulled from the library.
  • Voice effects. Processing on top of a voice (echo, whisper, telephone, etc.) when a scene calls for it.
ModelProviderStrengthsTypical use
ElevenLabsElevenLabsPhotoreal voice cloning, multi-language, emotion controlDialogue, voice-over, character voices

ElevenLabs powers dialogue and voice-over end-to-end. Voice cloning is used to lock a creator’s house voice into Memory so every narration across every video sounds like the same person. Additional TTS and music models are routed in as they ship.

When you create a character, you set their voice once:

  • Pick from stock voices. ElevenLabs ships a library of pre-built voices by gender, age, and register.
  • Use a custom voice. Clone a voice from a short audio sample; use across every line that character speaks.
  • Set per-line mood. Inside the scene editor’s Dialogues tab, each line can carry a mood tag (neutral, worried, angry, calm, etc.) that Ra passes through to the model.

Once set, the voice lives on the character forever. Re-casting a character is a single action; re-running dialogue is a single action. No re-prompting. Ever.

The scene editor’s Audio tab surfaces every audio layer for a scene: dialogue volume, background music, ambient sound, voice-over, sound effects, and a master output with normalize + limiter controls. Each layer has filters (remastering, EQ) applied on top.

Dialogue is generated on the scene editor’s Dialogues tab, one line at a time or in batch via Generate All Audio. Every line can be regenerated without touching the video, and every generated clip is written to Assets with full provenance.

  • Character cast walks through locking voices into Memory.
  • Subtitle Studio converts dialogue into per-platform captions and multilingual dubs automatically.
  • Orchestration covers the routing layer above every model stage.