Writing briefs
Writing a brief is the single most important skill on CreatorStudio. Every render, every edit, every voiceover is downstream of the words you type. If the brief is loose, Ra guesses. If the brief is sharp, Ra executes.
This guide teaches one principle and four techniques. The principle is direct, don’t describe. The techniques are time-coded multi-shot structure, reference tags, the five heuristics that make a prompt compile, and a production stance that turns all of it into output: volume over perfection.
Treat this page as the craft manual. Everything else in the docs assumes you’ve internalized it.
Direct, don’t describe
Section titled “Direct, don’t describe”The master principle: prompt like a director giving a shot brief, not a writer describing a scene.
A describer writes a word painting and hopes the model translates it. A director gives the model actionable instructions: camera, movement, composition, lighting, audio. The same subject produces two very different results.
Describer:
A beautiful sunset over the ocean with golden light reflecting onthe waves and a person walking on the beach looking contemplative.Director:
Wide establishing shot. Golden hour. Silhouette figure walkscamera-left along wet shoreline. Slow dolly forward. Ambientwave sound under.Both are about the same scene. Only one tells the model what to do. The director version names shot type, camera movement, subject blocking, time of day, and audio bed. The describer version names vibes. Models compile the first and hallucinate the second.
This is why CS’s Cinematic Intelligence thinks in shot types, camera movement, transitions, and pacing. While other tools generate “clips,” CS generates directed scenes. The grammar of film is the grammar of the brief.
Five heuristics
Section titled “Five heuristics”Five compilable rules, hard-earned from running hundreds of shots through modern video models. Use them as a checklist before you hit render.
Five micro-actions beat one adjective. Instead of “a tense chase scene,” write the shot: figure glances over shoulder, stumbles on loose gravel, grabs chain-link fence, pushes off, accelerates into shadow. Five concrete micro-actions give the model a storyboard, not a mood board. Adjectives are a bet. Actions are instructions.
Emotional specificity over adjectives. “She feels the weight of a promise she can’t keep” outperforms “she looks sad.” Don’t say “act sad.” Say “your character just realized she’ll never see her daughter graduate.” Specific emotional subtext activates more coherent model behavior than surface descriptors, every time.
Objects over faces for continuity. Faces are the hardest thing for current video models to hold across shots. Objects are reliable. A red scarf, a dented briefcase, a cracked watch, a pair of scuffed white sneakers. Props thread the visual narrative through cuts the model would otherwise break. Pick one object per character, keep it in frame.
Chain shots, don’t batch them. Generate shots sequentially, not in parallel. Each shot’s output informs the next shot’s prompt. The director watches the dailies before calling the next shot. Batch production applies to projects, not to shots within a project. CS’s rendering pipeline is built around this: feedback loops, not bulk parallel passes.
Pain beats spectacle. Close-up on trembling hands outperforms wide shot of explosion. Internal conflict, vulnerability, and emotional pain generate more engagement than visually loud but emotionally empty frames. When Ra offers a choice between the spectacular beat and the intimate one, bias toward the intimate one. It’s almost always the hook that holds.
Time-coded multi-shot structure
Section titled “Time-coded multi-shot structure”Don’t let the model improvise cuts. Direct each shot with an explicit timestamp, camera language, and a named transition.
[0-4s]: wide establishing shot, static camera, misty bamboo forest at dawn[4-9s]: medium shot, slow push-in, the fighter steps forward[9-15s]: close-up, orbit shot, the fighter strikes, slow motionPer shot, specify three things:
- Camera position. Wide / medium / close-up / extreme close-up, plus angle.
- Subject action. What happens, ideally 3 to 5 micro-actions.
- Lighting state. So the model doesn’t reset lighting at every cut.
Transition verbs
Section titled “Transition verbs”Name the cut. Don’t describe a dissolve, name the cut. These verbs are director language, and models trained on film data respond to them cleanly:
hard cut tosnap cut toseamless morph intowhip pan todolly forward intomatch cut to
Naming transitions is how you get filmic rhythm. Leaving them blank is how you get dissolve slop.
The Wide to ECU four-beat
Section titled “The Wide to ECU four-beat”The reliable 15-second structure maps classical film grammar onto short-form video. It’s four beats in a fixed rhythm:
| Beat | Duration | Shot | Purpose |
|---|---|---|---|
| 1 | 0 to 4s | Wide establishing | Where are we? |
| 2 | 4 to 8s | Medium | Who’s doing what? |
| 3 | 8 to 12s | Close-up | What do they feel? |
| 4 | 12 to 15s | Extreme close-up | The detail that lands |
Where, Who, Feel, Detail. This is hook, setup, beat, payoff, compressed into 15 seconds and shot-listed at the same time. Use it as a scaffold until you outgrow it. Most creators don’t outgrow it.
Reference tags
Section titled “Reference tags”When you attach assets to a brief, refer to them by numbered tags inside the prose. One prompt, multiple channels.
[Image1], [Image2], ... [Image9] up to 9 image references[Video1], [Video2], [Video3] up to 3 video clips[Audio1], [Audio2], [Audio3] up to 3 audio filesEach channel carries what it’s best at:
| Channel | Best for |
|---|---|
[ImageN] | Character anchor. Scene composition, color palette, style, consistency across shots. |
[VideoN] | Motion reference. Camera movement, gesture, pacing pulled from an existing clip. |
[AudioN] | Rhythm reference. Dialogue lip-sync, cut-to-beat timing, ambient tone. |
Example:
[Image2] is in the interior of [Image1] where he is kept the styleof [Image2], but the realism of [Image1] remains. He says [Audio1].One brief pulls composition from one image, style from another, dialogue from an audio file. Text still runs the show. Tags are arguments. The prose prompt is the function call.
Volume over perfection
Section titled “Volume over perfection”The closing principle, and the one most creators get wrong.
100 attempts at 70% beats 3 attempts at 95%. When the cost of a shot drops below the learning threshold, shipping more at good-enough quality generates more total value than shipping less at perfect quality. The math is ruthless: 50 carefully perfected shots give you 50 data points. 561 rough ones give you 561, and somewhere in there is a success rate that climbs from 71% to 90% because you kept going.
Traditional creative culture worships craft. Spend weeks on one piece. Polish until it shines. Never ship anything you wouldn’t put in a portfolio. AI-native creative culture inverts this. The cost of a mediocre piece is small. The cost of not shipping is the compounding data you never collected, the viral window you didn’t catch, the audience signal you never heard.
This does not mean quality doesn’t matter. It means the path to quality runs through volume, not through perfectionism. The 90% success rate comes from producing 561 shots, not from studying prompting theory. The viral hit happens because you were publishing when the trend hit, not because you spent three weeks on the piece.
Ship the 70%. Keep writing briefs. Let the data tell you what’s good.