Writing briefs

Writing a brief is the single most important skill on CreatorStudio. Every render, every edit, every voiceover is downstream of the words you type. If the brief is loose, Ra guesses. If the brief is sharp, Ra executes.

This guide teaches one principle and four techniques. The principle is direct, don’t describe. The techniques are time-coded multi-shot structure, reference tags, the five heuristics that make a prompt compile, and a production stance that turns all of it into output: volume over perfection.

Treat this page as the craft manual. Everything else in the docs assumes you’ve internalized it.

Direct, don’t describe

The master principle: prompt like a director giving a shot brief, not a writer describing a scene.

A describer writes a word painting and hopes the model translates it. A director gives the model actionable instructions: camera, movement, composition, lighting, audio. The same subject produces two very different results.

Describer:

A beautiful sunset over the ocean with golden light reflecting on
the waves and a person walking on the beach looking contemplative.

Director:

Wide establishing shot. Golden hour. Silhouette figure walks
camera-left along wet shoreline. Slow dolly forward. Ambient
wave sound under.

Both are about the same scene. Only one tells the model what to do. The director version names shot type, camera movement, subject blocking, time of day, and audio bed. The describer version names vibes. Models compile the first and hallucinate the second.

This is why CS’s Cinematic Intelligence thinks in shot types, camera movement, transitions, and pacing. While other tools generate “clips,” CS generates directed scenes. The grammar of film is the grammar of the brief.

Five heuristics

Five compilable rules, hard-earned from running hundreds of shots through modern video models. Use them as a checklist before you hit render.

Five micro-actions beat one adjective. Instead of “a tense chase scene,” write the shot: figure glances over shoulder, stumbles on loose gravel, grabs chain-link fence, pushes off, accelerates into shadow. Five concrete micro-actions give the model a storyboard, not a mood board. Adjectives are a bet. Actions are instructions.

Emotional specificity over adjectives. “She feels the weight of a promise she can’t keep” outperforms “she looks sad.” Don’t say “act sad.” Say “your character just realized she’ll never see her daughter graduate.” Specific emotional subtext activates more coherent model behavior than surface descriptors, every time.

Objects over faces for continuity. Faces are the hardest thing for current video models to hold across shots. Objects are reliable. A red scarf, a dented briefcase, a cracked watch, a pair of scuffed white sneakers. Props thread the visual narrative through cuts the model would otherwise break. Pick one object per character, keep it in frame.

Chain shots, don’t batch them. Generate shots sequentially, not in parallel. Each shot’s output informs the next shot’s prompt. The director watches the dailies before calling the next shot. Batch production applies to projects, not to shots within a project. CS’s rendering pipeline is built around this: feedback loops, not bulk parallel passes.

Pain beats spectacle. Close-up on trembling hands outperforms wide shot of explosion. Internal conflict, vulnerability, and emotional pain generate more engagement than visually loud but emotionally empty frames. When Ra offers a choice between the spectacular beat and the intimate one, bias toward the intimate one. It’s almost always the hook that holds.

Time-coded multi-shot structure

Don’t let the model improvise cuts. Direct each shot with an explicit timestamp, camera language, and a named transition.

[0-4s]:  wide establishing shot, static camera, misty bamboo forest at dawn
[4-9s]:  medium shot, slow push-in, the fighter steps forward
[9-15s]: close-up, orbit shot, the fighter strikes, slow motion

Per shot, specify three things:

Camera position. Wide / medium / close-up / extreme close-up, plus angle.
Subject action. What happens, ideally 3 to 5 micro-actions.
Lighting state. So the model doesn’t reset lighting at every cut.

Transition verbs

Name the cut. Don’t describe a dissolve, name the cut. These verbs are director language, and models trained on film data respond to them cleanly:

hard cut to
snap cut to
seamless morph into
whip pan to
dolly forward into
match cut to

Naming transitions is how you get filmic rhythm. Leaving them blank is how you get dissolve slop.

The Wide to ECU four-beat

The reliable 15-second structure maps classical film grammar onto short-form video. It’s four beats in a fixed rhythm:

Beat	Duration	Shot	Purpose
1	0 to 4s	Wide establishing	Where are we?
2	4 to 8s	Medium	Who’s doing what?
3	8 to 12s	Close-up	What do they feel?
4	12 to 15s	Extreme close-up	The detail that lands

Where, Who, Feel, Detail. This is hook, setup, beat, payoff, compressed into 15 seconds and shot-listed at the same time. Use it as a scaffold until you outgrow it. Most creators don’t outgrow it.

Reference tags

When you attach assets to a brief, refer to them by numbered tags inside the prose. One prompt, multiple channels.

[Image1], [Image2], ... [Image9]   up to 9 image references
[Video1], [Video2], [Video3]       up to 3 video clips
[Audio1], [Audio2], [Audio3]       up to 3 audio files

Each channel carries what it’s best at:

Channel	Best for
`[ImageN]`	Character anchor. Scene composition, color palette, style, consistency across shots.
`[VideoN]`	Motion reference. Camera movement, gesture, pacing pulled from an existing clip.
`[AudioN]`	Rhythm reference. Dialogue lip-sync, cut-to-beat timing, ambient tone.

Example:

[Image2] is in the interior of [Image1] where he is kept the style
of [Image2], but the realism of [Image1] remains. He says [Audio1].

One brief pulls composition from one image, style from another, dialogue from an audio file. Text still runs the show. Tags are arguments. The prose prompt is the function call.

Volume over perfection

The closing principle, and the one most creators get wrong.

100 attempts at 70% beats 3 attempts at 95%. When the cost of a shot drops below the learning threshold, shipping more at good-enough quality generates more total value than shipping less at perfect quality. The math is ruthless: 50 carefully perfected shots give you 50 data points. 561 rough ones give you 561, and somewhere in there is a success rate that climbs from 71% to 90% because you kept going.

Traditional creative culture worships craft. Spend weeks on one piece. Polish until it shines. Never ship anything you wouldn’t put in a portfolio. AI-native creative culture inverts this. The cost of a mediocre piece is small. The cost of not shipping is the compounding data you never collected, the viral window you didn’t catch, the audience signal you never heard.

This does not mean quality doesn’t matter. It means the path to quality runs through volume, not through perfectionism. The 90% success rate comes from producing 561 shots, not from studying prompting theory. The viral hit happens because you were publishing when the trend hit, not because you spent three weeks on the piece.

Ship the 70%. Keep writing briefs. Let the data tell you what’s good.