promptdojo_›phase 06 · applied builds›ch 28 · ai video generation

lesson 2 of 3 · camera control and motion — what separates slop from craftstep 2 / 8

Shot vocabulary — the words that mean something to the model

The first axis to specify in any prompt is what is in the frame and how close. This is "shot type." It has a vocabulary that predates AI by 100 years and every modern model has been trained on enough labeled film footage to recognize it.

The nine shots you actually need

Shot	Abbrev	Used for
Extreme Wide	EWS	Geography. Subject tiny or absent. Opening of a scene.
Wide	WS	Whole subject + environment. Establishing the relationship between them.
Medium Wide	MWS	Subject knees-up. Common for action.
Medium	MS	Subject waist-up. Conversational distance.
Medium Close	MCU	Subject chest-up. News-anchor frame. Default for dialogue.
Close	CU	Head and shoulders. Emotional weight.
Extreme Close	ECU	A single feature — eyes, lips, hands, an object.
Over-the-Shoulder	OTS	Two subjects, one's shoulder framing the other.
POV	POV	Camera IS the subject. First-person.

Almost every shot in every piece of professional video is one of these nine. AI models, including all the heavyweights, have been trained on labeled examples. Use the labels.

Why this matters for video prompts

A prompt that names the shot type is grounding the model in something concrete:

Bad: "a man at a coffee shop, cinematic, beautiful."

Good: "Medium close-up of a man sipping coffee at a window seat, shallow depth of field, warm light from frame-left."

The good version specifies:

Shot type (MCU)
Action (sipping coffee)
Setting (window seat)
Depth of field (shallow)
Lighting direction (frame-left)
Lighting quality (warm)

The bad version specifies one thing: "make it pretty." The model has no choice but to invent the other five and the inventions are unstable across retakes. You get a different shot every time, none of them what you imagined.

The two most-overused shots

In AI video specifically, two shots get overused:

The slow drift wide. Default Sora 2 / Veo 3 output when the prompt is vague. Reads as "AI nature documentary." Use only when you want a literal landscape moment.
The locked-off medium. Default for any prompt that mentions a person without specifying the shot. Reads as stock-photo-in-motion.

If your shot list has more than two of either, the piece will read as AI-generated even if every individual shot is technically fine. The variety of shot types is what makes a sequence read as professional.

The 30-shot test

Take any 30-second piece of professional video. Count the shot types. You will find 5-10 different shot types in the sequence. Each shot transition is a deliberate change of distance or angle.

Now take any 30-second piece of AI video that "looks like AI video." Count the shot types. You will find 1-2 shot types repeated through the sequence — usually a slow drift wide and a locked-off medium, alternating.

The fix is shot-list discipline before generation. Decide the shot types up front. Generate against the list. The output will read as a real piece of video, not a tech demo.

How to write a shot type in a prompt

The format the heavyweights respond to most consistently:

[Shot type] of [subject] [action], [framing detail],
[depth of field], [lighting].

Examples that work:

"Medium close-up of a woman laughing at a phone screen, three-quarter angle, shallow depth of field, soft window light."
"Extreme close-up of fingers typing on a mechanical keyboard, side angle, deep focus, neon underglow."
"Wide shot of a runner cresting a hill at dawn, low horizon, deep focus, backlit sun."

Each of these prompts specifies the same five things every time. Each one is reproducible. Each one cuts together with the others because the variety is structural, not accidental.

This is the first axis. The next read covers the second: camera moves — how the camera travels through the scene.

⌘↵ runs the editor.read, then continue.

promptdojo_›phase 06 · applied builds›ch 28 · ai video generation

lesson 2 of 3 · camera control and motion — what separates slop from craftstep 2 / 8

Shot vocabulary — the words that mean something to the model

The nine shots you actually need

Shot	Abbrev	Used for
Extreme Wide	EWS	Geography. Subject tiny or absent. Opening of a scene.
Wide	WS	Whole subject + environment. Establishing the relationship between them.
Medium Wide	MWS	Subject knees-up. Common for action.
Medium	MS	Subject waist-up. Conversational distance.
Medium Close	MCU	Subject chest-up. News-anchor frame. Default for dialogue.
Close	CU	Head and shoulders. Emotional weight.
Extreme Close	ECU	A single feature — eyes, lips, hands, an object.
Over-the-Shoulder	OTS	Two subjects, one's shoulder framing the other.
POV	POV	Camera IS the subject. First-person.

Almost every shot in every piece of professional video is one of these nine. AI models, including all the heavyweights, have been trained on labeled examples. Use the labels.

Why this matters for video prompts

A prompt that names the shot type is grounding the model in something concrete:

Bad: "a man at a coffee shop, cinematic, beautiful."

Good: "Medium close-up of a man sipping coffee at a window seat, shallow depth of field, warm light from frame-left."

The good version specifies:

Shot type (MCU)
Action (sipping coffee)
Setting (window seat)
Depth of field (shallow)
Lighting direction (frame-left)
Lighting quality (warm)

The two most-overused shots

In AI video specifically, two shots get overused:

The slow drift wide. Default Sora 2 / Veo 3 output when the prompt is vague. Reads as "AI nature documentary." Use only when you want a literal landscape moment.
The locked-off medium. Default for any prompt that mentions a person without specifying the shot. Reads as stock-photo-in-motion.

The 30-shot test

Take any 30-second piece of professional video. Count the shot types. You will find 5-10 different shot types in the sequence. Each shot transition is a deliberate change of distance or angle.

The fix is shot-list discipline before generation. Decide the shot types up front. Generate against the list. The output will read as a real piece of video, not a tech demo.

How to write a shot type in a prompt

The format the heavyweights respond to most consistently:

[Shot type] of [subject] [action], [framing detail],
[depth of field], [lighting].

Examples that work:

"Medium close-up of a woman laughing at a phone screen, three-quarter angle, shallow depth of field, soft window light."
"Extreme close-up of fingers typing on a mechanical keyboard, side angle, deep focus, neon underglow."
"Wide shot of a runner cresting a hill at dawn, low horizon, deep focus, backlit sun."

Each of these prompts specifies the same five things every time. Each one is reproducible. Each one cuts together with the others because the variety is structural, not accidental.

This is the first axis. The next read covers the second: camera moves — how the camera travels through the scene.

⌘↵ runs the editor.read, then continue.

Camera control and motion — what separates slop from craft — step 2 of 8

Shot vocabulary — the words that mean something to the model

The nine shots you actually need

Why this matters for video prompts

The two most-overused shots

The 30-shot test

How to write a shot type in a prompt

Camera control and motion — what separates slop from craft — step 2 of 8

Shot vocabulary — the words that mean something to the model

The nine shots you actually need

Why this matters for video prompts

The two most-overused shots

The 30-shot test

How to write a shot type in a prompt