Arena Fight is the theme for a real showdown: two characters you pick go head-to-head in a non-stop one-on-one fight, performed on a raised theater stage with the setting you type built as the backdrop — a graveyard, a circus, a neon temple, anything — while a live crowd watches and films from the front rows like a Las Vegas show.
The look is a candid, handheld phone shot from the audience: wide and zoomed out, the fighters small and distant up on the full stage, dark-haired heads silhouetted across the foreground. It reads like a clip someone actually captured at a live event. This guide covers how to set one up and the single most important habit for clean, consistent results.
What the Arena Fight Theme Does
Pick two characters and a setting, and Arena Fight stages a fight between them. Each fighter keeps their own identity — face, outfit, body type, weapons, and signature powers — and the two are kept distinct the whole time, never merged into one character and never turned into a talking or singing performance.
Every scene is pure mid-combat: punches, blocks, spinning kicks, dodges, throws, and special moves connecting. There is no walking up, posing, or dialogue — it opens already fighting and escalates to a finish across the clips.
Two clips, one continuous fight
An Arena Fight runs as two chained 10-second clips (about 20 seconds total). The first clip is the opening exchange and the second continues straight from it into the decisive finish, so it plays as one fight rather than two separate shots.
The One Rule: Build the Stage Scene First, Then Add the Fighters
This is the tip that makes or breaks the look. Behind the scenes, Arena Fight does NOT try to imagine the stage, the crowd, and both fighters all at once. It builds the empty stage scene first — the raised stage, your setting as the backdrop, the spotlights, and the live audience packed across the foreground — with nobody on the stage yet.
Only after that backdrop is locked does it composite your two characters onto the stage. That ordering is the whole trick: the crowd and setting are decided once and held fixed, so they show up the same way in every shot instead of the audience vanishing or the camera zooming in on the characters and losing the stage.
Stage first = consistent crowd every shot
When an image model has to invent fighters AND a crowd AND a stage simultaneously, it often drops the crowd or zooms in. Generating the empty stage scene first and then placing the characters into that fixed plate guarantees the audience and setting are present in every scene.
Practical takeaway: think of your setting as the first decision, not an afterthought. Type the location clearly before you generate, because that location is what gets built into the stage scene the fighters will live on.
Step by Step
- Open the homepage and choose the Arena Fight category.
- Pick your two characters — these are the fighters. Use your own uploaded characters or generated ones; each keeps its own look and powers.
- Type your setting / location in the prompt box (for example: graveyard, circus, neon temple, rooftop at night). This becomes the stage backdrop. Leaving it blank falls back to a generic dramatic arena.
- Generate. The empty stage scene with the crowd is built first, then both fighters are composited onto it for each beat.
- Review the scene images, then let it continue into the two video clips and combine them into the final fight.
Same setting? The stage is reused automatically
Once a stage scene has been built for a setting (say, 'circus'), that exact stage is saved and reused next time you run a 'circus' Arena Fight — same crowd, same backdrop. Great for rematches or a series of fights in the same venue, and it skips the stage-building step entirely.
Picking a Great Stage Setting
The setting is built as a theatrical backdrop, so settings with strong silhouettes and lighting read best from a distance. A few that work well:
- Graveyard — fog, headstones, and moonlight behind the stage.
- Circus — big-top stripes, rigging, and colored spotlights.
- Neon temple / dojo — lanterns, banners, and dramatic uplight.
- Volcano or lava arena — glowing backdrop with heavy contrast.
- Rooftop at night — city skyline as the set behind the fighters.
Keep the setting to a clear place or vibe. The fighters' own powers and weapons carry the action, so the backdrop just needs to set the scene, not compete with it.
Getting the Most Out of It
- Choose two visually distinct fighters. Different body types, colors, and silhouettes keep them readable in a wide shot.
- Lock your setting before generating. It defines the stage scene that everything else is built on.
- Lean into each character's powers. The fight uses each fighter's own moves, weapons, and abilities, so contrasting styles make a better showdown.
- Run rematches in the same venue. Reusing a setting reuses the saved stage, so a series of fights shares one consistent arena.
Keep exploring
Ready to make your own? Jump into the AI arena fight video generator and pick your two fighters.
- Make two characters fight: AI arena fight video generator
- Try another theme: AI puppet video generator
- Put a toon in a real world: cartoon character in a real scene
- New here? How to make an AI music video from a song
Example videos
Finished Arena Fight music videos — click a card to watch.
Fighter One VERSUS Fighter Two — a non-stop one-on-one martial-arts FIGHT where the two characters battle each other the entire time using their own unique powers, moves, and fighting style. Fast, precise choreography: punches, blocks, spinning kicks, dodges, throws, and special moves connecting at impressive speed, dust kicking up from the stage floor. EVERY scene is pure mid-combat action — NO preparation, posing, or walking up. The fighters MAY exchange short taunts or battle cries in English while fighting, but they never pause or slow down to talk — the combat stays continuous through any dialog. Performed on a raised theater stage with cemetery built as an elaborate scenic backdrop/set, like a Las Vegas stage show with dramatic stage lighting and spotlights. WIDE ESTABLISHING SHOT, camera far back at the rear of the audience and ZOOMED OUT so the ENTIRE stage and a large portion of the crowd are visible in a wide widescreen frame — the fighters appear small in the distance on the full stage, NOT a close-up. POV handheld phone-camera filmed from the audience, raised above other people's heads, frame slightly unsteady, fixed position. A large crowd fills the whole FOREGROUND — many dark-haired heads silhouetted, holding up phones recording the show — with the full elevated stage and the fight clearly in the distance above and in front of them. One continuous handheld audience shot, no cuts to another location; the crowd gasps and cheers. PHOTO STYLE: looks like a real amateur smartphone photo taken from the crowd — candid and unposed, natural daylight, slight handheld motion blur and sensor noise, shallow focus on the stage, photorealistic documentary snapshot, NOT a glossy cinematic studio render.
Fighter One VERSUS Fighter Two — a non-stop one-on-one martial-arts FIGHT where the two characters battle each other the entire time using their own unique powers, moves, and fighting style. Fast, precise choreography: punches, blocks, spinning kicks, dodges, throws, and special moves connecting at impressive speed, dust kicking up from the stage floor. EVERY scene is pure mid-combat action — NO preparation, posing, walking up, talking, or spoken dialog. Performed on a raised theater stage with a dramatic arena built as an elaborate scenic backdrop/set, like a Las Vegas stage show with dramatic stage lighting and spotlights. WIDE ESTABLISHING SHOT, camera far back at the rear of the audience and ZOOMED OUT so the ENTIRE stage and a large portion of the crowd are visible in a wide widescreen frame — the fighters appear small in the distance on the full stage, NOT a close-up. POV handheld phone-camera filmed from the audience, raised above other people's heads, frame slightly unsteady, fixed position. A large crowd fills the whole FOREGROUND — many dark-haired heads silhouetted, holding up phones recording the show — with the full elevated stage and the fight clearly in the distance above and in front of them. One continuous handheld audience shot, no cuts to another location; the crowd gasps and cheers. PHOTO STYLE: looks like a real amateur smartphone photo taken from the crowd — candid and unposed, natural daylight, slight handheld motion blur and sensor noise, shallow focus on the stage, photorealistic documentary snapshot, NOT a glossy cinematic studio render.
Fighter One VERSUS Fighter Two — a non-stop one-on-one martial-arts FIGHT where the two characters battle each other the entire time using their own unique powers, moves, and fighting style. Fast, precise choreography: punches, blocks, spinning kicks, dodges, throws, and special moves connecting at impressive speed, dust kicking up from the stage floor. EVERY scene is pure mid-combat action — NO preparation, posing, walking up, talking, or spoken dialog. Performed on a raised theater stage with carnival built as an elaborate scenic backdrop/set, like a Las Vegas stage show with dramatic stage lighting and spotlights. WIDE ESTABLISHING SHOT, camera far back at the rear of the audience and ZOOMED OUT so the ENTIRE stage and a large portion of the crowd are visible in a wide widescreen frame — the fighters appear small in the distance on the full stage, NOT a close-up. POV handheld phone-camera filmed from the audience, raised above other people's heads, frame slightly unsteady, fixed position. A large crowd fills the whole FOREGROUND — many dark-haired heads silhouetted, holding up phones recording the show — with the full elevated stage and the fight clearly in the distance above and in front of them. One continuous handheld audience shot, no cuts to another location; the crowd gasps and cheers. PHOTO STYLE: looks like a real amateur smartphone photo taken from the crowd — candid and unposed, natural daylight, slight handheld motion blur and sensor noise, shallow focus on the stage, photorealistic documentary snapshot, NOT a glossy cinematic studio render.
Fighter One VERSUS Fighter Two — a non-stop one-on-one martial-arts FIGHT where the two characters battle each other the entire time using their own unique powers, moves, and fighting style. Fast, precise choreography: punches, blocks, spinning kicks, dodges, throws, and special moves connecting at impressive speed, dust kicking up from the stage floor. EVERY scene is pure mid-combat action — NO preparation, posing, walking up, talking, or spoken dialog. Performed on a raised theater stage with cemetery built as an elaborate scenic backdrop/set, like a Las Vegas stage show with dramatic stage lighting and spotlights. WIDE ESTABLISHING SHOT, camera far back at the rear of the audience and ZOOMED OUT so the ENTIRE stage and a large portion of the crowd are visible in a wide widescreen frame — the fighters appear small in the distance on the full stage, NOT a close-up. POV handheld phone-camera filmed from the audience, raised above other people's heads, frame slightly unsteady, fixed position. A large crowd fills the whole FOREGROUND — many dark-haired heads silhouetted, holding up phones recording the show — with the full elevated stage and the fight clearly in the distance above and in front of them. One continuous handheld audience shot, no cuts to another location; the crowd gasps and cheers. PHOTO STYLE: looks like a real amateur smartphone photo taken from the crowd — candid and unposed, natural daylight, slight handheld motion blur and sensor noise, shallow focus on the stage, photorealistic documentary snapshot, NOT a glossy cinematic studio render.
Fighter One VERSUS Fighter Two — a non-stop one-on-one martial-arts FIGHT where the two characters battle each other the entire time using their own unique powers, moves, and fighting style. Fast, precise choreography: punches, blocks, spinning kicks, dodges, throws, and special moves connecting at impressive speed, dust kicking up from the stage floor. EVERY scene is pure mid-combat action — NO preparation, posing, walking up, talking, or spoken dialog. Performed on a raised theater stage with graveyard built as an elaborate scenic backdrop/set, like a Las Vegas stage show with dramatic stage lighting and spotlights. WIDE ESTABLISHING SHOT, camera far back at the rear of the audience and ZOOMED OUT so the ENTIRE stage and a large portion of the crowd are visible in a wide widescreen frame — the fighters appear small in the distance on the full stage, NOT a close-up. POV handheld phone-camera filmed from the audience, raised above other people's heads, frame slightly unsteady, fixed position. A large crowd fills the whole FOREGROUND — many dark-haired heads silhouetted, holding up phones recording the show — with the full elevated stage and the fight clearly in the distance above and in front of them. One continuous handheld audience shot, no cuts to another location; the crowd gasps and cheers. PHOTO STYLE: looks like a real amateur smartphone photo taken from the crowd — candid and unposed, natural daylight, slight handheld motion blur and sensor noise, shallow focus on the stage, photorealistic documentary snapshot, NOT a glossy cinematic studio render.
Fighter One VERSUS Fighter Two — a non-stop one-on-one martial-arts FIGHT where the two characters battle each other the entire time using their own unique powers, moves, and fighting style. Fast, precise choreography: punches, blocks, spinning kicks, dodges, throws, and special moves connecting at impressive speed, dust kicking up from the stage floor. EVERY scene is pure mid-combat action — NO preparation, posing, walking up, talking, or spoken dialog. Performed on a raised theater stage with cemetery built as an elaborate scenic backdrop/set, like a Las Vegas stage show with dramatic stage lighting and spotlights. POV HANDHELD PHONE-CAMERA shot filmed by someone in the audience, raised above other people's heads in the crowd, frame slightly unsteady, shot from a fixed audience position. A large crowd fills the FOREGROUND — dark-haired heads silhouetted, many holding up phones recording the show — with shallow focus on the elevated stage where the fight happens, above and in front of the crowd. One continuous handheld audience shot, no cuts to another location; the crowd gasps and cheers. PHOTO STYLE: looks like a real amateur smartphone photo taken from the crowd — candid and unposed, natural daylight, slight handheld motion blur and sensor noise, shallow focus on the stage, photorealistic documentary snapshot, NOT a glossy cinematic studio render.
Fighter One VERSUS Fighter Two — a non-stop one-on-one FIGHT where the two characters battle each other the entire time using their own unique powers, moves, and fighting style. EVERY scene is active mid-combat action — punches, kicks, blocks, dodges, throws, and special moves connecting — shot dramatically like a 2D side-view fighting game with both fighters in profile. NO scene shows preparation, posing, walking up, talking, or any spoken dialog; it is pure fighting from start to finish. Performed on a RAISED THEATER STAGE lifted above the audience, with evil carnival built as the stage backdrop/set behind them like a Las Vegas stage show with dramatic stage lighting and spotlights. Filmed from a seated audience member's point of view looking up at the stage: the backs of heads and silhouettes of the SEATED CROWD fill the foreground across the bottom of frame, all facing the stage, with the elevated stage and the fight clearly above and in front of them. The fight stays up on the stage the entire time; one continuous escalating fight, no cuts to another location.
Fighter One VERSUS Fighter Two — a non-stop one-on-one FIGHT where the two characters battle each other the entire time using their own unique powers, moves, and fighting style. EVERY scene is active mid-combat action — punches, kicks, blocks, dodges, throws, and special moves connecting — shot dramatically like a 2D side-view fighting game with both fighters in profile. NO scene shows preparation, posing, walking up, talking, or any spoken dialog; it is pure fighting from start to finish. Performed on a RAISED THEATER STAGE lifted above the audience, with death arena built as the stage backdrop/set behind them like a Las Vegas stage show with dramatic stage lighting and spotlights. Filmed from a seated audience member's point of view looking up at the stage: the backs of heads and silhouettes of the SEATED CROWD fill the foreground across the bottom of frame, all facing the stage, with the elevated stage and the fight clearly above and in front of them. The fight stays up on the stage the entire time; one continuous escalating fight, no cuts to another location.
Fighter One VERSUS Fighter Two — a non-stop one-on-one FIGHT where the two characters battle each other the entire time using their own distinct abilities, powers, and fighting style. EVERY scene is active mid-combat action — punches, kicks, blocks, dodges, throws, and special moves connecting — shot dramatically like a 2D side-view fighting game with both fighters in profile. NO scene shows preparation, posing, walking up, talking, or any spoken dialog; it is pure fighting from start to finish. Performed on a RAISED THEATER STAGE lifted above the audience, with roman colosseum built as the stage backdrop/set behind them like a Las Vegas stage show with dramatic stage lighting and spotlights. Filmed from a seated audience member's point of view looking up at the stage: the backs of heads and silhouettes of the SEATED CROWD fill the foreground across the bottom of frame, all facing the stage, with the elevated stage and the fight clearly above and in front of them. The fight stays up on the stage the entire time; one continuous escalating fight, no cuts to another location.
In a dimly lit underground puppet arena, Phoenix Knight at the PC Bang stands in the center ring, surrounded by an enthusiastic puppet crowd. The atmosphere is charged with anticipation. Emotional tone: announcer-like, enthusiastic. Phoenix Knight at the PC Bang gestures toward two colorful sock puppets at the edge of the ring. One puppet has a sly, confident expression, while the other looks eccentric and colorful, reminiscent of a parrot. Emotional tone: amused, storytellin
Auto-direct a one-shot music video from the selected dashboard song. Selected song: galaxy. Analyze the song title, lyrics/LRC timing, audio structure, mood, genre, pacing, and emotional arc. Choose the strongest story world, locations, performer action, camera movement, visual style, and scene progression for the song.
Pick two characters, type your setting, and stage a one-on-one fight on a live theater stage with a roaring crowd. The stage scene is built first, then your fighters are dropped in — so every shot stays consistent.
Create an Arena Fight