Turn any voiceover into an illustrated story video

Make faceless YouTube videos from a voiceover — no camera, no face needed. Drop in an audio narration or record your own. StoryAnimator transcribes it into timestamped sentences, generates one consistent image per line, and lays it onto a real editing timeline synced to the waveform - ready to export as MP4.

timestamped lines

0:00

narration length

image model

16:9

export ratio

Upload your audio

Drop a voiceover or narration file. We run speech-to-text and split it into sentences with start and end times.

Drop audio here, or click to browse

WAV · MP3 · M4A · AAC · FLAC — up to 200 MB

Auto-detect language · timestamped sentences

source.wav

No file yet

Waiting for audio…

Waiting

Upload an audio file to begin transcription.

Review the transcript

Every sentence becomes a shot. Hover or click a row to highlight it — these timings drive image durations on the timeline.

transcript.txtread-only

shots.list0 shots

fewer more

AI prompt assist

One click drafts your visual style, a character consistency bible, and a negative prompt. Everything stays editable.

prompt.cfginjected into every image

Transcribe audio first, then auto-fill these fields with AI.

Visual style Mood, medium, palette, lighting.

Character bible Fixed designs reused every shot.

Negative prompt What to keep out of frame.

Generate one image per sentence

We render a 16:9 frame for each timestamped line with Nano Banana. Reference image, style template, and character cast previews shape the final result.

generate.cfg

Image model

Reference image

Upload character ref

Upload voice first to unlock this step.

Character cast preview

Generate a portrait of each character first. Refresh any you don't like, then generate the story - the approved cast locks character consistency.

Style template

Applies to the cast and every generated frame.

estimate

Cost / frame$0.0000

Quality score—

Nano Banana keeps characters consistent, accepts a reference image, and follows per-sentence prompts.

$0.0000

0 frames @ $0.0000 · estimated total

Transcribe audio before generating.

Results gallery

Each generated 16:9 frame with its index, filename, and the prompt that produced it. Export the whole job as a ZIP with manifest.

render_out/no job yet

Generate frames to populate the gallery.

Video preview

Every frame laid end to end, each clip stretched to its sentence duration over the narration waveform. Scrub, preview, and export to MP4.

sequence.editno sequence

Sequence preview

Timeline

Export quality

~ size shown here

Burn transcript subtitles

Background music

No music selected

Upload music you own, licensed royalty-free music, or Creative Commons music with the required attribution.

Loop Music volume

SHOT 01

PREVIEW

0:00.0 / 0:00.0

0:000:00

Ready · records in real time with audio

—

Credits and usage

Transcript and ZIP exports are free. Image generation and video export use credits.

total.bill

Transcription$0.0000

Character portraits—

Image generation$0.0000

Video render$0.0000

Total spent this session

$0.0000

Regular users see credits only. Admins see internal USD cost for margin tracking.