How to Make an AI Video from Text (with Sound)

Type a scene, wait a few minutes, download an MP4 with sound — that is the whole workflow. This guide walks through making an AI video from text on GeniGPT, what a good video prompt includes, and what it costs. One thing up front: the image tools are free to try; video runs on Pro credits.

What's free and what isn't

GeniGPT's video generator is a Pro feature — there is no free video tier. The image tools are a different story: your first 3 generations are free, with no signup and no watermark. Video runs on Pro credits at 10 credits per second, so a 5-second clip costs 50 credits and a 15-second clip costs 150.

Credits are only deducted when a video generates successfully — a failed generation does not charge you. Pro credits are purchased through Buy Me a Coffee; the current packages are listed on the pricing section.

Make a video from text, step by step

Everything happens on one page — the AI video generator. There is no timeline editor, no software to install, and nothing to configure beyond three choices: your prompt, a duration, and an aspect ratio. If you have Pro credits, the first clip takes about as long to set up as it does to read this section.

Open the AI video generator.
Type the scene you want — subject, motion, and any sound. More on what to include below.
Pick a duration: 5, 10, or 15 seconds (50, 100, or 150 credits).
Pick an aspect ratio: 16:9 for widescreen, 9:16 for TikTok, Reels, and Shorts, or 1:1.
Generate. Expect a few minutes — video is heavier than images. If you close the tab, generation keeps running on GeniGPT's side; reopen the page and it picks up where it left off.
Download the MP4 — sound included, no watermark.

What to put in a video prompt

A video prompt is an image prompt plus time. The stills-style description still matters — subject, setting, light — but the model also needs to know what moves, how the camera behaves, and what the clip should sound like. Four ingredients cover it: subject, motion, camera, and sound. You will not need all four every time, but the strongest prompts touch each one.

Subject and setting — who or what, where, in what light. "A tiny paper boat on a rain-soaked, neon-lit street at night."
Motion — what changes over the clip. "Drifting down the gutter, bobbing over ripples." Without it, you tend to get a near-static scene.
Camera — static shot, slow push-in, handheld follow, aerial pull-back. One camera move per short clip usually reads better than several.
Sound — the video generates with native sound, so describe it: "rain patter, distant traffic, a low hum." Skip it and the model infers audio from the scene.

Put together: "A tiny paper boat drifting down a rain-soaked, neon-lit street at night, camera slowly tracking alongside, rain patter and distant traffic." The hero image at the top of this page is a still generated from the film-still version of that same idea — a video prompt is that description, set in motion.

Animate a photo instead (Pro)

Text-to-video invents the scene from scratch. Image-to-video starts from a photo you upload — also a Pro feature — and animates it, so a product, portrait, or pet stays recognizably itself. Drop a JPG, PNG, or WebP onto the video page, describe the motion you want, and the clip begins from your exact image. The aspect ratio follows the photo automatically.

If you are new to prompting GeniGPT and want a feel for how it reads instructions before buying credits, start with the AI image generator — your first 3 generations are free — or read what an image GPT is for how typed descriptions become pictures.

Questions, answered

Three questions come up more than any others: whether video is free, whether clips come with sound, and how long generation takes. The short answers are no, yes, and a few minutes — the longer answers below cover the details, so you know exactly what to expect before you spend credits.

Is making an AI video free?

No — video is a Pro feature, and there is no free video tier. Videos cost 10 Pro credits per second, and credits are only deducted when a video generates successfully. GeniGPT's image tools are a separate matter: your first 3 generations are free, with no signup and no watermark.

Does the AI video come with sound?

Yes. Videos generate with native sound built in — ambient noise, effects, and atmosphere that match the scene — rather than a silent clip you have to score afterwards. You can steer the audio by describing it in your prompt; if you don't, the model infers sound from the scene.

How long does an AI video take to generate?

Usually a few minutes. Video is heavier work than image generation, so expect a longer wait than the roughly one minute the image tools take. Generation keeps running even if you close the tab — reopen the video page and it picks up checking where it left off.

Open the AI video generator