How To Complete A 30-Second Video With AI Tools

If you’re as curious as I am about making videos using nothing but AI tools, this is the article for you.

The technology is here, albeit in early form, to create videos like the one above doing just that. But it isn’t easy and requires lots of trial and error.

We’ll be going over one method (out of possibly hundreds) to make a 30-second video complete with a script, video clips, and narration. Links to all the tools are included.

The Process and Tools

The Script

This is super easy to write in ChatGPT. Once you have a basic idea of what you want your story to be about, enter a prompt like this:

“Write a fictional script for a 30-second video about a helicopter crew that rescues a stranded mountain climber off of the top of a snowy mountain. 8 scenes. Each scene no longer than 4 seconds of narration.”

It’s important to include the number of scenes and to specify that they’re no longer than four seconds of narration per scene because the text-to-video tool called Runway only generates four-second video clips in text-to-video mode (Gen2). Making the scenes and narration no longer than four seconds will help match your narration time and video clip time when editing.  

With this prompt, ChatGPT spits out a title, descriptions of the scenes, and narration of the scenes.


Head on over to ElevenLabs. You can modify any voice you want, or just stick with one of the premade voices. I chose Wayne from the voice library because it sounds like it was made for movie trailers or narration. Once you’ve found a voice you like, click Add to VoiceLab.

Copy and paste your script from ChatGPT into the Speech Synthesis text box. I found it helpful to have one scene of narration on its own line, followed by an ellipsis like … to slow down the speech a bit.

Once your text is in place, simply hit generate, and download the file to your computer.

Video Clip Generation

Ok, this is by far the most difficult part. There are a ton of ways to go about this. However, the video at the beginning of the article was done in Runway AI using a combination of Gen2 text-to-video and image-to-video prompts.

This part will take A LOT of trial and error. I struggled to figure out which prompts would create the style of clips I wanted and to keep the clips looking consistent across the 8 to 10 clips I generated. You can practice in the free version, but your credits will quickly run out and you will have to bust out the ole credit card. But the first tier paid plan is only $12 per month.

A basic way to begin text-to-video prompting is to use this format:

The subject and the action the subject is performing / Video style / shot type / lighting / Any other modifiers

It would look something like:

A palm tree on a tropical beach, professional cinematography, birds-eye view, red and orange hues, feature film.

As opposed to:

A palm tree on a tropical beach, 2D animation, close-up, blue hues, hand-drawn animation.

You can see how the two prompts will deliver very different results. I’m not sure if there is a foolproof prompt template – please comment if you have ideas or your own templates! 

Important: After you enter a prompt, make sure to hit FREE PREVIEW, and it will generate four rendered images of what the clip will look like. If the rendering isn’t what you want, try honing your prompt further, and then hitting FREE PREVIEW again, as many times as it takes, until you get a depiction you are happy with. After you are happy with the rendering, then hit GENERATE and see what happens. 

If you can’t get the clips you want, you can try uploading an image (for example, a palm tree) to the Gen2 prompt box and add some simple modifiers such as “a coconut falling,” then hit FREE PREVIEW again, and see where that gets you. Or, you can simply upload the image, omit the modifiers, and hit generate and see what happens.

One of the struggles with the early generative models is consistency between clips. This relies on what is known as the seed number. You can find out more about the seed number, and access the other Runway instructions, here. But even the instructions aren’t the best and this area will likely evolve and improve quickly in the space.


Once you’ve downloaded all your clips and are ready to compile them, head over to your preferred video editor and begin layering in your video, narration audio, and any background music. One trick I found is to slow down the audio just a bit, or to speed it up, to match your video clips more accurately. Or, you can try to cut the audio during pauses and move it around on the timeline to fit your clips.  

You can try CapCut, which has a great free plan and lots of music, or Wondershare Filmora, which is best with a paid plan. But both apps are relatively beginner friendly. 

We’re Early

Text-to-video generation models will evolve quickly, and if you feel like it’s too much right now, then you can always perfect your text-to-image generation prompts (which is an art in and of itself). And at the very least, at least you learned how to create a simple script in ChatGPT and also how to create script-based narration in ElevenLabs. 

Either way, it’s good to practice because if you aren’t becoming familiar with AI tools, you can bet someone else is. There will be a large market for AI prompt engineers in the near future.


This is a Contributor Post. Opinions expressed here are opinions of the Contributor. Influencive does not endorse or review brands mentioned; does not and cannot investigate relationships with brands, products, and people mentioned and is up to the Contributor to disclose. Contributors, amongst other accounts and articles may be professional fee-based.

Tagged with: