The 7-Step Production Process
Before anything โ decide what one video will accomplish. Pick a single content angle: lesson, story, hot take, or product recommendation. Define the platform (Instagram Reels, YouTube Shorts, TikTok), the emotion you want to trigger, and the single action you want viewers to take. One video, one goal.
Write a tight 30โ45 second script. No warm-up, no intro. The first 3 words must stop the scroll. Follow the Hook โ Problem โ Solution โ Proof โ CTA structure. Short sentences. Write it like you'd say it โ conversational, direct, opinionated. Use Claude to draft, then rewrite in your own voice.
This is the heart of personal-brand AI UGC โ you need your face, not a stock model. Use HeyGen's Photo Avatar to upload a well-lit, neutral-expression, straight-on photo. The AI maps your face into a speaking avatar. Provide multiple angles for maximum realism. Avoid shadows, hats, or glasses in source photos.
Record 3โ10 minutes of yourself speaking naturally (no echo, no background noise โ use a closet or padded room). Upload to ElevenLabs to create a custom clone. The closer your recording matches your on-camera speaking style, the more natural every generated video will sound.
Feed your script + voice clone into HeyGen. Select your avatar and generate. Review closely: check lip sync on tricky consonants (B, P, M), watch for "glass eye" stiffness, check blink naturalness. Regenerate specific lines if needed. Aim for 2โ3 generation attempts before settling on the best take.
This is where "AI-looking" becomes "is that actually Rahul?". Add b-roll cutaways. Layer ambient background audio. Add animated word-by-word captions. Apply subtle film grain (4โ8% opacity). Warm color grade with a LUT. Trim every pause beyond 0.3 seconds. See the Realism section below for every specific technique.
Export in 9:16, 1080ร1920, MP4 at 30fps. Write a platform-native caption for each platform. Pin your hook line as a text overlay on the cover frame. Track watch-time and hook retention (first 3 seconds) in analytics โ these are your key optimization signals for the next video.
Edit for Realism
The edit phase separates AI content that feels robotic from content that feels human. Every technique below targets a specific "tell" that viewers unconsciously detect. Fix the tells, and the brain stops looking for them.
Cut Away Early & Often
Your avatar only needs to be on screen 40โ60% of the time. Use stock b-roll, screen recordings, or text slides at natural emphasis points. Hides any stiffness and keeps pacing dynamic.
โ Cut away at least every 5โ8 seconds.
Add Real Room Tone
AI audio is "too clean" โ