Multi-pose estimation is currently a state-of-the-art deep learning approach in computer vision for detecting humans and their joints in an image. In this article, I outline briefly how you can make a funny little bobble-head GIF like the one I produced above using Lebron James’ face on top of Drake’s Hotline Bling music video. There are essentially 4 main steps:
- Download the video you are interested in overlaying. I chose Drake’s Hotline Bling video.
2. Download the isolated face image that you would like to overlay on top of your video. I chose Lebron James’ face. If your face image inconveniently has a background, then use some image editing tools to crop out the background until your images looks something like this:
3. On each frame of the video, detect the humans and their joints. The project I used was https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation, which is a keras Python implementation of multi-pose estimation. In particular, we are interested in using the code to localize the head portion of the human object.
4. Once we have the face screen coordinates of the video frame, we overlay the isolated face cut-out image on top of the video frame at these very coordinates.
5. Repeatedly repeat steps 3 and 4 for each frame in the video. Afterwards, I used ffmpeg to combine all the frames together to make a silent video. You will have to do a little more work to add and sync in the audio of the original video clip. But for the most part, you are done now!
Source: Deep Learning on Medium