Gradio

Demo for the CVPR2025 paper "Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction", available here. Codebase released on GitHub.

SkeletonDiffusion takes as input a sequence of 3D body joints coordinates, which not everyone has at disposal. In this demo, we use a publicly available model, Neural Localizer Fields (NLF) to extract 3D poses from a given input video. We feed the extracted poses to SkeletonDiffusion to generate corresponding future motions. Note that the poses extracted from the video are noisy and imperfect, but SkeletonDiffusion has been trained only with precise sensor data obtained in laboratory settings. Despite never having seen noisy data and various real-world actions (ballet, basketball, etc.), SkeletonDiffusion can handle most cases reasonably!

Instructions

Upload a video or select from examples
Choose whether to use precomputed results (if available)
Select the number of motion predictions to display (2-6)
Click "Run Skeleton Diffusion" to start

Note:

SkeletonDiffusion requires less than half a second for a forward pass, but extracting the poses from RGB and rendering the output are time consuming
Only the first 30 frames of the input video will be used
The first 0.5 seconds of motion will be used to predict future motion
Processing time depends on video length and selected number of predictions
Precomputed results will be much faster if available

SkeletonDiffusion Demo

Instructions