SkeletonDiffusion Demo

Demo for the CVPR2025 paper "Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction", available here. Codebase released on GitHub.

SkeletonDiffusion takes as input a sequence of 3D body joints coordinates, which not everyone has at disposal. In this demo, we use a publicly available model, Neural Localizer Fields (NLF) to extract 3D poses from a given input video. We feed the extracted poses to SkeletonDiffusion to generate corresponding future motions. Note that the poses extracted from the video are noisy and imperfect, but SkeletonDiffusion has been trained only with precise sensor data obtained in laboratory settings. Despite never having seen noisy data and various real-world actions (ballet, basketball, etc.), SkeletonDiffusion can handle most cases reasonably!

Instructions

  1. Upload a video or select from examples
  2. Choose whether to use precomputed results (if available)
  3. Select the number of motion predictions to display (2-6)
  4. Click "Run Skeleton Diffusion" to start

Note:

  • SkeletonDiffusion requires less than half a second for a forward pass, but extracting the poses from RGB and rendering the output are time consuming
  • Only the first 30 frames of the input video will be used
  • The first 0.5 seconds of motion will be used to predict future motion
  • Processing time depends on video length and selected number of predictions
  • Precomputed results will be much faster if available
Examples

If checked, will use existing results instead of processing the video again

Number of displayed predictions

Select how many motion predictions to display (2-6)