Since BlazePose is mentioned in the paper, maybe the following mention on GHUM is useful
model bundle:
• Pose detection model: detects the presence of bodies with a few key pose landmarks.
• Pose landmarker model: adds a complete mapping of the pose. The model outputs an estimate of 33 3-dimensional pose landmarks. This bundle uses a convolutional neural network similar to MobileNetV2 and is optimized for on-device, real-time fitness applications. This variant of the BlazePose model uses GHUM, a 3D human shape modeling pipeline, to estimate the full 3D body pose of an individual in images or videos
https://developers.google.com/mediapipe/solutions/vision/pose_landmarker?fbclid=IwAR2OUxYJXs7tdtT8NmZylPXpr07fEYwp6Yf-yYaxGz4cEfHoTZdYrp71UIE
Since BlazePose is mentioned in the paper, maybe the following mention on GHUM is useful
https://developers.google.com/mediapipe/solutions/vision/pose_landmarker?fbclid=IwAR2OUxYJXs7tdtT8NmZylPXpr07fEYwp6Yf-yYaxGz4cEfHoTZdYrp71UIE