Skip to main content

Multishot Human Pose

· 4 min read
SeulGi Hong

I'm trying to extract SMPL parameters on video in a multishot setting. To this end, tracklets are necessary.

Steps

PHALP setup -> Multishot Human Pose extraction -> select data for training NeRF.

1. PHALP setup

Summary of PHALP

  • "Tracking People by Predicting 3D Appearance, Location & Pose" (CVPR 2022)
  • Input: monocular video
  • Task: Tracking people
  • Method
    • lift people in a frame to 3D (3D pose of human, human location in 3D space, 3D appearance)
    • track a person: collect 3D observations over time in a tracklet representation -> build temporal model
      • model predict the future state of a tracklet including 3d location, pose, appearance
      • for the future frame, calculate similarity between pred state of tracklet and single frame observations
    • Association: simple Hungarian matching
  • Output: *.pkl file (input for the next step, multishot human pose extraction)
  • project page: PHALP project

Installation

Tips

  • about branch
    • master
      • pros: compatible to Mshot (CVPR22), relatively easy to install
      • cons: out-dated, hard-coded
    • v1.1
      • pros: fancy code and demos
      • cons: OpenGL and OSMesa would be a little bit tricky, not compatible to Mshot repo.
          1. output *.pkl file should be slightly changed
          1. this demo code doesn't generate instance segmentation mask per person, which is necessary to Mshot preprocessing.
  • My settings
    • docker container / without conda
    • off-screen
      • Pyrenderer and OpenGL may cause problems, you need to choose EGL or OSMesa
      • I build OSMesa and use PyOpenGL + OSMesa combination. (check 'Failure Logs...' under)
    • works on PHALP master and v1.1 branches, and also work for Mshot.
    • Here is pip list output of my environment. req.txt
Failure Logs... (skip this)

Logs

  • follow this PHALP
  • it takes quite a time to solve environment in conda...
  • "('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))" error
  • I'm using docker container
    • to install git+http, 'apt-get install git'
    • pycocotools error: link
  • torch.ao is added on 1.10

I tried setting an environment without conda, but failed. issues while installing PHALP:

  • pip install pyrootutils submitit gdown dill colordict scenedetect pytube
  • pip install scikit-learn==0.22.2
  • OpenGL: ssh env (without display)
    • install OSMesa OSMesa doc
    • apt-get install libexpat1-dev
  • encountered on PosixPath ~~~ error but I couldn't handle it

2. OpenPose

Settings

  • docker image from the hub: 'd0ckaaa/openpose'
  • execute make commands to install
  • download pretrained opencv models by get models script.

3. multishot human pose setup

Settings

  • follow this multishot
  • I use the same docker as the previous one.
  • Notes
    • use exactly the same version of smplx (if not, it may cause error)
    • no need to use torch 1.5.
    • change environ variables: PYOPENGL_PLATFORM' egl → osmesa
    • expand docker's shared memory size to 2GB
  • Data
    • download mshot_data from github official Mshot page (approximately 1GB)
    • download smpl neutral pkl and gmm08 on the official smplify webpage and move to the correct location

Pre-processing

  • .pkl → .npz
  • output contains information for a specific person (create separately)
  • these are the initial value of optimization step.

Optimization

  • optimize the SMPL parameters using multishot loss
Memo

Notes.

  • To extract SMPL by multishot human pose, tracking and identification are needed.
  • What about the other SMPL algorithms? SMPL extraction models like ROMP, VIBE...
    • Do they use 2D keypoints?
    • Regression or optimized based?
    • How are they inputs like? sequence or only an image? should be cropped? do they work on multiple people?
    • VIBE: https://arxiv.org/pdf/1912.05656.pdf