Associative Embedding

joint detection and grouping

Posted by Stephen Zhou on October 18, 2017

Related work

Vector embeddings Perceptual organization

  • group pixels into parts
  • detecting basic visual units first and grouping them second. our approach performs detection and grouping in one stage

Multiperson pose estimation

  • top-down
  • bottom-up

Instance Segmentation

  • do detection followed by segmentation
  • Two recent works , DeepMask, Instance-Sensitive FCN

Figure 1: DeepMask network

Figure 2: Instance-Sensitive FCN network

Figure 3: Instance-Sensitive FCN

Stacked Hourglass Architecture

  • combine associative embedding with the stacked hourglass architecture
  • repeated bottom-up and top- down
  • consolidate global and local features

Figure 4: Stacked Hourglass Architecture

Figure 5: Stacked Hourglass Architecture

Figure 6: Stacked Hourglass Architecture

Multiperson Pose Estimation

  • m detection heatmap and m tag heatmap
  • Detection loss : MSE
  • Grouping loss:

Figure 7: An overview of our approach for producing multi-person pose estimates

Experiments of Multiperson Pose Estimation

Dataset: MS-COCO and MPII Human Pose Figure 8: visualize the associative embedding channels for different joints

Figure 9: Results (AP) on MPII Multi-Person

Figure 10: Results on MS-COCO test-std, excluding systems trained with external data

Figure 11: Results on MS-COCO test-dev, excluding systems trained with external data

Figure 12: Qualitative pose estimation results on MSCOCO validation images

Instance Segmentation

detection loss: MSE between the predicted heatmap and the ground truth heatmap (the union of all instance masks) grouping loss:

Figure 13: instance segmentations’ work

Experiment of instance segmentation

Dataset: val split of PASCAL VOC 2012 Pretrained on MS COCO Figure 14: Example instance predictions produced by our system on the PASCAL VOC 2012 validation set

Figure 15: Semantic instance segmentation results (mAP) on PASCAL VOC 2012 validation images

conclusion

  • introduce associative embedding, a new method for single- stage, end-to-end joint detection and grouping
  • associative embedding can be easily integrated with other state-of- the-art architectures that produces pixelwise predictions
  • apply associative embedding to multiperson pose estimation and achieve state of the art results on two standard benchmarks