Related work
Vector embeddings Perceptual organization
- group pixels into parts
- detecting basic visual units first and grouping them second. our approach performs detection and grouping in one stage
Multiperson pose estimation
- top-down
- bottom-up
Instance Segmentation
- do detection followed by segmentation
- Two recent works , DeepMask, Instance-Sensitive FCN
Figure 1: DeepMask network
Figure 2: Instance-Sensitive FCN network
Figure 3: Instance-Sensitive FCN
Stacked Hourglass Architecture
- combine associative embedding with the stacked hourglass architecture
- repeated bottom-up and top- down
- consolidate global and local features
Figure 4: Stacked Hourglass Architecture
Figure 5: Stacked Hourglass Architecture
Figure 6: Stacked Hourglass Architecture
Multiperson Pose Estimation
- m detection heatmap and m tag heatmap
- Detection loss : MSE
- Grouping loss:
Figure 7: An overview of our approach for producing multi-person pose estimates
Experiments of Multiperson Pose Estimation
Dataset: MS-COCO and MPII Human Pose Figure 8: visualize the associative embedding channels for different joints
Figure 9: Results (AP) on MPII Multi-Person
Figure 10: Results on MS-COCO test-std, excluding systems trained with external data
Figure 11: Results on MS-COCO test-dev, excluding systems trained with external data
Figure 12: Qualitative pose estimation results on MSCOCO validation images
Instance Segmentation
detection loss: MSE between the predicted heatmap and the ground truth heatmap (the union of all instance masks) grouping loss:
Figure 13: instance segmentations’ work
Experiment of instance segmentation
Dataset: val split of PASCAL VOC 2012 Pretrained on MS COCO Figure 14: Example instance predictions produced by our system on the PASCAL VOC 2012 validation set
Figure 15: Semantic instance segmentation results (mAP) on PASCAL VOC 2012 validation images
conclusion
- introduce associative embedding, a new method for single- stage, end-to-end joint detection and grouping
- associative embedding can be easily integrated with other state-of- the-art architectures that produces pixelwise predictions
- apply associative embedding to multiperson pose estimation and achieve state of the art results on two standard benchmarks