Single Shot MultiBox Detector

SSD

Posted by stephen zhou on November 4, 2017

related work

figure 1: classic network of object detection

model

figure 2: the confrontation between YOLO and SSD

details

Multi-scale feature maps for detection

Convolutional predictors for detection: 3x3 kernel default boxes

Default boxes and aspect ratios: scale: aspect ratios: 4 or 6 mxn—>(c+4)kmn figure 3: SSD working framework

loss

Matching strategy: every ground truth match with IOU higher than a threshold(0.5)

Loss function: L(conf): softmax L(loc): Smooth L1 Loss

trick

Negative mining: ratio between the negatives and positives is at most 3:1

Data Augmentation: For every image:

  • Use the entire original input image.
  • Sample a patch so that the minimum IOU overlap with the objects is 0.1, 0.3, 0.5, 0.7, or 0.9.
  • Randomly sample a patch

Then for every patch , horizontally flipped with probability of 0.5 mAP update from 65.4% to 74.3%

experiment result

figure 4: PASCAL VOC2007 test detection results

figure 5: PASCAL VOC2007 test detection results of different models

figure 6: Sensitivity and impact of different object characteristics on VOC2007 test set using

figure 7: Detection examples on COCO test-dev with SSD512 model

model analysis

  • Data augmentation is crucial
  • More default box shapes is better
  • Atrous is faster
  • Multiple output layers at different resolutions is better

figure 8: Effects of various design choices and components on SSD performance

figure 9: Effects of using multiple output layers

conclusion

  • The core of SSD :predicting category scores and box offsets for a fixed set of default bounding boxes in multiscale
  • Faster and better