related work
figure 1: classic network of object detection
model
figure 2: the confrontation between YOLO and SSD
details
Multi-scale feature maps for detection
Convolutional predictors for detection: 3x3 kernel default boxes
Default boxes and aspect ratios: scale: aspect ratios: 4 or 6 mxn—>(c+4)kmn figure 3: SSD working framework
loss
Matching strategy: every ground truth match with IOU higher than a threshold(0.5)
Loss function: L(conf): softmax L(loc): Smooth L1 Loss
trick
Negative mining: ratio between the negatives and positives is at most 3:1
Data Augmentation: For every image:
- Use the entire original input image.
- Sample a patch so that the minimum IOU overlap with the objects is 0.1, 0.3, 0.5, 0.7, or 0.9.
- Randomly sample a patch
Then for every patch , horizontally flipped with probability of 0.5 mAP update from 65.4% to 74.3%
experiment result
figure 4: PASCAL VOC2007 test detection results
figure 5: PASCAL VOC2007 test detection results of different models
figure 6: Sensitivity and impact of different object characteristics on VOC2007 test set using
figure 7: Detection examples on COCO test-dev with SSD512 model
model analysis
- Data augmentation is crucial
- More default box shapes is better
- Atrous is faster
- Multiple output layers at different resolutions is better
figure 8: Effects of various design choices and components on SSD performance
figure 9: Effects of using multiple output layers
conclusion
- The core of SSD :predicting category scores and box offsets for a fixed set of default bounding boxes in multiscale
- Faster and better
-
Previous
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks -
Next
Pixel Recurrent Neural Networks