CV Top30 by Microsoft Research Asia

Deep High-Resolution Representation Learning for Human Pose Estimation

1.1 Two benefits of the architecture
1. Connects high-to-low resolution sub-networks in parallel rather than in series as done in most existing solutions.
2. Repeated multi-scale fusions to boost the high-resolution representations with the help of the low-resolution representations of the same depth and similar level, and vice versa.
Relation Networks for Object Detection Define relations between objects by adopting attention mechanism. In their work, it largely used for remove duplicate detection.
Learning Region Features for Object Detection Local features without RoI pooling.
Local Relation Networks for Image Recognition another relation networks on ImageNet.
Deformable ConvNets v2: More Deformable, Better Results
A Twofold Siamese Network for Real-Time Object Tracking Tracking with help pf semantic.
Learning Pyramid Context Encoder Network for High-Quality Image Inpainting Attention of pyramid.
Structured Knowledge Distillation forDense Prediction