IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

Motivation

语义分割分为三类：

基于像素级标注的全监督训练
基于目标级标注的弱监督训练（Bbox、spots、scribbles）
基于图像级标注的弱监督训练

基于像素级的标注需要耗费太多精力，本文考虑基于图像级标注来实现语义分割，可以大大减少像素级标注的工作量。

Method

首先通过初始网络生成初始分割结果，再利用图模型优化，指导语义分割模型学习，反复迭代优化。

框架分为三部分： coarse mask generation, coarse mask enhancement, recursive mask refinement

Coarse Mask Generation

采用一个8层的CNN结构，能够无监督的方式生成初始分割结果，这一步骤不考虑类别信息。

Coarse Mask Enhancement

通过GrabCut算法优化分割结果。

将类别信息赋值给分割结果，作为语义分割模型的标签指导训练，生成的结果再返回到Coarse Mask Enhancement。

Model Parameterization

Extend the Proposed Framework to Foreground Segmentation

框架可以延伸到前景分割中，提出dilated feature pyramid network ，采用空洞卷积扩大感受野。

Inspired by dilated convolution and multi-scale feature learning, we propose the Dilated Feature Pyramid Network (DFPN) for foreground segmentation task as shown in Fig. 4. The proposed DFPN has the same architecture as FPN [43] except adding the dilated convolution layers for three branches to enlarge the receptive field of the network.

Experiments

训练阶段只采用了有单个类别的图像，测试发现可以识别图像中多个类别信息。

AI-Smile

【2020 TIP】Coarse-to-Fine Semantic Segmentation From Image-Level Labels