YOLOv3源码阅读：layer_utils.py

2019-05-22 2019-09-03

目标检测

5 minutes read (About 783 words) 0 visits

一、YOLO简介

YOLO（You Only Look Once）是一个高效的目标检测算法，属于One-Stage大家族，针对于Two-Stage目标检测算法普遍存在的运算速度慢的缺点，YOLO创造性的提出了One-Stage。也就是将物体分类和物体定位在一个步骤中完成。YOLO直接在输出层回归bounding box的位置和bounding box所属类别，从而实现one-stage。

经过两次迭代，YOLO目前的最新版本为YOLOv3，在前两版的基础上，YOLOv3进行了一些比较细节的改动，效果有所提升。

本文正是希望可以将源码加以注释，方便自己学习，同时也愿意分享出来和大家一起学习。由于本人还是一学生，如果有错还请大家不吝指出。

本文参考的源码地址为：https://github.com/wizyoung/YOLOv3_TensorFlow

二、代码和注释

文件目录：YOUR_PATH\YOLOv3_TensorFlow-master_utils.py

这里函数的主要作用是对卷积等操作做出一定的个性化封装，方便代码的编写。主要包括：

卷积的封装
darknet网络结构的定义
resize的定义，默认是最近邻方法
在主体网络的基础上做的YOLO的附加的卷积操作，为后面的特征融合做准备

# coding: utf-8

from __future__ import division, print_function

import numpy as np
import tensorflow as tf

slim = tf.contrib.slim


def conv2d(inputs, filters, kernel_size, strides=1):
    # 对conv2d做一定的个性化封装，方便代码的编写和阅读
    def _fixed_padding(inputs, kernel_size):
        pad_total = kernel_size - 1
        pad_beg = pad_total // 2
        pad_end = pad_total - pad_beg

        padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end],
                                        [pad_beg, pad_end], [0, 0]], mode='CONSTANT')
        return padded_inputs

    if strides > 1:
        inputs = _fixed_padding(inputs, kernel_size)
    inputs = slim.conv2d(inputs, filters, kernel_size, stride=strides,
                         padding=('SAME' if strides == 1 else 'VALID'))
    return inputs


def darknet53_body(inputs):
    """
    darknet的主体网络框架
    :param inputs: 
    :return: 三张不同尺度的特征图
    """
    def res_block(inputs, filters):
        shortcut = inputs
        net = conv2d(inputs, filters * 1, 1)
        net = conv2d(net, filters * 2, 3)

        net = net + shortcut

        return net

    # first two conv2d layers
    net = conv2d(inputs, 32, 3, strides=1)
    net = conv2d(net, 64, 3, strides=2)

    # res_block * 1
    net = res_block(net, 32)

    net = conv2d(net, 128, 3, strides=2)

    # res_block * 2
    for i in range(2):
        net = res_block(net, 64)

    net = conv2d(net, 256, 3, strides=2)

    # res_block * 8
    for i in range(8):
        net = res_block(net, 128)

    route_1 = net
    net = conv2d(net, 512, 3, strides=2)

    # res_block * 8
    for i in range(8):
        net = res_block(net, 256)

    route_2 = net
    net = conv2d(net, 1024, 3, strides=2)

    # res_block * 4
    for i in range(4):
        net = res_block(net, 512)
    route_3 = net

    return route_1, route_2, route_3


def yolo_block(inputs, filters):
    """
    在darknet主体网络提取特征的基础上增加的若干卷积层，为了后面的特征融合做准备
    :param inputs: 
    :param filters: 
    :return: 
    """
    net = conv2d(inputs, filters * 1, 1)
    net = conv2d(net, filters * 2, 3)
    net = conv2d(net, filters * 1, 1)
    net = conv2d(net, filters * 2, 3)
    net = conv2d(net, filters * 1, 1)
    route = net
    net = conv2d(net, filters * 2, 3)
    return route, net


def upsample_layer(inputs, out_shape):
    """
    这一部分主要是对特征图进行resize，默认使用最近邻方法
    :param inputs: 
    :param out_shape: 
    :return: 
    """
    new_height, new_width = out_shape[1], out_shape[2]
    # NOTE: here height is the first
    # TODO: Do we need to set `align_corners` as True?
    inputs = tf.image.resize_nearest_neighbor(inputs, (new_height, new_width), name='upsampled')
    return inputs