YOLOv3源码阅读:nms_utils.py

YOLOv3源码阅读:nms_utils.py

一、YOLO简介

  YOLO(You Only Look Once)是一个高效的目标检测算法,属于One-Stage大家族,针对于Two-Stage目标检测算法普遍存在的运算速度慢的缺点,YOLO创造性的提出了One-Stage。也就是将物体分类和物体定位在一个步骤中完成。YOLO直接在输出层回归bounding box的位置和bounding box所属类别,从而实现one-stage。

  经过两次迭代,YOLO目前的最新版本为YOLOv3,在前两版的基础上,YOLOv3进行了一些比较细节的改动,效果有所提升。

  本文正是希望可以将源码加以注释,方便自己学习,同时也愿意分享出来和大家一起学习。由于本人还是一学生,如果有错还请大家不吝指出。

  本文参考的源码地址为:https://github.com/wizyoung/YOLOv3_TensorFlow

二、代码和注释

  文件目录:YOUR_PATH\YOLOv3_TensorFlow-master_utils.py

  这一部分代码主要是非最大值抑制(NMS)的实现,原理都是相同,过程大致如下: - 首先按照目标的置信度从大到小排序 - 取出当前最大的置信度的目标框 - 计算剩下的目标框和取出的目标框的iou - 依次检查iou的大小,如果iou高于一定的阈值,则说明对应的目标框被取出的目标框抑制了,因此只留下iou小于一定阈值的框。 - 重复2~4步骤,直至处理完所有的目标框 - 返回所有取出的目标框,就是NMS的结果

  需要注意的是,NMS只针对于一类类别的数据,如果有多个类别,则需要分别处理。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# coding: utf-8

from __future__ import division, print_function

import numpy as np
import tensorflow as tf


def gpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, nms_thresh=0.5):
"""
Perform NMS on GPU using TensorFlow.

params:
boxes: tensor of shape [1, 10647, 4] # 10647=(13*13+26*26+52*52)*3, for input 416*416 image
scores: tensor of shape [1, 10647, num_classes], score=conf*prob
num_classes: total number of classes
max_boxes: integer, maximum number of predicted boxes you'd like, default is 50
score_thresh: if [ highest class probability score < score_threshold]
then get rid of the corresponding box
nms_thresh: real value, "intersection over union" threshold used for NMS filtering
"""

boxes_list, label_list, score_list = [], [], []
max_boxes = tf.constant(max_boxes, dtype='int32')

# since we do nms for single image, then reshape it
boxes = tf.reshape(boxes, [-1, 4]) # '-1' means we don't konw the exact number of boxes
score = tf.reshape(scores, [-1, num_classes])

# Step 1: Create a filtering mask based on "box_class_scores" by using "threshold".
mask = tf.greater_equal(score, tf.constant(score_thresh))
# Step 2: Do non_max_suppression for each class
for i in range(num_classes):
# Step 3: Apply the mask to scores, boxes and pick them out
filter_boxes = tf.boolean_mask(boxes, mask[:, i])
filter_score = tf.boolean_mask(score[:, i], mask[:, i])
nms_indices = tf.image.non_max_suppression(boxes=filter_boxes,
scores=filter_score,
max_output_size=max_boxes,
iou_threshold=nms_thresh, name='nms_indices')
label_list.append(tf.ones_like(tf.gather(filter_score, nms_indices), 'int32') * i)
boxes_list.append(tf.gather(filter_boxes, nms_indices))
score_list.append(tf.gather(filter_score, nms_indices))

boxes = tf.concat(boxes_list, axis=0)
score = tf.concat(score_list, axis=0)
label = tf.concat(label_list, axis=0)

return boxes, score, label


def py_nms(boxes, scores, max_boxes=50, iou_thresh=0.5):
"""
Pure Python NMS baseline.

Arguments: boxes: shape of [-1, 4], the value of '-1' means that dont know the
exact number of boxes
scores: shape of [-1,]
max_boxes: representing the maximum of boxes to be selected by non_max_suppression
iou_thresh: representing iou_threshold for deciding to keep boxes
"""
assert boxes.shape[1] == 4 and len(scores.shape) == 1

# 下面几行的代码主要是用于求解每个box的面积,然后按照每个box的score的大小进行排序
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]

areas = (x2 - x1) * (y2 - y1)
# 按照每个box的score大小进行排序,这里返回的是排序之后的box的index。
# 本质上order储存的是需要处理的box的索引
order = scores.argsort()[::-1]

# keep用于储存保留下来的box的索引index
keep = []

# 如果还存在没有被处理的box的索引
while order.size > 0:
# 由于之前进行了排序,所以order的第一个肯定是score最高的
i = order[0]
# 将这个索引保存起来
keep.append(i)

# 下面的代码主要是求解第一个box和剩下的所有的box的IOU,
# 因为第一个是目标box,所以在order的选取上需要加上[1:],取遍剩下的所有的box
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])

w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
# IOU计算
ovr = inter / (areas[i] + areas[order[1:]] - inter)

# 将和目标box的IOU小于一定阈值的box的索引取出,因为高于这一阈值的box都已经被目标box抑制了
inds = np.where(ovr <= iou_thresh)[0]
# 然后更新我们的order,重复下一轮循环。
order = order[inds + 1]

# 最后返回给定数目的box的索引
return keep[:max_boxes]


def cpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, iou_thresh=0.5):
"""
Perform NMS on CPU.
Arguments:
boxes: shape [1, 10647, 4]
scores: shape [1, 10647, num_classes]
"""

boxes = boxes.reshape(-1, 4)
scores = scores.reshape(-1, num_classes)
# Picked bounding boxes
picked_boxes, picked_score, picked_label = [], [], []

for i in range(num_classes):
indices = np.where(scores[:, i] >= score_thresh)
filter_boxes = boxes[indices]
filter_scores = scores[:, i][indices]
if len(filter_boxes) == 0:
continue
# do non_max_suppression on the cpu
indices = py_nms(filter_boxes, filter_scores,
max_boxes=max_boxes, iou_thresh=iou_thresh)
picked_boxes.append(filter_boxes[indices])
picked_score.append(filter_scores[indices])
picked_label.append(np.ones(len(indices), dtype='int32') * i)
if len(picked_boxes) == 0:
return None, None, None

boxes = np.concatenate(picked_boxes, axis=0)
score = np.concatenate(picked_score, axis=0)
label = np.concatenate(picked_label, axis=0)

return boxes, score, label

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×