python调用yolov3检测,yolo3 python
本文主要介绍python目标检测yolo3的详细讲解、预测和代码复制。有需要的朋友可以借鉴一下,希望能有所帮助。祝大家进步很大,早日升职加薪。
00-1010学习序实施思路1。yolo3预测思路(建网思路)2。使用先前的盒子来解码网络3的输出。用非最大抑制排序分数和筛选结果。
目录
分析完yolo2,当然要说说yolo3。yolo3和yolo2的区别主要在网络的特征提取部分,实际解码部分其实差别不大。
下载代码
本教程主要基于github中项目的直接下载,比yolo3-Keras的项目更容易理解,但是它的很多代码和yolo3-Keras的是一样的。
我保留了预测部分的代码,实际上,我可以通过执行detect.py来运行这个示例
链接: https://pan.baidu.com/s/1_xLeytnjBBSL2h2Kj4-YEw
取代码:i3hi
学习前言
实现思路
与之前的yolo1和yolo2相比,YOLOv3有了很大的提升。主要改进方向是:
1.使用剩余网络。残差卷积就是进行一个步长为2的3X3卷积,然后保存卷积层,再进行一个1X1卷积和一个3X3卷积,将这个结果加到层上作为最终结果。残差网络的特点是容易优化,增加一个相当的深度就可以提高精度。其内部残差块采用跳跃连接,缓解了深度神经网络中深度增加导致的梯度消失问题。
2.提取多特征图层进行目标检测。总共提取了三个特征图层。特征层的形状分别为(13,13,75),(26,26,75)和(52,52,75)。最后一个维度是75,因为该图基于voc数据集,其类别是20。yolo3仅用于每个要素图层。
如果使用coco训练集,有80个类,最终维数应该是255=385。三个特征图层的形状分别为(13,13,255),(26,26,255),(52,52,255)。
3.它采用反卷积UmSampling2d设计,反卷积相对于卷积在神经网络结构的前向和后向传播中进行相反的运算,可以提取更多更好的特征。
实际情况是,输入N张416x416的图片,经过多层运算后,会输出三个形状(N,13,13,255),(N,26,26,255),(N,52,52,255),每张图片分为13x13和255。
实现代码如下。实际调用时会调用yolo_inference函数,此时会得到三个要素图层的内容。
def _darknet53(self,inputs,conv指数,训练=真,norm_decay=0.99,norm_epsilon=1e-3):
介绍
-
构建yolo3使用的darknet53网络结构
因素
-
输入:模型输入变量
Conv_index:卷积层数,方便根据名称加载预训练权重。
Weights_dict:训练前体重
培训:算培训吗?
Norm_decay:预测时计算移动平均线的衰减率。
norm _:方差加上一个非常小的数,以防止被0除
返回
-
Conv:是52层卷积计算的结果。输入图片是416x416x3,所以输出结果形状是13x13x1024。
Route1:返回第26层52x52x256的卷积计算结果供后续使用。
Route2:返回第43层26x26x512的卷积计算结果,以备后续使用。
Conv_index:卷积层数,方便加载预训练模型。
使用tf.variabl
e_scope(darknet53):
# 416,416,3 -> 416,416,32
conv = self._conv2d_layer(inputs, filters_num = 32, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
# 416,416,32 -> 208,208,64
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 64, blocks_num = 1, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# 208,208,64 -> 104,104,128
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 128, blocks_num = 2, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# 104,104,128 -> 52,52,256
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 256, blocks_num = 8, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# route1 = 52,52,256
route1 = conv
# 52,52,256 -> 26,26,512
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 512, blocks_num = 8, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# route2 = 26,26,512
route2 = conv
# 26,26,512 -> 13,13,1024
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 1024, blocks_num = 4, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# route3 = 13,13,1024
return route1, route2, conv, conv_index
# 输出两个网络结果
# 第一个是进行5次卷积后,用于下一次逆卷积的,卷积过程是1X1,3X3,1X1,3X3,1X1
# 第二个是进行5+2次卷积,作为一个特征层的,卷积过程是1X1,3X3,1X1,3X3,1X1,3X3,1X1
def _yolo_block(self, inputs, filters_num, out_filters, conv_index, training = True, norm_decay = 0.99, norm_epsilon = 1e-3):
"""
Introduction
------------
yolo3在Darknet53提取的特征层基础上,又加了针对3种不同比例的feature map的block,这样来提高对小物体的检测率
Parameters
----------
inputs: 输入特征
filters_num: 卷积核数量
out_filters: 最后输出层的卷积核数量
conv_index: 卷积层数序号,方便根据名字加载预训练权重
training: 是否为训练
norm_decay: 在预测时计算moving average时的衰减率
norm_epsilon: 方差加上极小的数,防止除以0的情况
Returns
-------
route: 返回最后一层卷积的前一层结果
conv: 返回最后一层卷积的结果
conv_index: conv层计数
"""
conv = self._conv2d_layer(inputs, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
route = conv
conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = out_filters, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index), use_bias = True)
conv_index += 1
return route, conv, conv_index
# 返回三个特征层的内容
def yolo_inference(self, inputs, num_anchors, num_classes, training = True):
"""
Introduction
------------
构建yolo模型结构
Parameters
----------
inputs: 模型的输入变量
num_anchors: 每个grid cell负责检测的anchor数量
num_classes: 类别数量
training: 是否为训练模式
"""
conv_index = 1
# route1 = 52,52,256、route2 = 26,26,512、route3 = 13,13,1024
conv2d_26, conv2d_43, conv, conv_index = self._darknet53(inputs, conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
with tf.variable_scope(yolo):
#--------------------------------------#
# 获得第一个特征层
#--------------------------------------#
# conv2d_57 = 13,13,512,conv2d_59 = 13,13,255(3x(80+5))
conv2d_57, conv2d_59, conv_index = self._yolo_block(conv, 512, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
#--------------------------------------#
# 获得第二个特征层
#--------------------------------------#
conv2d_60 = self._conv2d_layer(conv2d_57, filters_num = 256, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv2d_60 = self._batch_normalization_layer(conv2d_60, name = "batch_normalization_" + str(conv_index),training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
conv_index += 1
# unSample_0 = 26,26,256
unSample_0 = tf.image.resize_nearest_neighbor(conv2d_60, [2 * tf.shape(conv2d_60)[1], 2 * tf.shape(conv2d_60)[1]], name=upSample_0)
# route0 = 26,26,768
route0 = tf.concat([unSample_0, conv2d_43], axis = -1, name = route_0)
# conv2d_65 = 52,52,256,conv2d_67 = 26,26,255
conv2d_65, conv2d_67, conv_index = self._yolo_block(route0, 256, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
#--------------------------------------#
# 获得第三个特征层
#--------------------------------------#
conv2d_68 = self._conv2d_layer(conv2d_65, filters_num = 128, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv2d_68 = self._batch_normalization_layer(conv2d_68, name = "batch_normalization_" + str(conv_index), training=training, norm_decay=self.norm_decay, norm_epsilon = self.norm_epsilon)
conv_index += 1
# unSample_1 = 52,52,128
unSample_1 = tf.image.resize_nearest_neighbor(conv2d_68, [2 * tf.shape(conv2d_68)[1], 2 * tf.shape(conv2d_68)[1]], name=upSample_1)
# route1= 52,52,384
route1 = tf.concat([unSample_1, conv2d_26], axis = -1, name = route_1)
# conv2d_75 = 52,52,255
_, conv2d_75, _ = self._yolo_block(route1, 128, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
return [conv2d_59, conv2d_67, conv2d_75]
2、利用先验框对网络的输出进行解码
yolo3的先验框生成与yolo2的类似,如果不明白先验框是如何生成的,可以看我的上一篇博文
python目标检测yolo2详解及其预测代码复现
其实yolo3的解码与yolo2的解码过程一样,只是对于yolo3而言,其需要对三个特征层进行解码,三个特征层的shape分别为(N,13,13,255),(N,26,26,255),(N,52,52,255)的数据,对应每个图分为13x13、26x26、52x52的网格上3个先验框的位置。
此处需要用到一个循环。
1、将第一个特征层reshape成[-1, 13, 13, 3, 80 + 5],代表169个中心点每个中心点的3个先验框的情况。
2、将80+5的5中的xywh分离出来,0、1是xy相对中心点的偏移量;2、3是宽和高的情况;4是置信度。
3、建立13x13的网格,代表图片进行13x13处理后网格的中心点。
4、利用计算公式计算实际的bbox的位置 。
5、置信度乘上80+5中的80(这里的80指的是类别概率)得到得分。
6、将第二个特征层reshape成[-1, 26, 26, 3, 80 + 5],重复2到5步。将第三个特征层reshape成[-1, 52, 52, 3, 80 + 5],重复2到5步。
单个特征层的解码部分代码如下:
# 获得单个特征层框的位置和得分def boxes_and_scores(self, feats, anchors, classes_num, input_shape, image_shape):
"""
Introduction
------------
将预测出的box坐标转换为对应原图的坐标,然后计算每个box的分数
Parameters
----------
feats: yolo输出的feature map
anchors: anchor的位置
class_num: 类别数目
input_shape: 输入大小
image_shape: 图片大小
Returns
-------
boxes: 物体框的位置
boxes_scores: 物体框的分数,为置信度和类别概率的乘积
"""
# 获得特征
box_xy, box_wh, box_confidence, box_class_probs = self._get_feats(feats, anchors, classes_num, input_shape)
# 寻找在原图上的位置
boxes = self.correct_boxes(box_xy, box_wh, input_shape, image_shape)
boxes = tf.reshape(boxes, [-1, 4])
# 获得置信度box_confidence * box_class_probs
box_scores = box_confidence * box_class_probs
box_scores = tf.reshape(box_scores, [-1, classes_num])
return boxes, box_scores
# 单个特征层的解码过程
def _get_feats(self, feats, anchors, num_classes, input_shape):
"""
Introduction
------------
根据yolo最后一层的输出确定bounding box
Parameters
----------
feats: yolo模型最后一层输出
anchors: anchors的位置
num_classes: 类别数量
input_shape: 输入大小
Returns
-------
box_xy, box_wh, box_confidence, box_class_probs
"""
num_anchors = len(anchors)
anchors_tensor = tf.reshape(tf.constant(anchors, dtype=tf.float32), [1, 1, 1, num_anchors, 2])
grid_size = tf.shape(feats)[1:3]
predictions = tf.reshape(feats, [-1, grid_size[0], grid_size[1], num_anchors, num_classes + 5])
# 这里构建13*13*1*2的矩阵,对应每个格子加上对应的坐标
grid_y = tf.tile(tf.reshape(tf.range(grid_size[0]), [-1, 1, 1, 1]), [1, grid_size[1], 1, 1])
grid_x = tf.tile(tf.reshape(tf.range(grid_size[1]), [1, -1, 1, 1]), [grid_size[0], 1, 1, 1])
grid = tf.concat([grid_x, grid_y], axis = -1)
grid = tf.cast(grid, tf.float32)
# 将x,y坐标归一化,相对网格的位置
box_xy = (tf.sigmoid(predictions[..., :2]) + grid) / tf.cast(grid_size[::-1], tf.float32)
# 将w,h也归一化
box_wh = tf.exp(predictions[..., 2:4]) * anchors_tensor / tf.cast(input_shape[::-1], tf.float32)
box_confidence = tf.sigmoid(predictions[..., 4:5])
box_class_probs = tf.sigmoid(predictions[..., 5:])
return box_xy, box_wh, box_confidence, box_class_probs
该函数被其它函数调用,用于完成三个特征层的解码:
def eval(self, yolo_outputs, image_shape, max_boxes = 20):"""
Introduction
------------
根据Yolo模型的输出进行非极大值抑制,获取最后的物体检测框和物体检测类别
Parameters
----------
yolo_outputs: yolo模型输出
image_shape: 图片的大小
max_boxes: 最大box数量
Returns
-------
boxes_: 物体框的位置
scores_: 物体类别的概率
classes_: 物体类别
"""
# 每一个特征层对应三个先验框
anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
boxes = []
box_scores = []
# inputshape是416x416
# image_shape是实际图片的大小
input_shape = tf.shape(yolo_outputs[0])[1 : 3] * 32
# 对三个特征层的输出获取每个预测box坐标和box的分数,score = 置信度x类别概率
#---------------------------------------#
# 对三个特征层解码
# 获得分数和框的位置
#---------------------------------------#
for i in range(len(yolo_outputs)):
_boxes, _box_scores = self.boxes_and_scores(yolo_outputs[i], self.anchors[anchor_mask[i]], len(self.class_names), input_shape, image_shape)
boxes.append(_boxes)
box_scores.append(_box_scores)
# 放在一行里面便于操作
boxes = tf.concat(boxes, axis = 0)
box_scores = tf.concat(box_scores, axis = 0)
3、进行得分排序与非极大抑制筛选
这一部分基本上是所有目标检测通用的部分。不过该项目的处理方式与其它项目不同。其对于每一个类进行判别。
1、取出每一类得分大于self.obj_threshold的框和得分。
2、利用框的位置和得分进行非极大抑制。
实现代码如下:
#---------------------------------------## 1、取出每一类得分大于self.obj_threshold
# 的框和得分
# 2、对得分进行非极大抑制
#---------------------------------------#
# 对每一个类进行判断
for c in range(len(self.class_names)):
# 取出所有类为c的box
class_boxes = tf.boolean_mask(boxes, mask[:, c])
# 取出所有类为c的分数
class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
# 非极大抑制
nms_index = tf.image.non_max_suppression(class_boxes, class_box_scores, max_boxes_tensor, iou_threshold = self.nms_threshold)
# 获取非极大抑制的结果
class_boxes = tf.gather(class_boxes, nms_index)
class_box_scores = tf.gather(class_box_scores, nms_index)
classes = tf.ones_like(class_box_scores, int32) * c
boxes_.append(class_boxes)
scores_.append(class_box_scores)
classes_.append(classes)
boxes_ = tf.concat(boxes_, axis = 0)
scores_ = tf.concat(scores_, axis = 0)
classes_ = tf.concat(classes_, axis = 0)
实际上该部分与第二部分在一个函数里,完成输出的解码和筛选,完成预测过程。
def eval(self, yolo_outputs, image_shape, max_boxes = 20):"""
Introduction
------------
根据Yolo模型的输出进行非极大值抑制,获取最后的物体检测框和物体检测类别
Parameters
----------
yolo_outputs: yolo模型输出
image_shape: 图片的大小
max_boxes: 最大box数量
Returns
-------
boxes_: 物体框的位置
scores_: 物体类别的概率
classes_: 物体类别
"""
# 每一个特征层对应三个先验框
anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
boxes = []
box_scores = []
# inputshape是416x416
# image_shape是实际图片的大小
input_shape = tf.shape(yolo_outputs[0])[1 : 3] * 32
# 对三个特征层的输出获取每个预测box坐标和box的分数,score = 置信度x类别概率
#---------------------------------------#
# 对三个特征层解码
# 获得分数和框的位置
#---------------------------------------#
for i in range(len(yolo_outputs)):
_boxes, _box_scores = self.boxes_and_scores(yolo_outputs[i], self.anchors[anchor_mask[i]], len(self.class_names), input_shape, image_shape)
boxes.append(_boxes)
box_scores.append(_box_scores)
# 放在一行里面便于操作
boxes = tf.concat(boxes, axis = 0)
box_scores = tf.concat(box_scores, axis = 0)
mask = box_scores >= self.obj_threshold
max_boxes_tensor = tf.constant(max_boxes, dtype = tf.int32)
boxes_ = []
scores_ = []
classes_ = []
#---------------------------------------#
# 1、取出每一类得分大于self.obj_threshold
# 的框和得分
# 2、对得分进行非极大抑制
#---------------------------------------#
# 对每一个类进行判断
for c in range(len(self.class_names)):
# 取出所有类为c的box
class_boxes = tf.boolean_mask(boxes, mask[:, c])
# 取出所有类为c的分数
class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
# 非极大抑制
nms_index = tf.image.non_max_suppression(class_boxes, class_box_scores, max_boxes_tensor, iou_threshold = self.nms_threshold)
# 获取非极大抑制的结果
class_boxes = tf.gather(class_boxes, nms_index)
class_box_scores = tf.gather(class_box_scores, nms_index)
classes = tf.ones_like(class_box_scores, int32) * c
boxes_.append(class_boxes)
scores_.append(class_box_scores)
classes_.append(classes)
boxes_ = tf.concat(boxes_, axis = 0)
scores_ = tf.concat(scores_, axis = 0)
classes_ = tf.concat(classes_, axis = 0)
return boxes_, scores_, classes_
得到框的位置和种类后就可以画图了。
实现结果
以上就是python目标检测yolo3详解预测及代码复现的详细内容,更多关于yolo3预测复现的资料请关注盛行IT软件开发工作室其它相关文章!
郑重声明:本文由网友发布,不代表盛行IT的观点,版权归原作者所有,仅为传播更多信息之目的,如有侵权请联系,我们将第一时间修改或删除,多谢。