pytorch版yolov4检测人,pytorch实现yolov3

　　本文主要介绍Pytorch构建YoloV4目标检测平台的源代码。有需要的朋友可以借鉴一下，希望能有所帮助。祝大家进步很大，早日升职加薪。

　　00-1010什么是YOLOV4YOLOV4结构分析1、骨干特征提取网络Backbone2、特征金字塔3、YoloHead利用获得的特征进行预测4、解码预测结果5、在原始图像上绘制YOLOV4的训练1、YOLOV4的改进训练技巧a)、镶嵌数据增强b)、标签平滑c)、CIOUd)、学习率余弦退火衰减2、损失构成a)、计算损失所需的参数b)、y_pre是什么c)、y_true是什么。d)在损失1的计算过程中训练你的YoloV4模型。数据集的准备。数据集的处理。开始网络培训4。训练结果的预测。

　　YOLOV4是YOLOV3的改进版，在YOLOV3的基础上融合了很多小技巧。虽然在目标探测上没有革命性的变化，但YOLOV4依然兼顾了速度和精度。从上图也可以看出，YOLOV4是在YOLOV3的基础上，地图在不掉FPS的情况下达到了44，表现出了明显的提升。

　　与YOLOV3相比，YOLOV4的整体检测思路并没有太大的不同，它使用了三个特征层进行分类和回归预测。

　　请注意！

　　强烈建议先学YOLOV3再学YOLOV4，因为YOLOV4真的可以算是YOLOV3结合一系列改进的版本！

　　(重要的事情说三遍！)

　　YOLOV3可以参考这个博客：https://www.jb51.net/article/247364.htm.

　　下载代码

　　YOLOV4改进部分(不完整)

　　1.骨干特征提取网络：DarkNet53=CSPDarkNet53

　　2.特征金字塔：SPP、PAN

　　3.分类回归层：YOLOv3(不变)

　　4.训练中使用的技巧：镶嵌数据增强、标签平滑平滑、CIOU、学习率余弦退火衰减

　　5.激活功能：使用Mish激活功能。

　　以上并不是全部的改进，还有一些其他的改进。因为YOLOV4中使用的改进实在太多了，很难完全实现，也很难一一列举。下面只是一些我感兴趣的，非常有效的改进。

　　还有一件更重要的事：

　　SAM在论文中提到，作者自己的源代码没有使用。

　　还有很多其他的招数。不是所有的招数都升级了，我也不可能实现所有的招数。

　　全博客将结合YOLOV3和YOLOV4的区别进行分析。

什么是YOLOV4

　　为了便于理解，本文将所有通道号放在最后一个维度。

YOLOV4结构解析

　　当输入为416x416时，特征结构如下：

　　当输入为608x608时，特征结构如下：

　　r">

　　主干特征提取网络Backbone的改进点有两个：

　　a).主干特征提取网络：DarkNet53 => CSPDarkNet53

　　b).激活函数：使用Mish激活函数

　　如果大家对YOLOV3比较熟悉的话，应该知道Darknet53的结构，其由一系列残差网络结构构成。在Darknet53中，其存在resblock_body模块，其由一次下采样和多次残差结构的堆叠构成，Darknet53便是由resblock_body模块组合而成。

　　而在YOLOV4中，其对该部分进行了一定的修改。

　　1、其一是将DarknetConv2D的激活函数由LeakyReLU修改成了Mish，卷积块由DarknetConv2D_BN_Leaky变成了DarknetConv2D_BN_Mish。

　　Mish函数的公式与图像如下：

　　2、其二是将resblock_body的结构进行修改，使用了CSPnet结构。此时YOLOV4当中的Darknet53被修改成了CSPDarknet53。

　　CSPnet结构并不算复杂，就是将原来的残差块的堆叠进行了一个拆分，拆成左右两部分：

　　主干部分继续进行原来的残差块的堆叠；

　　另一部分则像一个残差边一样，经过少量处理直接连接到最后。

　　因此可以认为CSP中存在一个大的残差边。

#---------------------------------------------------#
　　# CSPdarknet的结构块
　　# 存在一个大残差边
　　# 这个大残差边绕过了很多的残差结构
　　#---------------------------------------------------#
　　class Resblock_body(nn.Module):
　　 def __init__(self, in_channels, out_channels, num_blocks, first):
　　 super(Resblock_body, self).__init__()
　　 self.downsample_conv = BasicConv(in_channels, out_channels, 3, stride=2)
　　 if first:
　　 self.split_conv0 = BasicConv(out_channels, out_channels, 1)
　　 self.split_conv1 = BasicConv(out_channels, out_channels, 1) 
　　 self.blocks_conv = nn.Sequential(
　　 Resblock(channels=out_channels, hidden_channels=out_channels//2),
　　 BasicConv(out_channels, out_channels, 1)
　　 )
　　 self.concat_conv = BasicConv(out_channels*2, out_channels, 1)
　　 else:
　　 self.split_conv0 = BasicConv(out_channels, out_channels//2, 1)
　　 self.split_conv1 = BasicConv(out_channels, out_channels//2, 1)
　　 self.blocks_conv = nn.Sequential(
　　 *[Resblock(out_channels//2) for _ in range(num_blocks)],
　　 BasicConv(out_channels//2, out_channels//2, 1)
　　 )
　　 self.concat_conv = BasicConv(out_channels, out_channels, 1)
　　 def forward(self, x):
　　 x = self.downsample_conv(x)
　　 x0 = self.split_conv0(x)
　　 x1 = self.split_conv1(x)
　　 x1 = self.blocks_conv(x1)
　　 x = torch.cat([x1, x0], dim=1)
　　 x = self.concat_conv(x)
　　 return x

　　全部实现代码为：

import torch
　　import torch.nn.functional as F
　　import torch.nn as nn
　　import math
　　from collections import OrderedDict
　　#-------------------------------------------------#
　　# MISH激活函数
　　#-------------------------------------------------#
　　class Mish(nn.Module):
　　 def __init__(self):
　　 super(Mish, self).__init__()
　　 def forward(self, x):
　　 return x * torch.tanh(F.softplus(x))
　　#-------------------------------------------------#
　　# 卷积块
　　# CONV+BATCHNORM+MISH
　　#-------------------------------------------------#
　　class BasicConv(nn.Module):
　　 def __init__(self, in_channels, out_channels, kernel_size, stride=1):
　　 super(BasicConv, self).__init__()
　　 self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, kernel_size//2, bias=False)
　　 self.bn = nn.BatchNorm2d(out_channels)
　　 self.activation = Mish()
　　 def forward(self, x):
　　 x = self.conv(x)
　　 x = self.bn(x)
　　 x = self.activation(x)
　　 return x
　　#---------------------------------------------------#
　　# CSPdarknet的结构块的组成部分
　　# 内部堆叠的残差块
　　#---------------------------------------------------#
　　class Resblock(nn.Module):
　　 def __init__(self, channels, hidden_channels=None, residual_activation=nn.Identity()):
　　 super(Resblock, self).__init__()
　　 if hidden_channels is None:
　　 hidden_channels = channels
　　 self.block = nn.Sequential(
　　 BasicConv(channels, hidden_channels, 1),
　　 BasicConv(hidden_channels, channels, 3)
　　 )
　　 def forward(self, x):
　　 return x+self.block(x)
　　#---------------------------------------------------#
　　# CSPdarknet的结构块
　　# 存在一个大残差边
　　# 这个大残差边绕过了很多的残差结构
　　#---------------------------------------------------#
　　class Resblock_body(nn.Module):
　　 def __init__(self, in_channels, out_channels, num_blocks, first):
　　 super(Resblock_body, self).__init__()
　　 self.downsample_conv = BasicConv(in_channels, out_channels, 3, stride=2)
　　 if first:
　　 self.split_conv0 = BasicConv(out_channels, out_channels, 1)
　　 self.split_conv1 = BasicConv(out_channels, out_channels, 1) 
　　 self.blocks_conv = nn.Sequential(
　　 Resblock(channels=out_channels, hidden_channels=out_channels//2),
　　 BasicConv(out_channels, out_channels, 1)
　　 )
　　 self.concat_conv = BasicConv(out_channels*2, out_channels, 1)
　　 else:
　　 self.split_conv0 = BasicConv(out_channels, out_channels//2, 1)
　　 self.split_conv1 = BasicConv(out_channels, out_channels//2, 1)
　　 self.blocks_conv = nn.Sequential(
　　 *[Resblock(out_channels//2) for _ in range(num_blocks)],
　　 BasicConv(out_channels//2, out_channels//2, 1)
　　 )
　　 self.concat_conv = BasicConv(out_channels, out_channels, 1)
　　 def forward(self, x):
　　 x = self.downsample_conv(x)
　　 x0 = self.split_conv0(x)
　　 x1 = self.split_conv1(x)
　　 x1 = self.blocks_conv(x1)
　　 x = torch.cat([x1, x0], dim=1)
　　 x = self.concat_conv(x)
　　 return x
　　class CSPDarkNet(nn.Module):
　　 def __init__(self, layers):
　　 super(CSPDarkNet, self).__init__()
　　 self.inplanes = 32
　　 self.conv1 = BasicConv(3, self.inplanes, kernel_size=3, stride=1)
　　 self.feature_channels = [64, 128, 256, 512, 1024]
　　 self.stages = nn.ModuleList([
　　 Resblock_body(self.inplanes, self.feature_channels[0], layers[0], first=True),
　　 Resblock_body(self.feature_channels[0], self.feature_channels[1], layers[1], first=False),
　　 Resblock_body(self.feature_channels[1], self.feature_channels[2], layers[2], first=False),
　　 Resblock_body(self.feature_channels[2], self.feature_channels[3], layers[3], first=False),
　　 Resblock_body(self.feature_channels[3], self.feature_channels[4], layers[4], first=False)
　　 ])
　　 self.num_features = 1
　　 # 进行权值初始化
　　 for m in self.modules():
　　 if isinstance(m, nn.Conv2d):
　　 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
　　 m.weight.data.normal_(0, math.sqrt(2. / n))
　　 elif isinstance(m, nn.BatchNorm2d):
　　 m.weight.data.fill_(1)
　　 m.bias.data.zero_()
　　 def forward(self, x):
　　 x = self.conv1(x)
　　 x = self.stages[0](x)
　　 x = self.stages[1](x)
　　 out3 = self.stages[2](x)
　　 out4 = self.stages[3](out3)
　　 out5 = self.stages[4](out4)
　　 return out3, out4, out5
　　def darknet53(pretrained, **kwargs):
　　 model = CSPDarkNet([1, 2, 8, 8, 4])
　　 if pretrained:
　　 if isinstance(pretrained, str):
　　 model.load_state_dict(torch.load(pretrained))
　　 else:
　　 raise Exception("darknet request a pretrained path. got [{}]".format(pretrained))
　　 return model

2、特征金字塔

　　当输入是416x416时，特征结构如下：

　　当输入是608x608时，特征结构如下：

　　在特征金字塔部分，YOLOV4结合了两种改进:

　　a).使用了SPP结构。

　　b).使用了PANet结构。

　　如上图所示，除去CSPDarknet53和Yolo Head的结构外，都是特征金字塔的结构。

　　1、SPP结构参杂在对CSPdarknet53的最后一个特征层的卷积里，在对CSPdarknet53的最后一个特征层进行三次DarknetConv2D_BN_Leaky卷积后，分别利用四个不同尺度的最大池化进行处理，最大池化的池化核大小分别为13x13、9x9、5x5、1x1（1x1即无处理）

#---------------------------------------------------#
　　# SPP结构，利用不同大小的池化核进行池化
　　# 池化后堆叠
　　#---------------------------------------------------#
　　class SpatialPyramidPooling(nn.Module):
　　 def __init__(self, pool_sizes=[5, 9, 13]):
　　 super(SpatialPyramidPooling, self).__init__()
　　 self.maxpools = nn.ModuleList([nn.MaxPool2d(pool_size, 1, pool_size//2) for pool_size in pool_sizes])
　　 def forward(self, x):
　　 features = [maxpool(x) for maxpool in self.maxpools[::-1]]
　　 features = torch.cat(features + [x], dim=1)
　　 return features

　　其可以它能够极大地增加感受野，分离出最显著的上下文特征。

　　2、PANet是2018的一种实例分割算法，其具体结构由反复提升特征的意思。

　　上图为原始的PANet的结构，可以看出来其具有一个非常重要的特点就是特征的反复提取。

　　在（a）里面是传统的特征金字塔结构，在完成特征金字塔从下到上的特征提取后，还需要实现（b）中从上到下的特征提取。

　　而在YOLOV4当中，其主要是在三个有效特征层上使用了PANet结构。

　　实现代码如下：

#---------------------------------------------------#
　　# yolo_body
　　#---------------------------------------------------#
　　class YoloBody(nn.Module):
　　 def __init__(self, config):
　　 super(YoloBody, self).__init__()
　　 self.config = config
　　 # backbone
　　 self.backbone = darknet53(None)
　　 self.conv1 = make_three_conv([512,1024],1024)
　　 self.SPP = SpatialPyramidPooling()
　　 self.conv2 = make_three_conv([512,1024],2048)
　　 self.upsample1 = Upsample(512,256)
　　 self.conv_for_P4 = conv2d(512,256,1)
　　 self.make_five_conv1 = make_five_conv([256, 512],512)
　　 self.upsample2 = Upsample(256,128)
　　 self.conv_for_P3 = conv2d(256,128,1)
　　 self.make_five_conv2 = make_five_conv([128, 256],256)
　　 # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
　　 final_out_filter2 = len(config["yolo"]["anchors"][2]) * (5 + config["yolo"]["classes"])
　　 self.yolo_head3 = yolo_head([256, final_out_filter2],128)
　　 self.down_sample1 = conv2d(128,256,3,stride=2)
　　 self.make_five_conv3 = make_five_conv([256, 512],512)
　　 # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
　　 final_out_filter1 = len(config["yolo"]["anchors"][1]) * (5 + config["yolo"]["classes"])
　　 self.yolo_head2 = yolo_head([512, final_out_filter1],256)
　　 self.down_sample2 = conv2d(256,512,3,stride=2)
　　 self.make_five_conv4 = make_five_conv([512, 1024],1024)
　　 # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
　　 final_out_filter0 = len(config["yolo"]["anchors"][0]) * (5 + config["yolo"]["classes"])
　　 self.yolo_head1 = yolo_head([1024, final_out_filter0],512)
　　 def forward(self, x):
　　 # backbone
　　 x2, x1, x0 = self.backbone(x)
　　 P5 = self.conv1(x0)
　　 P5 = self.SPP(P5)
　　 P5 = self.conv2(P5)
　　 P5_upsample = self.upsample1(P5)
　　 P4 = self.conv_for_P4(x1)
　　 P4 = torch.cat([P4,P5_upsample],axis=1)
　　 P4 = self.make_five_conv1(P4)
　　 P4_upsample = self.upsample2(P4)
　　 P3 = self.conv_for_P3(x2)
　　 P3 = torch.cat([P3,P4_upsample],axis=1)
　　 P3 = self.make_five_conv2(P3)
　　 P3_downsample = self.down_sample1(P3)
　　 P4 = torch.cat([P3_downsample,P4],axis=1)
　　 P4 = self.make_five_conv3(P4)
　　 P4_downsample = self.down_sample2(P4)
　　 P5 = torch.cat([P4_downsample,P5],axis=1)
　　 P5 = self.make_five_conv4(P5)
　　 out2 = self.yolo_head3(P3)
　　 out1 = self.yolo_head2(P4)
　　 out0 = self.yolo_head1(P5)
　　 return out0, out1, out2

3、YoloHead利用获得到的特征进行预测

　　当输入是416x416时，特征结构如下：

　　当输入是608x608时，特征结构如下：

　　1、在特征利用部分，YoloV4提取多特征层进行目标检测，一共提取三个特征层，分别位于中间层，中下层，底层，三个特征层的shape分别为(76,76,256)、(38,38,512)、(19,19,1024)。

　　2、输出层的shape分别为(19,19,75)，(38,38,75)，(76,76,75)，最后一个维度为75是因为该图是基于voc数据集的，它的类为20种，YoloV4只有针对每一个特征层存在3个先验框，所以最后维度为3x25；

　　如果使用的是coco训练集，类则为80种，最后的维度应该为255 = 3x85，三个特征层的shape为(19,19,255)，(38,38,255)，(76,76,255)

　　实现代码如下：

#---------------------------------------------------#
　　# 最后获得yolov4的输出
　　#---------------------------------------------------#
　　def yolo_head(filters_list, in_filters):
　　 m = nn.Sequential(
　　 conv2d(in_filters, filters_list[0], 3),
　　 nn.Conv2d(filters_list[0], filters_list[1], 1),
　　 )
　　 return m
　　#---------------------------------------------------#
　　# yolo_body
　　#---------------------------------------------------#
　　class YoloBody(nn.Module):
　　 def __init__(self, config):
　　 super(YoloBody, self).__init__()
　　 self.config = config
　　 # backbone
　　 self.backbone = darknet53(None)
　　 self.conv1 = make_three_conv([512,1024],1024)
　　 self.SPP = SpatialPyramidPooling()
　　 self.conv2 = make_three_conv([512,1024],2048)
　　 self.upsample1 = Upsample(512,256)
　　 self.conv_for_P4 = conv2d(512,256,1)
　　 self.make_five_conv1 = make_five_conv([256, 512],512)
　　 self.upsample2 = Upsample(256,128)
　　 self.conv_for_P3 = conv2d(256,128,1)
　　 self.make_five_conv2 = make_five_conv([128, 256],256)
　　 # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
　　 final_out_filter2 = len(config["yolo"]["anchors"][2]) * (5 + config["yolo"]["classes"])
　　 self.yolo_head3 = yolo_head([256, final_out_filter2],128)
　　 self.down_sample1 = conv2d(128,256,3,stride=2)
　　 self.make_five_conv3 = make_five_conv([256, 512],512)
　　 # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
　　 final_out_filter1 = len(config["yolo"]["anchors"][1]) * (5 + config["yolo"]["classes"])
　　 self.yolo_head2 = yolo_head([512, final_out_filter1],256)
　　 self.down_sample2 = conv2d(256,512,3,stride=2)
　　 self.make_five_conv4 = make_five_conv([512, 1024],1024)
　　 # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
　　 final_out_filter0 = len(config["yolo"]["anchors"][0]) * (5 + config["yolo"]["classes"])
　　 self.yolo_head1 = yolo_head([1024, final_out_filter0],512)
　　 def forward(self, x):
　　 # backbone
　　 x2, x1, x0 = self.backbone(x)
　　 P5 = self.conv1(x0)
　　 P5 = self.SPP(P5)
　　 P5 = self.conv2(P5)
　　 P5_upsample = self.upsample1(P5)
　　 P4 = self.conv_for_P4(x1)
　　 P4 = torch.cat([P4,P5_upsample],axis=1)
　　 P4 = self.make_five_conv1(P4)
　　 P4_upsample = self.upsample2(P4)
　　 P3 = self.conv_for_P3(x2)
　　 P3 = torch.cat([P3,P4_upsample],axis=1)
　　 P3 = self.make_five_conv2(P3)
　　 P3_downsample = self.down_sample1(P3)
　　 P4 = torch.cat([P3_downsample,P4],axis=1)
　　 P4 = self.make_five_conv3(P4)
　　 P4_downsample = self.down_sample2(P4)
　　 P5 = torch.cat([P4_downsample,P5],axis=1)
　　 P5 = self.make_five_conv4(P5)
　　 out2 = self.yolo_head3(P3)
　　 out1 = self.yolo_head2(P4)
　　 out0 = self.yolo_head1(P5)
　　 return out0, out1, out2

4、预测结果的解码

　　由第二步我们可以获得三个特征层的预测结果，shape分别为(N,19,19,255)，(N,38,38,255)，(N,76,76,255)的数据，对应每个图分为19x19、38x38、76x76的网格上3个预测框的位置。

　　但是这个预测结果并不对应着最终的预测框在图片上的位置，还需要解码才可以完成。

　　此处要讲一下yolo3的预测原理，yolo3的3个特征层分别将整幅图分为19x19、38x38、76x76的网格，每个网络点负责一个区域的检测。

　　我们知道特征层的预测结果对应着三个预测框的位置，我们先将其reshape一下，其结果为(N,19,19,3,85)，(N,38,38,3,85)，(N,76,76,3,85)。

　　最后一个维度中的85包含了4+1+80，分别代表x_offset、y_offset、h和w、置信度、分类结果。

　　yolo3的解码过程就是将每个网格点加上它对应的x_offset和y_offset，加完后的结果就是预测框的中心，然后再利用先验框和h、w结合计算出预测框的长和宽。这样就能得到整个预测框的位置了。

　　当然得到最终的预测结构后还要进行得分排序与非极大抑制筛选这一部分基本上是所有目标检测通用的部分。不过该项目的处理方式与其它项目不同。其对于每一个类进行判别。

　　1、取出每一类得分大于self.obj_threshold的框和得分。

　　2、利用框的位置和得分进行非极大抑制。

　　实现代码如下，当调用yolo_eval时，就会对每个特征层进行解码：

import torch
　　import torch.nn as nn
　　from torchvision.ops import nms
　　import numpy as np
　　class DecodeBox():
　　 def __init__(self, anchors, num_classes, input_shape, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
　　 super(DecodeBox, self).__init__()
　　 self.anchors = anchors
　　 self.num_classes = num_classes
　　 self.bbox_attrs = 5 + num_classes
　　 self.input_shape = input_shape
　　 #-----------------------------------------------------------#
　　 # 13x13的特征层对应的anchor是[142, 110],[192, 243],[459, 401]
　　 # 26x26的特征层对应的anchor是[36, 75],[76, 55],[72, 146]
　　 # 52x52的特征层对应的anchor是[12, 16],[19, 36],[40, 28]
　　 #-----------------------------------------------------------#
　　 self.anchors_mask = anchors_mask
　　 def decode_box(self, inputs):
　　 outputs = []
　　 for i, input in enumerate(inputs):
　　 #-----------------------------------------------#
　　 # 输入的input一共有三个，他们的shape分别是
　　 # batch_size, 255, 13, 13
　　 # batch_size, 255, 26, 26
　　 # batch_size, 255, 52, 52
　　 #-----------------------------------------------#
　　 batch_size = input.size(0)
　　 input_height = input.size(2)
　　 input_width = input.size(3)
　　 #-----------------------------------------------#
　　 # 输入为416x416时
　　 # stride_h = stride_w = 32、16、8
　　 #-----------------------------------------------#
　　 stride_h = self.input_shape[0] / input_height
　　 stride_w = self.input_shape[1] / input_width
　　 #-------------------------------------------------#
　　 # 此时获得的scaled_anchors大小是相对于特征层的
　　 #-------------------------------------------------#
　　 scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]
　　 #-----------------------------------------------#
　　 # 输入的input一共有三个，他们的shape分别是
　　 # batch_size, 3, 13, 13, 85
　　 # batch_size, 3, 26, 26, 85
　　 # batch_size, 3, 52, 52, 85
　　 #-----------------------------------------------#
　　 prediction = input.view(batch_size, len(self.anchors_mask[i]),
　　 self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
　　 #-----------------------------------------------#
　　 # 先验框的中心位置的调整参数
　　 #-----------------------------------------------#
　　 x = torch.sigmoid(prediction[..., 0]) 
　　 y = torch.sigmoid(prediction[..., 1])
　　 #-----------------------------------------------#
　　 # 先验框的宽高调整参数
　　 #-----------------------------------------------#
　　 w = prediction[..., 2]
　　 h = prediction[..., 3]
　　 #-----------------------------------------------#
　　 # 获得置信度，是否有物体
　　 #-----------------------------------------------#
　　 conf = torch.sigmoid(prediction[..., 4])
　　 #-----------------------------------------------#
　　 # 种类置信度
　　 #-----------------------------------------------#
　　 pred_cls = torch.sigmoid(prediction[..., 5:])
　　 FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
　　 LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
　　 #----------------------------------------------------------#
　　 # 生成网格，先验框中心，网格左上角 
　　 # batch_size,3,13,13
　　 #----------------------------------------------------------#
　　 grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
　　 batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
　　 grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
　　 batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)
　　 #----------------------------------------------------------#
　　 # 按照网格格式生成先验框的宽高
　　 # batch_size,3,13,13
　　 #----------------------------------------------------------#
　　 anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
　　 anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
　　 anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
　　 anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)
　　 #----------------------------------------------------------#
　　 # 利用预测结果对先验框进行调整
　　 # 首先调整先验框的中心，从先验框中心向右下角偏移
　　 # 再调整先验框的宽高。
　　 #----------------------------------------------------------#
　　 pred_boxes = FloatTensor(prediction[..., :4].shape)
　　 pred_boxes[..., 0] = x.data + grid_x
　　 pred_boxes[..., 1] = y.data + grid_y
　　 pred_boxes[..., 2] = torch.exp(w.data) * anchor_w
　　 pred_boxes[..., 3] = torch.exp(h.data) * anchor_h
　　 #----------------------------------------------------------#
　　 # 将输出结果归一化成小数的形式
　　 #----------------------------------------------------------#
　　 _scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor)
　　 output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale,
　　 conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1)
　　 outputs.append(output.data)
　　 return outputs
　　 def yolo_correct_boxes(self, box_xy, box_wh, input_shape, image_shape, letterbox_image):
　　 #-----------------------------------------------------------------#
　　 # 把y轴放前面是因为方便预测框和图像的宽高进行相乘
　　 #-----------------------------------------------------------------#
　　 box_yx = box_xy[..., ::-1]
　　 box_hw = box_wh[..., ::-1]
　　 input_shape = np.array(input_shape)
　　 image_shape = np.array(image_shape)
　　 if letterbox_image:
　　 #-----------------------------------------------------------------#
　　 # 这里求出来的offset是图像有效区域相对于图像左上角的偏移情况
　　 # new_shape指的是宽高缩放情况
　　 #-----------------------------------------------------------------#
　　 new_shape = np.round(image_shape * np.min(input_shape/image_shape))
　　 offset = (input_shape - new_shape)/2./input_shape
　　 scale = input_shape/new_shape
　　 box_yx = (box_yx - offset) * scale
　　 box_hw *= scale
　　 box_mins = box_yx - (box_hw / 2.)
　　 box_maxes = box_yx + (box_hw / 2.)
　　 boxes = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1)
　　 boxes *= np.concatenate([image_shape, image_shape], axis=-1)
　　 return boxes
　　 def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
　　 #----------------------------------------------------------#
　　 # 将预测结果的格式转换成左上角右下角的格式。
　　 # prediction [batch_size, num_anchors, 85]
　　 #----------------------------------------------------------#
　　 box_corner = prediction.new(prediction.shape)
　　 box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
　　 box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
　　 box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
　　 box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
　　 prediction[:, :, :4] = box_corner[:, :, :4]
　　 output = [None for _ in range(len(prediction))]
　　 for i, image_pred in enumerate(prediction):
　　 #----------------------------------------------------------#
　　 # 对种类预测部分取max。
　　 # class_conf [num_anchors, 1] 种类置信度
　　 # class_pred [num_anchors, 1] 种类
　　 #----------------------------------------------------------#
　　 class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)
　　 #----------------------------------------------------------#
　　 # 利用置信度进行第一轮筛选
　　 #----------------------------------------------------------#
　　 conf_mask = (image_pred[:, 4] * class_conf[:, 0] >= conf_thres).squeeze()
　　 #----------------------------------------------------------#
　　 # 根据置信度进行预测结果的筛选
　　 #----------------------------------------------------------#
　　 image_pred = image_pred[conf_mask]
　　 class_conf = class_conf[conf_mask]
　　 class_pred = class_pred[conf_mask]
　　 if not image_pred.size(0):
　　 continue
　　 #-------------------------------------------------------------------------#
　　 # detections [num_anchors, 7]
　　 # 7的内容为：x1, y1, x2, y2, obj_conf, class_conf, class_pred
　　 #-------------------------------------------------------------------------#
　　 detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)
　　 #------------------------------------------#
　　 # 获得预测结果中包含的所有种类
　　 #------------------------------------------#
　　 unique_labels = detections[:, -1].cpu().unique()
　　 if prediction.is_cuda:
　　 unique_labels = unique_labels.cuda()
　　 detections = detections.cuda()
　　 for c in unique_labels:
　　 #------------------------------------------#
　　 # 获得某一类得分筛选后全部的预测结果
　　 #------------------------------------------#
　　 detections_class = detections[detections[:, -1] == c]
　　 #------------------------------------------#
　　 # 使用官方自带的非极大抑制会速度更快一些！
　　 #------------------------------------------#
　　 keep = nms(
　　 detections_class[:, :4],
　　 detections_class[:, 4] * detections_class[:, 5],
　　 nms_thres
　　 )
　　 max_detections = detections_class[keep]
　　 # # 按照存在物体的置信度排序
　　 # _, conf_sort_index = torch.sort(detections_class[:, 4]*detections_class[:, 5], descending=True)
　　 # detections_class = detections_class[conf_sort_index]
　　 # # 进行非极大抑制
　　 # max_detections = []
　　 # while detections_class.size(0):
　　 # # 取出这一类置信度最高的，一步一步往下判断，判断重合程度是否大于nms_thres，如果是则去除掉
　　 # max_detections.append(detections_class[0].unsqueeze(0))
　　 # if len(detections_class) == 1:
　　 # break
　　 # ious = bbox_iou(max_detections[-1], detections_class[1:])
　　 # detections_class = detections_class[1:][ious < nms_thres]
　　 # # 堆叠
　　 # max_detections = torch.cat(max_detections).data
　　 # Add max detections to outputs
　　 output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))
　　 if output[i] is not None:
　　 output[i] = output[i].cpu().numpy()
　　 box_xy, box_wh = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
　　 output[i][:, :4] = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
　　 return output