一、创新点介绍
根据大白老师的《深入浅出Yolo系列之Yolov5核心基础知识完整讲解》,YOLOv5有如下创新点:
(1)输入端:Mosaic数据增强、自适应锚框计算、自适应图片缩放
(2)Backbone:Focus结构,CSP结构
(3)Neck:FPN+PAN结构
(4)Prediction:GIOU_Loss
二、Mosaic数据增强
1.原理
把4张图片,通过随机缩放、随机裁减、随机排布的方式拼接到一张图片上。具体可参考:深度学习中小知识点系列(三) 解读Mosaic 数据增强_mosaic数据增强-CSDN博客
2.优点
(1)丰富数据集:随机使用4张图片,随机缩放,再随机分布进行拼接,大大丰富了检测数据集,特别是随机缩放增加了很多小目标,让网络的鲁棒性更好;
(2)减少GPU
显存:直接计算4张图片的数据,使得Mini-batch
大小并不需要很大就可以达到比较好的效果。
3.代码详解
代码采用yolov5-5.0版本:GitHub - ultralytics/yolov5 at v5.0,代码位置在utils/datasets.py的load_mosaic函数里。
def load_mosaic(self, index):
# loads images in a 4-mosaic
labels4, segments4 = [], []
s = self.img_size #s是模型的输入大小
yc, xc = [int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border] # mosaic center x, y, 定义mosaic的中心,其中self.mosaic_border=[-img_size//2,-img_size//2],xc,yc的取值范围为(img_size/2,img_size*3/2),这样的取值范围实现了后面拼接时对图片的随机裁剪
indices = [index] + random.choices(self.indices, k=3) # 3 additional image indices, 当前图片索引再随机选取另外三张图片的索引,保证了图片信息的随机分布
for i, index in enumerate(indices): #根据索引循环处理四张图片
# Load image
img, _, (h, w) = load_image(self, index) #根据索引加载图片
# place img in img4 #这里将四张图片拼接到一张大图上,大图、小图的拼接坐标可参考上述原理中的博客链接
if i == 0: # top left #添加拼接图片的左上角图片
img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8) # base image with 4 tiles,创建一张2*2倍模型输入大小的灰色空白图片用来拼接4张图片
x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc # xmin, ymin, xmax, ymax (large image) #x1a, y1a, x2a, y2a是大图上用来拼接左上角图片的矩形区域
x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h # xmin, ymin, xmax, ymax (small image) #x1b, y1b, x2b, y2b是左上角图片被裁剪的矩形区域,用于后面拼接到大图上
elif i == 1: # top right #添加拼接图片的右上角图片
x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
elif i == 2: # bottom left #添加拼接图片的左下角图片
x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
elif i == 3: # bottom right #添加拼接图片的右下角图片
x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b] # img4[ymin:ymax, xmin:xmax],将小图区域拼接到大图区域
padw = x1a - x1b
padh = y1a - y1b
# Labels
labels, segments = self.labels[index].copy(), self.segments[index].copy()
if labels.size:
labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh) # normalized xywh to pixel xyxy format,将标签进行格式转化
segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
labels4.append(labels)
segments4.extend(segments)
# Concat/clip labels
labels4 = np.concatenate(labels4, 0) # 将标签进行列拼接
for x in (labels4[:, 1:], *segments4):
np.clip(x, 0, 2 * s, out=x) # clip when using random_perspective()
# img4, labels4 = replicate(img4, labels4) # replicate
# Augment
img4, labels4 = random_perspective(img4, labels4, segments4,
degrees=self.hyp['degrees'],
translate=self.hyp['translate'],
scale=self.hyp['scale'],
shear=self.hyp['shear'],
perspective=self.hyp['perspective'],
border=self.mosaic_border) # border to remove #数据增强
return img4, labels4
4.效果展示
在train.py加载数据集的地方可以打开、关闭mosaic数据增强:
# Trainloader
dataloader, dataset = create_dataloader(train_path, imgsz, batch_size, gs, opt,
hyp=hyp, augment=False, cache=opt.cache_images, rect=opt.rect, rank=rank,
world_size=opt.world_size, workers=opt.workers,
image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr('train: ')) #augment=False即关掉mosaic
(1)打开mosaic数据增强的训练样例:
(2)关闭mosaic数据增强的训练样例: