0%

Pytorch之卷积神经网络

本节主要介绍了卷积神经网络的基本概念,并以cifar10图像分类任务为例,学习了VGGNET、RESNET、MobileNet、InceptionModule等常见网络的搭建。

卷积层

1.torch.nn.Conv2d

  • 对图像和滤波矩阵做内积(逐个元素相乘再求和)的操作

  • torch.nn.Conv2d(in_channels,out_channels,kernel_size,stride=1,padding=0,dilation=1,groups=1,bias=True,padding_mode='zeros')

    • in_channels(int):输入的特征维度
    • out_channels(int):输出的特征维度
    • kernel_size(int or tuple):卷积核大小
    • stride(int or tuple):卷积的步幅,默认值为1
    • padding(int or tuple):添加到输入两侧的零填充数量,默认值为0
    • dilation(int or tuple):内核元素之间的间距,默认值为1
    • groups(int):从输入通道到输出通道的阻塞连接数
    • bias(bool):默认值为True,如果为True,则向输出添加可学习的偏差
    • padding_mode(str):可选值为”zeros”、”reflect”、”replicate”、“circular”,默认值为”zeros”
  • 输入输出形状的关系:
    $$
    H_{out}=[\frac{H_{in}+2\times padding[0]-dilation[0]\times(kernel_size[0]-1)-1}{stride[0]}+1]
    $$

    $$
    W_{out}=[\frac{W_{in}+2\times padding[1]-dilation[1]\times(kernel_size[1]-1)-1}{stride[1]}+1]
    $$

  • dilation:扩张操作,控制kernel点(卷积核点)的间距

    image-20230809175517005
  • padding_mode:填充方式

  • Same Padding:在stride为1的情况下,若想让输入输出尺寸一致,需要指定padding数为kerner_size的一半

  • Full Padding:在stride为1的情况下,padding=kerner_size-1

2.torch.nn.ConvTranspose2d

  • torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')

  • 在反卷积中,stride参数就是表示往输入图片每两个像素点中间填充0,而填充的数量就是 stride - 1

  • 输入与输出形状的关系:
    $$
    H_{out}=(H_{in}-1)\times stride[0]-2\times padding[0]+dilation[0]\times(kernel_size[0]-1)+out_padding[0]+1
    $$

    $$
    W_{out}=(W_{in}-1)\times stride[1]-2\times padding[1]+dilation[1]\times(kernel_size[1]-1)+out_padding[1]+1
    $$

  • output_padding的作用就是:在输出图像右侧和下侧补值,用于弥补stride大于1带来的缺失

  • ConvTranspose2d可用于上采样

  • 反卷积通俗详细解析与nn.ConvTranspose2d重要参数解释_11456419的技术博客_51CTO博客

3.感受野

  • 使用两个3×3的卷积核级联与使用一个5×5的卷积核得到的感受野是一样的,但参数个数却少了image-20221107124918062

4.常见的卷积层组合结构

  • 堆叠–跳连–并连

    image-20230809190541362

5.池化层

  • 对输入的特征图进行压缩:

    • 一方面使特征图变小,简化网络计算复杂度
    • 一方面进行特征压缩,提取主要特征
  • 最大池化:torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

    • kernel_size :表示做最大池化的窗口大小,可以是单个值,也可以是tuple元组
    • stride :步长,可以是单个值,也可以是tuple元组。如果不指定这个参数,那么默认步长跟最大池化窗口大小一致。如果指定了参数,那么将按照我们指定的参数进行滑动
    • padding :填充,可以是单个值,也可以是tuple元组
    • dilation :控制窗口中元素步幅
    • return_indices :布尔类型,返回最大值位置索引
    • ceil_mode :布尔类型,为True,用向上取整的方法,计算输出形状;默认是向下取整。
  • 输入与输出形状的关系:
    $$
    H_{out}=[\frac{H_{in}+2\times padding[0]-dilation[0]\times(kernel_size[0]-1)-1}{stride[0]}+1]
    $$

    $$
    W_{out}=[\frac{W_{in}+2\times padding[1]-dilation[1]\times(kernel_size[1]-1)-1}{stride[1]}+1]
    $$

  • torch.nn.MaxPool2d详解_Medlen的博客-CSDN博客


激活层

  • 激活函数:为了增加网络的非线性,进而提升网络的表达能力
  • 卷积层后经常加上ReLU层去提高网络的非线性表达能力
  • torch.nn.ReLU()

BatchNorm层

  • 通过一定的规范化手段,把每层神经网络任意神经元这个输入值的分布强行拉回到均值为0方差为1的标准正态分布中
  • Batchnorm是归一化的一种手段,它会减小图像之间的绝对差异,突出相对差异,加快训练速度
  • BN的缺点:
    • batch_size较小的时候,效果差。BN的过程,使用整个batch中样本的均值和方差来模拟全部数据的均值和方差,在batch_size 较小的时候,效果肯定不好
    • BN在RNN中效果比较差
  • nn.BatchNorm2d(num_features)

Dropout层

  • 在不同的训练过程中随机扔掉一部分神经元
  • 测试过程中不使用随机失活,所有的神经元都激活
  • 为了防止或减轻过拟合而使用的函数,它一般用在全连接层
  • nn.dropout

损失层


学习率与优化器


cifar10图像分类任务

1.读取cifar10数据集

  • cifar10训练集包括5个batch,每个batch有10000个数据,每个数据又以字典的形式存储了图片所在的batch值,图片标签,图片数据(3072个数据,即$32\times 32\times 3$),图片名称

    1
    dict_keys([b'batch_label', b'labels', b'data', b'filenames'])
  • 打印每张图片标签、图片名称(注意这里名称是byte存储形式)、图片数据 nj

    image-20230813215413389

  • 可视化图片数据$32\times 32\times 3$

    image-20230813215723948

  • 每个类别存储在对应文件夹中:

    image-20230813220124842
  • 最终图片存储形式:

    image-20230813220102126
  • readcifar10.py

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    import os 
    import cv2
    import numpy as np

    #在cifar10官网拷贝cifar10数据解析函数
    def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
    dict = pickle.load(fo, encoding='bytes')
    return dict

    #定义10个label
    label_name = ["airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck"]

    #训练图片数据存放的路径
    save_path = "D:\\App_Data_File\\VScode_Project\\Python\\Pytorch\\Cifar10\\cifar10_batches_py\\TRAIN" #将TRAIN改为TEST即可处理测试集中的内容

    import glob #调用glob读取当前文件夹下匹配的文件
    train_list = glob.glob("D:\\App_Data_File\\VScode_Project\\Python\\Pytorch\\Cifar10\\cifar10_batches_py\\data_batch_*")#将data_batch_*改为test_batch*即可处理测试集中的内容
    #print(train_list) #以列表的形式存储了五个文件路径

    for l in train_list:
    #print(l)
    l_dict = unpickle(l)
    #print(l_dict)
    #print(l_dict.keys()) #打印字典的所有键值key

    for im_idx, im_data in enumerate(l_dict[b'data']):
    #print(im_idx)
    #print(im_data)
    im_label = l_dict[b'labels'][im_idx]
    im_name = l_dict[b'filenames'][im_idx]
    #print(im_label, im_name, im_data) #打印每张图片标签、图片名称、图片数据

    #将数据转化为图片形式存储在TRAIN文件夹中
    im_label_name = label_name[im_label]
    im_data = np.array(im_data) #将图片数据转化为np形式
    im_data = im_data.reshape(3, 32, 32) #将图片转化为32*32*3的格式
    im_data = im_data.transpose(1, 2, 0)

    #cv2.imshow('im_data', im_data) #可视化读取的图片数据
    #cv2.waitKey(0)

    if not os.path.exists("{}\\{}".format(save_path, im_label_name)): #对每一个类别创建一个文件夹
    os.mkdir("{}\\{}".format(save_path, im_label_name)) #如果不存在对应文件夹则创建文件夹

    cv2.imwrite("{}\\{}\\{}".format(save_path, #通过imwrite写入图片
    im_label_name,
    im_name.decode("utf-8")),
    im_data) #.decode("utf-8")将byte型转化为字符串型

2.自定义数据加载

  • load_cifar10.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
from torchvision import transforms
from torch.utils.data import DataLoader, Dataset
import os
from PIL import Image
import numpy as np
import glob

label_name = ["airplane", "automobile", "bird",
"cat", "deer", "dog", "frog",
"horse", "ship", "truck"]

#将字符串label转化为数字0~9
label_dict = {}
for idx, name in enumerate(label_name):
label_dict[name] = idx

def default_loader(path):
return Image.open(path).convert("RGB") #如果不使用.convert(‘RGB’)进行转换的话,读出来的图像是RGBA四通道的,A通道为透明通道

train_transform = transforms.Compose([
transforms.RandomCrop(28),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()
])

test_transform = transforms.Compose([
transforms.Resize((28,28)),
transforms.ToTensor()
])

# train_transform = transforms.Compose([ #Compose来拼接多个数据增强的方法
# transforms.RandomResizedCrop((28,28)),
# transforms.RandomHorizontalFlip(),
# transforms.RandomVerticalFlip(),
# transforms.RandomRotation(90),
# transforms.RandomGrayscale(0.1),
# transforms.ColorJitter(0.3, 0.3, 0.3, 0.3),
# transforms.ToTensor()
# ])

class MyDataset(Dataset): #自定义的数据加载的类
def __init__(self, im_list, transform = None, loader = default_loader):
super(MyDataset, self).__init__()
imgs = []
for im_item in im_list:
im_label_name = im_item.split("\\")[-2] #-2对应类别号
imgs.append([im_item, label_dict[im_label_name]]) #得到图片的路径及其label

self.imgs = imgs
self.transform = transform
self.loader = loader

def __getitem__(self, index): #定义读取元素的方式
im_path, im_label = self.imgs[index]
im_data = self.loader(im_path)

if self.transform is not None:
im_data = self.transform(im_data)

return im_data, im_label

def __len__(self): #返回样本的数量
return len(self.imgs)

im_train_list = glob.glob("D:\\App_Data_File\\VScode_Project\\Python\\Pytorch\\Cifar10\\cifar10_batches_py\\TRAIN\\*\\*.png")
im_test_list = glob.glob("D:\\App_Data_File\\VScode_Project\\Python\\Pytorch\\Cifar10\\cifar10_batches_py\\TEST\\*\\*.png")

train_dataset = MyDataset(im_train_list, transform = train_transform)

test_dataset = MyDataset(im_test_list, transform = test_transform)

train_loader = DataLoader(dataset = train_dataset,
batch_size = 128,
shuffle = True,
num_workers = 4)

test_loader = DataLoader(dataset = test_dataset,
batch_size = 128,
shuffle = False,
num_workers = 4)

print("num_of_train", len(train_dataset))
print("num_of_test", len(test_dataset))

结果如下:

1
2
num_of_train 50000
num_of_test 10000

3.VGGNET网络搭建与训练

  • vggnet.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import torch 
import torch.nn as nn
import torch.nn.functional as F

class VGGbase(nn.Module):
def __init__(self):
super(VGGbase, self).__init__()

self.conv1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(64),
nn.ReLU())
self.max_pooling1 = nn.MaxPool2d(kernel_size = 2, stride = 2) #此时输出图片大小为14*14

self.conv2_1 = nn.Sequential(nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(128),
nn.ReLU())
self.conv2_2 = nn.Sequential(nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(128),
nn.ReLU())
self.max_pooling2 = nn.MaxPool2d(kernel_size = 2, stride = 2) #此时输出图片大小为7*7

self.conv3_1 = nn.Sequential(nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(256),
nn.ReLU())
self.conv3_2 = nn.Sequential(nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(256),
nn.ReLU())
self.max_pooling3 = nn.MaxPool2d(kernel_size = 2, stride = 2, padding = 1) #补零输出后图片大小为4*4

self.conv4_1 = nn.Sequential(nn.Conv2d(256, 512, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(512),
nn.ReLU())
self.conv4_2 = nn.Sequential(nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(512),
nn.ReLU())
self.max_pooling4 = nn.MaxPool2d(kernel_size = 2, stride = 2) #此时输出图片大小为2*2

self.fc = nn.Linear(512 * 4, 10)

def forward(self, x):
batchsize = x.size(0)
out = self.conv1(x)
out = self.max_pooling1(out)

out = self.conv2_1(out)
out = self.conv2_2(out)
out = self.max_pooling2(out)

out = self.conv3_1(out)
out = self.conv3_2(out)
out = self.max_pooling3(out)

out = self.conv4_1(out)
out = self.conv4_2(out)
out = self.max_pooling4(out)

out = out.view(batchsize, -1)
out = self.fc(out)
out = F.log_softmax(out, dim = 1)

return out

def VGGNet():
return VGGbase()
  • 结果如下(训练了6个epoch的结果):

    image-20230814204637731image-20230814204651060

    image-20230814204554305image-20230814204604844

    image-20230814204423334image-20230814204523272

  • 用tensorboard可视化的输入图像:

    image-20230814204834806$\quad\quad\quad$image-20230814204904235

4.RESNET网络搭建与训练

  • resnet.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import torch 
import torch.nn as nn
import torch.nn.functional as F

class ResBlock(nn.Module):
def __init__(self, in_channel, out_channel, stride = 1):
super(ResBlock, self).__init__()
self.layer = nn.Sequential( #主干分支
nn.Conv2d(in_channel, out_channel, kernel_size = 3, stride = stride, padding = 1),
nn.BatchNorm2d(out_channel),
nn.ReLU(),
nn.Conv2d(out_channel, out_channel, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(out_channel)
)
self.shortcut = nn.Sequential()
if in_channel != out_channel or stride > 1:
self.shortcut = nn.Sequential( #跳连分支
nn.Conv2d(in_channel, out_channel, kernel_size = 3, stride = stride, padding = 1),
nn.BatchNorm2d(out_channel) #保证相加时数据大小相同
)

def forward(self, x):
out1 = self.layer(x)
out2 = self.shortcut(x)
out = out1 + out2
out = F.relu(out)
return out

class ResNet(nn.Module):
def make_layer(self, block, out_channel, stride, num_block): #定义多个层
layers_list = []
for i in range(num_block):
if i == 0:
in_stride = stride #每层只进行一次stride操作,也就是每次图片尺寸/stride
else:
in_stride = 1
layers_list.append(block(self.in_channel, out_channel, in_stride))
self.in_channel = out_channel
return nn.Sequential(*layers_list)

def __init__(self, ResBlock):
self.in_channel = 32
super(ResNet, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(3, 32, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(32),
nn.ReLU()
)

self.layer1 = self.make_layer(ResBlock, 64, 2, 2)
self.layer2 = self.make_layer(ResBlock, 128, 2, 2)
self.layer3 = self.make_layer(ResBlock, 256, 2, 2)
self.layer4 = self.make_layer(ResBlock, 512, 2, 2)

self.fc = nn.Linear(512, 10)

def forward(self, x):
out = self.conv1(x)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)

out = F.avg_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = self.fc(out)
out = F.log_softmax(out, dim = 1)

return out

def resnet():
return ResNet(ResBlock)
  • 结果如下(训练了6个epoch的结果):

    image-20230815205510074image-20230815205522354

    image-20230815205551675image-20230815205607429

    image-20230815205619903image-20230815205628871

5.MobileNet网路搭建与训练

  • 深度可分卷积模型

image-20230815204836730

  • dw中输入通道数等于输出通道数,卷积核的通道数为1,输入通道需要被分成in_channel组
  • pw的输入为dw的输出,卷积核的大小为1
  • MobileNet能压缩模型计算量
  • mobilenetv1.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#轻量级网络:压缩模型计算量
import torch
import torch.nn as nn
import torch.nn.functional as F

class mobilenet(nn.Module):

def conv_dw_pw(self, in_channel, out_channel, stride):
return nn.Sequential(
nn.Conv2d(in_channel, in_channel, kernel_size = 3, stride = stride,
padding = 1, groups = in_channel, bias = False), #groups将输入与输出通道分组
nn.BatchNorm2d(in_channel),
nn.ReLU(), #dw

nn.Conv2d(in_channel, out_channel, kernel_size = 1, stride = 1,
padding = 0, bias = False),
nn.BatchNorm2d(out_channel),
nn.ReLU() #pw
)

def __init__(self):
super(mobilenet, self).__init__()

self.conv1 = nn.Sequential(
nn.Conv2d(3, 32, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(32),
nn.ReLU()
)
self.conv_dw_pw2 = self.conv_dw_pw(32, 32, 1)
self.conv_dw_pw3 = self.conv_dw_pw(32, 64, 2)

self.conv_dw_pw4 = self.conv_dw_pw(64, 64, 1)
self.conv_dw_pw5 = self.conv_dw_pw(64, 128, 2)

self.conv_dw_pw6 = self.conv_dw_pw(128, 128, 1)
self.conv_dw_pw7 = self.conv_dw_pw(128, 256, 2)

self.conv_dw_pw8 = self.conv_dw_pw(256, 256, 1)
self.conv_dw_pw9 = self.conv_dw_pw(256, 512, 2)

self.fc = nn.Linear(512, 10)

def forward(self, x):
out = self.conv1(x)
out = self.conv_dw_pw2(out)
out = self.conv_dw_pw3(out)
out = self.conv_dw_pw4(out)
out = self.conv_dw_pw5(out)
out = self.conv_dw_pw6(out)
out = self.conv_dw_pw7(out)
out = self.conv_dw_pw8(out)
out = self.conv_dw_pw9(out)

out = F.avg_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = self.fc(out)
out = F.log_softmax(out, dim = 1)

return out

def mobilenetv1_small():
return mobilenet()
  • 结果如下(训练了6个epoch的结果):

    image-20230815221611203image-20230815221623192

    image-20230815221649154image-20230815221725125

    image-20230815221746554image-20230815221755811

6.InceptionModule网络搭建与训练

  • 使用1x1的卷积核实现降维操作(也间接增加了网络的深度),以此来减小网络的参数量

    image-20230816143945162

  • inceptionModule.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import torch
import torch.nn as nn
import torch.nn.functional as F

def ConvBNRelu(in_channel, out_channel, kernel_size):
return nn.Sequential( #不改变图片形状,只改变通道维数(只对kernel_size为奇数成立)
nn.Conv2d(in_channel, out_channel, kernel_size = kernel_size,
stride = 1, padding = kernel_size // 2),
nn.BatchNorm2d(out_channel),
nn.ReLU()
)

class BaseInception(nn.Module):
def __init__(self, in_channel, out_channel_list, reduce_channel_list):
super(BaseInception, self).__init__()
self.branch1_conv = ConvBNRelu(in_channel, out_channel_list[0], 1)

self.branch2_conv1 = ConvBNRelu(in_channel, reduce_channel_list[0], 1)
self.branch2_conv2 = ConvBNRelu(reduce_channel_list[0], out_channel_list[1], 3)

self.branch3_conv1 = ConvBNRelu(in_channel, reduce_channel_list[1], 1)
self.branch3_conv2 = ConvBNRelu(reduce_channel_list[1], out_channel_list[2], 5)

self.branch4_pool = nn.MaxPool2d(kernel_size = 3, stride = 1, padding = 1)
self.branch4_conv = ConvBNRelu(in_channel, out_channel_list[3], 1)

def forward(self, x):
out1 = self.branch1_conv(x)

out2 = self.branch2_conv1(x)
out2 = self.branch2_conv2(out2)

out3 = self.branch3_conv1(x)
out3 = self.branch3_conv2(out3)

out4 = self.branch4_pool(x)
out4 = self.branch4_conv(out4)

out = torch.cat([out1, out2, out3, out4], dim = 1) #cat是将通道数合并

return out

class InceptionNet(nn.Module):
def __init__(self):
super(InceptionNet, self).__init__()
self.block1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size = 7, stride = 2, padding = 1),
nn.BatchNorm2d(64),
nn.ReLU()
) #12*12

self.block2 = nn.Sequential(
nn.Conv2d(64, 128, kernel_size = 3, stride = 2, padding = 1),
nn.BatchNorm2d(128),
nn.ReLU()
) #6*6

self.block3 = nn.Sequential(
BaseInception(in_channel = 128, out_channel_list = [64, 64, 64, 64],
reduce_channel_list = [16, 16]),
nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
) #3*3

self.block4 = nn.Sequential(
BaseInception(in_channel = 256, out_channel_list = [96, 96, 96, 96],
reduce_channel_list = [32, 32]),
nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
) #2*2

self.fc = nn.Linear(96*4, 10)

def forward(self, x):
out = self.block1(x)
out = self.block2(out)
out = self.block3(out)
out = self.block4(out)

out = F.avg_pool2d(out, 2) #1*1
out = out.view(out.size(0), -1)
out = self.fc(out)
out = F.log_softmax(out, dim = 1)

return out

def InceptionNetSmall():
return InceptionNet()
  • 结果如下(训练了6个epoch的结果):

    image-20230816150820224image-20230816150828915

    image-20230816150854735image-20230816150904546

    image-20230816150920921image-20230816150934482

7.pytorch中自带的ResNet18网络搭建与训练

  • pytorch_resnet18.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#pytorch中提供的resnet18网络
import torch.nn as nn
from torchvision import models
import torch.nn.functional as F

class resnet18(nn.Module):
def __init__(self):
super(resnet18, self).__init__()
self.model = models.resnet18(weights = models.ResNet18_Weights.IMAGENET1K_V1)
self.num_features = self.model.fc.in_features
self.model.fc = nn.Linear(self.num_features, 10)

def forward(self, x):
out = self.model(x)
out = F.log_softmax(out, dim = 1)
return out

def pytorch_resnet18():
return resnet18()
  • 结果如下(训练了6个epoch的结果):

    image-20230816171416787image-20230816171440663

    image-20230816171456684image-20230816171507867

    image-20230816171521234image-20230816171529364

8.模型训练相关的代码

  • train.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
import torch
import torch.nn as nn
import torchvision
from vggnet import VGGNet
from resnet import resnet
from mobilenetv1 import mobilenetv1_small
from inceptionModule import InceptionNetSmall
from pytorch_resnet18 import pytorch_resnet18
from load_cifar10 import train_loader, test_loader, train_dataset, test_dataset
import os
import tensorboardX

model_path = "models/pytorch_resnet18"
log_path = "logs/pytorch_resnet18"
if not os.path.exists(model_path): #mkdir函数用于创建单级目录,makedirs函数用于创建多级目录
os.makedirs(model_path)
if not os.path.exists(log_path):
os.makedirs(log_path)
writer = tensorboardX.SummaryWriter(log_path)

#判断是否存在GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#定义训练样本的训练次数
epoch_num = 200

#定义学习率的初始值
lr = 0.01

#将网络扔到device上去
#net = VGGNet().to(device)
#net = resnet().to(device)
#net = mobilenetv1_small().to(device)
net = pytorch_resnet18().to(device)

#定义损失函数
loss_func = nn.NLLLoss() #默认reduction='mean'

#定义优化器
optimizer = torch.optim.Adam(net.parameters(), lr = lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size = 5, gamma = 0.9) #学习率更新函数,step_size表示每5个epoch更新学习率

step_n_train = 0
step_n_test = 0
for epoch in range(epoch_num):
#训练集训练
net.train()
sum_loss = 0
sum_correct = 0
for i, data in enumerate(train_loader):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)

outputs = net(inputs)
loss = loss_func(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()

#计算准确率
batch_size = outputs.size(0)
_, pred = torch.max(outputs, dim = 1)
correct = pred.eq(labels.data).cpu().sum()
# print("Train step ", i, " mini-batch loss is:", loss.item(), "mini-batch correct is:", 100.0 * correct / batch_size)

writer.add_scalar("Train loss", loss.item(), global_step = step_n_train)
writer.add_scalar("Train correct", 100.0 * correct.item() / batch_size, global_step = step_n_train)

train_im = torchvision.utils.make_grid(inputs)
writer.add_image("train im", train_im, global_step = step_n_train)
step_n_train = step_n_train + 1

#loss与准确率求和运算
sum_loss += loss.item()
sum_correct += correct.item()

train_loss = sum_loss * 1.0 / (len(train_dataset) // batch_size)
train_correct = sum_correct * 100.0 / len(train_dataset)

print("Train epoch is ", epoch + 1, " epoch loss is:", train_loss,
"epoch correct is:", train_correct)

#每个epoch之后保存模型参数
torch.save(net.state_dict(), "{}/{}.path".format(model_path, epoch + 1))

#每个epoch之后更新学习率
scheduler.step()
print("lr is ", optimizer.state_dict()["param_groups"][0]["lr"]) #打印学习率的参数

#验证测试集
net.eval()
sum_loss = 0
sum_correct = 0
for i, data in enumerate(test_loader):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)

with torch.no_grad():
outputs = net(inputs)
loss = loss_func(outputs, labels)

#计算准确率
batch_size = outputs.size(0)
_, pred = torch.max(outputs, dim = 1)
correct = (pred == labels).cpu().sum()

writer.add_scalar("Test loss", loss.item(), global_step = step_n_test)
writer.add_scalar("Test correct", 100.0 * correct.item() / batch_size, global_step = step_n_test)

test_im = torchvision.utils.make_grid(inputs)
writer.add_image("test im", test_im, global_step = step_n_test)
step_n_test = step_n_test + 1

#loss与准确率求和运算
sum_loss += loss.item()
sum_correct += correct.item()

test_loss = sum_loss * 1.0 / (len(test_dataset) // batch_size)
test_correct = sum_correct * 100.0 / len(test_dataset)

writer.add_scalar("Batch Test loss", test_loss, global_step = epoch + 1)
writer.add_scalar("Batch Test correct", test_correct, global_step = epoch + 1)

print("Test epoch is ", epoch + 1, " epoch loss is:", test_loss,
"epoch correct is:", test_correct)

writer.close()
  • tensorboard可视化页面的使用:程序运行开始后,在终端中调用如下代码启动可视化页面,下面代码中路径为log文件夹所在位置

    1
    tensorboard --logdir="D:\App_Data_File\VScode_Project\Python\Pytorch\Cifar10\logs\InctionNetSmall"

9.模型验证

  • test.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import torch 
import cv2
import numpy as np
import load_cifar10
from torch.utils.data import DataLoader
import torch.nn as nn

from pytorch_resnet18 import pytorch_resnet18

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#加载测试数据
test_data = load_cifar10.test_dataset
test_loader = DataLoader(dataset = test_data, batch_size = 128, shuffle = True)

#加载训练好的模型及其参数
net = pytorch_resnet18().to(device)
net.load_state_dict(torch.load("D:\\App_Data_File\\VScode_Project\\Python\\Pytorch\\Cifar10\\models\\pytorch_resnet18\\11.path"))

#验证模型
loss_func = nn.NLLLoss()
loss_test = 0
accuracy_test = 0
#测试集
for i, (images, labels) in enumerate(test_loader):
images, labels = images.to(device), labels.to(device)
outputs = net(images)
#计算每个batch的损失和
loss_test += loss_func(outputs, labels)
#计算每个batch的正确率
_, pred = outputs.max(1) #1表示在第一个维度上,即每张图对应输出10个值的那一行,pred是最大值的索引
accuracy_test += (pred == labels).sum().item()

#可视化分析并显示结果
for idx in range(images.shape[0]):
im_data = images[idx].numpy()
im_data = im_data.transpose(1, 2, 0)
im_label = labels[idx].numpy()
im_pred = pred[idx].numpy()

print("label:", im_label, ",label_name:", load_cifar10.label_name[im_label])
print("pred", im_pred, ",pred_name:", load_cifar10.label_name[im_pred])
print(30*'-')
cv2.imshow("imdata", im_data)
cv2.waitKey(0)

accuracy_test = accuracy_test / len(test_data)
loss_test = loss_test / (len(test_data) // 64)
  • 结果如下:

    image-20230816174656948
欢迎来到ssy的世界