1. 层和块

1.1 自定义块

块/模块（block）可以描述单个层、由多个层（lay）组成的组件或整个神经网络模型本身。
- 复杂的模块也可以由简单的模块组成
从编程的角度，块由类表示，一般继承自torch的nn.Module
- 定义__init__构造函数，声明组成块所需要的层；
- 定义forward前向传播函数，表明模型如何将输入转换为输出；
- 由于torch自动微分的性质，将自动实现反向传播函数，以更新模型参数。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


import torch
from torch import nn
from torch.nn import functional as F

class MLP(nn.Module):
    # 用模型参数声明层。这里，我们声明两个全连接的层
    def __init__(self):
        # 调用MLP的父类Module的构造函数来执行必要的初始化。
        # 这样，在类实例化时也可以指定其他函数参数，例如模型参数params（稍后将介绍）
        super().__init__()
        self.hidden = nn.Linear(20, 256)  # 隐藏层
        self.out = nn.Linear(256, 10)  # 输出层

    # 定义模型的前向传播，即如何根据输入X返回所需的模型输出
    def forward(self, X):
        # 注意，这里我们使用ReLU的函数版本，其在nn.functional模块中定义。
        return self.out(F.relu(self.hidden(X)))
    
# 实例化一个神经网络
net = MLP()
# 模拟输入
X = torch.rand(2, 20)
# 计算输出
net(X)

torch.nn.functional 模块包含了大量用于构建神经网络的常用函数，可以被用来定义损失函数、激活函数、池化操作等。

1.2 顺序块

1
2
3


net = nn.Sequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
X = torch.rand(2, 20)
net(X)

自定义一个Sequential类，定义两个关键函数
- 将块逐个添加到列表中的构造函数
- 按照组成的块顺序，计算输出的前向传播函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


class MySequential(nn.Module):
    # args参数表示想要叠加的块
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            # 这里，module是Module子类的一个实例。我们把它保存在'Module'类的成员
            # 变量_modules中。_module的类型是OrderedDict
            self._modules[str(idx)] = module

    def forward(self, X):
        # OrderedDict保证了按照成员添加的顺序遍历它们
        for block in self._modules.values():
            X = block(X)
        return X
    
net = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
net(X)

1.3 在前向传播函数中执行代码

灵活的模块定义

在前向传播中，可以执行Python的控制流；并且可以不依赖预定的层，而执行任意的数学运算；
在构造函数中，也可以定义一些常数参数（constant parameter），不依赖于上一层的结果。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        # 不计算梯度的随机权重参数。因此其在训练期间保持不变
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        # 使用创建的常量参数以及relu和mm函数
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        # 复用全连接层。这相当于两个全连接层共享参数
        X = self.linear(X)
        # 控制流while循环: 若L1范数大于1，则输出向量除以2
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()

net = FixedHiddenMLP()
net(X)

灵活的模块组合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(20, 64), nn.ReLU(),
                                 nn.Linear(64, 32), nn.ReLU())
        self.linear = nn.Linear(32, 16)

    def forward(self, X):
        return self.linear(self.net(X))

chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixedHiddenMLP())
chimera(X)

2. 参数管理

模型的训练目标是找到损失函数最小化的模型参数值；
基于最优模型，提出其参数保存下来后以便在其他环境中复用

1
2
3
4
5
6


import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)

2.1 参数访问

基于Sequential类定义的模型，可通过索引来访问模型的任意层
如下检查第二个全连接层的参数，包括权重与偏置

1
2


print(net[2].state_dict())
# OrderedDict([('weight', tensor([[-0.3138, -0.0693, -0.0505,  0.0699,  0.2249, -0.1000, -0.0449, -0.0910]])), ('bias', tensor([0.1179]))])

目标参数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


print(type(net[2].bias))
# <class 'torch.nn.parameter.Parameter'>
print(net[2].bias)
# Parameter containing:
# tensor([0.1179], requires_grad=True)
print(net[2].bias.data)
# tensor([0.1179])

net[2].weight.grad==None
# True

一次性访问所有参数

1
2
3
4
5
6
7
8


## 访问第一个全连接层的参数
print(*[(name, param.shape) for name, param in net[0].named_parameters()])
# ('weight', torch.Size([8, 4])) ('bias', torch.Size([8]))

## 访问所有层
print(*[(name, param.shape) for name, param in net.named_parameters()])
# ('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) 
# ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1]))

从嵌套块收集参数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


def block1():
    return nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
                         nn.Linear(8, 4), nn.ReLU())

def block2():
    net = nn.Sequential()
    for i in range(4):
        # 在这里嵌套
        net.add_module(f'block {i}', block1())
    return net

rgnet = nn.Sequential(block2(), nn.Linear(4, 1))
X = torch.rand(size=(2, 4))
rgnet(X)

# 访问第一个主要的块中，第二个子块的第一层的偏置项
rgnet[0][1][0].bias.data

2.2 参数初始化

默认情况下，PyTorch会根据一个范围均匀地初始化权重和偏置矩阵，这个范围是根据输入和输出维度计算出的。
此外，PyTorch的nn.init模块提供了多种预置初始化方法。

内置初始化

nn.init.normal_将参数初始化为正态分布
nn.init.zeros_将参数初始化为0

1
2
3
4
5


def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, mean=0, std=0.01)
        nn.init.zeros_(m.bias)
net.apply(init_normal)

nn.init.constant_()可以将参数初始化为给定的常数
nn.init.xavier_uniform_()使用Xavier初始化方法初始化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


def init_xavier(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)
def init_42(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 42)
        
net[0].apply(init_xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)

自定义初始化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


def my_init(m):
    if type(m) == nn.Linear:
        print("Init", *[(name, param.shape)
                        for name, param in m.named_parameters()][0])
        nn.init.uniform_(m.weight, -10, 10)
        m.weight.data *= m.weight.data.abs() >= 5

net.apply(my_init)
net[0].weight[:2]

# 直接修改
net[0].weight.data[:] += 1
net[0].weight.data[0, 0] = 42
net[0].weight.data[0]

2.3 参数绑定

定义一个公共层/块，在多个地方使用，共享参数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


# 我们需要给共享层一个名称，以便可以引用它的参数
shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
                    shared, nn.ReLU(),
                    shared, nn.ReLU(),
                    nn.Linear(8, 1))
net(X)
# 检查参数是否相同
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# 确保它们实际上是同一个对象，而不只是有相同的值
print(net[2].weight.data[0] == net[4].weight.data[0])

3. 自定义层

同样继承自nn.Module，灵活地适用于各种任务的架构

3.1 不带参数的层

如下定义一个层，从输入中减去均值

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


import torch
import torch.nn.functional as F
from torch import nn

class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()
    
layer = CenteredLayer()
layer(torch.FloatTensor([1, 2, 3, 4, 5]))

将该层作为组间，合并到更复杂的模型/模块中

1
2
3


net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())
Y = net(torch.rand(4, 8))
Y.mean()

3.2 带参数的层

使用nn.Parameter创建参数
如下自定义一个全连接层：in_units参数设置输入神经元的维度，units设置输出神经元的维度。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))
    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)
    
linear = MyLinear(5, 3)
linear.weight

# 计算输出
linear(torch.rand(2, 5))

# 组成复杂模型
net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
net(torch.rand(2, 64))

5. 读写文件

1
2
3


import torch
from torch import nn
from torch.nn import functional as F

5.1 加载和保存张量

张量

1
2
3
4
5
6


# save
x = torch.arange(4)
torch.save(x, 'x-file')
# load
x2 = torch.load('x-file')
x2

张量列表

1
2
3
4


y = torch.zeros(4)
torch.save([x, y],'x-files')
x2, y2 = torch.load('x-files')
(x2, y2)

字典

1
2
3
4


mydict = {'x': x, 'y': y}
torch.save(mydict, 'mydict')
mydict2 = torch.load('mydict')
mydict2

5.2 加载和保存模型参数

相较于保存模型本身，torch支持保存模型所有的参数；
在新的环境使用时，需要先单独指定模型架构，然后套用之前训练的参数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)

    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

net = MLP()
X = torch.randn(size=(2, 20))
Y = net(X)

#保存模型参数
torch.save(net.state_dict(), 'mlp.params')

1
2
3
4
5
6
7


# 新建模型，套用保存的参数
clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone

Y_clone = clone(X)
Y_clone == Y

6. GPU

默认情况下，所有变量和相关的计算部分都分配给CPU；
当在带有GPU的服务器上训练神经网络时，希望模型的参数在GPU上；

1
2


# !表示在ipython环境中使用linux命令
!nvidia-smi

6.1 计算设备

默认情况下，张量是在内存中创建的，然后使用CPU计算它
可通过torch.device()指定计算的设备
- torch.device('cpu')指代所有物理CPU和内存
- torch.device('cuda')默认指代0号GPU，等价于torch.device('cuda:0')

1
2
3
4
5
6
7


import torch
from torch import nn

torch.device('cpu'), torch.device('cuda'), torch.device('cuda:1')

# 查询可用GPU数量
torch.cuda.device_count()

定义两个方便的函数调用GPU

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


def try_gpu(i=0):  #@save
    """如果存在，则返回gpu(i)，否则返回cpu()"""
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

# 返回所有的GPU
def try_all_gpus():  #@save
    """返回所有可用的GPU，如果没有GPU，则返回[cpu(),]"""
    devices = [torch.device(f'cuda:{i}')
             for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]

try_gpu(), try_gpu(10), try_all_gpus()

6.2 张量与GPU

查询张量所在的设备

1
2


x = torch.tensor([1, 2, 3])
x.device

存储在GPU上

在创建时，指定GPU

1
2
3
4
5


X = torch.ones(2, 3, device=try_gpu())
X

Y = torch.rand(2, 3, device=try_gpu(1))
Y

复制

将数据复制到另一个设备

1
2
3
4
5


Z = X.cuda(1)
print(X)
print(Z)

Y + Z

6.3 神经网络与GPU

将模型参数放在GPU上

1
2
3
4
5
6
7


net = nn.Sequential(nn.Linear(3, 1))
net = net.to(device=try_gpu())

net(X)

# 确认模型参数存储在同一个GPU上
net[0].weight.data.device

1. 层和块#

1.1 自定义块#

1.2 顺序块#

1.3 在前向传播函数中执行代码#

2. 参数管理#

2.1 参数访问#

2.2 参数初始化#

2.3 参数绑定#

3. 自定义层#

3.1 不带参数的层#

3.2 带参数的层#

5. 读写文件#

5.1 加载和保存张量#

5.2 加载和保存模型参数#

6. GPU#

6.1 计算设备#

6.2 张量与GPU#

6.3 神经网络与GPU#

1. 层和块

1.1 自定义块

1.2 顺序块

1.3 在前向传播函数中执行代码

2. 参数管理

2.1 参数访问

2.2 参数初始化

2.3 参数绑定

3. 自定义层

3.1 不带参数的层

3.2 带参数的层

5. 读写文件

5.1 加载和保存张量

5.2 加载和保存模型参数

6. GPU

6.1 计算设备

6.2 张量与GPU

6.3 神经网络与GPU