Layer Normalization¶

本文用一个简单的数值例子，说明了 LayerNorm 的作用。

\[ y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta \]

Python

import pytorch_lightning as pl
import torch
import torch.nn as nn

pl.seed_everything(0)

Text Only

Seed set to 0

0

Python

input_tensor = torch.randint(0, 10, (2, 3, 4)).float()
input_tensor

Text Only

tensor([[[4., 9., 3., 0.],
         [3., 9., 7., 3.],
         [7., 3., 1., 6.]],

        [[6., 9., 8., 6.],
         [6., 8., 4., 3.],
         [6., 9., 1., 4.]]])

查看 LayerNorm 的效果¶

`normalized_shape` 指定为最后一维特征¶

下面我们以 normalized_shape 为 4 为例，验证 LayerNorm 的效果。

normalized_shape 为 4 时，意味着对最后一个维度上的 4 个元素进行标准化。

以上述输入张量为例，则需要对 [4., 9., 3., 0.] 这一行数据进行标准化。

Python

layer_norm = nn.LayerNorm(4)

Python

output_tensor = layer_norm(input_tensor)
output_tensor

Text Only

tensor([[[ 0.0000,  1.5430, -0.3086, -1.2344],
         [-0.9622,  1.3471,  0.5773, -0.9622],
         [ 1.1531, -0.5241, -1.3628,  0.7338]],

        [[-0.9622,  1.3471,  0.5773, -0.9622],
         [ 0.3906,  1.4321, -0.6509, -1.1717],
         [ 0.3430,  1.3720, -1.3720, -0.3430]]],
       grad_fn=<NativeLayerNormBackward0>)

手动验证结果¶

Python

input_tensor.mean(dim=2, keepdim=True)

Text Only

tensor([[[4.0000],
         [5.5000],
         [4.2500]],

        [[7.2500],
         [5.2500],
         [5.0000]]])

Python

input_tensor.std(dim=2, unbiased=False, keepdim=True)

Text Only

tensor([[[3.2404],
         [2.5981],
         [2.3848]],

        [[1.2990],
         [1.9203],
         [2.9155]]])

Python

(input_tensor - input_tensor.mean(dim=2, keepdim=True)) / (
    input_tensor.std(dim=2, unbiased=False, keepdim=True) + 1e-5
)

Text Only

tensor([[[ 0.0000,  1.5430, -0.3086, -1.2344],
         [-0.9622,  1.3471,  0.5773, -0.9622],
         [ 1.1531, -0.5241, -1.3628,  0.7338]],

        [[-0.9622,  1.3471,  0.5773, -0.9622],
         [ 0.3906,  1.4321, -0.6509, -1.1717],
         [ 0.3430,  1.3720, -1.3720, -0.3430]]])

`normalized_shape` 指定为最后两维特征¶

下面我们以 normalized_shape 为 [3, 4] 为例，验证 LayerNorm 的效果。

normalized_shape 为 [3, 4] 时，意味着对最后两个维度上的 3 行 4 列的元素进行标准化。

以上述输入张量为例，则需要对

Text Only

[[4., 9., 3., 0.],
 [3., 9., 7., 3.],
 [7., 3., 1., 6.]]

进行标准化。

Python

layer_norm = nn.LayerNorm([3, 4])

Python

output_tensor = layer_norm(input_tensor)
output_tensor

Text Only

tensor([[[-0.2053,  1.5541, -0.5571, -1.6128],
         [-0.5571,  1.5541,  0.8504, -0.5571],
         [ 0.8504, -0.5571, -1.2609,  0.4985]],

        [[ 0.0702,  1.3335,  0.9124,  0.0702],
         [ 0.0702,  0.9124, -0.7720, -1.1932],
         [ 0.0702,  1.3335, -2.0354, -0.7720]]],
       grad_fn=<NativeLayerNormBackward0>)

手动验证结果¶

Python

input_tensor.mean(dim=(1, 2), keepdim=True)

Text Only

tensor([[[4.5833]],

        [[5.8333]]])

Python

input_tensor.std(dim=(1, 2), unbiased=False, keepdim=True)

Text Only

tensor([[[2.8419]],

        [[2.3746]]])

Python

(input_tensor - input_tensor.mean(dim=(1, 2), keepdim=True)) / (
    input_tensor.std(dim=(1, 2), unbiased=False, keepdim=True) + 1e-5
)

Text Only

tensor([[[-0.2053,  1.5541, -0.5571, -1.6128],
         [-0.5571,  1.5541,  0.8504, -0.5571],
         [ 0.8504, -0.5571, -1.2609,  0.4985]],

        [[ 0.0702,  1.3335,  0.9124,  0.0702],
         [ 0.0702,  0.9124, -0.7720, -1.1932],
         [ 0.0702,  1.3335, -2.0354, -0.7720]]])

\(\gamma\) 和 \(\beta\)¶

在 LayerNorm 中，\(\gamma\) 和 \(\beta\) 是可学习的参数，用于对标准化后的数据进行缩放和平移。

\(\gamma\) 和 \(\beta\) 的初始值为 1 和 0，维数和 normalized_shape 一致。

Python

layer_norm = nn.LayerNorm(4)
print(layer_norm.weight)
print(layer_norm.bias)

Text Only

Parameter containing:
tensor([1., 1., 1., 1.], requires_grad=True)
Parameter containing:
tensor([0., 0., 0., 0.], requires_grad=True)

Python

layer_norm = nn.LayerNorm([3, 4])
print(layer_norm.weight)
print(layer_norm.bias)

Text Only

Parameter containing:
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], requires_grad=True)
Parameter containing:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], requires_grad=True)

Layer Normalization¶

查看 LayerNorm 的效果¶

normalized_shape 指定为最后一维特征¶

手动验证结果¶

normalized_shape 指定为最后两维特征¶

手动验证结果¶

\(\gamma\) 和 \(\beta\)¶

评论

`normalized_shape` 指定为最后一维特征¶

`normalized_shape` 指定为最后两维特征¶