PyTorch教程-8.7. 密集连接网络 (DenseNet)-电子发烧友网

ResNet 显着改变了如何在深度网络中参数化函数的观点。DenseNet（密集卷积网络）在某种程度上是对此的逻辑延伸（Huang et al. , 2017）。DenseNet 的特点是每一层都连接到所有前面的层的连接模式和连接操作（而不是 ResNet 中的加法运算符）以保留和重用早期层的特征。要了解如何得出它，让我们稍微绕道数学。

import torch
from torch import nn
from d2l import torch as d2l

from mxnet import init, np, npx
from mxnet.gluon import nn
from d2l import mxnet as d2l

npx.set_np()

import jax
from flax import linen as nn
from jax import numpy as jnp
from d2l import jax as d2l

import tensorflow as tf
from d2l import tensorflow as d2l

8.7.1. 从 ResNet 到 DenseNet

回忆一下函数的泰勒展开式。对于这一点x=0 它可以写成

(8.7.1)f(x)=f(0)+x⋅[f′(0)+x⋅[f″(0)2!+x⋅[f‴(0)3!+…]]].

关键是它将函数分解为越来越高阶的项。同样，ResNet 将函数分解为

(8.7.2)f(x)=x+g(x).

也就是说，ResNet分解f分为一个简单的线性项和一个更复杂的非线性项。如果我们想捕获（不一定要添加）两个术语以外的信息怎么办？一种这样的解决方案是 DenseNet （Huang等人，2017 年）。

图 8.7.1 ResNet（左）和 DenseNet（右）在跨层连接中的主要区别：加法的使用和连接的使用。

如图 8.7.1所示，ResNet 和 DenseNet 的主要区别在于后者的输出是连接的（表示为[,]) 而不是添加。结果，我们从x在应用越来越复杂的函数序列后，它的值：

(8.7.3)x→[x,f1(x),f2([x,f1(x)]),f3([x,f1(x),f2([x,f1(x)])]),…].

最后，将所有这些功能组合在 MLP 中，再次减少特征数量。就实现而言，这非常简单：我们不是添加术语，而是将它们连接起来。DenseNet 这个名字源于变量之间的依赖图变得非常密集这一事实。这种链的最后一层与前面的所有层紧密相连。密集连接如图 8.7.2所示。

图 8.7.2 DenseNet 中的密集连接。注意维度如何随着深度增加。

构成 DenseNet 的主要组件是密集块和过渡层。前者定义输入和输出如何连接，而后者控制通道的数量，使其不会太大，因为扩展 x→[x,f1(x),f2([x,f1(x)]),…] 可以是相当高维的。

8.7.2. 密集块

DenseNet 使用改进的 ResNet 的“批量归一化、激活和卷积”结构（参见第 8.6 节中的练习）。首先，我们实现这个卷积块结构。

def conv_block(num_channels):
  return nn.Sequential(
    nn.LazyBatchNorm2d(), nn.ReLU(),
    nn.LazyConv2d(num_channels, kernel_size=3, padding=1))

def conv_block(num_channels):
  blk = nn.Sequential()
  blk.add(nn.BatchNorm(),
      nn.Activation('relu'),
      nn.Conv2D(num_channels, kernel_size=3, padding=1))
  return blk

class ConvBlock(nn.Module):
  num_channels: int
  training: bool = True

  @nn.compact
  def __call__(self, X):
    Y = nn.relu(nn.BatchNorm(not self.training)(X))
    Y = nn.Conv(self.num_channels, kernel_size=(3, 3), padding=(1, 1))(Y)
    Y = jnp.concatenate((X, Y), axis=-1)
    return Y

class ConvBlock(tf.keras.layers.Layer):
  def __init__(self, num_channels):
    super(ConvBlock, self).__init__()
    self.bn = tf.keras.layers.BatchNormalization()
    self.relu = tf.keras.layers.ReLU()
    self.conv = tf.keras.layers.Conv2D(
      filters=num_channels, kernel_size=(3, 3), padding='same')

    self.listLayers = [self.bn, self.relu, self.conv]

  def call(self, x):
    y = x
    for layer in self.listLayers.layers:
      y = layer(y)
    y = tf.keras.layers.concatenate([x,y], axis=-1)
    return y

密集块由多个卷积块组成，每个卷积块使用相同数量的输出通道。然而，在前向传播中，我们在通道维度上连接每个卷积块的输入和输出。惰性评估允许我们自动调整维度。

class DenseBlock(nn.Module):
  def __init__(self, num_convs, num_channels):
    super(DenseBlock, self).__init__()
    layer = []
    for i in range(num_convs):
      layer.append(conv_block(num_channels))
    self.net = nn.Sequential(*layer)

  def forward(self, X):
    for blk in self.net:
      Y = blk(X)
      # Concatenate input and output of each block along the channels
      X = torch.cat((X, Y), dim=1)
    return X

class DenseBlock(nn.Block):
  def __init__(self, num_convs, num_channels):
    super().__init__()
    self.net = nn.Sequential()
    for _ in range(num_convs):
      self.net.add(conv_block(num_channels))

  def forward(self, X):
    for blk in self.net:
      Y = blk(X)
      # Concatenate input and output of each block along the channels
      X = np.concatenate((X, Y), axis=1)
    return X

class DenseBlock(nn.Module):
  num_convs: int
  num_channels: int
  training: bool = True

  def setup(self):
    layer = []
    for i in range(self.num_convs):
      layer.append(ConvBlock(self.num_channels, self.training))
    self.net = nn.Sequential(layer)

  def __call__(self, X):
    return self.net(X)

class DenseBlock(tf.keras.layers.Layer):
  def __init__(self, num_convs, num_channels):
    super(DenseBlock, self).__init__()
    self.listLayers = []
    for _ in range(num_convs):
      self.listLayers.append(ConvBlock(num_channels))

  def call(self, x):
    for layer in self.listLayers.layers:
      x = layer(x)
    return x

在下面的示例中，我们定义了一个DenseBlock具有 10 个输出通道的 2 个卷积块的实例。当使用 3 个通道的输入时，我们将得到一个输出3+10+10=23渠道。卷积块通道数控制输出通道数相对于输入通道数的增长。这也称为增长率。

blk = DenseBlock(2, 10)
X = torch.randn(4, 3, 8, 8)
Y = blk(X)
Y.shape

torch.Size([4, 23, 8, 8])

blk = DenseBlock(2, 10)
X = np.random.uniform(size=(4, 3, 8, 8))
blk.initialize()
Y = blk(X)
Y.shape

(4, 23, 8, 8)

blk = DenseBlock(2, 10)
X = jnp.zeros((4, 8, 8, 3))
Y = blk.init_with_output(d2l.get_key(), X)[0]
Y.shape

(4, 8, 8, 23)

blk = DenseBlock(2, 10)
X = tf.random.uniform((4, 8, 8, 3))
Y = blk(X)
Y.shape

TensorShape([4, 8, 8, 23])

8.7.3. 过渡层

由于每个密集块都会增加通道的数量，因此添加太多通道会导致模型过于复杂。过渡层用于控制模型的复杂性。它通过使用一个减少通道的数量1×1卷积。此外，它通过步幅为 2 的平均池将高度和宽度减半。

def transition_block(num_channels):
  return nn.Sequential(
    nn.LazyBatchNorm2d(), nn.ReLU(),
    nn.LazyConv2d(num_channels, kernel_size=1),
    nn.AvgPool2d(kernel_size=2, stride=2))

def transition_block(num_channels):
  blk = nn.Sequential()
  blk.add(nn.BatchNorm(), nn.Activation('relu'),
      nn.Conv2D(num_channels, kernel_size=1),
      nn.AvgPool2D(pool_size=2, strides=2))
  return blk

class TransitionBlock(nn.Module):
  num_channels: int
  training: bool = True

  @nn.compact
  def __call__(self, X):
    X = nn.BatchNorm(not self.training)(X)
    X = nn.relu(X)
    X = nn.Conv(self.num_channels, kernel_size=(1, 1))(X)
    X = nn.avg_pool(X, window_shape=(2, 2), strides=(2, 2))
    return X

class TransitionBlock(tf.keras.layers.Layer):
  def __init__(self, num_channels, **kwargs):
    super(TransitionBlock, self).__init__(**kwargs)
    self.batch_norm = tf.keras.layers.BatchNormalization()
    self.relu = tf.keras.layers.ReLU()
    self.conv = tf.keras.layers.Conv2D(num_channels, kernel_size=1)
    self.avg_pool = tf.keras.layers.AvgPool2D(pool_size=2, strides=2)

  def call(self, x):
    x = self.batch_norm(x)
    x = self.relu(x)
    x = self.conv(x)
    return self.avg_pool(x)

将具有 10 个通道的过渡层应用于前面示例中的密集块的输出。这将输出通道的数量减少到 10，并将高度和宽度减半。

blk = transition_block(10)
blk(Y).shape

torch.Size([4, 10, 4, 4])

blk = transition_block(10)
blk.initialize()
blk(Y).shape

(4, 10, 4, 4)

blk = TransitionBlock(10)
blk.init_with_output(d2l.get_key(), Y)[0].shape

(4, 4, 4, 10)

blk = TransitionBlock(10)
blk(Y).shape

TensorShape([4, 4, 4, 10])

8.7.4. DenseNet 模型

接下来，我们将构建一个 DenseNet 模型。DenseNet 首先使用与 ResNet 中相同的单卷积层和最大池化层。

class DenseNet(d2l.Classifier):
  def b1(self):
    return nn.Sequential(
      nn.LazyConv2d(64, kernel_size=7, stride=2, padding=3),
      nn.LazyBatchNorm2d(), nn.ReLU(),
      nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

class DenseNet(d2l.Classifier):
  def b1(self):
    net = nn.Sequential()
    net.add(nn.Conv2D(64, kernel_size=7, strides=2, padding=3),
      nn.BatchNorm(), nn.Activation('relu'),
      nn.MaxPool2D(pool_size=3, strides=2, padding=1))
    return net

class DenseNet(d2l.Classifier):
  num_channels: int = 64
  growth_rate: int = 32
  arch: tuple = (4, 4, 4, 4)
  lr: float = 0.1
  num_classes: int = 10
  training: bool = True

  def setup(self):
    self.net = self.create_net()

  def b1(self):
    return nn.Sequential([
      nn.Conv(64, kernel_size=(7, 7), strides=(2, 2), padding='same'),
      nn.BatchNorm(not self.training),
      nn.relu,
      lambda x: nn.max_pool(x, window_shape=(3, 3),
                 strides=(2, 2), padding='same')
    ])

class DenseNet(d2l.Classifier):
  def b1(self):
    return tf.keras.models.Sequential([
      tf.keras.layers.Conv2D(
        64, kernel_size=7, strides=2, padding='same'),
      tf.keras.layers.BatchNormalization(),
      tf.keras.layers.ReLU(),
      tf.keras.layers.MaxPool2D(
        pool_size=3, strides=2, padding='same')])

然后，类似于 ResNet 使用的由残差块组成的四个模块，DenseNet 使用四个密集块。与 ResNet 类似，我们可以设置每个密集块中使用的卷积层数。这里，我们设置为4，与8.6节中的ResNet-18模型一致。此外，我们将密集块中卷积层的通道数（即增长率）设置为 32，因此每个密集块将添加 128 个通道。

在 ResNet 中，每个模块之间的高度和宽度通过步长为 2 的残差块减少。这里，我们使用过渡层将高度和宽度减半，并将通道数减半。与 ResNet 类似，在最后连接一个全局池化层和一个全连接层以产生输出。

@d2l.add_to_class(DenseNet)
def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4),
       lr=0.1, num_classes=10):
  super(DenseNet, self).__init__()
  self.save_hyperparameters()
  self.net = nn.Sequential(self.b1())
  for i, num_convs in enumerate(arch):
    self.net.add_module(f'dense_blk{i+1}', DenseBlock(num_convs,
                             growth_rate))
    # The number of output channels in the previous dense block
    num_channels += num_convs * growth_rate
    # A transition layer that halves the number of channels is added
    # between the dense blocks
    if i != len(arch) - 1:
      num_channels //= 2
      self.net.add_module(f'tran_blk{i+1}', transition_block(
        num_channels))
  self.net.add_module('last', nn.Sequential(
    nn.LazyBatchNorm2d(), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(),
    nn.LazyLinear(num_classes)))
  self.net.apply(d2l.init_cnn)

@d2l.add_to_class(DenseNet)
def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4),
       lr=0.1, num_classes=10):
  super(DenseNet, self).__init__()
  self.save_hyperparameters()
  self.net = nn.Sequential()
  self.net.add(self.b1())
  for i, num_convs in enumerate(arch):
    self.net.add(DenseBlock(num_convs, growth_rate))
    # The number of output channels in the previous dense block
    num_channels += num_convs * growth_rate
    # A transition layer that halves the number of channels is added
    # between the dense blocks
    if i != len(arch) - 1:
      num_channels //= 2
      self.net.add(transition_block(num_channels))
  self.net.add(nn.BatchNorm(), nn.Activation('relu'),
         nn.GlobalAvgPool2D(), nn.Dense(num_classes))
  self.net.initialize(init.Xavier())

@d2l.add_to_class(DenseNet)
def create_net(self):
  net = self.b1()
  for i, num_convs in enumerate(self.arch):
    net.layers.extend([DenseBlock(num_convs, self.growth_rate,
                   training=self.training)])
    # The number of output channels in the previous dense block
    num_channels = self.num_channels + (num_convs * self.growth_rate)
    # A transition layer that halves the number of channels is added
    # between the dense blocks
    if i != len(self.arch) - 1:
      num_channels //= 2
      net.layers.extend([TransitionBlock(num_channels,
                        training=self.training)])
  net.layers.extend([
    nn.BatchNorm(not self.training),
    nn.relu,
    lambda x: nn.avg_pool(x, window_shape=x.shape[1:3],
               strides=x.shape[1:3], padding='valid'),
    lambda x: x.reshape((x.shape[0], -1)),
    nn.Dense(self.num_classes)
  ])
  return net

@d2l.add_to_class(DenseNet)
def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4),
       lr=0.1, num_classes=10):
  super(DenseNet, self).__init__()
  self.save_hyperparameters()
  self.net = tf.keras.models.Sequential(self.b1())
  for i, num_convs in enumerate(arch):
    self.net.add(DenseBlock(num_convs, growth_rate))
    # The number of output channels in the previous dense block
    num_channels += num_convs * growth_rate
    # A transition layer that halves the number of channels is added
    # between the dense blocks
    if i != len(arch) - 1:
      num_channels //= 2
      self.net.add(TransitionBlock(num_channels))
  self.net.add(tf.keras.models.Sequential([
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.ReLU(),
    tf.keras.layers.GlobalAvgPool2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(num_classes)]))

8.7.5. 训练

由于我们在这里使用更深的网络，在本节中，我们将输入的高度和宽度从 224 减少到 96 以简化计算。

model = DenseNet(lr=0.01)
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
data = d2l.FashionMNIST(batch_size=128, resize=(96, 96))
trainer.fit(model, data)

model = DenseNet(lr=0.01)
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
data = d2l.FashionMNIST(batch_size=128, resize=(96, 96))
trainer.fit(model, data)

model = DenseNet(lr=0.01)
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
data = d2l.FashionMNIST(batch_size=128, resize=(96, 96))
trainer.fit(model, data)

trainer = d2l.Trainer(max_epochs=10)
data = d2l.FashionMNIST(batch_size=128, resize=(96, 96))
with d2l.try_gpu():
  model = DenseNet(lr=0.01)
  trainer.fit(model, data)

8.7.6. 总结与讨论

构成 DenseNet 的主要组件是密集块和过渡层。对于后者，我们需要在组成网络时通过添加再次缩小通道数量的过渡层来控制维数。在跨层连接方面，不同于ResNet将输入和输出相加，DenseNet是在通道维度上拼接输入和输出。虽然这些连接操作重用特征来实现计算效率，但不幸的是它们会导致大量的 GPU 内存消耗。因此，应用 DenseNet 可能需要更高效的内存实现，这可能会增加训练时间（Pleiss等人，2017 年）。

8.7.7. 练习

为什么我们在过渡层使用平均池而不是最大池？

DenseNet 论文中提到的优点之一是其模型参数比 ResNet 小。为什么会这样？

DenseNet 被诟病的一个问题是它的高内存消耗。

真的是这样吗？尝试将输入形状更改为 224×224凭经验查看实际的 GPU 内存消耗。

你能想到减少内存消耗的替代方法吗？您需要如何更改框架？

实施 DenseNet 论文（Huang等人，2017 年）表 1 中提供的各种 DenseNet 版本。

应用 DenseNet 思想设计基于 MLP 的模型。将其应用于第 5.7 节中的房价预测任务。

声明：本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人，不代表电子发烧友网立场。文章及其配图仅供工程师学习之用，如有内容侵权或者其他违规问题，请联系本站处理。举报投诉

连接网络

连接网络

+关注

关注
0

文章
2

浏览量
774

如何在PyTorch上学习和创建网络模型呢？

之一。在本文中，我们将在 PyTorch 上学习和创建网络模型。PyTorch安装参考官网步骤。我使用的 Ubuntu 16.04 LTS 上安装的 Python 3.5 不支持最新的 PyT

发表于 02-21 15:22

【Milk-V Duo 开发板免费体验】学习：基于Duo开发板的Densenet图像分类

densenet121 //网络不佳可能导致densenet-12.tar.gz下载失败，可使用下方的离线附件 $ wget https://github.com/onnx/models/raw/main

发表于 08-19 22:44

使用加权密集连接卷积网络的深度强化学习方法说明

针对深度强化学习中卷积神经网络（CNN）层数过深导致的梯度消失问题，提出一种将密集连接卷积网络应用于强化学习的方法。首先，利用密集

发表于 01-23 10:41 •3次下载

基于PyTorch的深度学习入门教程之使用PyTorch构建一个神经网络

：PyTorch的自动梯度计算 Part3：使用PyTorch构建一个神经网络 Part4：训练一个神经网络分类器 Part5：数据并行化本文是关于Part3的内容。 Part3：使

发表于 02-15 09:40 •1886次阅读

基于PyTorch的深度学习入门教程之PyTorch重点综合实践

前言 PyTorch提供了两个主要特性：（1）一个n维的Tensor，与numpy相似但是支持GPU运算。（2）搭建和训练神经网络的自动微分功能。我们将会使用一个全连接的ReLU网络

发表于 02-15 10:01 •1491次阅读

一种端到端的密集连接扩张卷积神经网络

针对大多数图像去雾算法模型参数估计准确性差及色彩失真等问题，提岀了一种端到端的密集连接扩张卷积神经网络。首先，通过使用多层密集连接结枃来増加

发表于 04-02 16:11 •19次下载

基于双残差超密集网络的多模态医学图像融合方法

针对基于残差网络和密集网络的图像融合方法存在网络中间层的部分有用信息丢失和融合图像细节不清晰的问题，提出了基于双残差超密集

发表于 04-14 11:18 •19次下载

PyTorch教程7.1之从全连接层到卷积

电子发烧友网站提供《PyTorch教程7.1之从全连接层到卷积.pdf》资料免费下载

发表于 06-05 11:50 •0次下载

PyTorch教程8.2之使用块的网络(VGG)

电子发烧友网站提供《PyTorch教程8.2之使用块的网络(VGG).pdf》资料免费下载

发表于 06-05 10:11 •0次下载

PyTorch教程8.6之残差网络(ResNet)和ResNeXt

电子发烧友网站提供《PyTorch教程8.6之残差网络(ResNet)和ResNeXt.pdf》资料免费下载

发表于 06-05 10:08 •0次下载

PyTorch教程8.7之密集连接网络(DenseNet)

电子发烧友网站提供《PyTorch教程8.7之密集连接网络(DenseNet).pdf》资料免费

发表于 06-05 10:01 •0次下载

PyTorch教程8.8之设计卷积网络架构

电子发烧友网站提供《PyTorch教程8.8之设计卷积网络架构.pdf》资料免费下载

发表于 06-05 10:02 •0次下载

PyTorch教程之循环神经网络

电子发烧友网站提供《PyTorch教程之循环神经网络.pdf》资料免费下载

发表于 06-05 09:52 •0次下载

PyTorch教程14.11之全卷积网络

电子发烧友网站提供《PyTorch教程14.11之全卷积网络.pdf》资料免费下载

发表于 06-05 11:19 •0次下载

pytorch如何构建网络模型

　　利用 pytorch 来构建网络模型有很多种方法，以下简单列出其中的四种。　　假设构建一个网络模型如下：　　卷积层--》Relu 层--》池化层--》全连接层--》Relu 层--

发表于 07-20 11:51 •0次下载