Posted 2019-07-29Updated 2021-08-04ML Technique / Neural Network5 minutes read (About 791 words)

激活函数

指数类

Sigmoid

将实数映射到(0, 1)区间

$sigmoid(z) = \frac 1 {1+e^{-z}}$

$z= wx+b$

用途
- 隐层神经元输出
- 二分类输出
缺点
- 激活函数计算量大，BP算法求误差梯度时，求导涉及除法
- 误差反向传播时容易出现梯度消失
- 函数收敛缓慢

Hard_Sigmoid

计算速度比sigmoid激活函数快

$hard_signmoid(z) = \left \{ \begin {array} {l} 0 & z < -2.5 \\ 1 & z > 2.5 \\ 0.2*z + 0.5 & -2.5 \leq z \leq 2.5 \\ \end {array} \right.$

$z= wx+b$

Softmax

主要用于多分类神经网络输出

$softmax(z_i) = \frac {e^{z_i}} {\sum_{k=1}^K e^{z_k}}$

$z_i = w_i x + b_i$：$(w_i, b_i)$组数同分类数量，和输入 $x$维度无关

$K$：分类数目

工程意义：指数底
- 可导$max$：拉开数值之间差距
- 特征对输出结果为乘性：即$z_i$中输入增加会导致输出随对应权重倍数增加
- 联合交叉熵损失避免导数溢出，提高数值稳定性
理论意义：概率论、最优化
- softmax符合最大熵原理
- 假设各标签取值符合多元伯努利分布，而softmax是其 link functiond的反函数#todo
- 光滑间隔最大函数

Softmax回归参数$(w_i, b_i$$冗余，可以消去一组

Softplus

$softplus(z) = log(exp(z)+1)$

$z = wx + b$

Tanh

双曲正切函数

$\begin{align*} tanh(z) & = \frac {sinhz} {coshz} \\ & = \frac {e^z - e^{-z}} {e^z + e^{-z}} \\ \end{align*}$

$z = wx + b$

$\frac{\partial tanh(z)}{\partial z} = (1 - tanh(z))^2$ ：非常类似普通正切函数，可以简化梯度计算

线性类

Softsign

$softsign(z) = \frac z {abs(z) + 1)}$

ReLU

Rectfied Linear Units：修正线性单元

$relu(z, max) = \left \{ \begin{array} {l} 0 & z \leq 0 \\ z & 0 < x < max \\ max & z \geq max \\ \end {array} \right.$

LeakyReLU

Leaky ReLU：带泄露的修正线性

$relu(z, \alpha, max) = \left \{ \begin {array} {l} \alpha z & z \leq 0 \\ z & 0 < z < max \\ max & z \geq max \\ \end {array} \right.$

$\alpha$：超参，建议取0.01

解决了$z < 0$时进入死区问题，同时保留了ReLU的非线性特性

Parametric ReLU

PReLU：参数化的修正线性

$prelu(z) = \left \{ \begin{array} {l} \alpha z & z < 0 \\ z & z> 0 \\ \end{array} \right.$

$\alpha$：自学习参数（向量），初始值常设置为0.25，通过 momentum方法更新

ThreshholdReLU

带阈值的修正线性

$threshhold_relu(z, theta)= \left \{ \begin{array} {l} z & z > theta \\ 0 & otherwise \\ \end{array} \right.$

Linear

线性激活函数：不做任何改变

线性指数类

Exponential Linear Unit

Elu：线性指数

$elu(z, \alpha) = \left \{ \begin{array} {l} z & z > 0 \\ \alpha (e^z - 1) & x \leqslant 0 \\ \end{array} \right.$

$\alpha$：超参

$x \leq 0$时，$f(x)$随$x$变小而饱和
- ELU对输入中存在的特性进行了表示，对缺失特性未作定量表示

网络深度超超过5层时，ELU相较ReLU、LReLU学习速度更快、泛化能力更好

Gausssion Error Liear Unit

GELU：ReLU的可导版本

Selu

可伸缩指数线性激活：可以两个连续层之间保留输入均值、方差

正确初始化权重：lecun_normal初始化
输入数量足够大：AlphaDropout
选择合适的$\alpha, scale$值

$selu(z) = scale * elu(z, \alpha)$

梯度消失

激活函数导数太小（$<1$），压缩误差（梯度）变化

Posted 2019-02-20Updated 2019-02-17Python / Keras3 minutes read (About 472 words)

高级激活层

LeakyReLU

1	keras.layers.LeakyReLU(alpha=0.3)

带泄漏的修正线性单元。

返回值：当神经元未激活时，它仍可以赋予其一个很小的梯度
- x < 0：alpha * x
- x >= 0：x
输入尺寸
- 可以是任意的。如果将该层作为模型的第一层，需要指定 input_shape参数（整数元组，不包含样本数量的维度）
输出尺寸：与输入相同
参数
- alpha：float >= 0，负斜率系数。
参考文献
- Rectifier Nonlinearities Improve Neural Network Acoustic Models

PReLU

keras.layers.PReLU(
	alpha_initializer='zeros',
	alpha_regularizer=None,
	alpha_constraint=None,
	shared_axes=None
)

参数化的修正线性单元。

返回值
- x < 0：alpha * x
- x >= 0：x
参数
- alpha_initializer: 权重的初始化函数。
- alpha_regularizer: 权重的正则化方法。
- alpha_constraint: 权重的约束。
- shared_axes: 激活函数共享可学习参数的轴。如果输入特征图来自输出形状为 (batch, height, width, channels) 的2D卷积层，而且你希望跨空间共享参数，以便每个滤波器只有一组参数，可设置shared_axes=[1, 2]
参考文献
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

ELU

1	keras.layers.ELU(alpha=1.0)

指数线性单元

返回值
- x < 0：alpha * (exp(x) - 1.)
- x >= 0：x
参数
- alpha：负因子的尺度。
参考文献
- Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

ThresholdedReLU

1	keras.layers.ThresholdedReLU(theta=1.0)

带阈值的修正线性单元。

返回值
- x > theta：x
- x <= theta：0
参数
- theta：float >= 0激活的阈值位。
参考文献
- Zero-Bias Autoencoders and the Benefits of Co-Adapting Features

Softmax

1	keras.layers.Softmax(axis=-1)

Softmax激活函数

参数
- axis: 整数，应用 softmax 标准化的轴。

ReLU

1	keras.layers.ReLU(max_value=None)

ReLU激活函数

参数
- max_value：浮点数，最大的输出值。

Posted 2019-02-20Updated 2019-02-17Python / Keras11 minutes read (About 1690 words)

常用层

常用层对应于core模块，core内部定义了一系列常用的网络层，包括全连接、激活层等

Dense层

keras.layers.core.Dense(
	units,
	activation=None,
	use_bias=True,
	kernel_initializer='glorot_uniform',
	bias_initializer='zeros',
	kernel_regularizer=None,
	bias_regularizer=None,
	activity_regularizer=None,
	kernel_constraint=None,
	bias_constraint=None
)

Dense就是常用的全连接层

用途：实现运算$output = activation(dot(input, kernel)+bias)$
- activation：是逐元素计算的激活函数
- kernel：是本层的权值矩阵
- bias：为偏置向量，只有当use_bias=True才会添加
参数
- units：大于0的整数，代表该层的输出维度。
- activation：激活函数
  - 为预定义的激活函数名（参考激活函数）
  - 逐元素（element-wise）的Theano函数
  - 不指定该参数，将不会使用任何激活函数（即使用线性激活函数：a(x)=x）
- use_bias: 布尔值，是否使用偏置项
- kernel_initializer：权值初始化方法
  - 预定义初始化方法名的字符串
  - 用于初始化权重的初始化器（参考initializers）
- bias_initializer：偏置向量初始化方法
  - 为预定义初始化方法名的字符串
  - 用于初始化偏置向量的初始化器（参考initializers）
- kernel_regularizer：施加在权重上的正则项，为 Regularizer对象
- bias_regularizer：施加在偏置向量上的正则项，为 Regularizer对象
- activity_regularizer：施加在输出上的正则项，为 Regularizer对象
- kernel_constraints：施加在权重上的约束项，为
  Constraints对象
- bias_constraints：施加在偏置上的约束项，为 Constraints对象
输入
- 形如(batch_size, ..., input_dim)的NDT，最常见情况为(batch_size, input_dim)的2DT
- 数据的维度大于2，则会先被压为与kernel相匹配的大小
输出
- 形如(batch_size, ..., units)的NDT，最常见的情况为 $(batch_size, units)$的2DT

Activation层

keras.layers.core.Activation(
	activation,
	input_shape
)

激活层对一个层的输出施加激活函数

参数
- activation：将要使用的激活函数
  - 预定义激活函数名
  - Tensorflow/Theano的函数（参考激活函数）
输入：任意，使用激活层作为第一层时，要指定input_shape
输出：与输入shape相同

Dropout层

keras.layers.core.Dropout(
	rate,
	noise_shape=None,
	seed=None
)

为输入数据施加Dropout

说明
- Dropout将在训练过程中每次更新参数时按一定概率rate 随机断开输入神经元
- 可以用于防止过拟合
- 参考文献：Dropout: A Simple Way to Prevent Neural Networks from Overfitting

参数
- rate：0~1的浮点数，控制需要断开的神经元的比例
- noise_shape：整数张量，为将要应用在输入上的二值 Dropout mask的shape
- seed：整数，使用的随机数种子
输入
- 例：(batch_size, timesteps, features)，希望在各个时间步上Dropout mask都相同，则可传入 noise_shape=(batch_size, 1, features)

Flatten层

1	keras.layers.core.Flatten()

Flatten层用来将输入“压平”，把多维的输入一维化

常用在从卷积层到全连接层的过渡
Flatten不影响batch的大小。

model = Sequential()
model.add(Convolution2D(64, 3, 3,
            border_mode='same',
            input_shape=(3, 32, 32)))
	# now: model.output_shape == (None, 64, 32, 32)

model.add(Flatten())
	# now: model.output_shape == (None, 65536)

Reshape层

keras.layers.core.Reshape(
	target_shape,
	input_shape
)

Reshape层用来将输入shape转换为特定的shape

参数
- target_shape：目标shape，为整数的tuple，不包含样本数目的维度（batch大小）
  - 包含-1表示推断该维度大小
输入：输入的shape必须固定（和target_shape积相同）
输出：(batch_size, *target_shape)

例

model = Sequential()
model.add(Reshape((3, 4), input_shape=(12,)))
	# now: model.output_shape == (None, 3, 4)
	# note: `None` is the batch dimension

model.add(Reshape((6, 2)))
	# now: model.output_shape == (None, 6, 2)

	# also supports shape inference using `-1` as dimension
model.add(Reshape((-1, 2, 2)))
	# now: model.output_shape == (None, 3, 2, 2)

Permute层

1
2
3

keras.layers.core.Permute(
	dims(tuple)
)

Permute层将输入的维度按照给定模式进行重排

说明
- 当需要将RNN和CNN网络连接时，可能会用到该层。
参数
- dims：指定重排的模式，不包含样本数的维度（即下标从1开始）
输出shape
- 与输入相同，但是其维度按照指定的模式重新排列

例

1
2
3

model = Sequential()
model.add(Permute((2, 1), input_shape=(10, 64)))
	# now: model.output_shape == (None, 64, 10)

RepeatVector层

1
2
3

keras.layers.core.RepeatVector(
	n(int)
)

RepeatVector层将输入重复n次

参数
- n：整数，重复的次数
输入：形如(batch_size, features)的张量
输出：形如(bathc_size, n, features)的张量

例

model = Sequential()
model.add(Dense(32, input_dim=32))
	# now: model.output_shape == (None, 32)

model.add(RepeatVector(3))
	# now: model.output_shape == (None, 3, 32)

Lambda层

keras.layers.core.Lambda(
	function,
	output_shape=None,
	mask=None,
	arguments=None
)

对上一层的输出施以任何Theano/TensorFlow表达式

参数
- function：要实现的函数，该函数仅接受一个变量，即上一层的输出
- output_shape：函数应该返回的值的shape，可以是一个 tuple，也可以是一个根据输入shape计算输出shape的函数
- mask: 掩膜
- arguments：可选，字典，用来记录向函数中传递的其他关键字参数
输出：output_shape参数指定的输出shape，使用TF时可自动推断

例

1 2	model.add(Lambda(lambda x: x ** 2)) # add a x -> x^2 layer

# add a layer that returns the concatenation
# of the positive part of the input and
# the opposite of the negative part

def antirectifier(x):
	x -= K.mean(x, axis=1, keepdims=True)
	x = K.l2_normalize(x, axis=1)
	pos = K.relu(x)
	neg = K.relu(-x)
	return K.concatenate([pos, neg], axis=1)

def antirectifier_output_shape(input_shape):
	shape = list(input_shape)
	assert len(shape) == 2  # only valid for 2D tensors
	shape[-1] *= 2
	return tuple(shape)

model.add(Lambda(antirectifier,
		 output_shape=antirectifier_output_shape))

ActivityRegularizer层

keras.layers.core.ActivityRegularization(
	l1=0.0,
	l2=0.0
)

经过本层的数据不会有任何变化，但会基于其激活值更新损失函数值

参数
- l1：1范数正则因子（正浮点数）
- l2：2范数正则因子（正浮点数）

Masking层

1	keras.layers.core.Masking(mask_value=0.0)

使用给定的值对输入的序列信号进行“屏蔽”

说明
- 用以定位需要跳过的时间步
- 对于输入张量的时间步，如果输入张量在该时间步上都等于 mask_value，则该时间步将在模型接下来的所有层（只要支持masking）被跳过（屏蔽）。
- 如果模型接下来的一些层不支持masking，却接受到masking 过的数据，则抛出异常
输入：形如(samples,timesteps,features)的张量

例：缺少时间步为3和5的信号，希望将其掩盖
- 方法：赋值x[:,3,:] = 0., x[:,5,:] = 0.
- 在LSTM层之前插入mask_value=0.的Masking层
  1
  2
  3
  model = Sequential()
  model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
  model.add(LSTM(32))

.`的Masking层

1
2
3

model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(32))

```

激活函数

指数类

Sigmoid

Hard_Sigmoid

Softmax

Softplus

Tanh

线性类

Softsign

ReLU

LeakyReLU

Parametric ReLU

ThreshholdReLU

Linear

线性指数类

Exponential Linear Unit

Gausssion Error Liear Unit

Selu

梯度消失

高级激活层

LeakyReLU

PReLU

ELU

ThresholdedReLU

Softmax

ReLU

常用层

Dense层

Activation层

Dropout层

Flatten层

Reshape层

Permute层

RepeatVector层

Lambda层

ActivityRegularizer层

Masking层

Categories

Recents

Advertisement

follow.it