机器学习之神经网络

a在这里表示神经网络中的神经元

分类

  1. 二元分类(Binary classification)
  2. 多分类(Multi-class classification)

代价函数

  • L: 网络中的总层数
  • K: 输出层神经元的数量
  • Sl: l 层的总神经元数量(不算偏置单元)额 为了计算代价函数的最小值,我们使用反向传播算法

反向传播算法

(计算出代价函数下降最快的方向)
给出训练集{(x(1),y(1))⋯(x(m),y(m))}

令 Δi,j(l) := 0 for all (l,i,j), (hence you end up having a matrix full of zeros)

For training example t =1 to m:

  1. Set a(1):=x(t)
  2. Perform forward propagation to compute a(l) for l=2,3,…,L
  3. Using y(t), compute δ(L)=a(L)−y(t)
  4. Compute δ(L−1)(L−2),…,δ(2) using δ(l)=((Θ(l))Tδ(l+1)) .∗ a(l) .∗ (1−a(l))
  5. Δi,j(l):=Δi,j(l)+aj(l)δi(l+1) or with vectorization, Δ(l):=Δ(l)(l+1)(a(l))T

Hence we update our new Δ matrix.

  • Di,j(l):=1/m(Δi,j(l)+λΘi,j(l)), if j≠0.
  • Di,j(l):=1/mΔi,j(l) If j=0

所以得出J的微分为

向量展开

在之前的学习过程中我们总是使用fminunc来优化参数,但是他要求所有的输入和输出均是向量,我们在神经网络中使用的时候只能把Theta矩阵和Gradient矩阵进行向量化

  • size(Theta1) = 10 * 11
  • size(Theta2) = 10 * 11
  • size(Theta3) = 1 * 11
    1
    2
    thetaVector = [ Theta1(:); Theta2(:); Theta3(:); ]
    deltaVector = [ D1(:); D2(:); D3(:) ]

恢复为矩阵

1
2
3
Theta1 = reshape(thetaVector(1:110),10,11)
Theta2 = reshape(thetaVector(111:220),10,11)
Theta3 = reshape(thetaVector(221:231),1,11)

梯度计算结果校验

反向传播算法复杂多变,我们使用导数的定义来校验反向传播得到的结果是否正确(相差很小),当确定正确之后关掉梯度校验。因为他很

1
2
3
4
5
6
7
8
epsilon = 1e-4;
for i = 1:n,
thetaPlus = theta;
thetaPlus(i) += epsilon;
thetaMinus = theta;
thetaMinus(i) -= epsilon;
gradApprox(i) = (J(thetaPlus) - J(thetaMinus))/(2*epsilon)
end;

随机初始化

为了避免神经网络因为相同的Theta而陷入冗余的计算,范围是[-INIT_EPSILON,INIT_EPSILON]

1
2
3
4
5
If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11.
Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

网络选择 的步骤

  • Number of input units = dimension of features x(i)
  • Number of output units = number of classes
  • Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
  • Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.

    网络训练的步骤

  • Randomly initialize the weights
  • Implement forward propagation to get hΘ(x(i)) for any x(i)
  • Implement the cost function
  • Implement backpropagation to compute partial derivatives
  • Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.
  • Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.

When we perform forward and back propagation, we loop on every training example:

1
2
3
for i = 1:m,
Perform forward propagation and backpropagation using example (x(i),y(i))
(Get activations a(l) and delta terms d(l) for l = 2,...,L

(There are m training sets)