CS345, Machine Learning
Prof. Alvarez
The Error Back-Propagation Algorithm
This page summarizes the error back-propagation algorithm
that we will discuss in class.
The error back-propagation (EBP) algorithm is a supervised learning algorithm
for artificial neural networks (ANN). It extends the weight update rule used
in the simple perceptron learning algorithm to multilayer feedforward ANN.
The name "error back-propagation" derives from the manner in which information
is propagated across the network on each pass of the algorithm: the errors at
the output nodes are passed back to the hidden nodes using the network
connections, and the resulting information is used to update the connection
weights.
The EBP algorithm was described independently by several groups of researchers.
The most widely known paper on the subject is probably the one by Rumelhart,
Hinton, and Williams, "Learning Internal Representations through Error
Propagation", which was reprinted in the book Parallel Distributed
Processing edited by Rumelhart and McClelland (MIT Press, 1986).
Weight Space View of EBP
Many algorithms for updating the connection weights in an ANN may be
viewed as searching in the space of possible weight values for points at
which a certain error measure is minimized. A specific algorithm
is characterized by the choice of error measure and search strategy.
Error measure
EBP uses the mean square output error as the error measure:
MSE = average over all training instances of the sum over all output nodes k of ( yk - targetk )2
where yk and targetk are respectively the desired
activation and the target activation of output node k for the given
training instance.
Search strategy
EBP employs a steepest descent search strategy which updates the weights
so as to move in the direction in weight space in which the MSE decreases
the fastest. This direction is simply the direction opposite to the
gradient vector of the MSE as a function of the weights. Thus, the weight
update rule for EBP may be written as:
wi,j = wi,j + deltawi,j,
where the weight change deltawi,j is the learning rate eta
multiplied by the negative of the wi,j component of the
gradient vector:
deltawi,j = - eta*dMSE/dwi,j
Obtaining the error back-propagation form of the weight update rule
requires writing out the partial derivative that appears on the right-hand
side of the above equation, using a sigmoid activation function to get the
output of each node as a function of the summed input of the node.
EBP pseudocode (see Mitchell, chapter 4)
There are several variants of the EBP algorithm. The one given in pseudocode
below is a so-called "stochastic" version that performs weight updates after
examining each individual instance. A commonly used termination condition is
that the MSE over a validation set increases for a certain number of
consecutive training iterations; this suggests that overfitting may
be starting to occur.
- Initialize the connection weights wi,j pseudorandomly.
- Iterate the following weight update process until the termination
condition is satisfied.
- For each training instance:
- Present the instance to the network at the input layer
and propagate the signals forward through the network using
the sigmoid activation function of each node with the current
values of the weight and threshold parameters.
- Calculate the errors at the output layer and propagate
the errors back through the network as follows:
- For each output node k, let
deltak = yk (1-yk) (targetk - yk)
where yk is the actual output of node k
and targetk is the desired output of node k
as indicated on the labeled training instance.
- For each hidden unit h,
deltah = yh (1-yh) * sum over all output nodes k of wk,h deltak
where wi,j denotes the weight from node j
to node i.
- Update the connection weights
wi,j = wi,j + deltawi,j,
where
deltawi,j = eta*deltai*yj
Note that wi,j denotes the weight from node j
to node i here, and eta is the learning rate. Thus,
updating the weight from one node (the "source") to
another (the "target") involves the delta at the target
node and the output at the source node.