CS345, Machine Learning
Prof. Alvarez

The Error Back-Propagation Algorithm

This page summarizes the error back-propagation algorithm that we will discuss in class. The error back-propagation (EBP) algorithm is a supervised learning algorithm for artificial neural networks (ANN). It extends the weight update rule used in the simple perceptron learning algorithm to multilayer feedforward ANN. The name "error back-propagation" derives from the manner in which information is propagated across the network on each pass of the algorithm: the errors at the output nodes are passed back to the hidden nodes using the network connections, and the resulting information is used to update the connection weights. The EBP algorithm was described independently by several groups of researchers. The most widely known paper on the subject is probably the one by Rumelhart, Hinton, and Williams, "Learning Internal Representations through Error Propagation", which was reprinted in the book Parallel Distributed Processing edited by Rumelhart and McClelland (MIT Press, 1986).

Weight Space View of EBP

Many algorithms for updating the connection weights in an ANN may be viewed as searching in the space of possible weight values for points at which a certain error measure is minimized. A specific algorithm is characterized by the choice of error measure and search strategy.

Error measure

EBP uses the mean square output error as the error measure:

MSE = average over all training instances of the sum over all output nodes k of ( y_k - target_k )²

where y_k and target_k are respectively the desired activation and the target activation of output node k for the given training instance.

Search strategy

EBP employs a steepest descent search strategy which updates the weights so as to move in the direction in weight space in which the MSE decreases the fastest. This direction is simply the direction opposite to the gradient vector of the MSE as a function of the weights. Thus, the weight update rule for EBP may be written as:

w_i,j = w_i,j + deltaw_i,j,

where the weight change deltaw_i,j is the learning rate eta multiplied by the negative of the w_i,j component of the gradient vector:

deltaw_i,j = - eta*dMSE/dw_i,j

Obtaining the error back-propagation form of the weight update rule requires writing out the partial derivative that appears on the right-hand side of the above equation, using a sigmoid activation function to get the output of each node as a function of the summed input of the node.

EBP pseudocode (see Mitchell, chapter 4)

There are several variants of the EBP algorithm. The one given in pseudocode below is a so-called "stochastic" version that performs weight updates after examining each individual instance. A commonly used termination condition is that the MSE over a validation set increases for a certain number of consecutive training iterations; this suggests that overfitting may be starting to occur.