CSCI 3346, Data Mining
Prof. Alvarez

The Error Back-Propagation Algorithm

This page summarizes the error back-propagation (EBP) algorithm that was mentioned in class, and which is used to train multilayer artificial neural networks (ANN) for regression and classification tasks. EBP extends the weight update rule used in the case of single perceptrons. The name "error back-propagation" derives from the manner in which information is propagated across the network on each pass of the algorithm: the errors at the output nodes are passed back to the hidden nodes using the network connections, and the resulting information is used to update the connection weights. The EBP algorithm was described independently by several groups of researchers. The most widely known paper on the subject is probably the one by Rumelhart, Hinton, and Williams, "Learning Internal Representations through Error Propagation", which was reprinted in the book Parallel Distributed Processing edited by Rumelhart and McClelland (MIT Press, 1986).

Weight Space View of EBP

Many algorithms for updating the connection weights in an ANN may be viewed as searching in the space of possible weight values for points at which a certain error measure is minimized. A specific algorithm is characterized by the choice of error measure and search strategy.

Error measure

EBP uses the mean square output error as the error measure:
MSE = average over all training instances of the sum over all output nodes k of ( yk - targetk )2
where yk and targetk are respectively the desired activation and the target activation of output node k for the given training instance.

Search strategy

EBP employs a steepest descent search strategy which updates the weights so as to move in the direction in weight space in which the MSE decreases the fastest. This direction is simply the direction opposite to the gradient vector of the MSE as a function of the weights. Thus, the weight update rule for EBP may be written as:
wi,j = wi,j + Δwi,j,
where the weight change Δwi,j is the learning rate η multiplied by the negative of the wi,j component of the gradient vector:
Δwi,j = - η*dMSE/dwi,j
Obtaining the error back-propagation form of the weight update rule requires writing out the partial derivative that appears on the right-hand side of the above equation, using a sigmoid activation function to get the output of each node as a function of the summed input of the node.

EBP pseudocode (e.g., Mitchell, Machine Learning, chapter 4)

There are several variants of the EBP algorithm. The one given in pseudocode below is a so-called "stochastic" version that performs weight updates after examining each individual instance. A commonly used termination condition is that the MSE over a validation set increases for a certain number of consecutive training iterations; this suggests that overfitting may be starting to occur.

Inputs

Outputs

For each connection weight wi,j (from node j to node i), a value of wi,j, such that the network with these weights approximates the input-to-output behavior implicitly defined by the training dataset.

Procedure