CSCI 3346, Data Mining
Prof. Alvarez
Example of Hidden Layer Representations in Artificial Neural Networks
In class we discussed the manner in which the hidden layer in a
feedforward multilayer neural network can extract features that
allow the output layer to more easily differentiate among classes.
An additional example appears below. For more details, see the paper
C. Shoemaker, M. Pungliya, M. Sao Pedro, C. Ruiz, S. Alvarez, M. Ward,
E. Ryder, J. Krushkal,
"Computational methods for single point and multipoint analysis
of genetic variants associated with a simulated complex disorder
in a general population",
Genetic Epidemiology, Nov. 2001.
Task Description
The target task is a classification task for a disease. The instances
are patients, and the class is binary: sick/healthy, corresponding to
the disease status of patients. Descriptive attributes include raw DNA
sequence data and genetic markers.
Data Mining Approach
A fully connected two-layer feedforward neural network architecture was used,
in which the input attributes feed into a hidden layer, the activation levels
of which in turn provide stimuli for the output layer.
The diagrams below are for the case in which there are exactly two units in
the hidden layer. Error backpropagation was used for training.
Before Training
We display the data in the hidden layer activation space. Each instance
is represented as a point whose coordinates are the activation levels
of the two hidden units in the neural network.
Before training, the healthy and diseased populations are both
distributed similarly, concentrated near the upper part of the diagram.
After Training
After error backpropagation training, the two populations have migrated
toward opposite corners of the hidden layer activation space, as shown below.
This illustrates the fact that the hidden layer has in effect "learned"
to compute features that are relevant to predicting disease status.
With these features as inputs, the output layer is now able to
successfully predict whether a patient will get sick based on that
patient's DNA data.