CSCI 3346, Fall 2018 Prof. Alvarez Exam 1 Topics (also see PS 1-4, their solutions, and the relevant sections of the textbook) The KDD process Preprocessing, data mining, postprocessing Supervised and unsupervised data mining tasks Classification, regression, clustering Evaluation of data mining techniques Accuracy, error rate Training performance vs. generalization performance Cross-validation Data preprocessing Sampling, outlier removal, Mahalanobis distance Discretization, class entropy Attribute selection, feature extraction Exploratory data analysis Summary statistics, correlation, covariance Model complexity vs data complexity Underfitting and overfitting Bias-variance tradeoff MDL principle Decision tree classification Tree induction algorithm Early stopping, pruning, statistical error bounds Rule-based classification Coverage and accuracy Sequential covering approach Similarity and instance-based prediction Similarity metrics k-NN classifier / regressor