Classification with imperfect training labels
Statistics Seminar
1st March 2019, 3:00 pm – 3:45 pm
Main Maths Building, SM3
Supervised classification is one of the fundamental problems in
statistical learning. However, it is frequently the case that the class
labels in the training data are inaccurate, either due to a coding error,
or because the true labels are difficult or expensive to determine. We
will present both general theory to characterise the effect of label noise
on an arbitrary classifier. We will then specialise to three popular
approaches to classification, namely the $k$-nearest neighbour classifier,
support vector machines and linear discriminant analysis, and show that,
under stronger conditions, more detailed asymptotic properties may be
derived. Our conclusions act as a counterpoint to much of the folklore in
the computer science/machine learning literature.
Comments are closed.