Richard Samworth

Statistical Laboratory, University of Cambridge

Classification with imperfect training labels

Statistics Seminar

1st March 2019, 3:00 pm – 3:45 pm
Main Maths Building, SM3

Supervised classification is one of the fundamental problems in
statistical learning. However, it is frequently the case that the class
labels in the training data are inaccurate, either due to a coding error,
or because the true labels are difficult or expensive to determine. We
will present both general theory to characterise the effect of label noise
on an arbitrary classifier. We will then specialise to three popular
approaches to classification, namely the $k$-nearest neighbour classifier,
support vector machines and linear discriminant analysis, and show that,
under stronger conditions, more detailed asymptotic properties may be
derived. Our conclusions act as a counterpoint to much of the folklore in
the computer science/machine learning literature.

Organiser: Henry Reeve

Comments are closed.