Valid Inference After Hierarchical Clustering
Statistics Seminar
4th March 2022, 4:00 pm – 5:00 pm
Virtual Seminar, Zoom link: TBA
Testing for a difference in means between two groups is fundamental to answering research questions across virtually every scientific area. Classical tests control the type I error rate when the groups are defined a priori. However, if the groups are instead defined using a clustering algorithm, then applying a classical test yields an extremely inflated type I error rate. Surprisingly, this problem persists even if two separate and independent data sets are used for clustering and for hypothesis testing.
In this talk, I will propose a test for a difference in means between two estimated clusters that accounts for the fact that the null hypothesis is a function of the data, using a selective inference framework. Then, I will describe how to efficiently compute exact p-values for clusters obtained using hierarchical clustering. I will also show an application in the context of single-cell RNA-sequencing data, where it is common for researchers to cluster the cells, then test for a difference in mean gene expression between the clusters.
This talk is based on joint work with Jacob Bien (University of Southern California) and Daniela Witten (University of Washington).
Comments are closed.