Guru Ganesan

University of Bristol University of Bristol


Dissimilar Batch Decompositions of Random Datasets


Probability Seminar


31st January 2025, 3:30 pm – 4:30 pm
Fry Building, 2.04


For better learning, large datasets are often split into small batches and fed sequentially to the predictive model. In this talk, we study such batch decompositions from a probabilistic perspective. We assume that data points are drawn independently from a given space and define a concept of similarity between two data points. We then consider decompositions that restrict the amount of similarity within each batch and obtain high probability bounds for the minimum size of such decompositions. We demonstrate an inherent tradeoff between relaxing the similarity constraint and the overall size and also use martingale methods to obtain bounds for the maximum size of data subsets with a given similarity.






Comments are closed.
css.php