Learning from Imperfect Data: Noisy Labels, Truncation, and Coarsening

Learning from Imperfect Data: Noisy Labels, Truncation, and Coarsening
Author :
Publisher :
Total Pages : 0
Release :
ISBN-10 : OCLC:1410731106
ISBN-13 :
Rating : 4/5 (06 Downloads)

Book Synopsis Learning from Imperfect Data: Noisy Labels, Truncation, and Coarsening by : Vasilis Kontonis (Ph.D.)

Download or read book Learning from Imperfect Data: Noisy Labels, Truncation, and Coarsening written by Vasilis Kontonis (Ph.D.) and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The datasets used in machine learning and statistics are \emph{huge} and often \emph{imperfect},\textit{e.g.}, they contain corrupted data, examples with wrong labels, or hidden biases. Most existing approaches (i) produce unreliable results when the datasets are corrupted, (ii) are computationally inefficient, or (iii) come without any theoretical/provable performance guarantees. In this thesis, we \emph{design learning algorithms} that are \textbf{computationally efficient} and at the same time \textbf{provably reliable}, even when used on imperfect datasets. We first focus on supervised learning settings with noisy labels. We present efficient and optimal learners under the semi-random noise models of Massart and Tsybakov -- where the true label of each example is flipped with probability at most 50\% -- and an efficient approximate learner under adversarial label noise -- where a small but arbitrary fraction of labels is flipped -- under structured feature distributions. Apart from classification, we extend our results to noisy label-ranking. In truncated statistics, the learner does not observe a representative set of samples from the whole population, but only truncated samples, \textit{i.e.}, samples from a potentially small subset of the support of the population distribution. We give the first efficient algorithms for learning Gaussian distributions with unknown truncation sets and initiate the study of non-parametric truncated statistics. Closely related to truncation is \emph{data coarsening}, where instead of observing the class of an example, the learner receives a set of potential classes, one of which is guaranteed to be the correct class. We initiate the theoretical study of the problem, and present the first efficient learning algorithms for learning from coarse data.


Learning from Imperfect Data: Noisy Labels, Truncation, and Coarsening Related Books

Learning from Imperfect Data: Noisy Labels, Truncation, and Coarsening
Language: en
Pages: 0
Authors: Vasilis Kontonis (Ph.D.)
Categories:
Type: BOOK - Published: 2023 - Publisher:

DOWNLOAD EBOOK

The datasets used in machine learning and statistics are \emph{huge} and often \emph{imperfect},\textit{e.g.}, they contain corrupted data, examples with wrong
Reinforcement Learning, second edition
Language: en
Pages: 549
Authors: Richard S. Sutton
Categories: Computers
Type: BOOK - Published: 2018-11-13 - Publisher: MIT Press

DOWNLOAD EBOOK

The significantly expanded and updated new edition of a widely used text on reinforcement learning, one of the most active research areas in artificial intellig
Federated Learning
Language: en
Pages: 291
Authors: Qiang Yang
Categories: Computers
Type: BOOK - Published: 2020-11-25 - Publisher: Springer Nature

DOWNLOAD EBOOK

This book provides a comprehensive and self-contained introduction to federated learning, ranging from the basic knowledge and theories to various key applicati
Machine Learning Algorithms
Language: en
Pages: 352
Authors: Giuseppe Bonaccorso
Categories: Computers
Type: BOOK - Published: 2017-07-24 - Publisher: Packt Publishing Ltd

DOWNLOAD EBOOK

Build strong foundation for entering the world of Machine Learning and data science with the help of this comprehensive guide About This Book Get started in the
Bayesian Reinforcement Learning
Language: en
Pages: 146
Authors: Mohammad Ghavamzadeh
Categories: Computers
Type: BOOK - Published: 2015-11-18 - Publisher:

DOWNLOAD EBOOK

Bayesian methods for machine learning have been widely investigated, yielding principled methods for incorporating prior information into inference algorithms.