Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments

Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments
Author :
Publisher :
Total Pages : 0
Release :
ISBN-10 : OCLC:1365104694
ISBN-13 :
Rating : 4/5 (94 Downloads)

Book Synopsis Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments by : Zijian Ni (Ph.D.)

Download or read book Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments written by Zijian Ni (Ph.D.) and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: RNA sequencing (RNA-seq) has revolutionized the possibility of measuring transcriptome-wide gene expression in the last two decades. Modern RNA sequencing techniques such as single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) have been developed in recent years, allowing researchers to quantify gene expression in single-cell resolution or to profile gene activity patterns in 2-dimensional space across tissue. While useful, data collected from these techniques always come with noise, and appropriate filtering and cleaning are required for reliable downstream analyses. In this dissertation, I investigate multiple quality-related issues in scRNA-seq and ST experiments, and I develop, implement, evaluate and apply statistical methods to adjust for them. A unifying theme of this work is that all these methods aim at improving data quality and allowing for better power and precision in downstream analyses. For scRNA-seq data, the quality issue we discuss in this dissertation is distinguishing barcodes associated with real cells from those binding background noise. In droplet-based scRNA-seq experiments, raw data contains both cell barcodes that should be retained for downstream analysis as well as background barcodes that are uninformative and should be filtered out. Due to ambient RNAs presenting in all the barcodes, cell barcodes are not easily distinghished from background barcodes. Both misclassified background barcodes and cell barcodes induce misleading results in downstream analyses. Existing filtering methods test barcodes individually and consequently do not leverage the strong cell-to-cell correlation present in most datasets. To improve cell detection, we introduce CB2, a cluster-based approach for distinguishing real cells from background barcodes. As demonstrated in simulated and case study datasets, CB2 has increased power for identifying real cells which allows for the identification of novel subpopulations and improves downstream differential expression analyses. We then present a benchmark study to evaluate the performance of cell detection methods, including CB2, on public scRNA-seq datasets covering a variety of experiment protocols. In recent years, variants of scRNA-seq techniques have been developed for specialized biological tasks. While the data structures remain the same as the standard scRNA-seq experiment, the underlying data properties can alter a lot. Here, we propose the first benchmark study to provide a thorough comparison across existing cell detection methods in scRNA-seq data, and to guide users to choose the appropriate methods for their experiments. Evaluation metrics include power, precision, computational efficiency, robustness, and accessibility. In addition, we provide investigation and guidance on appropriately choosing filtering parameters in order to improve data quality. For ST data, we uncover, for the first time, a novel quality issue that genes expressed at one tissue region bleed out and contaminate nearby tissue regions. ST is a powerful and widely-used approach for profiling transcriptome-wide gene expression across a tissue with emerging applications in molecular medicine and tumor diagnostics. Recent ST experiments utilize slides containing thousands of spots with spot-specific barcodes that bind RNAs. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case owing to bleed from nearby spots, an artifact we refer to as spot swapping. We design a creative human-mouse chimeric ST experiment to validate the existence of spot swapping. Spot swapping hinders inferences of region-specific gene activities and tissue annotations. In order to decontaminate ST data, we propose SpotClean, a probabilistic model that measures the spot swapping effect and estimates gene expression using EM algorithm. SpotClean is shown to provide a more accurate estimation of the underlying gene expression, increase the specificity of marker gene signals, and, more importantly, allow for improved tumor diagnostics.


Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments Related Books

Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments
Language: en
Pages: 0
Authors: Zijian Ni (Ph.D.)
Categories:
Type: BOOK - Published: 2022 - Publisher:

DOWNLOAD EBOOK

RNA sequencing (RNA-seq) has revolutionized the possibility of measuring transcriptome-wide gene expression in the last two decades. Modern RNA sequencing techn
Statistical Methods for Bulk and Single-cell RNA Sequencing Data
Language: en
Pages: 207
Authors: Wei Li
Categories:
Type: BOOK - Published: 2019 - Publisher:

DOWNLOAD EBOOK

Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecul
Statistical Methods for the Analysis of Genomic Data
Language: en
Pages: 136
Authors: Hui Jiang
Categories: Science
Type: BOOK - Published: 2020-12-29 - Publisher: MDPI

DOWNLOAD EBOOK

In recent years, technological breakthroughs have greatly enhanced our ability to understand the complex world of molecular biology. Rapid developments in genom
Statistical Analysis of Next Generation Sequencing Data
Language: en
Pages: 438
Authors: Somnath Datta
Categories: Medical
Type: BOOK - Published: 2014-07-03 - Publisher: Springer

DOWNLOAD EBOOK

Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a
Statistical Methods for RNA-sequencing Data
Language: en
Pages: 0
Authors: Rhonda Bacher
Categories:
Type: BOOK - Published: 2017 - Publisher:

DOWNLOAD EBOOK

Major methodological and technological advances in sequencing have inspired ambitious biological questions that were previously elusive. Addressing such questio