It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Structural variants (SVs) contribute to genomic diversity and play pathogenic roles in a wide range of genetic disorders. Accurate characterization of SVs is critical for genomic research and studies of disease mechanisms. The rapid development of Third-Generation Sequencing (TGS) technologies has largely increased sequencing read length compared to Next-Generation Sequencing (NGS), bringing both great potentials and challenges in SV discovery through alignment-based and assembly-based approaches. In order to take full advantage of TGS data, I have developed a suite of bioinformatics tools focusing on comprehensive characterization of SVs.
For the alignment-based SV discovery, I have developed DeBreak to identify SVs directly from long-read alignments. With the implanted density-based clustering algorithm and breakpoint refinement method, DeBreak can accurately identify SVs with precise breakpoint locations in both simulated and real datasets. When compared to the assembly-based SV callsets, DeBreak showed highest consistency among the four tested alignment-based SV callers. For the assembly-based SV discovery, I have developed Inspector to assess and improve the quality of whole-genome de novo assembly results. Inspector achieved highest accuracy in reporting both small-scale and larger assembly errors among the three tested assembly evaluation tools on simulated datasets. When applied on the assemblies of a real human genome, Inspector revealed that both small-scale and structural assembly errors are enriched in repetitive regions for most assemblers. With its error correction module, Inspector reduced number of assembly errors and improved the assembly quality after polishing with long reads. In addition, I have developed FusionSeeker to detect gene fusions caused by SVs from long-read cancer transcriptome sequencing data. FusionSeeker reports gene fusions in both exonic and intronic regions with high accuracy and can reconstruct fused transcript sequences in simulated and cancer cell line datasets. These tools will facilitate the SV analysis using long-read sequencing data in the community.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer





