Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize | Vien Khoa Hoc Ky Thuat Nong Nghiep Mien Nam

Welcome To Website IAS

Home >> News >> Scientific news >> Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize

Hot news

Achievement

Independence Award

- First Rank - Second Rank - Third Rank

Labour Award

- First Rank - Second Rank -Third Rank

National Award

- Study on food stuff for animal(2005)

- Study on rice breeding for export and domestic consumption(2005)

VIFOTEC Award

- Hybrid Maize by Single Cross V2002 (2003)

- Tomato Grafting to Manage Ralstonia Disease(2005)

- Cassava variety KM140(2010)

Centres

Department of Biotechnology
https://sites.google.com/site/cadcnshias/
Dalat center
http://pvfcdalat.vn
Hung Loc Agricultural Research Center
http://harc-ias.vn/

Website links

Vietnamese calendar

Library

Visitors summary

Curently online : 24
Total visitors : 7513008

News

Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize

Heather Manching, Subhajit Sengupta, Keith R. Hopper, Shawn W. Polson, Yuan Ji and Randall J. Wisser

G3: Genes, Genomes, Genetics July 1, 2017 vol. 7 no. 7 2161-2170; https://doi.org/10.1534/g3.117.042036

Abstract

High-throughput sequencing (HTS) of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken from heterogeneous populations of heterozygous individuals. This requires that a number of issues encountered with GBS be considered, including the sequencing of nonoverlapping sets of loci across multiple GBS libraries, a common missing data problem that results in low call rates for markers per individual, and a tendency for applicability only in inbred line samples with sufficient linkage disequilibrium for accurate imputation. We addressed these issues while developing and validating a new, comprehensive platform for GBS. This study supports the notion that GBS can be tailored to particular aims, and using Zea mays our results indicate that large samples of unknown pedigree can be genotyped to obtain complete and accurate GBS data. Optimizing size selection to sequence a high proportion of shared loci among individuals in different libraries and using simple in silico filters, a GBS procedure was established that produces high call rates per marker (>85%) with accuracy exceeding 99.4%. Furthermore, by capitalizing on the sequence-read structure of GBS data (stacks of reads), a new tool for resolving local haplotypes and scoring phased genotypes was developed, a feature that is not available in many GBS pipelines. Using local haplotypes reduces the marker dimensionality of the genotype matrix while increasing the informativeness of the data. Phased GBS in maize also revealed the existence of reproducibly inaccurate (apparent accuracy) genotypes that were due to divergent copy number variants (CNVs) unobservable in the underlying single nucleotide polymorphism (SNP) data.

See http://www.g3journal.org/content/7/7/2161?etoc=

Figure 1: Phased GBS. A RedRep pipeline is used for SNP typing. The figure shows the basic flow of RedRep, which begins with a fastq data file with barcoded sequences. For QC, the “meta” file contains metadata used for demultiplexing into sample-specific fastq files. A reference genome sequence file is used for mapping (refmap) and variant calling (SNPcall). LocHap-GBS is run by editing a generate.py file specifying the location of the bam files, the filtered vcf file, and a bed file of window coordinates to search for haplotypes. A LocHap-GBS run file and windows file are automatically generated. Windows are currently split into subwindows with a maximum of three heterozygous sites within any one individual in the vcf file. This situation is depicted for reads across a window that has been delineated into two subwindows where phasing is performed. The dashed connecting line between reads indicates that a contiguous sequence with five SNPs was split into two subwindows. Black-filled bars represent the reference sequence and yellow squares represent SNVs. Given stacks of reads across each subwindow, LocHap-GBS uses a probabilistic model to identify haplotypes in the presence of sequence errors (depicted as one-off instances in the stacks of reads). An hcf file is created for each sample, which is then merged into a combined hcf file for downstream analysis. GBS, genotyping-by-sequencing; MNV, multi-nucleotide variant; QC, quality control; RedRep, reduced representation; SNP, single nucleotide polymorphism; SNV, single nucleotide variant.

Trở lại

Số lần xem: 530

[ Tin tức liên quan ]___________________________________________________