Phased genome maps are important to understand genetic and epigenetic regulation and disease mechanisms, particularly parental imprinting defects. Phasing is also critical to assess the functional consequences of genetic variants, and to allow precise definition of haplotype blocks which is useful to understand gene-flow and genotype-phenotype association at the population level. Transmission phasing by analysis of a family quartet allows the phasing of 95% of all variants as the uniformly heterozygous positions cannot be phased. Here, we report a phasing method based on a combination of transmission analysis, physical phasing by pair-end sequencing of libraries of staggered sizes and population-based analysis. Sequencing of a healthy Caucasians quartet at 120x coverage and combination of physical and transmission phasing yielded the phased genotypes of about 99.8% of the SNPs, indels and structural variants present in the quartet, a phasing rate significantly higher than what can be achieved using any single phasing method. A false positive SNP error rate below 10*E-7 per genome and per base was obtained using a combination of filters. We provide a complete list of SNPs, indels and structural variants, an analysis of haplotype block sizes, and an analysis of the false positive and negative variant calling error rates. Improved genome phasing and family sequencing will increase the power of genome-wide sequencing as a clinical diagnosis tool and has myriad basic science applications.
ASJC Scopus subject areas
- Biochemistry, Genetics and Molecular Biology(all)
- Agricultural and Biological Sciences(all)