The Genome in a Bottle Consortium is on a roll — and if you haven’t checked out the latest paper in Scientific Data, you’re missing out. “Extensive sequencing of seven human genomes to characterize benchmark reference materials” comes from lead author Justin Zook and senior author Marc Salit, both at the National Institute of Standards and Technology, along with a boatload of collaborators.
In this publication, the GIAB team reports a massive sequencing effort for seven human genomes, five of which are currently or expected to become NIST Reference Materials which will allow sequencing labs around the world to measure the accuracy of their data. Among the genomes included in the publication are “two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry,” the authors write. They note that genomic data was generated with 12 different methods: “BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads.”
The NIST-led team reports that this unprecedented level of detail about each genome has led to diverse data sets that will help inform the reference materials they ultimately make public. “These reference materials are the first of their kind, and will play key roles in the translation of genome sequencing to widespread adoption and as validation tools in clinical practice,” the scientists write. “We previously characterized high-confidence SNP, indel, and homozygous reference genotypes, as well as large deletions and insertions. We plan to use similar methods as well as new methods to characterize these genomes using the data described in this work.”
It was an honor to see that our BluePippin automated size selection platform was used for a number of genomes and with different analysis technologies, including PacBio and SOLiD. We’re glad that our tools contributed to such important work!