Preprint Roundup: Long-Read Sequencing, ddRAD-seq, and a New Look at Viruses
We always keep our eyes peeled for interesting new research from scientists using Sage Science automated DNA size selection instruments, and several recent preprints caught our attention. Here’s a look:
Nanopore Sequencing Reveals High-Resolution Structural Variation in the Cancer Genome
Authors: Liang Gong, Chee-Hong Wong, Wei-Chung Cheng, et al.
Scientists from The Jackson Laboratory for Genomic Medicine and China Medical University in Taiwan teamed up to detect structural variants in breast cancer genomes using a custom-built pipeline called Picky. They chose nanopore sequencing to generate long reads, identifying SVs with excellent sensitivity and specificity and finding that repetitive DNA was the primary source of cancer-related variation. This approach could prove useful in efforts to assess genome stability in a tumor over time. The team used BluePippin to size-select 12 Kb libraries prior to nanopore sequencing.
Authors: Jonas Korlach, Gregory Gedman, Sarah Kingan, et al.
In this work, scientists seeking to improve upon short-read genome assemblies for two birds deployed long-read PacBio sequencing to generate new diploid assemblies. The effort yielded assemblies with megabase-sized contigs, with a 150-fold improvement in the contiguity for the zebra finch genome and 200-fold improvement for Anna’s hummingbird. Since the birds are both models for vocal learning, the higher level of completeness, correction of previous misassemblies, and more accurate gene sequences will be important for many future studies. The team used BluePippin to size libraries for zebra finch and hummingbird prior to sequencing.
CIDER-Seq: unbiased virus enrichment and single-read, full length genome sequencing
Authors: Devang Mehta, Matthias Hirsch-Hoffmann, Andrea Patrignani, et al.
Scientists developed a new method for deeply sequencing viruses that can accurately represent populations with high levels of homology across genomes. They incorporated long-read sequencing with random circular amplification enrichment and a novel de-concatenation protocol, validating their results in a large population of geminiviruses. BluePippin was used for size selection prior to the enrichment step and again during library preparation for sequencing.
Authors: Derrick Thrasher, Bronwyn Butcher, Leonardo Campagna, et al.
Continuing with the avian theme, researchers used ddRAD-seq to analyze as many as 600 SNPs from up to 240 members of a population — validated in this case with a study of Malurus lamberti and other bird species. By comparing results to microsatellite markers, they determined that the ddRAD-seq method “results in substantially improved power to discriminate among potential relatives and considerably more precise estimates of relatedness coefficients,” they report. The pipeline they present, which relies on BluePippin for DNA fragment sizing, can be used with any other bird species, and other organisms as well.
Podcast: Genome Assembly Guru Deanna Church on Linked Reads and Single-Cell Genomics
If you’ve ever relied on the human reference genome, don’t miss this podcast with assembly pioneer Deanna Church. Mendelspod’s Theral Timpson interviews the genome informatics expert who made her name as an integral part of the reference project at the National Center for Biotechnology Information. Today, she’s Senior Director of Applications at 10x Genomics, where she’s working on everything from haplotyping to single-cell genomics.
In the podcast, Church offers insight into various efforts to improve the quality of the human reference genome, as well as a look at robust new work to characterize structural variation. She talks about the importance of phasing for structural variant detection, which explains why her NCBI team was so adamant about moving toward a haplotype-aware genome assembly instead of using “averaged-out alleles,” she says. Short reads can be especially problematic for this use because they can’t always clearly distinguish between two alleles of a heterozygous variant. Using problematic alignments could lead to a confounded analysis, she adds, “because you’re mixing the reads from those two genotypes.”
Church also calls for better integration of variant findings. “As a community we’ve had a very individual variant-centric view of genome analysis,” she tells Timpson, contending that viewing variants and their interactions with each other more holistically would provide much-needed information for genome interpretation efforts. She notes that combining technologies, an approach showcased in a recent preprint she co-authored with the Human Genome Structural Variation Consortium, is essential for a holistic approach. To that end, linked-read technology like 10x’s is a great complement to other methods. Church says linked reads enable de novo assembly and haplotype reconstruction at scale; customers have already published impressive demonstrations of this type of work.
Sample prep came up in the discussion as well. “You definitely want to try to optimize for longer molecules,” Church says about 10x technology, noting that recommended protocols are in place and under development for a range of sample types. (We’re pleased to be included in 10x protocol recommendations.)
Church also spoke about single-cell genomics, an area she is eager to explore. “Single cell is obviously one of the most exciting ways to think about doing science these days because it just allows us to get this level of resolution that’s not accessible with bulk,” she says, suggesting that this approach will be especially useful for understanding developmental biology. In some ways, she adds, the state of single-cell genomics reminds her of the early days of the Human Genome Project: there’s widely recognized potential, but the path forward isn’t completely clear yet.
It’s a great discussion, and we hope you have time to listen!
Large Variants and Even Larger Cohorts: Recapping ASHG
We had a blast as ASHG last week, and wanted to thank all the attendees who stopped by our booth. We were delighted to meet you all!
If you couldn’t make it to ASHG, a couple of running themes dominated the sessions and conversations: mega-scale studies, and the evolution of studying variants more complex than SNPs.
It wasn’t so long ago that a 100-person study would have led to an impressive talk at ASHG. But this year, speakers routinely cited studies with tens of thousands, or even hundreds of thousands, of participants. From the Million Veteran Program to the Estonian Biobank, these programs are adding so much to genetic databases that scientists are finally getting a handle on complex hereditary traits such as height. Amid these studies, though, was a continuing push to better represent more ethnic groups to achieve real diversity in publicly available databases. We wholeheartedly support those efforts. Without breaking the barriers of underrepresented groups, we will never achieve precision medicine for everyone.
Another shift came from variant discovery and analysis. More and more, scientists are pushing past SNPs to focus on larger structural variants. The community’s initial focus on SNPs was guided by technology — we could spot single variant changes, so that’s what we looked for — but with improvements to sequencing and other analysis tools using high molecular weight DNA, it is now possible to detect structural variants more comprehensively and reliably. These variants have already been demonstrated to cause diseases, and it was evident at ASHG that finding and cataloging them is a major priority for the genetics field to better understand genome function.
Thanks again for catching up with us in Orlando, and we’re already looking forward to ASHG 2018 in San Diego!
ASHG 2017: Big Studies, Big Names, and Big DNA
We can’t wait for the annual meeting of the American Society of Human Genetics next week! The Sage Science team will be heading to Orlando to catch up on cutting-edge genome science with several thousand of our nearest and dearest in the community.
As always, this year’s ASHG meeting features excellent speakers and sessions. We’re particularly eager for the headline event, a conversation between Francis Collins and Bill Gates that promises to offer interesting perspective on the intersection of global health and genomics. ASHG is also known for its top-tier award presentations. This year we’ll be hearing from recipients such as Kari Stefansson, Art Beaudet, and Dan MacArthur, among others.
Another hallmark of ASHG in recent years is the wealth of posters and talks reporting enormous studies — now regularly thousands or tens of thousands of samples in each — and this year’s agenda continues the trend. We’re eager to learn about new insights into diseases and other phenotypes that have been powered by these mega-scale studies.
If you’ll be at the meeting, don’t forget to stop by and say hello! We’ll be at booth #752, near the food court. You can check out the new SageHLS instrument for extracting or purifying high molecular weight DNA directly from samples, or learn more about our other automated DNA sizing platforms.