There’s a lot to love about the new preprint announcing the first-ever truly high-quality reference genome assembly for the Aedes aegypti mosquito. The large consortium of scientists — the paper features a total of 72 co-authors — came together on its own, with researchers signing on in droves after a tweeted plea from mosquito researcher Leslie Vosshall. The project took two years and yielded a remarkably complete assembly with a contig N50 of nearly 12 Mb. Here at Sage Science, we’re also delighted to see that two of our automated DNA size selection instruments, the SageELF and BluePippin, were used to prepare libraries for sequencing.
The preprint, currently titled “Improved Aedes aegypti mosquito reference genome assembly enables biological discovery and vector control,” comes from lead authors Benjamin Matthews, Olga Dudchenko, and Sarah Kingan. Scientists on the project hailed from Rockefeller University, Baylor College of Medicine, NHGRI, and dozens of other institutions.
Launched during the height of the Zika virus outbreak, the consortium aimed to generate a more complete and contiguous assembly for Aedes aegypti than was previously available. Since the mosquito genome is repetitive and spans more than a gigabase, prior assemblies were marked by thousands of gaps, short contigs, and sequence that couldn’t be mapped to chromosomes. “We used long-read Pacific Biosciences sequencing and Hi-C scaffolding to produce a new reference genome (AaegL5) that is highly contiguous, representing a decrease of 93% in the number of contigs, and anchored end-to-end to the three Ae. aegypti chromosomes,” the scientists report. They also produced a complete assembly of the mitochondrial genome with zero gaps.
Deep dives into the final assembly prove that it is more accurate and complete than previous resources, with more robust gene predictions. “The high-quality genome assembly and annotation described here will enable major advances in mosquito biology and has already allowed us to carry out a number of experiments that were previously impossible,” the team writes. For instance, they identified new candidate loci that appear to be associated with competence as a vector of dengue virus. They also learned more about the sex-determining mechanism, which is important for efforts to shift mosquito populations toward harmless-to-human males.
All in all, the scientists conclude, this new assembly should allow the community to make major strides in understanding mosquito biology and controlling their impact on human health.
As we start to make lists of New Year’s resolutions (with bets on how long they’ll last), it’s the perfect time for a moment to absorb the themes and highlights of 2017. In our corner of the world, that means impressive advances around HMW DNA and long-read sequencing; novel biological insights about infectious disease, cancer detection, and more; plus new and improved sample prep methods for DNA sequencing.
For us, one of the breakthroughs of the year came in the form of CATCH, or Cas9-assisted targeting of chromosome segments. This method from Yuval Ebenstein and collaborators allows users to target large or complex regions of the genome for cost-effective sequencing. The innovation is based on CRISPR, taking advantage of the precise activity of Cas9 guide enzymes to snip out the region of interest. The original method, which relied on gel electrophoresis, was improved by swapping in the SageHLS instrument for a more streamlined and automated process with excellent recovery.
If you polled the Sage team about the best part of our jobs, it would be unanimous: getting to know our customers! This year, we had the honor of profiling great work from Anna Selmecki at Creighton University, who is using BluePippin to boost library recovery for investigations into genome instability in fungi, and the Broad Institute’s Michelle Cipicchio, who helps optimize methods before they are put into production. She has been using the PippinHT platform to get the best results from the 10x Genomics Chromium system.
Of course, we also spend plenty of time keeping up with the literature — especially the growing number of preprints. One of our favorite studies this year came from scientists in Brazil who reconstructed the transmission path of a recent chikungunya outbreak in their country. The team’s budget was tight, but their results show just how much can be accomplished with creativity and a little help. We were also particularly impressed by a preprint from UK scientists who demonstrated that size selection can significantly improve results from circulating tumor DNA studies, with implications for liquid biopsies in general. And the field of long-read sequencing continued to heat up, with lots of advances including this great comparison of PacBio and Oxford platforms for transcriptome analysis.
As always, we enjoyed hearing from luminaries in the genomics field through Mendelspod interviews this year. If you missed the podcasts with Mark Akeson, Deanna Church, Yuval Ebenstein, or Evan Eichler, we recommend carving out some time to listen.
From all of us at Sage Science, we wish you and yours a healthy and happy holiday season.
We always keep our eyes peeled for interesting new research from scientists using Sage Science automated DNA size selection instruments, and several recent preprints caught our attention. Here’s a look:
Authors: Liang Gong, Chee-Hong Wong, Wei-Chung Cheng, et al.
Scientists from The Jackson Laboratory for Genomic Medicine and China Medical University in Taiwan teamed up to detect structural variants in breast cancer genomes using a custom-built pipeline called Picky. They chose nanopore sequencing to generate long reads, identifying SVs with excellent sensitivity and specificity and finding that repetitive DNA was the primary source of cancer-related variation. This approach could prove useful in efforts to assess genome stability in a tumor over time. The team used BluePippin to size-select 12 Kb libraries prior to nanopore sequencing.
Authors: Jonas Korlach, Gregory Gedman, Sarah Kingan, et al.
In this work, scientists seeking to improve upon short-read genome assemblies for two birds deployed long-read PacBio sequencing to generate new diploid assemblies. The effort yielded assemblies with megabase-sized contigs, with a 150-fold improvement in the contiguity for the zebra finch genome and 200-fold improvement for Anna’s hummingbird. Since the birds are both models for vocal learning, the higher level of completeness, correction of previous misassemblies, and more accurate gene sequences will be important for many future studies. The team used BluePippin to size libraries for zebra finch and hummingbird prior to sequencing.
Authors: Devang Mehta, Matthias Hirsch-Hoffmann, Andrea Patrignani, et al.
Scientists developed a new method for deeply sequencing viruses that can accurately represent populations with high levels of homology across genomes. They incorporated long-read sequencing with random circular amplification enrichment and a novel de-concatenation protocol, validating their results in a large population of geminiviruses. BluePippin was used for size selection prior to the enrichment step and again during library preparation for sequencing.
Authors: Derrick Thrasher, Bronwyn Butcher, Leonardo Campagna, et al.
Continuing with the avian theme, researchers used ddRAD-seq to analyze as many as 600 SNPs from up to 240 members of a population — validated in this case with a study of Malurus lamberti and other bird species. By comparing results to microsatellite markers, they determined that the ddRAD-seq method “results in substantially improved power to discriminate among potential relatives and considerably more precise estimates of relatedness coefficients,” they report. The pipeline they present, which relies on BluePippin for DNA fragment sizing, can be used with any other bird species, and other organisms as well.
If you’ve ever relied on the human reference genome, don’t miss this podcast with assembly pioneer Deanna Church. Mendelspod’s Theral Timpson interviews the genome informatics expert who made her name as an integral part of the reference project at the National Center for Biotechnology Information. Today, she’s Senior Director of Applications at 10x Genomics, where she’s working on everything from haplotyping to single-cell genomics.
In the podcast, Church offers insight into various efforts to improve the quality of the human reference genome, as well as a look at robust new work to characterize structural variation. She talks about the importance of phasing for structural variant detection, which explains why her NCBI team was so adamant about moving toward a haplotype-aware genome assembly instead of using “averaged-out alleles,” she says. Short reads can be especially problematic for this use because they can’t always clearly distinguish between two alleles of a heterozygous variant. Using problematic alignments could lead to a confounded analysis, she adds, “because you’re mixing the reads from those two genotypes.”
Church also calls for better integration of variant findings. “As a community we’ve had a very individual variant-centric view of genome analysis,” she tells Timpson, contending that viewing variants and their interactions with each other more holistically would provide much-needed information for genome interpretation efforts. She notes that combining technologies, an approach showcased in a recent preprint she co-authored with the Human Genome Structural Variation Consortium, is essential for a holistic approach. To that end, linked-read technology like 10x’s is a great complement to other methods. Church says linked reads enable de novo assembly and haplotype reconstruction at scale; customers have already published impressive demonstrations of this type of work.
Sample prep came up in the discussion as well. “You definitely want to try to optimize for longer molecules,” Church says about 10x technology, noting that recommended protocols are in place and under development for a range of sample types. (We’re pleased to be included in 10x protocol recommendations.)
Church also spoke about single-cell genomics, an area she is eager to explore. “Single cell is obviously one of the most exciting ways to think about doing science these days because it just allows us to get this level of resolution that’s not accessible with bulk,” she says, suggesting that this approach will be especially useful for understanding developmental biology. In some ways, she adds, the state of single-cell genomics reminds her of the early days of the Human Genome Project: there’s widely recognized potential, but the path forward isn’t completely clear yet.
It’s a great discussion, and we hope you have time to listen!