Podcast: Nanopore Expert Mark Akeson on Challenges and Opportunities in Sequencing
Mark Akeson knows a thing or two about perseverance. After spending years as a soil biologist in Guatemala where he endured a series of parasite infections, he went on to become a pioneer in nanopore sequencing technology, which he has been developing for more than 20 years even though highly regarded scientists insisted it would never work.
His story is the subject of an interview with Mendelspod host Theral Timpson, and the conversation — which took place earlier this year — provides fascinating insight into the world of nanopore sequencing.
Now co-director of the biophysics laboratory at the University of California, Santa Cruz, and a consultant to Oxford Nanopore, Akeson told Timpson that current nanopore technology has a number of challenges but also has plenty of room for significant improvement. Right now, accuracy of single-pass reads is still lower than that of short-read sequencing platforms, and sample prep issues limit the length of fragments that can be fed through the pore. But having already overcome major obstacles — finding the right size sensor, improving sensitivity, and moving DNA reliably one nucleotide at a time — Akeson is confident the technology will get even better.
As he sees it, the major advantage of nanopore sequencing is the ability to interpret DNA or RNA directly from the cell. “You’re reading what nature put there, not only the bases but modifications,” he said. In addition, nanopores allow for very long reads (Akeson said his lab is routinely generating 200 Kb sequences) and are ideally suited for sequencing in the field.
This work may not have been possible without the commitment of program officers at NHGRI, whom Akeson praised as “visionaries on this research.” Now that the approach has finally been proven to work, “the whole nanopore sequencing endeavor is going exponentially faster,” he added.
Looking ahead, Akeson predicted that 20 years from now genome sequencing will be so cheap it’ll just be given away to consumers, with profits coming from follow-on interpretation services. One day, he said, we’ll look back on the $1,000 genome or even the $100 genome as very expensive. Sequencing technology will continue to improve over time because it’s “too important a technology to say ‘We’re done,’” he added, noting that accuracy, read length, and throughput will be areas of focus in the coming years.
Nabsys Returns: CEO Barrett Bready on the Importance of Structural Variation
A new podcast from Mendelspod features an interesting interview with Barrett Bready, CEO of electronic mapping firm Nabsys, who emphasizes the growing need to incorporate structural variation data into genome studies.
In the discussion, Bready describes his company’s platform, which relies on voltage-powered, solid-state nanodetectors to generate map-level information. Each nanodetector can cover 1 million bases per second, Bready said, and can be multiplexed for a highly scalable system. It’s a “really high-speed, highly scalable way of getting structural information,” he added.
Bready noted that the genomics community has realized the need for long-range information, estimating that known structural variants now make up about 60 Mb of the human genome, a number that has increased rapidly in the last few years even as the amount of sequence attributed to single-nucleotide variants has stayed the same. Nabsys aims to democratize access to structural information by producing a cost-effective mapping tool for routine analysis of these large variants.
This information will complement short-read data, Bready said, which necessarily sacrifices assembly contiguity due to the need to cut DNA into small fragments prior to sequencing. The Nabsys platform works with high molecular weight DNA to capture extremely long-range information. He also said that electronic mapping data offers more value than optical mapping technologies.
Beta testing for the new platform is expected to begin early next year.
ddRAD-seq Study Explores Behavioral Roles in Speciation
A new preprint from the Hoekstra lab at Harvard makes great use of the double digest RAD-seq protocol to better understand reproductive barriers and speciation in closely related species of mice. Since it was the Hoekstra lab that gave us the ddRAD-seq method, we took notice when this preprint became available.
The paper comes from Hopi Hoekstra and Emily Delaney, a Harvard grad student who is now a postdoctoral fellow at the University of California, Davis. In “Sexual imprinting and speciation in two Peromyscus species,” the scientists describe how sexual imprinting, typically a learned trait, contributes to sexual isolation of Peromyscus leucopus, the white-footed mouse, and P. gossypinus, the cotton mouse.
One area of interest at the start of this project was determining the genetic or learned mechanisms underlying sexual isolation. The scientists “used genomic data to first assess hybridization in the wild and conclusively found that the two species remain genetically distinct in sympatry despite rare hybridization events,” they report. “We find that these mating preferences are learned in one species but may be genetic in the other: P. gossypinus sexually imprints on its parents, but innate biases or social learning affects mating preferences in P. leucopus.”
The study involved using ddRAD-seq to analyze 376 mice. In that workflow, the team used Pippin Prep to select fragments ranging from 265 bp to 335 bp. Libraries were sequenced with the Illumina platform.
“Our study supports an emerging view that sexual imprinting could be vital to the generation and maintenance of sexual reproductive barriers,” the authors conclude. “Examining the role of sexual imprinting in similar cases of speciation driven by sexual reproductive barriers will continue to expand our understanding of the role of behavior in speciation.”
For 10x Genomics Workflow, Broad Institute Uses PippinHT Size Selection
At the Broad Institute, scientist Michelle Cipicchio is part of the technology development team responsible for optimizing new methods or sample types before they’re implemented on the organization’s industrial-scale exome and whole-genome sequencing pipeline. Recently, she’s been working with the Chromium platform from 10x Genomics, and part of getting it ready for production involved implementing the PippinHT for automated DNA size selection.
The technology development team is focusing on whole genome analysis with the Chromium platform. To put the workflow through its paces, they’re running a pilot project on 450 whole blood samples for scientists conducting a large schizophrenia study.
Cipicchio began working with automated DNA size selection from Sage Science at the recommendation of 10x Genomics. “The first step in the 10x process requires the longest DNA molecules that you can acquire,” she says. Since the Broad often uses legacy samples that have gone through multiple freeze/thaw cycles, her team doesn’t have the luxury of expecting high-quality, intact DNA. “For 10x, these long molecules are really necessary and most of our samples don’t have a ton of that kind of material,” Cipicchio adds. She began using BluePippin to remove smaller fragments prior to library construction. The team evaluated four samples with and without Pippin size selection and found that they were consistently able to get longer phasing data with automated size selection. To ramp up capacity so all 450 samples can be run with size selection prior to Chromium processing, the team upgraded to the higher-throughput PippinHT platform.
Optimization work for the workflow is still underway. Cipicchio and the team have run about 100 of the 450 samples so far, so they have lots more opportunities to polish and perfect the protocol before it’s ready for production mode.
PacBio Users: Size Selection Is Essential for Generating Excellent Results
The Sage Science team was delighted to attend and co-sponsor PacBio’s annual East Coast user group meeting in Baltimore last week, particularly since there was a half-day session devoted to our favorite subject: sample prep.
There were plenty of customer presentations during the sample prep workshop, and it was great to see so many PacBio users deploying BluePippin, PippinHT, or SageELF in their sequencing workflows. Melissa Laird Smith from the Icahn School of Medicine at Mount Sinai may have put it best when she told attendees that the two most important components for PacBio sample prep are upfront quality control and size selection. The QC step, of course, evaluates sample quality and quantity to ensure that long-read sequencing is viable. Size selection allows users to really make use of their PacBio platforms by eliminating shorter fragments and letting the sequencer focus on the longest fragments available. Those are often used as seed reads to anchor assemblies, making them critical for achieving optimal contiguity. Smith said her team uses BluePippin or PippinHT to select either 10 kb – 50 kb or 20 kb – 50 kb ranges, depending on the sample.
Sonny Mark, a field application scientist manager at PacBio, also took the opportunity to introduce attendees to the SageHLS extraction and purification instrument we launched earlier this year. Designed expressly for the kind of high molecular weight DNA that single-molecule systems require, the SageHLS platform should be a nice fit for long-read sequencing pipelines. Users simply load their samples (up to four at a time) and the instrument extracts or purifies DNA fragments as long as 2 Mb. The fragments are automatically sorted by size into six collection bins. We anticipate that this product will work well for scientists studying structural rearrangements, copy number variation, haplotype phasing, and other applications for which HMW DNA is advantageous.
During the rest of the user group meeting, we thoroughly enjoyed learning about so many impressive results users have generated with their PacBio systems, from reference-grade genome assemblies to in-depth annotations. Congratulations to everyone who contributed!