We’re learning as we go: that’s the message from Winston Timp, assistant professor at Johns Hopkins University, about how labs are handling the new demands placed on sample prep techniques by ever-changing sequencing technologies. Timp’s impressive results, particularly with handling DNA from difficult organisms like trees, make his advice relevant to anyone interested in working with high molecular weight DNA. We chatted with him about his approach.
Q: How has nanopore technology changed what’s possible in genomics?
A: Nanopore sequencing offers us a unique opportunity because the read length is limited only by the length of DNA that you can prepare and then the length of DNA you can deliver to the pore. People have generated megabase-scale DNA reads. That’s incredible because that means we’re going to be able to sequence through large sections of chromosomes that were heretofore impossible to reach. It’s going to make things like genome assembly trivial because you can assemble an E. coli genome from, say, five or six reads.
Q: What new demands are being placed on sample prep by long-read technologies?
A: Part of the problem is getting the reads to the sequencing instrument, whether that’s a 10x Genomics instrument, or PacBio, or a nanopore sequencing instrument. The other part of the problem is extracting these long molecules without too much trouble and then characterizing and size selecting them, which is what Sage excels at. These issues are coming to the forefront because of the further development of sequencing technologies and the fact that the yields of some of these sequencing technologies have increased recently. Nanopore and PacBio sequencing yields have increased substantially in the past year or two, while Illumina prices continue to drop — which allows 10x to leverage its methodology to generate long sequencing reads. In all these cases, you need to start with high molecular weight DNA.
Q: That challenge is even worse for plant genomes. Why?
A: When you’re dealing with plant specimens, they often have all these polyphenolic and polysaccharide compounds so it’s hard to get a nice clean prep of DNA. Using native DNA for nanopore sequencing — DNA that hasn’t been PCR amplified — requires that your DNA be really clean or else it could easily poison the sequencer such that you’ll get lower yields.
Q: How have you found methods that address these challenges?
A: We’re learning to do it as we go. For doing high molecular weight DNA extractions, some of the tools and technologies, like pulsed-field gels, are old and some are new. It’s a mix to get at questions we couldn’t access before. It’s a great time to be doing science.
Q: What approach is your lab using for these tree projects?
A: We paired with this group here in Baltimore called Circulomics. They spun out of a lab at Johns Hopkins and developed a material called Nanobind which is able to relatively easily purify high molecular weight DNA. We are trying to generate genomes for the giant sequoia and for the coastal redwood, but their leaves are difficult to extract DNA from. We’re cracking open the plant cells and extracting out the nuclei, and then taking those nuclei and cleaning up what’s left using Nanobind to really enrich for nice high molecular weight DNA. We consistently get DNA that looks like it’s at least 100 kilobases long. We can run this on the nanopore sequencer and get yields on the order of 8 gigabases.
Q: What’s your advice for other scientists who want to work with HMW DNA?
A: It’s always useful to collaborate. We wouldn’t be able to do this without our collaborations, both with the bioinformaticists who do the assembly work, the plant biologists with deep biological knowledge, and the materials scientists at Circulomics. Also, you should always think about what you actually need. Sure, you might be able to try for megabase-scale sequencing reads using even older-school technologies like spooling up DNA on a glass rod. But for sequencing the sequoia we’re satisfied with reads on the order of tens of kilobases long because that’s still in excess of what was previously possible. You have to define the parameters of what it is you’re going after and not get too greedy. You’re always going to be sacrificing something. Either you need to use more material to get the high molecular weight, or you might have more contaminants or you might have less yield but you’re going to get longer reads. There’s always a trade-off.
If you missed Ami Bhatt’s talk at AGBT last month, a bioRxiv preprint is a great way to catch up on her team’s impressive work characterizing microbial communities — from the human gut to the sea floor. Bhatt and her colleagues developed Athena, a de novo assembler that can produce high-quality individual draft genomes from even very complex microbiomes without conflating species.
In “Culture-free generation of microbial genomes from human and marine microbiomes,” senior author Bhatt, lead author Alex Bishara, and colleagues from Stanford University and the University of California, San Diego, present experimental validation of Athena and the rest of the microbiome elucidation pipeline they created. The process can be conducted “at a price point that gives it relevance to the broader microbiome community,” the team notes. We’re proud that they chose the BluePippin platform for their size selection needs prior to analysis with the 10x Genomics Chromium system.
“Metagenomic shotgun sequencing has facilitated partial reconstruction of strain-level community structure and functional repertoire,” the authors write. “Unfortunately, it remains difficult to cost-effectively produce high quality genome drafts for individual microbes without isolation and culture.”
To address this challenge, they used 10x technology to produce read clouds, defined by the scientists as “short-read sequences containing long-range information.” Combined with the Athena assembler, this approach produces “the most complete individual genome drafts,” they report. They tested the method on a mock microbial community, and then validated it with real samples to analyze both the human intestinal tract and sediment from the sea floor. “We find that our approach combines the advantages of both short read and [synthetic long read] approaches, and is capable of producing many highly contiguous drafts (>200kb N50, <10 contigs) with as little as 20x raw short-read coverage,” the team writes. For the marine sample, their approach was the only of many tested that could produce useful, contiguous individual assemblies.
“We anticipate that our approach will be a significant step forward in enabling comparative genomics for bacteria, enabling fine-grained inspection of microbial evolution within complex communities,” the scientists conclude.
A new podcast from Mendelspod takes a look at the state of sequencing in 2018. It’s a lively and interesting discussion about the current and future landscape of genomics between host Theral Timpson and Keith Robison, founder of the Omics! Omics! blog and principal scientist at Warp Drive Bio.
Robison kicked off the conversation remembering his time as a grad student 30 years ago, when he thought sequencing was painfully slow. Fast-forward to today, and he sees modern sequencing tools as “just mind-blowing.” The problem, he told Timpson, is that “the more you get, the more you want.”
Much of the discussion focuses on Illumina, PacBio, and Oxford Nanopore — Robison’s “big three” sequencing platforms today. As the market leader, Illumina remains interesting to users by continuing to expand its lineup and broaden its use in the clinical realm, which might actually outpace the company’s research revenue for the first time this year. Robison thinks the new iSeq system, based on Illumina’s Firefly project, will be very attractive to scientists who can’t afford the capital outlay for a heavier-duty system. “There’s a lot of market at the lower end,” he said. Still, though, he cautioned that even 800-pound gorillas can be displaced. “I think [Illumina] should be worried about Oxford in the long run,” he added.
PacBio got praise for pioneering long-read sequencing. “People had to be convinced that long, error-rich reads could yield really high-quality data with consensus,” Robison said. “PacBio’s done some very nice demonstrations showing these very important, medically relevant sequence variations that you just can’t get to with short reads.” For genome assembly, transcriptomics, and other uses, the company’s platform makes a huge difference — including a sequencing project that he struggled with for two years until long reads saved the day, he said. While PacBio won’t be able to compete with Illumina on throughput, its sequencers are allowing scientists to see things they never could have with short reads.
But when it comes to PacBio, Robison said, “Oxford is nipping at their heels.” He noted that Oxford data is improving rapidly, and recent examples such as the human genome paper include really impressive science. The MHC locus, for instance, is represented in a single contig. Robison said there’s still tremendous variation in the yield users are getting, but that if the PromethIon performs as claimed, it would be a huge advance.
The interview wrapped up with Robison envisioning a world where sequencing is so affordable that grade schools could afford to use it in classrooms, and so simple that scientists could use it as easily as they use a pH meter today. Now that’s a world we’d like to see!
Even Orlando’s sky-high humidity can’t get us down during a fast-paced AGBT meeting where size selection and HMW DNA have been front and center since the opening session.
Last night we enjoyed a plenary talk from Ami Bhatt, a scientist and clinician at Stanford who presented exciting data from studies of microbial communities, both in patient populations and in samples from the ocean floor. Her primary challenge, that metagenomics alone cannot link taxonomy to function, was largely addressed with a combination of 10x Genomics technology, size-selected HMW DNA, and a custom assembler from her lab called Athena designed specifically for de novo metagenomics. Read cloud sequencing, as the pipeline is known, outperformed other workflows and produced useful data that made it possible for Bhatt’s team to understand which microbes were in a sample and what they were doing. Preprints describing much of this work are available here and here.
Today, GiWon Shin from Stanford continued the emphasis on size selection and HMW DNA. A scientist in Hanjee Li’s lab, Shin presented a method developed for targeting nearly megabase-size regions for analysis with the 10x Genomics platform. Targeting is handled with customized Cas9 guides, and DNA extraction from intact cells is performed using the SageHLS instrument. Barcoding reads allows the scientists to phase and assemble data. Shin presented results from evaluations of this process targeting BRCA1, the MHC locus, and 38 candidate structural variants. The Sage team collaborated with the Li lab on this project for the HLS-CATCH method, and you can dive into the data on this page.
AGBT is still in full swing, and we can’t wait to hear more great science in the days to come! If you’re at the meeting, be sure to stop by suite #1765 to meet the Sage Science team and learn more about HLS-CATCH.
We’ve had targeted capture on the brain lately as we work with collaborators to put the finishing touches on our HLS-CATCH method for selecting large genomic elements using Cas9 guides and the SageHLS system. So today, we wanted to revisit a great capture paper from scientists at the Earlham Institute and other organizations.
“Targeted capture and sequencing of gene-sized DNA molecules” came out in BioTechniques a year ago from lead author Michael Giolai, senior author Matthew Clark, and collaborators. It’s a terrific effort by scientists who know the value of nailing down every piece of a protocol in order to get the most robust, reliable results. For this method, the team focused on sample prep for the PacBio sequencing pipeline, and they made extensive use of the SageELF platform to optimize results.
Scientists began with RenSeq, short for R-gene enrichment sequencing, which was originally developed to help short-read sequencing platforms conquer challenging, highly repetitive sequence regions. In theory, incorporating long-read sequencing would be even more successful. “Here, we demonstrate that the use of RenSeq on DNA fragments up to 7-kb long in combination with PacBio sequencing results in full contig assemblies covering entire R-gene clusters with inter- and intragenic regions,” the team reported in BioTechniques. “Our approach can be used to capture, sequence, and distinguish between similar members of the NB-LRR gene family—key genes in plant immune systems.”
Giolai et al. focused on key steps in the RenSeq capture process: shearing, size selection, and PCR amplification. For more information about how the SageELF instrument makes a difference in this method, check out table 1 and figures 1 and 2 in the paper. Ultimately, the approach allows the PacBio platform to sequence only the longest fragments, generating the longest reads possible for a library and helping distinguish even high-identity sequences.
“This makes the optimized RenSeq protocol not only interesting for very accurate long-read R-gene enrichment,” the scientists noted, “but also as a robust and reproducible technique for the hybridization-based specific capture of fragments up to 7-kb in any genomic context—and it could be used for gap filling, other types of genome finishing, or structural variation verification.”