Long-read sequencing is steadily gaining traction as scientists realize its value for resolving regions of the genome that are intractable with short-read methods. From structural variants to repetitive or GC-rich regions, many clinically important stretches of the human genome just aren’t a good fit for traditional Illumina or Ion Torrent platforms.
That’s why we’re really pleased that The Jackson Laboratory for Genomic Medicine is hosting a new Long-Read Sequencing Workshop next week to help scientists understand protocols and strategies for using these tools. Held at the Farmington, Conn., location, the event features a who’s who of long-read pioneers presenting tutorial information about genome assembly, structural variant identification, variant phasing, and more. Attendees will also hear from scientists at PacBio, 10x Genomics, and other leading players in long-read and long-range technologies.
Long-read platforms offer unique sample preparation challenges, and our automated DNA size selection instruments have become a crowd favorite for maximizing the read lengths they generate. Sage is a proud sponsor of the workshop, and our CSO Chris Boles will also be giving a talk about Cas9-assisted sample prep strategies for long-read sequencing. We’re particularly looking forward to touring the new sequencing center and research facility.
We hope to see you there!
There’s a great new podcast interview with Stanford clinical geneticist and oncologist Hanlee Ji about targeting extremely large stretches of DNA by combining CRISPR methods, automated DNA purification with the SageHLS instrument, and 10x Genomics technology. The approach works with minute amounts of DNA, far less than is needed for long-read sequencing platforms, but can still resolve large and complex structural variations. (For readers who want a deeper dive, his lab also posted slides from an AGBT presentation of this work.)
As Ji described it, the method targets sub-megabase regions — as long as about 500 Kb — and his team has validated it with the BRCA1 gene, the MHC locus, and other examples. By getting large, intact DNA molecules, scientists can delineate structural variants, rearrangements, and more. The long-range information generated “would have been impossible” using short-read sequencing technology that requires fragmenting these large molecules, Ji noted. The protocol also makes it feasible to enrich for tumor DNA or other types of DNA that might get drowned out by normal DNA. “Our approach represents a solution to be able to pull that out, tease that information out, even when you have these complex mixtures and your event of interest is underrepresented compared to the normal genome,” he said.
Such information is clinically useful for everything from cancer to schizophrenia. As Ji pointed out, this method is a strong alternative to the FISH technology typically used for pediatric congenital disorders, finding somatic rearrangements in cancer, and more. His team’s work offers higher resolution and potentially lower cost to generate clinically relevant results.
The interview wraps up with a segment about precision medicine and its most promising applications in the oncology realm. Ji said he is particularly excited about personalized cancer vaccines and immunotherapy, and he stressed the importance of enrolling more patients in clinical trials. He also urged cancer centers to bank more samples than they do now, noting that samples considered unsuitable for use now could one day become important sources for population studies as technology continues to improve.
Good protocols are the currency of any lab. If your research involves working with small RNAs, be sure to check out this carefully documented protocol published in Extracellular RNA, part of Springer Nature’s Methods in Molecular Biology book series.
The protocol can be found in the chapter “Preparation of Small RNA NGS Libraries from Biofluids.” Authors Alton Etheridge, Kai Wang, David Baxter, and David Galas present an NGS protocol designed for use with low-input biofluid samples containing extracellular RNAs. The protocol “has modifications designed to reduce the sequence-specific bias typically encountered with commercial small RNA library construction kits,” the authors report. The result is increased library diversity and more complete RNA profiles.
As the writers note, extracellular RNA has great promise for use as a biomarker or diagnostic, offering a non-invasive approach to everything from monitoring cancer to prenatal testing. The challenge for scientists, though, is that typical small RNA library prep kits require higher concentrations than can be gleaned from biofluid samples such as plasma or serum. This new protocol optimizes steps for samples that do not meet those concentration thresholds.
We’re pleased to see that PippinHT is featured heavily in the protocol for DNA sizing steps following PCR amplification. Indeed, the authors report that “use of an automated size selection instrument like the [Pippin Prep], BluePippin, or PippinHT can reduce variability introduced during gel excision.” We couldn’t agree more!
We’re learning as we go: that’s the message from Winston Timp, assistant professor at Johns Hopkins University, about how labs are handling the new demands placed on sample prep techniques by ever-changing sequencing technologies. Timp’s impressive results, particularly with handling DNA from difficult organisms like trees, make his advice relevant to anyone interested in working with high molecular weight DNA. We chatted with him about his approach.
Q: How has nanopore technology changed what’s possible in genomics?
A: Nanopore sequencing offers us a unique opportunity because the read length is limited only by the length of DNA that you can prepare and then the length of DNA you can deliver to the pore. People have generated megabase-scale DNA reads. That’s incredible because that means we’re going to be able to sequence through large sections of chromosomes that were heretofore impossible to reach. It’s going to make things like genome assembly trivial because you can assemble an E. coli genome from, say, five or six reads.
Q: What new demands are being placed on sample prep by long-read technologies?
A: Part of the problem is getting the reads to the sequencing instrument, whether that’s a 10x Genomics instrument, or PacBio, or a nanopore sequencing instrument. The other part of the problem is extracting these long molecules without too much trouble and then characterizing and size selecting them, which is what Sage excels at. These issues are coming to the forefront because of the further development of sequencing technologies and the fact that the yields of some of these sequencing technologies have increased recently. Nanopore and PacBio sequencing yields have increased substantially in the past year or two, while Illumina prices continue to drop — which allows 10x to leverage its methodology to generate long sequencing reads. In all these cases, you need to start with high molecular weight DNA.
Q: That challenge is even worse for plant genomes. Why?
A: When you’re dealing with plant specimens, they often have all these polyphenolic and polysaccharide compounds so it’s hard to get a nice clean prep of DNA. Using native DNA for nanopore sequencing — DNA that hasn’t been PCR amplified — requires that your DNA be really clean or else it could easily poison the sequencer such that you’ll get lower yields.
Q: How have you found methods that address these challenges?
A: We’re learning to do it as we go. For doing high molecular weight DNA extractions, some of the tools and technologies, like pulsed-field gels, are old and some are new. It’s a mix to get at questions we couldn’t access before. It’s a great time to be doing science.
Q: What approach is your lab using for these tree projects?
A: We paired with this group here in Baltimore called Circulomics. They spun out of a lab at Johns Hopkins and developed a material called Nanobind which is able to relatively easily purify high molecular weight DNA. We are trying to generate genomes for the giant sequoia and for the coastal redwood, but their leaves are difficult to extract DNA from. We’re cracking open the plant cells and extracting out the nuclei, and then taking those nuclei and cleaning up what’s left using Nanobind to really enrich for nice high molecular weight DNA. We consistently get DNA that looks like it’s at least 100 kilobases long. We can run this on the nanopore sequencer and get yields on the order of 8 gigabases.
Q: What’s your advice for other scientists who want to work with HMW DNA?
A: It’s always useful to collaborate. We wouldn’t be able to do this without our collaborations, both with the bioinformaticists who do the assembly work, the plant biologists with deep biological knowledge, and the materials scientists at Circulomics. Also, you should always think about what you actually need. Sure, you might be able to try for megabase-scale sequencing reads using even older-school technologies like spooling up DNA on a glass rod. But for sequencing the sequoia we’re satisfied with reads on the order of tens of kilobases long because that’s still in excess of what was previously possible. You have to define the parameters of what it is you’re going after and not get too greedy. You’re always going to be sacrificing something. Either you need to use more material to get the high molecular weight, or you might have more contaminants or you might have less yield but you’re going to get longer reads. There’s always a trade-off.
If you missed Ami Bhatt’s talk at AGBT last month, a bioRxiv preprint is a great way to catch up on her team’s impressive work characterizing microbial communities — from the human gut to the sea floor. Bhatt and her colleagues developed Athena, a de novo assembler that can produce high-quality individual draft genomes from even very complex microbiomes without conflating species.
In “Culture-free generation of microbial genomes from human and marine microbiomes,” senior author Bhatt, lead author Alex Bishara, and colleagues from Stanford University and the University of California, San Diego, present experimental validation of Athena and the rest of the microbiome elucidation pipeline they created. The process can be conducted “at a price point that gives it relevance to the broader microbiome community,” the team notes. We’re proud that they chose the BluePippin platform for their size selection needs prior to analysis with the 10x Genomics Chromium system.
“Metagenomic shotgun sequencing has facilitated partial reconstruction of strain-level community structure and functional repertoire,” the authors write. “Unfortunately, it remains difficult to cost-effectively produce high quality genome drafts for individual microbes without isolation and culture.”
To address this challenge, they used 10x technology to produce read clouds, defined by the scientists as “short-read sequences containing long-range information.” Combined with the Athena assembler, this approach produces “the most complete individual genome drafts,” they report. They tested the method on a mock microbial community, and then validated it with real samples to analyze both the human intestinal tract and sediment from the sea floor. “We find that our approach combines the advantages of both short read and [synthetic long read] approaches, and is capable of producing many highly contiguous drafts (>200kb N50, <10 contigs) with as little as 20x raw short-read coverage,” the team writes. For the marine sample, their approach was the only of many tested that could produce useful, contiguous individual assemblies.
“We anticipate that our approach will be a significant step forward in enabling comparative genomics for bacteria, enabling fine-grained inspection of microbial evolution within complex communities,” the scientists conclude.