Podcast: Blogger Keith Robison on the State of Sequencing
A new podcast from Mendelspod takes a look at the state of sequencing in 2018. It’s a lively and interesting discussion about the current and future landscape of genomics between host Theral Timpson and Keith Robison, founder of the Omics! Omics! blog and principal scientist at Warp Drive Bio.
Robison kicked off the conversation remembering his time as a grad student 30 years ago, when he thought sequencing was painfully slow. Fast-forward to today, and he sees modern sequencing tools as “just mind-blowing.” The problem, he told Timpson, is that “the more you get, the more you want.”
Much of the discussion focuses on Illumina, PacBio, and Oxford Nanopore — Robison’s “big three” sequencing platforms today. As the market leader, Illumina remains interesting to users by continuing to expand its lineup and broaden its use in the clinical realm, which might actually outpace the company’s research revenue for the first time this year. Robison thinks the new iSeq system, based on Illumina’s Firefly project, will be very attractive to scientists who can’t afford the capital outlay for a heavier-duty system. “There’s a lot of market at the lower end,” he said. Still, though, he cautioned that even 800-pound gorillas can be displaced. “I think [Illumina] should be worried about Oxford in the long run,” he added.
PacBio got praise for pioneering long-read sequencing. “People had to be convinced that long, error-rich reads could yield really high-quality data with consensus,” Robison said. “PacBio’s done some very nice demonstrations showing these very important, medically relevant sequence variations that you just can’t get to with short reads.” For genome assembly, transcriptomics, and other uses, the company’s platform makes a huge difference — including a sequencing project that he struggled with for two years until long reads saved the day, he said. While PacBio won’t be able to compete with Illumina on throughput, its sequencers are allowing scientists to see things they never could have with short reads.
But when it comes to PacBio, Robison said, “Oxford is nipping at their heels.” He noted that Oxford data is improving rapidly, and recent examples such as the human genome paper include really impressive science. The MHC locus, for instance, is represented in a single contig. Robison said there’s still tremendous variation in the yield users are getting, but that if the PromethIon performs as claimed, it would be a huge advance.
The interview wrapped up with Robison envisioning a world where sequencing is so affordable that grade schools could afford to use it in classrooms, and so simple that scientists could use it as easily as they use a pH meter today. Now that’s a world we’d like to see!
At AGBT, Scientists Offer Data on Size Selection for HMW DNA
Even Orlando’s sky-high humidity can’t get us down during a fast-paced AGBT meeting where size selection and HMW DNA have been front and center since the opening session.
Last night we enjoyed a plenary talk from Ami Bhatt, a scientist and clinician at Stanford who presented exciting data from studies of microbial communities, both in patient populations and in samples from the ocean floor. Her primary challenge, that metagenomics alone cannot link taxonomy to function, was largely addressed with a combination of 10x Genomics technology, size-selected HMW DNA, and a custom assembler from her lab called Athena designed specifically for de novo metagenomics. Read cloud sequencing, as the pipeline is known, outperformed other workflows and produced useful data that made it possible for Bhatt’s team to understand which microbes were in a sample and what they were doing. Preprints describing much of this work are available here and here.
Today, GiWon Shin from Stanford continued the emphasis on size selection and HMW DNA. A scientist in Hanjee Li’s lab, Shin presented a method developed for targeting nearly megabase-size regions for analysis with the 10x Genomics platform. Targeting is handled with customized Cas9 guides, and DNA extraction from intact cells is performed using the SageHLS instrument. Barcoding reads allows the scientists to phase and assemble data. Shin presented results from evaluations of this process targeting BRCA1, the MHC locus, and 38 candidate structural variants. The Sage team collaborated with the Li lab on this project for the HLS-CATCH method, and you can dive into the data on this page.
AGBT is still in full swing, and we can’t wait to hear more great science in the days to come! If you’re at the meeting, be sure to stop by suite #1765 to meet the Sage Science team and learn more about HLS-CATCH.
SageELF Enables Targeted RenSeq Capture for PacBio Sequencing
We’ve had targeted capture on the brain lately as we work with collaborators to put the finishing touches on our HLS-CATCH method for selecting large genomic elements using Cas9 guides and the SageHLS system. So today, we wanted to revisit a great capture paper from scientists at the Earlham Institute and other organizations.
“Targeted capture and sequencing of gene-sized DNA molecules” came out in BioTechniques a year ago from lead author Michael Giolai, senior author Matthew Clark, and collaborators. It’s a terrific effort by scientists who know the value of nailing down every piece of a protocol in order to get the most robust, reliable results. For this method, the team focused on sample prep for the PacBio sequencing pipeline, and they made extensive use of the SageELF platform to optimize results.
Scientists began with RenSeq, short for R-gene enrichment sequencing, which was originally developed to help short-read sequencing platforms conquer challenging, highly repetitive sequence regions. In theory, incorporating long-read sequencing would be even more successful. “Here, we demonstrate that the use of RenSeq on DNA fragments up to 7-kb long in combination with PacBio sequencing results in full contig assemblies covering entire R-gene clusters with inter- and intragenic regions,” the team reported in BioTechniques. “Our approach can be used to capture, sequence, and distinguish between similar members of the NB-LRR gene family—key genes in plant immune systems.”
Giolai et al. focused on key steps in the RenSeq capture process: shearing, size selection, and PCR amplification. For more information about how the SageELF instrument makes a difference in this method, check out table 1 and figures 1 and 2 in the paper. Ultimately, the approach allows the PacBio platform to sequence only the longest fragments, generating the longest reads possible for a library and helping distinguish even high-identity sequences.
“This makes the optimized RenSeq protocol not only interesting for very accurate long-read R-gene enrichment,” the scientists noted, “but also as a robust and reproducible technique for the hybridization-based specific capture of fragments up to 7-kb in any genomic context—and it could be used for gap filling, other types of genome finishing, or structural variation verification.”
Fireworks, Genomics, and HLS-CATCH at AGBT in Orlando
The regular Super Bowl may have gone down last weekend, but the genomics Super Bowl is still to come with next week’s Advances in Genome Biology & Technology meeting in Orlando. The Sage Science team can’t wait. AGBT is our favorite one stop shop for cool new technology, amazing science, and legendary parties.
This year, we’re particularly excited about AGBT because one of our scientific collaborators, GiWon Shin from Stanford University, will be giving a talk about work involving a method we’ve been developing to target large genomic fragments and prepare them for the 10x Genomics workflow. We call it HLS-CATCH, and it involves our newest platform, the SageHLS instrument. If you’ll be at the conference, don’t miss his presentation: “Assembly-based structural variation and haplotypes from targeted sub-megabase DNA molecules” at 3:40 pm in the Tuesday afternoon plenary session about evolutionary genomes. (It’s right after Ed Green’s talk about pygmy genomes, sure to be a crowd pleaser.)
To learn more about HLS-CATCH — or just to enjoy Disney’s daily fireworks show at 8:00 pm — please visit us in our suite, room #1765 at the Hilton. The Sage team will be on hand to answer questions and help you determine whether our automated DNA sizing and preparation instruments could help with your research.
Finally, we’ll be presenting a poster (#305) about an integrated workflow for nanopore sequencing. Be sure to check it out during a dessert break or wine reception.
Happy AGBT, everyone!
Expert Q&A: Smithsonian Scientists on Evaluating DNA Quality
With the rapid adoption of long-read sequencing and other technologies that require high molecular weight DNA input, there is rising demand for high-quality samples from biorepositories around the world. To accommodate this, scientists at the Smithsonian Institution’s National Museum of Natural History in Washington, DC, developed and published a rapid, inexpensive method for quantifying DNA quality. We spoke with co-authors Jonathan Coddington and Daniel Mulcahy to learn more about this new technique.
Q: Why is there interest in working with high molecular weight DNA?
JC: We work for the Smithsonian Institution, so we’re in the forever business. We want to build a library of the highest-quality genetic resources for biodiversity. There are about 1.9 million species described and many, many millions more out there that haven’t been described — and we’re facing the sixth extinction crisis. So it’s important to collect and preserve samples that would have the maximum utility to science for many decades into the future.
DM: We’re interested in high molecular weight DNA because now it’s very practical to sequence the entire genome of non-model organisms. Even though you shear the DNA up to make the libraries, it’s better to start with big chunks of complete DNA so you’re shearing randomly across the genome, not as it’s been sheared through wear and tear.
JC: We’re also anticipating better technology in the future. Long-read technology is upon us, so everything from what you do in the field until when you extract the DNA needs to be adapted for very high molecular weight DNA.
Q: Before your method was published, how were people assessing sample quality?
DM: A lot of people were putting it on a spectrophotometer and looking at the concentrations. Another common way was to just PCR amplify short fragments, but that doesn’t really tell you if it’s high molecular weight DNA. There are more expensive methods as well, but we were trying to come up with an inexpensive way to quickly assess a lot of tissues based on DNA size.
JC: The concern with DNA quality has been around ever since the beginning of sequencing, but because PCR was so dominant people only measured the quality of their sample based on the marker that they wanted to amplify, not whole genomic DNA. An initial baseline assessment of the size frequency distribution for whole genomic DNA is almost never done in biodiversity biorepositories. We need to move the community to thinking about whole genomes.
Q: Have you seen signs that people are going in that direction?
JC: The sequencing research community has really moved beyond PCR marker-based sequencing. They want to do whole genomes and they’re looking to us to supply that high-quality DNA.
Q: How will having DNA quality information change how genomic scientists work?
DM: Our method can tell you how much of your sample was greater than the band you chose. Ultimately we’ll get to a point where we can sequence an entire chromosome; ideally you would want to be able to run the DNA out to the level where you can see various-sized chromosomes if it’s completely intact DNA. That’s where the future of this is going.
JC: Our paper was designed to enable a really cheap, really fast way for our community to inform people who want to do experiments what sort of quality DNA we might have. It’s a gel electrophoresis-based method because everybody can run gels.
Q: How will these quality assessments affect biorepositories?
JC: Dan and I represent a rapidly growing community of museums and other kinds of biorepositories. The Smithsonian, for example, has publicly about 90,000 samples. We’re also supporting an international network of biorepositories using a common data standard to report the existence and quality of samples that we all hold. The global total is about 615,000 publicly discoverable genetic resources. When we’re loaning out really rare samples, we want to know what you want to do with it. If you need the high-end stuff for a really good reason, we’ll give it to you.
DM: But if they don’t need it, we’ll give them lower-quality DNA to save the better stuff for someone who wants to do whole-genome assemblies.
Q: In the paper, you also looked at preservation methods. What were your key findings?
DM: That was a very preliminary study. If you stick a piece of muscle in the freezer and freeze it, it’s probably as good as it can be. But when you thaw it out, those proteins start to break down and the enzymes start to chew up the DNA.
JC: The DNases can still be active on the thaw cycle. I think the community assumption was that fresh frozen in liquid nitrogen was as good as it gets. What we found that was surprising was that saturating it in some sort of preservation buffer prior to freezing produces results that are better than just freezing. That’s the intriguing implication.
DM: We’re conducting a much longer-term study now. We’d like to see other people test this with other organisms, other buffers, and different storage methods.