At the Broad Institute, scientist Michelle Cipicchio is part of the technology development team responsible for optimizing new methods or sample types before they’re implemented on the organization’s industrial-scale exome and whole-genome sequencing pipeline. Recently, she’s been working with the Chromium platform from 10x Genomics, and part of getting it ready for production involved implementing the PippinHT for automated DNA size selection.
The technology development team is focusing on whole genome analysis with the Chromium platform. To put the workflow through its paces, they’re running a pilot project on 450 whole blood samples for scientists conducting a large schizophrenia study.
Cipicchio began working with automated DNA size selection from Sage Science at the recommendation of 10x Genomics. “The first step in the 10x process requires the longest DNA molecules that you can acquire,” she says. Since the Broad often uses legacy samples that have gone through multiple freeze/thaw cycles, her team doesn’t have the luxury of expecting high-quality, intact DNA. “For 10x, these long molecules are really necessary and most of our samples don’t have a ton of that kind of material,” Cipicchio adds. She began using BluePippin to remove smaller fragments prior to library construction. The team evaluated four samples with and without Pippin size selection and found that they were consistently able to get longer phasing data with automated size selection. To ramp up capacity so all 450 samples can be run with size selection prior to Chromium processing, the team upgraded to the higher-throughput PippinHT platform.
Optimization work for the workflow is still underway. Cipicchio and the team have run about 100 of the 450 samples so far, so they have lots more opportunities to polish and perfect the protocol before it’s ready for production mode.
The Sage Science team was delighted to attend and co-sponsor PacBio’s annual East Coast user group meeting in Baltimore last week, particularly since there was a half-day session devoted to our favorite subject: sample prep.
There were plenty of customer presentations during the sample prep workshop, and it was great to see so many PacBio users deploying BluePippin, PippinHT, or SageELF in their sequencing workflows. Melissa Laird Smith from the Icahn School of Medicine at Mount Sinai may have put it best when she told attendees that the two most important components for PacBio sample prep are upfront quality control and size selection. The QC step, of course, evaluates sample quality and quantity to ensure that long-read sequencing is viable. Size selection allows users to really make use of their PacBio platforms by eliminating shorter fragments and letting the sequencer focus on the longest fragments available. Those are often used as seed reads to anchor assemblies, making them critical for achieving optimal contiguity. Smith said her team uses BluePippin or PippinHT to select either 10 kb – 50 kb or 20 kb – 50 kb ranges, depending on the sample.
Sonny Mark, a field application scientist manager at PacBio, also took the opportunity to introduce attendees to the SageHLS extraction and purification instrument we launched earlier this year. Designed expressly for the kind of high molecular weight DNA that single-molecule systems require, the SageHLS platform should be a nice fit for long-read sequencing pipelines. Users simply load their samples (up to four at a time) and the instrument extracts or purifies DNA fragments as long as 2 Mb. The fragments are automatically sorted by size into six collection bins. We anticipate that this product will work well for scientists studying structural rearrangements, copy number variation, haplotype phasing, and other applications for which HMW DNA is advantageous.
During the rest of the user group meeting, we thoroughly enjoyed learning about so many impressive results users have generated with their PacBio systems, from reference-grade genome assemblies to in-depth annotations. Congratulations to everyone who contributed!
A recently shared preprint demonstrates the effectiveness of size-selection for nanopore sequencing, relying on the PippinHT automated DNA sizing platform for high-throughput pipelines.
“Mapping And Phasing Of Structural Variation In Patient Genomes Using Nanopore Sequencing” comes from lead author Mircea Cretu Stancu and collaborators at University Medical Center Utrecht, the University of Torino, and other institutions. In it, the scientists report results from using an Oxford Nanopore MinION to sequence the genomes of two patients with congenital abnormalities, with a focus on structural variant (SV) detection. “Long-read sequencing is breaking ground for the discovery of SVs at an unprecedented scale and depth,” they write. The team used the PippinHT system to size-select DNA libraries for the second patient prior to sequencing.
The effort, which produced the first known whole human diploid genome assemblies using the MinION, was a success. “We were able to extract all known de novo breakpoint junctions for Patient1, even at relatively low coverage,” the scientists report. For the second patient, the sequence data revealed more complexity for many breakpoint junctions. “We observed that 33.3% of the high confidence set of SVs observed in the Nanopore data could not be found in matching Illumina sequencing data, despite the use of six different variant calling methods,” they add.
The authors note that “these results highlight the feasibility to sequence clinical human samples in real-time on a low-cost device.”
At Creighton University in Omaha, Neb., Dr. Anna Selmecki’s lab explores various fungal species to understand genome instability, pathogenesis, and the acquisition of drug resistance. For these investigations, her team relies heavily on whole genome sequencing, using both the Illumina MiSeq platform and Oxford Nanopore sequencers.
However, Selmecki and her team encountered two major obstacles with their library preparation pipeline. A bead-based size-selection step was decreasing their yield and even with size selection, the MiSeq was still generating very short reads. Using AMPure magnetic beads for sizing, “we always found that we lost a huge percentage of the library,” Selmecki recalls. Even when a Bioanalyzer reported that the library fragment size was in the desired range, sequencing results were consistently shorter than expected.
While both problems stemmed from the sizing step, switching to commonly used manual gel excision was not an option. “From previous experience, I knew that cutting bands out of a gel is horrible and you still lose a lot of your library that way,” Selmecki says. She remembered from her days at the Dana-Farber Cancer Institute that colleagues had raved about an automated size selection instrument from Sage Science.
So Selmecki brought in the BluePippin sizing platform and solved both problems. Recovery is significantly better, and more precise size selection removes the small fragments that had been leading to shorter-than-anticipated MiSeq reads. “The Pippin cleaned that up a lot, ensuring that we’re only amplifying pieces that are much larger,” she says. Using BluePippin for size selection followed by bead-based purification, Selmecki and her team can easily select for insert sizes of 600 bp to 1.2 Kb for their paired-end sequencing pipeline. “We found we got better coverage across the genome,” she adds.
Selmecki’s team is already planning to expand the use of its BluePippin instrument to other molecular biology techniques, such as molecular cloning and library preparations for Oxford Nanopore sequencing. “We’re just doing everything on the Pippin,” she says.
“If people are noticing really uneven coverage across their genomes or they’re having trouble with yield during their library prep, I would recommend considering the Pippin,” Selmecki says.
If you haven’t heard about CATCH by now, it’s time to catch up. Short for Cas9-assisted targeting of chromosome segments, CATCH comes from the lab of Yuval Ebenstein at Tel Aviv University and was first reported in this Nature Communications paper.
Like so many scientists, Ebenstein found himself routinely having to sequence whole-genome data in order to study a region that was too large to amplify easily with PCR. “You end up paying for all this data and eventually using a very small fraction of it,” he recalls. While there are several target-capture and enrichment methods, they all require knowledge of the sequence of interest. But for Ebenstein, who was interested in highly repetitive DNA, those methods didn’t work.
He cast about for a new approach, and found inspiration in the burgeoning CRISPR field. “We came up with this idea that you can cut the flanking region with Cas9 and then use gel electrophoresis to extract only the fragment that you’re looking for,” he says. The method involves RNA-guided Cas9 to make two cuts to pull out the specific region of interest, followed by a size-separation step to remove off-target fragments. It’s geared toward genomic regions that are 50 Kb or larger. Together with Ting Zhu and Chunbo Lou from Tsinghua University, the team began generating custom BACs by combining CATCH with the Gibson assembly to cut the desired piece of DNA and clone it into a vector in a streamlined process.
Since then, Ebenstein and many other labs using CATCH have been broadening the base of applications. It’s particularly attractive for third-gen sequencing platforms; because they typically have lower throughput, “it’s especially beneficial to only probe what you’re interested in and not waste your sequencing depth on regions that are not of interest,” he says. “This is the power of CATCH: no matter how complex the region or what structural variations are in it, if you know the flanking region, you can fish it out and analyze it.”
An early drawback with the CATCH protocol was its use of gel electrophoresis, which Ebenstein refers to as “a prehistoric technology.” Size selection is essential for the method, but users must perform the very cumbersome pulsed-field gel electrophoresis technique. That’s where the SageHLS instrument came in. “Sage basically eliminates all of that,” Ebenstein says. The automated platform handles everything inside the gel, and collects size fractions without needing a visible band. “The recovery is phenomenal,” he adds. “You can use a very low amount of starting material and you still get a meaningful amount of DNA for further analysis.”
The protocol for using the SageHLS instrument with CATCH (something we refer to as HLS-CATCH) is still undergoing optimization, with Ebenstein’s team putting the new platform through its paces.
In the meantime, the community continues to push ahead with CATCH. It is already in development in several labs for studies of plants, which have highly repetitive DNA. Ebenstein and others are working to make the protocol robust for use in human genetics as well, targeting important genes such as BRCA1 and BRCA2. He says that the SageHLS instrument will likely be an important factor in those efforts.
How can you tell if CATCH is right for you? Ebenstein has a simple rule: “If you can PCR it, PCR it,” he says. “If you can’t, then you probably need CATCH if you don’t want to go bankrupt.”