PacBio users have been regularly serving up new microbial genome assemblies, and we’re glad to see that they’re using our BluePippin automated DNA size selection instrument to get the best results.
These are just some of the genome announcements published in the last few months:
A pathogen affecting economically important crops, such as melons and gourds, which had not previously been sequenced. Scientists present a draft sequence containing seven contigs and many phage or prophage elements.
Clostridium sporogenes DSM 795T
Researchers published this first whole genome sequence of this bacterium, a nontoxigenic relative of Clostridium botulinum. The genome was finished into a single contig of about 4 Mb and contains dozens of identical sequence copies greater than 1,000 bases.
A member of a group of sulfur-oxidizing bacteria, Sedimenticola thiotaurini strain SIP-G1 was sequenced and presented as a closed genome assembly. Scientists identified pathways not found in other members of this genus.
Scientists sequenced and annotated Microcystis aeruginosa NIES-2549, a freshwater cyanobacterium. The genome is almost 4.3 Mb and was sequenced to help understand the species’ ability to produce hepatotoxic cyanotoxins, which cause major environmental damage.
Escherichia coli O96:H19
This E. coli strain was responsible for a foodborne outbreak in Milan last year in which the organism’s pathogenicity was far more severe than usual. The published genome sequence is fully closed and allows scientists to study its acquired virulence.
In a Biotechniques paper this month, scientists from The Genome Analysis Centre describe a new method for mate-pair sequencing that saves time and money while decreasing the amount of input DNA required. The method is based on SageELF, which automatically generates 12 contiguous fractions of DNA from a single sample.
Led by Darren Heavens, the authors report that length and quantity of input DNA have been problematic factors in the preparation of long mate-pair (LMP) libraries for next-gen sequencing. To address that issue, they adjusted the sample prep protocol to use SageELF instead of conventional gel-based sizing, and then chose the fraction that best met their target fragment length.
“Using the SageELF streamlines the library construction process, allowing LMP libraries >10 kb to be constructed in under 2 days with <10 µg input material,” the scientists write. “For many genome projects, multiple insert size LMP libraries are required, and the ability to construct up to 12 discretely sized libraries for a combined reagent cost of $1270 compared with the reagent cost of $715 for a single insert size LMP library highlights the potential cost savings.” The protocol was developed to optimize the Nextera-based long mate-pair kit for library construction. In addition to the initial round of size selection with SageELF, the scientists conduct another sizing step on the BluePippin prior to Illumina sequencing to ensure selection of DNA fragments best suited for the platform. The protocol pays off by saving time and money in library prep, as well as by reducing the need for larger volumes of input DNA. It also leads to better sequencing results. “Accurately determining the size and span of the inserts for mate pair libraries simplifies the scaffolding problem, enabling the assembly of longer, more precise sequences with fewer non-determined bases (runs of N bases), empowering all subsequent downstream analysis,” the scientists report. Check out the full paper: “A method to simultaneously construct up to 12 differently sized Illumina Nextera long mate pair libraries with reduced DNA input, time, and cost.”
And for more on the TGAC team, check out this brief profile.
A team of scientists from the Icahn School of Medicine at Mount Sinai, Weill Cornell Medical College, Cold Spring Harbor Laboratory, European Molecular Biology Laboratory, and other institutions published the first analysis of a diploid human genome produced by combining single-molecule technologies.
Lead authors Matthew Pendleton, Robert Sebra, Andy Pang, and Ajay Ummat, along with their colleagues, report that integrating results from different technology platforms led to significant improvements in contiguity, with scaffold N50 values nearly 30 Mb. The high-quality assembly also allowed the team to find complex structural variants that can’t be detected in assemblies produced from short-read data.
The scientists used SMRT® Sequencing from Pacific Biosciences as well as genome maps from BioNano Genomics. “Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality,” they report.
In the study, which sequenced the well-characterized NA12878 genome, scientists used BluePippin to perform size selection prior to SMRT Sequencing. By removing DNA fragments smaller than 7 Kb, the team generated extraordinarily long reads with the PacBio platform. “Without selection, smaller 2000 – 7000 bp molecules dominate the zero-mode waveguide loading distribution, decreasing the sub-readlength” that can be achieved with the sequencer, the authors write in the supplementary materials.
For more, check out the full paper here: “Assembly and diploid architecture of an individual human genome via single-molecule technologies.”
Last week we attended the first-ever Festival of Genomics, a new series of meetings taking place in Boston, San Mateo, and London. This conference was held in Boston’s biggest convention center, and featured a music festival kind of approach, with four stages of concurrent sessions in addition to plenty of other activities.
The Sage Science team was out in force, and we participated in many of those activities. Our CSO Chris Boles gave a talk in the Tech Forum, sharing details of a new product in development we’re calling the SageHLS. Built to help scientists generate ultra long DNA fragments for the new breed of technologies that need them — from optical mapping to single-molecule sequencing — the SageHLS will also help streamline the library prep process. More details will be available later this year.
Another element of the circus-like atmosphere was Race the Helix, a fundraising event for the Greenwood Genetic Center in which teams have 20 minutes on a treadmill to run as far as they can. Our own Alex Vira suited up and ran with the PacBio team, winning an impressive second place in a field of competitor teams. We’re proud to have helped raise money for a good cause!
Some 1,200 people registered for the conference, and the plenary talks were frequently standing room only. Great presentations came from Ting Wu, Craig Venter, Heidi Rehm, Diana Bianchi, and a host of others. We really enjoyed the concurrent session focused on long-read sequencing that included Mike Snyder, Chad Nusbaum, Dick McCombie, and a few other terrific speakers. One of the truly unique things about the event was an evening play about clinical genomics featuring a number of brave scientists, including Eric Green, Andy Faucett, and others. Who stole the show? Naturally, it was George Church in the role of God.
The festival heads west to San Mateo this fall, with a winter performance in London. We look forward to seeing how the organizers from Front Line Genomics continue to innovate at this fun meeting!
If you haven’t listened yet to the Mendelspod interview with Bobby Sebra from the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai, we can’t recommend it highly enough. And that’s not because we happen to be sponsoring this podcast series on DNA sequencing — it’s because Sebra offers up some really interesting perspectives on a range of topics.
For example, he talks with Mendelspod’s Theral Timpson about the institute’s Resilience Project, which is just now kicking into high gear. Sebra outlines efforts to scale up the sequencing facility to meet the needs of this massive project, which aims to scan the DNA of healthy people to find naturally occurring biological mechanisms that might help them escape the effects of disease-causing variants.
Sebra is the institute’s director of technology development, so of course the interview includes great information about his view of the different sequencing platforms and how he chooses which platform to use for which project (for example, short reads for resequencing, and long reads for reference-quality genomes). His take is that scientists get the best results by using multiple platforms to generate complementary data.
Our favorite part was the discussion of sample prep, which Sebra notes is becoming a bigger challenge for genomic scientists with the growing need for larger DNA fragments for long-read and single-molecule platforms. “The quality of your input material needs to be better,” Sebra says, calling for novel methods in DNA extraction and processing. While his team can currently make a 20 Kb to 50 Kb library with enough input material, he says the dream is being able to make these extremely large-fragment libraries from vanishingly small input.
Sebra covers several other compelling topics in the 27-minute podcast, such as his response to the accusation that the genomics revolution has fallen flat, what’s exciting in clinical genomics, the need for single-cell sequencing, and his experience with data from BioNano Genomics, 10X Genomics, and Oxford Nanopore. Be sure to check it out.
And if you missed the first installment in the series, here’s the podcast with Rod Wing at the Arizona Genomics Institute.