August 2025
Authors:
Maxat Kulmanov, Saeideh Ashouri, Yang Liu, Marwa Abdelhakim, Ebtehal Alsolme, Masao Nagasaki, Yasuyuki Ohkawa, Yutaka Suzuki, Rund Tawfiq, Katsushi Tokunaga, Toshiaki Katayama, Malak S. Abedalthagafi, Robert Hoehndorf & Yosuke Kawai
Abstract:
The selection of a reference sequence in genome analysis is critical, as it serves as the foundation for all downstream analyses. Recently, the pangenome graph has been proposed as a data model that incorporates haplotypes from multiple individuals. Here we present JaSaPaGe, a pangenome graph reference for Saudi Arabian and Japanese populations, both of which have been significantly underrepresented in previous genomic studies. We constructed JaSaPaGe from high-quality phased diploid assemblies which were made utilizing PacBio high-fidelity long reads, Nanopore long reads, and Hi-C short reads of 9 Saudi and 10 Japanese individuals. Quality evaluation of the pangenome graph by variant calling showed that our pangenome outperformed earlier linear reference genomes (GRCh38 and T2T-CHM13) and showed comparable performance to the pangenome graph provided by the Human Pangenome Reference Consortium (HPRC), with more variants found in Japanese and Saudi samples using their population-specific pangenomes. This pangenome reference will serve as a valuable resource for both the research and clinical communities in Japan and Saudi Arabia.
Sage Science Products:
PippinHT size selection was used for PacBio HiFi library prep
Methods Excerpt:
“Library preparation was performed using the SMRTbell prep kit 3.0, following the manufacturer’s protocol. For final library size selection, a PippinHT System with 0.75% agarose gel cassettes and marker S1 was used. The cut-off size range was set to 10–50 Kbp as recommended by the manufacturer. Subsequently, library QC was performed using FEMTO Pulse and Qubit 1x dsDNA HS kit. The prepared sequencing templates were loaded onto the Sequel IIe system using Binding kit 3.2 and cleanup beads (>3 Kbp). A 30-hour movie run was performed, generating high-fidelity (Hi-Fi) reads.”
Author Affiliations:
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Division, King Abdullah University of Science and Technology
KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology
KAUST Center of Excellence for Generative AI, King Abdullah University of Science and Technology
SDAIA–KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology
Genome Medical Science Project, National Institute of Global Health and Medicine, Japan Institute for Health Security
Biological and Environmental Sciences & Engineering (BESE) Division, King Abdullah University of Science and Technology
Genomics and Precision Medicine Department, King Fahad Medical City, Saudi Arabia
Division of Biomedical Information Analysis, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University
Center for Genomic Medicine, Graduate School of Medicine, Kyoto University
Division of Transcriptomics, Medical Institute of Bioregulation, Kyushu University
Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo
Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems
Department of Pathology and Laboratory Medicine, Tufts Medical Center and Tufts University School of Medicine
Department of Neurosurgery, Tufts Medical Center and Tufts University School of Medicine
King Salman Center for Disability Research, Riyadh, Saudi Arabia
Bioinformation and DDBJ Center, National Institute of Genetics, Research Organization of Information and Systems, Japan
Scientific Data
DOI:10.1038/s41597-025-05652-y