Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic

  • Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Alexandrov, L. B. & Stratton, M. R. Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Curr. Opin. Genet. Dev. 24, 52–60 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Perera-Bel, J. et al. From somatic variants towards precision oncology: evidence-driven reporting of treatment options in molecular tumor boards. Genome Med. 10, 18 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Garcia-Prieto, C. A., Martínez-Jiménez, F., Valencia, A. & Porta-Pardo, E. Detection of oncogenic and clinically actionable mutations in cancer genomes critically depends on variant calling tools. Bioinformatics 38, 3181–3191 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Farswan, A. et al. Branching clonal evolution patterns predominate mutational landscape in multiple myeloma. Am. J. Cancer Res. 11, 5659–5679 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, W. & Freudenberg, J. Mappability and read length. Front. Genet. 5, 381 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wilm, A. et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sahraeian, S. M. E. et al. Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10, 1041 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Krishnamachari, K. et al. Accurate somatic variant detection using weakly supervised deep learning. Nat. Commun. 13, 4248 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Musunuri, R. L. et al. Lancet2: improved and accelerated somatic variant calling with joint multi-sample local assembly graphs. Preprint at bioRxiv https://doi.org/10.1101/2025.02.18.638852 (2025).

  • Fang, L. T. et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat. Biotechnol. 39, 1151–1160 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Damaraju, N., Miller, A. L. & Miller, D. E. Long-read DNA and RNA sequencing to streamline clinical genetic testing and reduce barriers to comprehensive genetic testing. J. Appl. Lab. Med. 9, 138–150 (2024).

    Article 
    PubMed 

    Google Scholar
     

  • Kolesnikov, A. et al. Local read haplotagging enables accurate long-read small variant calling. Nat. Commun. 15, 5907 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci. 2, 797–803 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kolmogorov, M. et al. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. Nat. Methods 20, 1483–1492 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zheng, Z. et al. ClairS: a deep-learning method for long-read somatic small variant calling. Preprint at bioRxiv https://doi.org/10.1101/2023.08.17.553778 (2023).

  • Kolmogorov, M. & Gokce, A. CASTLE-Panel/castle. Datasets. GitHub https://github.com/CASTLE-Panel/castle (2025).

  • Keskus, A. G. et al. Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02618-8 (2025)

  • Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In Proc. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 314–324 (IEEE, 2019).

  • Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bergstrom, E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 20, 685 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lansdon, L. A. et al. Successful classification of clinical pediatric leukemia genetic subtypes via structural variant detection using HiFi long-read sequencing. Preprint at medRxiv https://doi.org/10.1101/2024.11.05.24316078 (2024).

  • Kim, R. rkimoakbioinformatics/oakvar. Source code. GitHub https://github.com/rkimoakbioinformatics/oakvar/ (2025).

  • Steiert, T. A. et al. A critical spotlight on the paradigms of FFPE-DNA sequencing. Nucleic Acids Res. 51, 7143–7162 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Xiao, W. et al. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat. Biotechnol. 39, 1141–1150 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Koboldt, D. C. Best practices for variant calling in clinical sequencing. Genome Med. 12, 91 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Keskus, A. G. et al. Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02618-8 (2025).

  • Cohen, A. S. A. et al. Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Monlong, J., Lorig-Roach, R., Meredith, M. & Negi, S. nanoporegenomics/wambam. Source code. GitHub https://github.com/nanoporegenomics/wambam (2025).

  • Bushnell, B. BioInfoTools/BBMap. Source code. GitHub https://github.com/BioInfoTools/BBMap/blob/master/sh/reformat.sh (2025).

  • Baid, G. et al. An extensive sequence dataset of gold-standard samples for benchmarking and development. Preprint at bioRxiv https://doi.org/10.1101/2020.12.11.422022 (2020).

  • An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  • Lake, J. A. & Sequencing (CoLoRS), C. of L. R. Consortium of Long Read Sequencing Database (CoLoRSdb). Zenodo https://doi.org/10.5281/zenodo.11511513 (2024).

  • Chen, N.-C. et al. Improving variant calling using population data and deep learning. BMC Bioinf. 24, 197 (2023).

    Article 
    CAS 

    Google Scholar
     

  • Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article 
    PubMed 

    Google Scholar
     

  • Szegedy, C. et al. Rethinking the inception architecture for computer vision. Proc. IEEE Conference on Computer Vision and Pattern Recognition 2818–2826 (2016); https://doi.org/10.1109/CVPR.2016.308

  • Poplin, R. et al. google/deepvariant. Google (2025). Source code. GitHub https://github.com/google/deepvariant (2025).

  • Kingma, D. P. & Ba, J. ADAM: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017).

  • Ahmad, T. KolmogorovLab/Wakhan. Source code. GitHub https://github.com/KolmogorovLab/Wakhan (2025).

  • Bergstrom, E. N. et al. AlexandrovLab/SigProfilerAssignment. Source code. GitHub https://github.com/AlexandrovLab/SigProfilerAssignment (2025).

  • Díaz-Gay, M. et al. AlexandrovLab/SigProfilerMatrixGenerator. Source code. GitHub https://github.com/AlexandrovLab/SigProfilerMatrixGenerator (2025).

  • CASTLE panel: Cancer Standards Long-read Evaluation. Datasets. Sequence Read Archive https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1086849 (2025).

  • Childhood Cancer Data Initiative (CCDI): Comprehensive Genomic Sequencing of Pediatric Cancer Cases (CMRI/KUCC) Datasets. dbGAP https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002529.v2.p1 (2025).

  • DeepSomatic: Accurate Somatic Small Variant Discovery for Multiple Sequencing Technologies. Datasets. dbGAP https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs004188.v1.p1 (2025).

  • Park, J. Supporting data for: Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic. Zenodo https://doi.org/10.5281/zenodo.16595168 (2025).

  • Park, J. et al. google/deepsomatic. Google (2025). Source code. GitHub https://github.com/google/deepsomatic (2025).

  • Leave a Comment