About NPDC


The Shen Lab at the Wertheim UF Scripps Institute, located in Jupiter, Florida, has launched a large-scale sequencing campaign to explore the 122,550 strains in NPDC's world's-largest actinobacterial strain collection, dramatically increasing the number of actinobacterial genomes currently available.

(i) Develop a shared state-of-the-art actinobacterial strain collection and genome database to revitalize natural products discovery
(ii) Serve the broad scientific community by providing strains with curated draft genomes to promote research and development on natural products and associated applications

The Actinobacterial Strain Collection at the Natural Products Discovery Center (NPDC) at UF Scripps contains a total of 122,550 strains. These strains, isolated over the last eight decades and from 77 different countries, represent microbial and natural product diversities that are not available anywhere else and impossible to reproduce in laboratory settings today. The potential for natural product discovery from the NPDC at UF Scripps is immense. Assuming about 30 biosynthetic gene clusters (BGCs) per strain, the collection's 125,000 strains could encode more than 3.75 million BGCs, potentially producing more than 3.75 million natural products. In reference to the ~20,000 natural products of Actinobacteria origin known to date, this leaves millions of compounds to be discovered. Although many strains may produce the same or very similar products, these redundancies are unlikely to fundamentally reduce the total number of novel natural products encoded in the NPDC. The millions of new BGCs will also serve as an unprecedented treasure trove for discovery of new enzymes and biocatalysts, while enabling a suite of innovative synthetic biology applications.

In early 2026, the NPDC made AntiSMASH 8 annotations for 25,375 RefSeq genomes available to the community. These constitute all Actinobacteria genomes, excluding the mostly pathogenic genera Mycobacteria, Bifidobacteria, Corynebacteria, that were on RefSeq as of September 22, 2025. We also re-ran taxonomic classification of the NPDC and RefSeq genomes with the newest GTDB database (r226) to give the genomes the most up-to-date classifications possible. Although we excluded Actinobacteria assigned to the three aforementioned pathogenic genera from our download, some RefSeq genomes were re-assigned to those taxa and remain on our site. RefSeq genomes were analyzed identically to NPDC genomes post-quality control: Prokka was run, followed by AntiSMASH 8, Mash clustering, and BiGSLICE. RefSeq genomes were given NPDC numbers starting at 200000 so they would be compatible with our pipeline, and are denoted REFSEQ on the portal. For both NPDC and RefSeq-sourced genomes, corresponding RefSeq ID numbers are available on their strain page. There is also BLAST support for these genomes. As long as the RefSeq database is enabled in your toggle bar, your search will automatically be performed against both databases. The BLAST metadata output will include RefSeq or GenBank IDs if available. Please make sure the RefSeq database is active when viewing the results of a BLAST against RefSeq/NPDC.

NIH
U19CA113297 (05/01/2005-04/30/2010)
P41GM086184 (05/01/2010-04/30/2013)
R01GM114353(12/01/2015-11/30/2019)
R01GM115575 (03/01/2016-12/31/2019)
R35GM134954 (01/01/2020-01/31/2029)

Joint Genome Institute (JGI), DOE
CSP 2021 Proposal 506764 (01/01/2021-12/31/2026)

Institutional
The Scripps Research Institute, Scripps Research (01/01/2011-03/31/2022)
The Wertheim UF Scripps Institute (04/01/2022-)
University of Florida, University Research Investment, Blue Future Medicine Initiative (07/01/2025-06/30/27)

Natural Products Discovery Center General Fund
A combination of philanthropic contribution, corporate partners, and license revenue (04/01/2022-)

Reads Processing
BBDuk [1] was used to remove contaminants, trim reads that contained adapter sequence and homopolymers of G's of size 5 or more at the ends of the reads, remove reads containing 1 or more 'N' bases or having length <= 51 bp or 33% of the full read length. Reads mapped with BBMap [1] to masked human references at 93% identity were separated into a chaff file. Further, reads aligned to masked common microbial contaminants were separated into a chaff file.
Assembly
The following steps were then performed for assembly: (1) artifact filtered and normalized Illumina reads were assembled with SPAdes (version v3.14.1; –phred-offset 33 –cov-cutoff auto -t 16 -m 64 –careful -k 25,55,95) [2]; (2) contigs were discarded if the length was <1kb (BBTools reformat.sh: minlength=1000 ow=t).
Genomes QC
CheckM [3] was used to calculate the contamination and completeness level of genomes. Genomes having >=95% completeness and <=10% contamination were kept, while others are discarded.
Annotations
GTDB-Toolkit [4] was used to annotate the taxonomy of genomes. Prokka [5] was used to predict and annotate coding sequences in the genomes, while antiSMASH version 8.0.4 [6] was used to predict the biosynthetic gene clusters. Finally, BiG-SLiCE version 2.0 [7] was used to calculate BGC Families / GCFs (using l2-normalized cutoff of 0.5).
References:
1. B. Bushnell: BBTools software package (version 38.90), URL https://bbtools.jgi.doe.gov.
2. Bankevich A, et.al, SPAdes: a new genome assembly algorithm and its applications to single–cell sequencing. J Comput Biol 2012; 19:455–77.
3. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015 Jul;25(7):1043-55.
4. Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925–1927.
5. Torsten Seemann, Prokka: rapid prokaryotic genome annotation, Bioinformatics, Volume 30, Issue 14, 15 July 2014, Pages 2068–2069.
6. Blin, K., Shaw, S., Vader, L., Szenei, J., Reitz, Z. L., Augustijn, H. E., Cediel-Becerra, J. D. D., de Crécy-Lagard, V., Koetsier, R. A., Williams, S. E., Cruz-Morales, P., Wongwas, S., Segurado Luchsinger, A. E., Biermann, F., Korenskaia, A., Zdouc, M. M., Meijer, D., Terlouw, B. R., van der Hooft, J. J. J., Ziemert, N., … Weber, T. (2025). antiSMASH 8.0: extended gene cluster detection capabilities and analyses of chemistry, enzymology, and regulation. Nucleic acids research, 53(W1), W32–W38.
7. Arjan Draisma, Catarina Loureiro, Nico L.L. Louwen, Satria A. Kautsar, Jorge C. Navarro-Muñoz, Drew T. Doering, Nigel J. Mouncey, Marnix H. Medema. BiG-SCAPE 2.0 and BiG-SLiCE 2.0: scalable, accurate and interactive sequence clustering of metabolic gene clusters. bioRxiv 2025.08.20.671210

Reviews
1. Shen, B. (2015) A new golden age of natural products drug discovery. Cell 163:1297-1300.Strategy
2. Rudolf, J.D.; Yan, X.; Shen, B. (2016) Genome neighborhood network reveals insights into enediyne biosynthesis and facilitates prediction and prioritization for discovery. J. Ind. Microbiol. Biotechnol. 42:261-276.Mechanism
3. Smanski, M.J.; Zhou, H.; Claesen, J.; Shen, B.; Fischbach, M.A.; Voigt, C.A. (2016) Synthetic biology to access and expand nature’s chemical diversity. Nat. Rev. Microbiol. 14:135-149.Technology
4. Steele, A.D.; Teijaro, C.N.; Yang, D.; Shen, B. (2019) Leveraging a large microbial strain collection for natural product discovery. J. Biol. Chem. 294:16567-16576.Strategy
5. Teijaro, C.N.; Adhikari, A.; Shen, B. (2019) Challenges and opportunities in microbial engineering for natural products. J. Ind. Microbiol. Biotechnol. 46:433-444.Technology
6. Kalkreuter, E.; Pan, G.; Cepeda, A.J.; Shen, B. (2020) Targeting bacterial genomes for natural product discovery: opportunities, challenges, and strategies. Trends Pharmacol. Sci. 41:13-26.Strategy
7. Adhikari, A.; Shen, B.; Rader, C. (2021) Challenges and opportunities to develop enediyne natural products as payloads for antibody-drug conjugates. Antib. Ther. 4:1-15.Therapeutics
8. Bader, C. D.; Nichols, A. L.; Yang, D.; Shen, B. (2023) Interplay of emerging and established technologies drives innovation in natural product antibiotic discovery. Curr. Opinion Microbiol. 75:102359.Technology
9. Steele, A.D.; Kiefer, A.F.; Shen, B. (2023) The many facets of sulfur incorporation in natural products. Curr. Opinion Chem. Biol. 76:102366.Mechanism
10. Alkhalaf, L.M., et. al., Winter, G. (2025) Thoughts for the future. Nat. Chem. Biol., 21:6-15.Strategy
11. Kiefer, A.F.; Steele, A.D.; Rader, C.; Shen, B. (2026) DVD-IgG1 Antibody–Drug Conjugates: Expanding the landscape of targeted cancer therapy. Curr. Opinion Chem. Biol. 92:102653.Therapeutics

Research Articles
1. Xie, P.; Ma, M.; Rateb, M.E.; Shaaban, K.A.; Yu, Z.; Huang, S.-X.; Zhao, L.-X.; Zhu, X.; Yan, Y.; Peterson, R.M.; Lohman, J.R.; Yang, D.; Yin, M.; Rudolf, J.D.; Jiang, Y.; Duan, Y.; Shen, B. (2014) Biosynthetic potential-based strain prioritization for natural product discovery - a showcase for diterpenoid producing actinomycetes. J. Nat. Prod. 77:377-387.StrategyProduct
2. Hindra; Huang, T.; Yang, D.; Rudolf, J.D.; Xie, P.; Xie, G.; Teng, Q.; Lohman, J.R.; Zhu, X.; Huang, Y.; Zhao, L.-X.; Jiang, Y.; Duan, Y.; Shen, B. (2014) Strain prioritization for natural product discovery by a high-throughput real-time PCR method. J. Nat. Prod. 77:2296-2303.Technology
3. Rudolf, J.D.; Yan, X.; Shen, B. (2016) Genome neighborhood network reveals insights into enediyne biosynthesis and facilitates prediction and prioritization for discovery. J. Ind. Microbiol. Biotechnol. 42:261-276.Mechanism
4. Yan, X.; Ge, H.; Huang, T.; Hindra; Yang, D.; Teng, Q.; Crnovčić, I.; Li, X.; Rudolf, J.D.; Lohman, J.R.; Gansemans, Y.; Zhu, X.; Huang, Y.; Zhao, L.-X.; Jiang, Y.; Van Nieuwerburgh, F.; Rader, C.; Duan, Y.; Shen, B. (2016) Strain prioritization and genome mining for enediyne natural products. mBio 7:e2104-16.Product
5. Yan, X.; Chen, J.-J.; Adhikari, A.; Yang, D.; Crnovcic, I.; Wang, N.; Chang, C.-Y.; Rader, C.; Shen, B. (2017) Genome mining of Micromonospora yangpuensis DSM 45577 as a producer of an anthraquinone-fused enediyne. Org. Lett. 19:6192-6195.Product
6. Pan, G.; Xu, Z.; Guo, Z.; Hindra; Ma, M.; Yang, D.; Zhou, H.; Gansemans, Y.; Zhu, X.; Huang, Y.; Zhao, L.-X.; Jiang, Y.; Cheng, J.; Van Nieuwerburgh, F.; Suh, J.-W.; Duan, Y.; Shen, B. (2017) Discovery of the leinamycin family of natural products by mining actinobacterial genomes. Proc. Natl. Acad. Sci. USA 114:E11131-E11140. Product
7. Yan, X.; Hindra; Ge, H.; Yang, D.; Huang, T.; Crnovcic, I.; Chang, C.-Y.; Fang, S.; Annaval, T.; Zhu, X.; Huang, Y.; Zhao, L.-X.; Jiang, Y.; Duan, Y.; Shen, B. (2018) Discovery of alternative producers of the enediyne antitumor antibiotic C-1027 with high titers. J. Nat. Prod. 81:594-599.Technology
8. Dong, L.-B.; Rudolf, J.D.; Kang, D.; Wang, N.; He, C.Q.; Deng, Y.; Huang, Y.; Houk, K.N.; Duan, Y.; Shen, B. (2018) Biosynthesis of thiocarboxylic acid-containing natural products. Nat. Commun. 9:2362.Mechanism
9. Chen, J.-J.; Rateb, M.E.; Love, M.S.; Xu, Z.; Yang, D.; Zhu, X.; Huang, Y.; Zhao, L.-X.; Jiang, Y.; Duan, Y.; McNamara, C.W.; Shen, B. (2018) Discovery of herbicidins from Streptomyces sp. CB01388 showing anti-cryptosporidium activity. J. Nat. Prod. 81:791-797.Product
10. Dong, L.-B.; Rudolf, J.D.; Deng, M.-R.; Yan, X.; Shen, B. (2018) Discovery of the tiancilactone antibiotics by genome mining of atypical bacterial type II diterpene synthases. ChemBioChem 19:1727-1733.Product
11. Kearney, S.E.; Zahoranszky-Kohalmi, G.; et al.; Guha, R.; Rohde, J. M. (2018) Canvass: a crowd-sourced, natural product screening library for exploring biological space. ACS Cent. Sci. 4:1727-1741.Technology
12. Xu, Z.; Fang, S.-M.; Bakowski, M. A.; Rateb, M.E.; Yang, D.; Zhu, X.; Huang, Y.; Zhao, L.-X.; Jiang, Y.; Duan, Y.; Hull, M.; McNamara, C.W.; Shen, B. (2019) Discovery of kirromycins with anti-Wolbachia activity from Streptomyces sp. CB00686. ACS Chem. Biol. 14:1174-1182.Product
13. Luo, J.; Yang, D.; Hindra; Adhikari, A.; Dong, L.-B.; Ye, F.; Yan, X.; Rader, C.; Shen, B. (2021) Discovery of ammosesters by mining the Streptomyces uncialis DCA2648 genome revealing new insight into ammosamide biosynthesis. J. Ind. Microbiol. Biotechnol. 48:kuab027.MechanismProduct
14. Ye, F.; Haniff, H.S.; Suresh, B.M.; Yang, D.; Zhang, P.; Crynen, G.; Teijaro, C.N.; Yan, W.; Abegg, D.; Adibekian, A.; Shen, B.; Disney, M D. (2022) Rational approach to identify RNA targets of natural products enables identification of nocathiacin as an inhibitor of an oncogenic RNA. ACS Chem. Biol. 17:474-482.StrategyProduct
15. Vega, V. F.; Yang, D.; Jordán, L. J.; Ye, F.; Conway, L.; Chen, L. Y.; Shumate, J.; Baillargeon, P.; Scampavia, L.; Souza, G.; Parker, C.; Shen, B.; Spicer, T. P. (2023) Protocol for 3D screening of lung cancer spheroids using natural products. SLAS Discovery 28:20-28.Technology
16. Xu, Z. F.; Zhou, Y. L.; Bo, S. T.; Zhang, S. Q.; Shi, J.; Xiang, L.; Xi, M. Y.; Zhang, B.; Xu, Z. R.; Yang, D.; Shen, B.; Tan, R. X.; Ge, H. M. (2024) Discovery and biosynthetic pathway analysis of cyclopentane-β-lactone globilactone A. Nat. Synth., 3:99-110.MechanismProduct
17. Gui, C.; Kalkreuter, E.; Lauterbach, L.; Yang, D.; Shen, B. (2024) Enediyne biosynthesis unified by a diiodotetrayne intermediate. Nat. Chem. Biol. 20:1210-1219.Mechanism
18. Kang, S.; Huynh, T.-H.; Kim, J.M.; Heo, B.E.; Jang, S.C.; Ock, C.W.; Lee, J.; Song, Y.; An, J.S.; Shen, B.; Kim, S.B.; Jang, J.; Lee, S.K.; Yoon, Y.J.; Oh, D.-C. (2025) Logical exploration of cinnamoyl-containing nonribosomal peptides via metabologenomic targeting and regulator overexpression. J. Am. Chem. Soc. 147:37719-37731.TechnologyStrategy
19. Bader, C.D.; Masuda, I.; Nichols, A.; Kalkreuter, E.; Yang, D.; Christian, T.; Nakano, Y.; Hou, Y.-M.; Shen, B. (2025) Discovery of 5-chlorotryptophan-containing antibiotics through metabologenomics-assisted high-throughput screening. J. Am. Chem. Soc. Au, 5:6265-6274.Product
20. Kalkreuter, E.; Kautsar, S. A.; Yang, D.; Bader, C. D.; Teijaro, C. N.; Fluegel, L. L.; Davis, C. M.; Simpson, J. R.; Lauterbach, L.; Steele, A. D.; Gui, C.; Meng, S.; Li, G.; Viehrig, K.; Ye, F.; Su, P.; Kiefer, A. F.; Nichols, A.; Cepeda, A. J.; Yan, W.; Fan, B.; Jiang, Y.; Adhikari, A.; Zheng, C.-J.; Schuster, L.; Cowan, T. M.; Smanski, M. J.; Chevrette, M. G.; Carvalho, L. P.; Shen, B. The Natural Products Discovery Center: Release of the first 8490 sequenced strains for exploring Actinobacteria biosynthetic diversity. bioRxiv., doi: https://doi.org/10.1101/2023.12.14.571759 (posted on May 2, 2024).StrategyProduct

Other public databases to access genomes of sequenced and curated Actinobacteria strains that can be requested from NPDC, UF Scripps
1. Shen, B. (2024) A massive sequencing initiative focuses on exploring the biosynthetic diversity of Actinobacteria, specifically aimed at identifying natural products and bioactive compounds. NCBI under BioProject PRJNA110689 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1106891). As of June 1, 2026, 14,801 genomes are publicly available.
2. JGI Data Portal, Website (https://data.jgi.doe.gov/refine-download/img?q=JGI_ID%3A%60506764%60&t=advanced). Top-quality genomic data, and open to all researchers. Search Project ID "506764" or "Ben Shen". As of June 9, 2026, 3,399 genomes are publicly available.