In-Silico Tools for Natural Product Analysis
HypoRiPPAtlas
HypoRiPPAtlas is a database of ribosomally synthesized and post-translationally modified peptides (RiPPs) generated by our tool seq2ripp. Seq2ripp mines RiPP biosynthetic gene clusters (BGCs), open reading frames (ORFs), cores, and mature post-translationally modified RiPPs from only genomic sequence data. The generated RiPPs prioritize sensitivity over specificity, so paired mass spectrometry data is used to validate hypothetical compounds via mass spectral search tool Dereplicator+.
Users can browse HypoRiPPAtlas and run seq2ripp here.
If you use HypoRiPPAtlas and seq2ripp in your research please cite the following.
VInSMoC
VInSMoC searches mass spectra against databases of arbitrary molecules. This tool operates under two possible search modes: exact and variable. In exact mode query spectra are scored against database molecules whose masses match the query spectrum's precursor m/z. Reported scores are the number of peaks that belong to fully explained paths in the theoretical fragmentation graph. In variable mode query spectra are scored against all database molecules whose masses fall within a user-defined threshold of the query spectrum's precursor m/z. Reported scores are similar to exact mode, but the spectrum is scored against a hypothetical variant of the database compound instead.
Users can run VInSMoC in exact and variable modes here. Users can find a short tutorial on how to use VInSMoC variable mass spectral data base search tool here. Users can find guidelines on how to use VInSMoC here or here.
If you use VInSMoC in your research please cite the following.
NPDiscover
NPDiscover is a tool to predict the structures of non-ribosomal peptides (NRPs) encoded in bacterial and fungal genomes. NPDiscover mines NRP biosynthetic gene clusters (BGCs) from input genomes and annotates the relevant NRP domains, such as Condensation (C-) domains, Adenylation (A-) domains, and Thiolation (T-) domains. It further annotates each A-Domain with the potential amino acids it may recruit, and uses this information along with other NRP domains to construct an NRP core. The NRP cores are further modified via post-assembly modifications. The generated NRPs prioritize sensitivity over specificity, so paired mass spectrometry data is used to validate hypothetical compounds via mass spectral search tool Dereplicator+.
Users can run NPDiscover with paired mass spectrometry data and genome data here.
If you use NPDiscover in your research please cite the following.
Seq2PKS
Seq2PKS is a tool aimed at predicting the polyketides(PKS) structure encoded in bacterial and fungal genomes. Seq2PKS first annotates polyketide domains and enzymes by genome minining. Then it predicts substrate specificity of domains and the order of genes in the backbone assembly pathway to construct the initial backbone. It incorporates post-assembly modifications to construct mature structures and searches mature structures against mass spectra using Dereplicator+.
Users can run Seq2PKS with paired mass spectrometry data and genome data here.
If you use Seq2PKS in your research please cite the following.
Seq2Hybrid
Seq2Hybrid is a tool to recover NRP-PK hybrid structures that are encoded in bacterial and fungal genomes. Seq2Hybrid first annotates both NRP and PK domains using HMM alignments to find biosynthetic gene clusters. Then, it annotates the active domains (A-domains and AT-domains) with the most likely monomers they recruit. Finally, it predicts potential assembly lines to construct an initial backbone. Then, post-assembly modifications are applied to the core molecule to generate mature structures. The resulting molecules are filtered using mass spectra with Dereplicator+.
Users can run Seq2Hybrid with paired mass spectrometry data and genome data here.
If you use Seq2Hybrid in your research please cite the following.