Standalone tools offer additional SIRIUS-related functionalities that fall outside the standard identification workflow. These tools include configuration tasks, file conversion utilities, and features that may assist with downstream analysis. They must be executed separately and cannot be integrated into a toolchain. Each standalone tool comes with its own help message, which can be accessed by sirius <TOOLNAME> -h.

Custom database tool

The custom-db tool enabled the import of custom structure databases and spectral libraries.

  • Structure database from single file can be imported from a CSV or TSV file, where structures are provided in SMILES format. You can also provide a database id and a name for the entries. You can import multiple files containing compounds in SMILES format into a single database.
CN1CCCC1C2=C[N+](=CC=C2)[O-]	id-01	Nicotin
CN1C=NC2=C1C(=O)N(C(=O)N2C)C	id-03	Caffein
CN1CCC2=CC3=C(C=C2C1C4C5=C(C6=C(C=C5)OCO6)C(=O)O4)OCO3 id-05 Bicculine
sirius custom-db --input <structure.tsv> --name <myStructureDB> --location </some/dir/mydb.siriusdb>
  • Structure databases can also be imported from directories or .zip files containing SDF files.
  • Spectral libraries can be imported from directories or .zip files. The supported import formats for spectral data are .ms, .mgf, .msp, .mat, .txt (MassBank), .mb, .json (GNPS, MoNA). Spectra must be annotated with a structure and must be centroided. As they mus containd structure annotations, they can be also used as custom sttructure database.
sirius custom-db --input </specDir/> --name <mySpectralDB> --location </some/dir/mydb.siriusdb>

The --name parameter is optional. If omitted, the db name will be retrieved from the file name given with the mandatory --location parameter.

If a structure is already present in SIRIUS’ internal structure database, the fingerprint will be downloaded automatically. Otherwise, the fingerprint is computed locally on your computer, which may take some time, especially for a large number of structures.

In our machine learning methods, we use PubChem standardized SMILES to represent structures. However, the PubChem standardization is not integrated into the import process. For optimal results, we recommend standardizing your SMILES using the PubChem standardization before importing them. This step is not mandatory, but recommended.

Summary tool

The write-summaries tool allows you to export summary files from the project space, that provide convenient access to the results for downstream analysis, data sharing and data visualization.

  • You can export in TSV, CSV, ZIP or XLSX format using --format=<format>. The ZIP file is a zipped TSV file.
  • Using --quote-strings will write quotes to all string values in TSV and CSV files.
  • --feature-quality-summary generates a feature quality summary with feature quality values for the aligned features.
  • --chemvista exports a ChemVista summary file which can be imported directly to the Agilent ChemVista software.

Similarity tool

The similarity tool allows you to compute different similarity measures between compounds. It accepts a SIRIUS project-space (or any input format that SIRIUS can convert into a project, such as .ms, .mgf or .cef) as input using the sirius -i <INPUT> command. The tool computes all against all similarity matrices for the compounds in the project-space and saves the results in the specified output directory defined by the -d option.

sirius -i <project-space> similarity --cosine --ftalign --ftblast <SPECTRA_LIB> --tanimoto -d <OUTPUT>

Cosine Similarity (--cosine)

This option computes the cosine similarity of the merged MS/MS spectra between all compounds in the dataset. It requires only the spectra, so no additional preprocessing is needed.

Fragmentation Tree alignment Similarity (--ftalign)

This option computes the tree alignment score between the top ranked fragmentation trees of all compounds. For this computation, the input project-space must already contain the fragmentation trees generated by the formula subtool of the SIRIUS CLI. So the formula subtool must be executed beforehand.

FT-Blast (--ftblast)

This option aligns the fragmentation tree of the compounds in the dataset against a library of fragmentation trees. The input project-space must already contain fragmentation trees computed with the formula subtool of the SIRIUS CLI. So the formula subtool must be executed beforehand. The fragmentation tree library (--ft-blast=<LIB_PATH>) can either be another SIRIUS project-space containing fragmentation trees, or a directory containing fragmentation trees in JSON format. The alignment method is described by Rasche et al.

Tanimoto Similarity (--tanimoto)

This option computes the Tanimoto similarity between the top-ranked predicted fingerprints of all compounds in the dataset. The input project-space must already contain the predicted fingerprints generated by the structure subtool of the SIRIUS CLI. So both the formula and structure subtools must be executed beforehand. Note that the fingerprints being compared are probabilistic. The Tanimoto similarity between two probabilistic fingerprints, $F$ and $F’$, of length $n$ is computed as follows:

\[\frac{ \sum_{i=1}^{n} F_i \cdot F'_i } { \sum_{i=1}^{n} 1 - (1 - F_i) \cdot (1 - F'_i) }\]

Examples

Example 1: To compute --cosine, --ftalign and --tanimoto similarities, we first need a project-space that contains spectra, fragmetations trees and fingerprints. We generate these with the following command:

sirius -i <input-data.mgf> -o <my-project> formula structure

This command processes the input data to create the necessary project-space (my-project). Once this project-space is created, it can be used as input for the similarity computation:

sirius -i <my-project> similarity --cosine --ftalign --tanimoto --d <output-dir>

Example 2: To compute --ftblast similarities, we first need a project-space (my-project) that contains fragmentation trees computed from our input spectral data and another project-space (library-project) that contains a fragmentation tree library, e.g. computed from a spectral library. Assuming both the input and library spectra are in MGF format, we have to execute the following commands.

Compute fragmentation trees for the input data:

sirius -i <input-data.mgf> -o <my-project> formula

Compute fragmentation trees for the library spectra:

sirius -i <library-data.mgf> -o <library-project> formula

No that both project-spaces are prepared, we can proceed with the --ftblast similarity computation:

sirius -i <my-project> similarity --ftblast <library-project> -d <output-dir>

Mass Decomposition tool

The decomp tool provides the SIRIUS internal efficient mass decomposition algorithm by Böcker and Lipták as standalone tool to decompose masses with given deviation, ionization, chemical alphabet and chemical filter.

MGF export tool

The mgf-export tool exports the spectra from a given project-space as MGF file for use with other tools, such as GNPS. The --quant-table option allows to export an additional feature quantification table in CSV format, e.g. for GNPS Feature-Based Molecular Networking:

sirius --input <project-space> MGF --merge-ms2 --quant-table <table.csv> --output <spectra.mgf>

Please not that quantification information is only available if the source of the project-space was in .mzml or .mzxml format.