Standalone tools offer additional SIRIUS-related functionalities that fall outside the standard identification workflow.
These tools include configuration tasks, file conversion utilities, and features that may assist with downstream analysis.
They must be executed separately and cannot be integrated into a toolchain.
Each standalone tool comes with its own
help message, which can be accessed by sirius <TOOLNAME> -h
.
Custom database tool
The custom-db
tool enabled the import of custom structure databases and spectral libraries.
- Structure database from single file can be imported from a CSV or TSV file, where
structures are provided in
SMILES
format. You can also provide a databaseid
and aname
for the entries. You can import multiple files containing compounds in SMILES format into a single database.
CN1CCCC1C2=C[N+](=CC=C2)[O-] id-01 Nicotin
CN1C=NC2=C1C(=O)N(C(=O)N2C)C id-03 Caffein
CN1CCC2=CC3=C(C=C2C1C4C5=C(C6=C(C=C5)OCO6)C(=O)O4)OCO3 id-05 Bicculine
sirius custom-db --input <structure.tsv> --name <myStructureDB> --location </some/dir/mydb.siriusdb>
- Structure databases can also be imported from directories or
.zip
files containing SDF files. - Spectral libraries can be imported from directories or
.zip
files. The supported import formats for spectral data are.ms
,.mgf
,.msp
,.mat
,.txt
(MassBank),.mb
,.json
(GNPS, MoNA). Spectra must be annotated with a structure and must be centroided. As they mus containd structure annotations, they can be also used as custom sttructure database.
sirius custom-db --input </specDir/> --name <mySpectralDB> --location </some/dir/mydb.siriusdb>
The --name
parameter is optional. If omitted, the db name will be retrieved from the file name given with the mandatory --location
parameter.
If a structure is already present in SIRIUS’ internal structure database, the fingerprint will be downloaded automatically. Otherwise, the fingerprint is computed locally on your computer, which may take some time, especially for a large number of structures.
In our machine learning methods, we use PubChem standardized SMILES to represent structures. However, the PubChem standardization is not integrated into the import process. For optimal results, we recommend standardizing your SMILES using the PubChem standardization before importing them. This step is not mandatory, but recommended.
Summary tool
The write-summaries
tool allows you to export summary files from the project space, that provide convenient access to the results for downstream analysis, data sharing and data visualization.
- You can export in TSV, CSV, ZIP or XLSX format using
--format=<format>
. The ZIP file is a zipped TSV file. - Using
--quote-strings
will write quotes to all string values in TSV and CSV files. --feature-quality-summary
generates a feature quality summary with feature quality values for the aligned features.--chemvista
exports a ChemVista summary file which can be imported directly to the Agilent ChemVista software.
Similarity tool
The similarity
tool allows you to compute different similarity measures between compounds.
It accepts a SIRIUS project-space (or any input format that SIRIUS can convert into a project, such as .ms
, .mgf
or .cef
)
as input using the sirius -i <INPUT>
command.
The tool computes all against all similarity matrices for the compounds in
the project-space and saves the results in the specified output directory defined by the -d
option.
sirius -i <project-space> similarity --cosine --ftalign --ftblast <SPECTRA_LIB> --tanimoto -d <OUTPUT>
Cosine Similarity (--cosine
)
This option computes the cosine similarity of the merged MS/MS spectra between all compounds in the dataset. It requires only the spectra, so no additional preprocessing is needed.
Fragmentation Tree alignment Similarity (--ftalign
)
This option computes the tree alignment score between the top ranked fragmentation trees of all compounds.
For this computation, the input project-space must already contain the fragmentation trees generated by the formula
subtool of the SIRIUS CLI. So the formula
subtool must be executed beforehand.
FT-Blast (--ftblast
)
This option aligns the fragmentation tree of the compounds in the dataset against a library of fragmentation trees.
The input project-space must already contain fragmentation trees computed with the formula
subtool of the SIRIUS CLI.
So the formula
subtool must be executed beforehand. The fragmentation tree library (--ft-blast=<LIB_PATH>
) can either be another
SIRIUS project-space containing fragmentation trees, or a directory containing fragmentation trees in JSON format.
The alignment method is described by Rasche et al.
Tanimoto Similarity (--tanimoto
)
This option computes the Tanimoto similarity between the top-ranked predicted fingerprints of all compounds in the dataset.
The input project-space must already contain the predicted fingerprints generated by the structure
subtool of the SIRIUS CLI.
So both the formula
and structure
subtools must be executed beforehand.
Note that the fingerprints being compared are probabilistic. The Tanimoto similarity
between two probabilistic fingerprints,
$F$ and $F’$, of length $n$ is computed as follows:
Examples
Example 1:
To compute --cosine
, --ftalign
and --tanimoto
similarities, we first need a project-space
that contains spectra, fragmetations trees and fingerprints. We generate these with the following command:
sirius -i <input-data.mgf> -o <my-project> formula structure
This command processes the input data to create the necessary project-space (my-project
).
Once this project-space is created, it can be used as input for the similarity computation:
sirius -i <my-project> similarity --cosine --ftalign --tanimoto --d <output-dir>
Example 2:
To compute --ftblast
similarities, we first need a project-space (my-project
)
that contains fragmentation trees computed from our input spectral data and another project-space (library-project
) that contains a fragmentation tree library, e.g. computed from a spectral library.
Assuming both the input and library spectra are in MGF format, we have to execute
the following commands.
Compute fragmentation trees for the input data:
sirius -i <input-data.mgf> -o <my-project> formula
Compute fragmentation trees for the library spectra:
sirius -i <library-data.mgf> -o <library-project> formula
No that both project-spaces are prepared, we can proceed with the --ftblast
similarity computation:
sirius -i <my-project> similarity --ftblast <library-project> -d <output-dir>
Mass Decomposition tool
The decomp
tool provides the SIRIUS internal efficient mass decomposition
algorithm by Böcker and Lipták as standalone tool to decompose masses with given deviation, ionization, chemical alphabet and chemical filter.
MGF export tool
The mgf-export
tool exports the spectra from a given project-space as MGF file for use with other tools, such as GNPS.
The --quant-table
option allows to export an additional feature quantification table in CSV format,
e.g. for GNPS Feature-Based Molecular Networking:
sirius --input <project-space> MGF --merge-ms2 --quant-table <table.csv> --output <spectra.mgf>
Please not that quantification information is only available if the source of the project-space was in .mzml
or .mzxml
format.