SIRIUS Science

SIRIUS is a Java software for analyzing small molecules from tandem mass spectrometry data. The primary focus of SIRIUS is structure elucidation of novel molecules (drug leads, contaminants in synthesis or food items), but it is also well-equipped to handle more standard tasks like dereplication of known structures.

SIRIUS is hghly integrateable into existing workflows and offers interfaces for manual as well as fully automated analysis.

It combines the analysis of isotope patterns in MS spectra with the analysis of fragmentation patterns in MS/MS spectra, and uses CSI:FingerID as a web service to search in molecular structure databases. Further it integrates CANOPUS for de novo compound class prediction and MSNovelist for de novo structure generation.

For getting started quickly see the quick-start guide.

SIRIUS introduction

SIRIUS requires high mass accuracy data. The mass deviation of your MS and MS/MS spectra should be within 20 ppm. Mass Spectrometry instruments such as TOF, Orbitrap and FT-ICR usually provide high mass accuracy data, as well as coupled instruments like Q-TOF, IT-TOF or IT-Orbitrap. Spectra measured with a quadrupole or linear trap do not provide the high mass accuracy that is required for our method. See Mass deviations on what “mass accuracy” means in detail for SIRIUS.

SIRIUS expects MS and MS/MS spectra as input. It is possible to omit the MS data, but it will make the analysis more time-consuming and might give you worse results. In this case, you should consider limiting the candidate molecular formulas to those found in PubChem.

SIRIUS expects processed peak lists (centroided spectra). It does not contain routines for peak picking from profiled spectra. This is a deliberate design decision: We want you to use the best peak picking software out there — or alternatively, your favorite software. There are several tools specialized for this task, such as OpenMS, MZmine or XCMS. See our video tutorials on how to preprocess tour data for SIRIUS with OpenMS or MZmine.

However, SIRIUS also contains a zero parameter preprocessing tool to directly import LCMS-Runs from .mzml (or mzxml) format to help you get started quickly. Most modern MS vendor instruments are able to export measured data from their native format to .mzML. Alternatively, see how to use MSconvert/ProteoWizard to convert your vendor formats to mzml for SIRIUS in this video tutorial.

SIRIUS will identify the molecular formula of the measured precursor ion, and will also annotate the spectrum by providing a molecular formula for each fragment peak. Peaks that receive no annotation are assumed to be noise peaks. Furthermore, a fragmentation tree is predicted; this tree contains the predicted fragmentation reaction leading to the fragment peaks.

ZODIAC improves the ranking of the formula candidates provided by SIRIUS. It re-ranks the candidates by considering joint fragments and losses between fragmentation trees of different compounds in a data set.

CSI:FingerID identifies the structure of a compound by searching in a molecular structure database. Here and in the following, “structure” refers to the identity and connectivity (with bond multiplicities) of the atoms, but no stereochemistry information. Elucidation of stereochemistry is currently beyond the power of automated search engines.

COSMIC confidence score assigns a confidence to CSI:FingerID structure identifications. The idea is similar to False Discovery Rates: It allows to run CSI:FingerID in high-throughput on thousands of compounds and select the most confident identifications. The workflow of generating a structure database, searching with CSI:FingerID and ranking hits by confidence score is termed the COSMIC workflow. Make your data interpretation workflow easier by first identifying the most confident compounds in your sample, then use them to generate knowledge or hypotheses.

CANOPUS predicts compound classes from the molecular fingerprint predicted by CSI:FingerID without any database search involved. Hence, it provides structural information for compounds for which neither spectral nor structural reference data are available.

MSNovelist generates de novo structure candidates to help overcome the limits of structure database search. Structures are generated based on molecular formula and fingerprint.

SIRIUS ships with a Graphical User Interface (GUI), a Command Line Interface (CLI) and an API that comes with a client in Python.

All these interfaces share the same persistence layer, allowing for high-throughput computation using e.g. the CLI on a compute cluster and then manual inspection of selected results using the GUI.

Literature

The scientific development behind SIRIUS, ZODIAC, CSI:FingerID and CANOPUS required numerous man-years of PhD students, postdocs and principal investigators; an educated guess would be roughly 35 man-years. This estimate does not include building the shiny Graphical User Interface that was introduced in version 3.1. But it is not the user interface or software development that does the work here; it is our scientific research that made SIRIUS, ZODIAC, CSI:FingerID and CANOPUS possible. It is understood that the work of 15 years cannot be described in a single paper.

Please cite all papers that you feel relevant for your work. Please do not cite this manual or the SIRIUS or CSI:FingerID website, but rather our scientific papers.

SIRIUS 4

CANOPUS – Compound Class Prediction

ZODIAC – molecular formula annotation

CSI:FingerID – Searching in molecular structure databases

COSMIC confidence score

Fragmentation Tree Computation

Isotope pattern analysis

Passatutto – Fragmentation tree based decoy spectra

Auto-detection of elements

Mass decomposition