GitHub - sirius-ms/comet: Modified sirius version additional features for barcode free hit discovery from massive libraries

Our methods are offered to the scientific community as freely available resources. (Re-)distribution of the methods, in whole or in part, for commercial purposes is prohibited. CSI:FingerID and CANOPUS web services hosted by the Böcker group are for academic research and education use only. Please review the terms of service of the academic version for details. For non-academic users, the Bright Giant GmbH provides licenses and all related services. We ask that users of our tools cite the corresponding papers in any resulting publications.

COMET (Combinatorial Mass Encoding decoding Tool) is a java-based software framework for the analysis of LC-MS/MS data obtained from an affinity selection-mass spectrometry (AS-MS) experiment where self-encoded libraries were screened. The use case focuses mainly on combinatorial libraries where all library compounds represent distinct combinations of predefined building blocks. Nevertheless, COMET can also be used in conjunction with natural product libraries as screening libraries.

Currently, the SIRIUS platform is used in which ASMS-related features are integrated, e.g. excluding features (or MS/MS spectra) unrelated to compounds of the screened combinatorial library and ranking of candidate structures using EPIMETHEUS.

Main developers of COMET are the Böcker group and the Bright Giant GmbH

Documentation of SIRIUS

Online Documentation
Video tutorials
Bookchapter on using SIRIUS 4 (Preprint) -- does not cover the new LC-MS/MS processing option
Demo data
Logos for publications and presentations

Installation and Dependencies

COMET is available for Windows (64bit), MacOS (64bit), and Linux (64bit) and can be installed via the fowllowing links:

for Windows (64bit): msi / zip
for Mac (64bit): pkg / zip
for Linux (64bit): zip

All (including previous) releases can be found here.

A typical install time should not exceed 10min which is mostly dependend on the speed of the internet connection for downloading the installation files. The installation and functionality of COMET was successfully tested on Windows 10 (x64) and on Ubuntu 24.04.2 (x64).

Installation Instructions

For Windows and MacOS, the installer version of COMET (msi/pkg) should be preferred but might require administrator permissions. Since we do not pay Microsoft/Apple for certification, you might have to confirm that you want to trust "software from an unknown source" on Windows/MacOS when using the .msi/.pkg installers.

If you choose to download the .zip file corresponding to your operating system, you have to extract that .zip file into a directory where you have writing permissions, e.g. C:\COMET. To start COMET, you have to execute sirius.exe in that folder. As COMET is currently using the SIRIUS platform, you can also chek out the documentation for more details about the installation procedure.

Dependencies

All installation versions of COMET include the Java Runtime Environment (JRE). Therefore, there is no need to install Java separately. In case you have screened your own combinatorial molecule library, you have to create a .csv file containing the building blocks of this library in order to use the COMET filter. A description of how such a .csv file looks like is given below. As we recommend to use our own scripts to create such a .csv file, you need to have python,jupyter notebook, and the python packages rdkit and pandas installed.

Creating a user account

User accounts can be created directly via the COMET/SIRIUS GUI. Please, use your institutional email address. SIRIUS web services are free for academic/non-commercial use. Usually academic institutions are identified by their email domain and access will be granted automatically. In some cases, further validation of your academic/non-commercial may be required. See also SIRIUS Documentation – Account and License.

Sources on GitHub

Changelog

How to use COMET

Demo Data

This Zenodo repository contains the obtained LC-MS/MS data, the scripts, and other supplementary files used in (van der Nol et al., 2025). Here, you can also find a file called demo_data.zip. This archive contains three files belonging to a 500 membered combinatorial molecule library where each molecule consists of a benzimidazole scaffold decorated with one amino acid building block, one amine building block, and one aldehyde building block:

ENL161_50uM_100fmol_SCE15-25_27112023.mzML is the LC-MS/MS data obtained by measuring the whole synthesized libary via nanoLC-MS/MS
ENL161_CustomDB.tsv contains all structures of that library in the form of SMILES strings. Each structure has a unique id and name which represents its composition of building blocks.
ENL161_CustomDB_BBs.csv contains the building blocks for each position.

Since each class of building block (e.g. amino acids) only occurs at a predefined position in the scaffold and this position never changes in the entire library, unique indices can be assigned to these positions. In case of this molecule library, this could be the index 0 for the amino acids, 1 for the amines and 2 for the aldehydes. In ENL161_CustomDB_BBs.csv, you will find that each building block is assigned with such a position which is called bb_pos. Additionally, this file contains for each building block its SMILES string (smiles), its corresponding molecular formula (formula), the formula of the loss when incorporated into the final molecule (reaction_loss), and an id specifying the exact building block in its class.

This is how ENL161_CustomDB_BBs.csv looks like:

bb_pos,smiles,formula,reaction_loss,id
0,OC([C@H](NC(OCC1c2c(c3c1cccc3)cccc2)=O)CC4CCCCC4)=O,C24H27NO4NH,C15O3H11,1
0,O=C([C@@H](NC(OCC1C2=CC=CC=C2C3=CC=CC=C13)=O)C)O,C18H17NO4NH,C15O3H11,2
0,OC([C@@H]1CSCN1C(OCC2c3c(c4c2cccc4)cccc3)=O)=O,C19H17NO4SNH,C15O3H11,3
...

For example, the building block 2 at position 0 has the SMILES string O=C([C@@H](NC(OCC1C2=CC=CC=C2C3=CC=CC=C13)=O)C)O and corresponding formula C18H17NO4NH before synthesis (not incorporated into the final molecule). When incorporated into the final molecule, its molecular formula changes to C3H7N2O. Therefore, C15O3H11 is the molecular formula which describes this loss.

To create such a CSV file for your own libraries, you can use the Jupyter Notebook COMET_Building blocks_input.ipynb which is also part of that Zenodo repository.

LC-MS/MS Data Import

Once you have started the COMET GUI (by executing sirius.exe), you can import your measured MS/MS data via the "Import" button or Drag and Drop to the left most panel. COMET/SIRIUS supports multiple MS data formats:

.ms, .mgf, .mat/.msp, and Agilent’s .cef: These formats contain pre-processed peak lists for each feature.
.mzml, and .mzxml: For these formats, feature detection and alignment will be performed.

Note that, all data must be centroided and that raw file formats are not supported. For more information about the import of MS data, see here.

Example: Let's import ENL161_50uM_100fmol_SCE15-25_27112023.mzML via the "Import" button or drag-and-drop. Here, a small window with a progess bar opens up showing you the current steps of the preprocessing workflow. This takes about 5min on a laptop with a 11th Gen Intel(R) Core(TM) i7-11859H processor and 48Gb of RAM. After the import, the detected features are shown in the left most panel. At the bottom of this panel, you will find three numbers: "0 of 4865 (24619) selected". The number in paranthesis is the total number of detected features which is 24,619. The number 4,865 refers to the number of features which are obtained using the default filter settings (each feature has at least one MS/MS scan and has at least a decent feature quality). The first number (here 0) is the number of features in the selection. You can select a custom number of features which you might want to analyze.

Feature Filtering with COMET

To use COMET's feature filtering, you have to click on the button with the three dots "..." next to the search text field ("Type and hit enter to search"). A panel called "Filter configuration" opens. Go to the "COMET" tab. Here, you have to specify the architecture of your combinatorial molecule library first. You do this by selecting the CSV file containing all the building blocks (described in section "Demo Data") and the molecular formula of the scaffold when incorporated in the final molecule. In case of the 500 membered benzimidazole library, this would be the file path to ENL161_CustomDB_BBs.csv and C8H3N2O as the molecular formula of the scaffold. Additionally, you have the option to set up the MS1 mass accurarcy and fallback adducts. Let's keep the default parameters of 10ppm and [M + H]+. When pressing "Apply" the first filter will be applied; i.e. all features will be filtered out those precursor mass doesn't match any compound of the library. This takes about 30sec and results in 806 filtered features. Note: when pressing "Apply", it's normal that the window freezes for a moment because there is currently no progress bar showing that something is computed.

To filter the features according to their fragmentation pattern and retain only those with a fragmentation pattern characteristic to the library molecules (i.e. fragmentation by cleaving off building blocks), you have to open the COMET filtering panel again. Here, you have to check "Enable peak matching filter". Now, you can specify which fragments you would expect; i.e. the fragments which are more likely to form and result by cleaving of single building blocks. If you leave this text field empty, all such fragments will be considered.
Let's specify 0,1,S[0;2],S[1;2] for the benzimidazole library from the example. Here, 0 specifies the single amino acid building block, 1 specifies the single amine, S[0;2] specifies the scaffold plus amino acid and aldehyde, and S[1;2] is the scaffold plus amine and aldehyde. See (van der Nol et al., 2025) for a more detailed description of how to specify fragments in the COMET filtering panel.
In addition to the fragment specification, you can chose the minimum number of peaks which should be explainable by any of the specified fragments. You also have to provide how many of the highest intensity peaks should be considered per spectrum. Let's say that the 5 highest intensity peaks should be considered and at least 2 of those fragment peaks should be explainable by at least one candidate's building block fragment.
As hydrogen rearrangments can occur during collision-induced dissociation (CID), you can also specify the maximum allowed number of hydrogen atom masses a theoretical fragment can deviate from its fragment peak. Let's say 2 allowed hydrogen shifts and let's change the MS2 mass accuracy to 5ppm as the MS data was measured on an Orbitrap.
If you specify an output location, a CSV file with the information about the filtering will be stored. This CSV file will contain all filtered features and it will tell you which fragment peaks can be explained with which building block fragments of which candidate structure. If you do not specifiy an output location, these information will be printed into the console.
Applying the filter with these settings results in 285 filtered features. The filtering takes about 5min on a laptop with a 11th Gen Intel(R) Core(TM) i7-11859H processor and 48Gb of RAM. Note again, it will look like that the window freezes. But in reality, the filtering is computed and will take some minutes.
For more information, see (van der Nol et al., 2025).

Structure Annotation

Creating a custom database with all library molecules

As we already know that the measured structures are a subset of all library compounds, we only want to annotate MS/MS spectra or features with structures of that library; i.e. we want to search in that library for potential candidates. Here, you will find a description on how to create a custom database. Regarding the example library from above, we would open the Databases dialog, click on the Create custom Database button, enter the name ENL161 as the database name in COMET, and specify the location of that custom database file (called enl161.siriusdb). Then, we would add the TSV file ENL161_CustomDB.tsv and press the Import structures and spectra button. After creation, this new custom database enl161 is shown together with the information of having 500 compounds with 455 different molecular formulas.

Compute Panel

To perform the structure annoation, you have to open the Compute dialog. Here, you have to activate the molecular formula identification with SIRIUS by clicking on the SIRIUS button. It's important to use the option Database search in the drop-down list for Molecular formula generation and define your custom database as the only database to search in, e.g. ENL161 from the example above.

Furthermore, you have to click on the Predict button for the fingerprint prediction and the Sarch DBs button for the structure annotation. Regarding the latter, we recommend you to use the option Rank with EPIMETHEUS in case of combinatorial molecule library. Again, as search database you should only use your created custom database, e.g. ENL161 from the example above.
Now press the Compute button and the MS/MS spectra or features will be annotated with their corresponding candidate strucutes. Those candidate structures are ranked according to their assigned score.

The computation of those remaining 285 features takes about 10min on a laptop with a 11th Gen Intel(R) Core(TM) i7-11859H processor and 48Gb of RAM.

Integration of CSI:FingerID, CANOPUS and MSNovelist

Fragmentation trees and spectra can be directly uploaded from SIRIUS to the CSI:FingerID, CANOPUS and MSNovelist web services. Results are retrieved from the web service and can be displayed in the SIRIUS graphical user interface. This functionality is also available for the SIRIUS command-line tool. Training structures for CSI:FingerID's predictors are available through the CSI:FingerID web API:

https://www.csi-fingerid.uni-jena.de/v3.0/api/fingerid/trainingstructures?predictor=1 (training structures for positive ion mode)
https://www.csi-fingerid.uni-jena.de/v3.0/api/fingerid/trainingstructures?predictor=2 (training structures for negative ion mode)

Fragmentation Tree Computation

The manual interpretation of tandem mass spectra is time-consuming and non-trivial. SIRIUS analyses the fragmentation pattern resulting in a hypothetical fragmentation tree, in which nodes are annotated with molecular formulas of the fragments and arcs (edges) represent fragmentation events (losses). SIRIUS allows for the automated and high-throughput analysis of small-compound MS data beyond elemental composition without requiring compound structures or a mass spectral database.

Isotope Pattern Analysis

SIRIUS deduces molecular formulas of small compounds by ranking isotope patterns from mass spectra of high resolution. After preprocessing, the output of a mass spectrometer is a list of peaks which corresponds to the masses of the sample molecules and their abundance. In principle, elemental compositions of small molecules can be identified using only accurate masses. However, even with very high mass accuracy, many formulas are obtained in higher mass regions. High resolution mass spectrometry allows us to determine the isotope pattern of sample molecule with outstanding accuracy and apply this information to identify the elemental composition of the sample molecule. SIRIUS can be downloaded either as graphical user interface (see Sirius GUI) or as command-line tool.

Main citations

Main citations for COMET related features

Edith van der Nol, Nils Alexander Haupt, Qing Qing Gao, Benthe A.M. Smit, Martin Andre Hoffmann, Martin Engler-Lukajewski Marcus Ludwig, Sean McKenna, J. Miguel Mata, Olivier Bequignon, Gerard van Westen, Tiemen J. Wendel, Sylvie M. Noordermeer, Sebastian Böcker, and Sebastian Pomplun. Barcode-free hit discovery from massive libraries enabled by automated small molecule structure annotation. ChemRxiv, 2025.

Main citations for SIRIUS related features

Kai Dührkop, Markus Fleischauer, Marcus Ludwig, Alexander A. Aksenov, Alexey V. Melnik, Marvin Meusel, Pieter C. Dorrestein, Juho Rousu, and Sebastian Böcker, SIRIUS 4: Turning tandem mass spectra into metabolite structure information. Nature Methods 16, 299–302, 2019.

Stravs, Michael A. and Dührkop, Kai and Böcker, Sebastian and Zamboni, Nicola MSNovelist: de novo structure generation from mass spectra Nature Methods 19, 865–870, 2022. (Cite if you are using: MSNovelist)

Martin A. Hoffmann and Louis-Félix Nothias and Marcus Ludwig and Markus Fleischauer and Emily C. Gentry and Michael Witting and Pieter C. Dorrestein and Kai Dührkop and Sebastian Böcker High-confidence structural annotation of metabolites absent from spectral libraries Nature Biotechnology 40, 411–421, 2022. (Cite if you are using: CSI:FingerID, COSMIC)

Kai Dührkop, Louis-Félix Nothias, Markus Fleischauer, Raphael Reher, Marcus Ludwig, Martin A. Hoffmann, Daniel Petras, William H. Gerwick, Juho Rousu, Pieter C. Dorrestein and Sebastian Böcker. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nature Biotechnology, 2020. (Cite if you are using CANOPUS)

Yannick Djoumbou Feunang, Roman Eisner, Craig Knox, Leonid Chepelev, Janna Hastings, Gareth Owen, Eoin Fahy, Christoph Steinbeck, Shankar Subramanian, Evan Bolton, Russell Greiner, David S. Wishart. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. Journal of Cheminformatics 8, 61, 2016. (ClassyFire publication; cite this if you are using CANOPUS)

Marcus Ludwig, Louis-Félix Nothias, Kai Dührkop, Irina Koester, Markus Fleischauer, Martin A. Hoffmann, Daniel Petras, Fernando Vargas, Mustafa Morsy, Lihini Aluwihare, Pieter C. Dorrestein, Sebastian Böcker. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nature Machine Intelligence 2, 629–641, 2020. (Cite if you are using ZODIAC)

Kai Dührkop and Sebastian Böcker. Fragmentation trees reloaded. Journal of Cheminformatics 8, 5, 2016. (Cite this for fragmentation pattern analysis and fragmentation tree computation)

Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu, and Sebastian Böcker. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proceedings of the National Academy of Sciences U S A 112(41), 12580-12585, 2015. (cite this when using CSI:FingerID)

Sebastian Böcker, Matthias C. Letzel, Zsuzsanna Lipták and Anton Pervukhin. SIRIUS: decomposing isotope patterns for metabolite identification. Bioinformatics 25(2), 218-224, 2009. (Cite this for isotope pattern analysis)

Additional citations

Marcus Ludwig, Kai Dührkop and Sebastian and Böcker. Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints. Bioinformatics, 34(13): i333-i340. 2018. Proc. of Intelligent Systems for Molecular Biology (ISMB 2018). (Cite for CSI:FingerID Scoring)

W. Timothy J. White, Stephan Beyer, Kai Dührkop, Markus Chimani and Sebastian Böcker. Speedy Colorful Subtrees. In Proc. of Computing and Combinatorics Conference (COCOON 2015), volume 9198 of Lect Notes Comput Sci, pages 310-322. Springer, Berlin, 2015. (cite this on why computations are swift, even on a laptop computer)

Huibin Shen, Kai Dührkop, Sebastian Böcker and Juho Rousu. Metabolite Identification through Multiple Kernel Learning on Fragmentation Trees. Bioinformatics, 30(12):i157-i164, 2014. Proc. of Intelligent Systems for Molecular Biology (ISMB 2014). (Introduces the machinery behind CSI:FingerID)

Imran Rauf, Florian Rasche, François Nicolas and Sebastian Böcker. Finding Maximum Colorful Subtrees in practice. J Comput Biol, 20(4):1-11, 2013. (More, earlier work on why computations are swift today)

Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics, 2012. Vol. 28, nro 18, pp. 2333-2341. (Introduces the idea of predicting molecular fingerprints from tandem MS data)

Florian Rasche, Aleš Svatoš, Ravi Kumar Maddula, Christoph Böttcher, and Sebastian Böcker. Computing Fragmentation Trees from Tandem Mass Spectrometry Data. Analytical Chemistry (2011) 83 (4): 1243–1251. (Cite this for introduction of fragmentation trees as used by SIRIUS)

Sebastian Böcker and Florian Rasche. Towards de novo identification of metabolites by analyzing tandem mass spectra. Bioinformatics (2008) 24 (16): i49-i55. (The very first paper to mention fragmentation trees as used by SIRIUS)

License

Starting with version 4.4.27, SIRIUS is licensed under the GNU Affero General Public License (GPL). If you integrate SIRIUS into other software, we strongly encourage you to make the usage of SIRIUS as well as the literature to cite transparent to the user.

Name		Name	Last commit message	Last commit date
Latest commit History 10,628 Commits
.github		.github
affinity_selection_ms		affinity_selection_ms
blob-storage		blob-storage
buildSrc/src/main/java		buildSrc/src/main/java
canopus_predict_oss		canopus_predict_oss
chemical_db_oss		chemical_db_oss
chemistry_base		chemistry_base
combinatorial_fragmenter		combinatorial_fragmenter
confidence_score_predict_oss		confidence_score_predict_oss
data		data
default_properties		default_properties
dist		dist
document-storage		document-storage
elgordo		elgordo
fingerblast_oss		fingerblast_oss
fingerid_base_oss		fingerid_base_oss
fingerid_project_space_oss		fingerid_project_space_oss
fingerprint_pvalues_oss		fingerprint_pvalues_oss
fingerprinter_oss		fingerprinter_oss
fragmentation_tree		fragmentation_tree
gibbs_sampling		gibbs_sampling
gradle/wrapper		gradle/wrapper
graph_utils_oss		graph_utils_oss
icons		icons
io		io
isotope_pattern		isotope_pattern
jpackage		jpackage
lcms		lcms
lcms2		lcms2
lcms_mel		lcms_mel
mass_decomposition		mass_decomposition
ml_utils		ml_utils
model_store_oss		model_store_oss
ms_persistence_oss		ms_persistence_oss
networks		networks
passatutto		passatutto
preprocessing		preprocessing
quality_assessment		quality_assessment
rabbitmq-support		rabbitmq-support
retention_order_prediction_oss		retention_order_prediction_oss
scripts		scripts
sirius-sdk		sirius-sdk
sirius_api		sirius_api
sirius_cli		sirius_cli
sirius_dist		sirius_dist
sirius_gui		sirius_gui
sirius_project_space		sirius_project_space
sirius_rest_service		sirius_rest_service
spectral_alignment		spectral_alignment
structure_matching		structure_matching
tree_motif_search		tree_motif_search
utils		utils
web_service_oss		web_service_oss
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
COPYING.txt		COPYING.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
licence-groups.json		licence-groups.json
license-overrides.txt		license-overrides.txt
lombok.config		lombok.config
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Documentation of SIRIUS

Installation and Dependencies

Installation Instructions

Dependencies

Creating a user account

Sources on GitHub

Changelog

How to use COMET

Demo Data

LC-MS/MS Data Import

Feature Filtering with COMET

Structure Annotation

Creating a custom database with all library molecules

Compute Panel

Integration of CSI:FingerID, CANOPUS and MSNovelist

Fragmentation Tree Computation

Isotope Pattern Analysis

Main citations

Main citations for COMET related features

Main citations for SIRIUS related features

Additional citations

License

Acknowledgements

Thanks for supporting the development of SIRIUS!

About

Licenses found

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 12

Uh oh!

Languages

License

Licenses found

sirius-ms/comet

Folders and files

Latest commit

History

Repository files navigation

Documentation of SIRIUS

Installation and Dependencies

Installation Instructions

Dependencies

Creating a user account

Sources on GitHub

Changelog

How to use COMET

Demo Data

LC-MS/MS Data Import

Feature Filtering with COMET

Structure Annotation

Creating a custom database with all library molecules

Compute Panel

Integration of CSI:FingerID, CANOPUS and MSNovelist

Fragmentation Tree Computation

Isotope Pattern Analysis

Main citations

Main citations for COMET related features

Main citations for SIRIUS related features

Additional citations

License

Acknowledgements

Thanks for supporting the development of SIRIUS!

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages