A frequent problem with scientific research software is the lack of support, maintenance and further development. In particular, development by a single researcher can easily result in orphaned software packages, especially if combined with poor documentation or lack of adherence to open software development standards. The RforMassSpectrometry initiative aims to develop an efficient and stable infrastructure for mass spectrometry (MS) data analysis. As part of this initiative, a growing ecosystem of R software packages is being developed covering different aspects of metabolomics and proteomics data analysis. To avoid the aforementioned problems, community contributions are fostered, and open development, documentation and long-term support emphasized. At the heart of the package ecosystem is the Spectra package that provides the core infrastructure to handle and analyze MS data. Its design allows easy expansion to support additional file or data formats including data representations with minimal memory footprint or remote data access. The xcms package for LC-MS data preprocessing was updated to reuse this infrastructure, enabling now also the analysis of very large, or remote, data. This integration simplifies in addition complete analysis workflows which can include the MsFeatures package for compounding, and the MetaboAnnotation package for annotation of untargeted metabolomics experiments. Public annotation resources can be easily accessed through packages such as MsBackendMassbank, MsBackendMgf, MsBackendMsp or CompoundDb, the latter also allowing to create and manage lab-specific compound databases. Finally, the MsCoreUtils and MetaboCoreUtils packages provide efficient implementations of commonly used algorithms, designed to be re-used in other R packages. Ultimately, and in contrast to a monolithic software design, the package ecosystem enables to build customized, modular, and reproducible analysis workflows. Future development will focus on improved data structures and analysis methods for chromatographic data, and better interoperability with other open source softwares including a direct integration with Python MS libraries.

Rainer, J.; Louail, P.; Vicini, A.; Gine, R.; Badia, J.; Stravs, M.; Garcia Aloy, M.; Huber, C.; Salzer, L.; Stanstrup, J.; Shahaf, N.; Panse, C.; Naake, T.; Kumler, W.; Vangeenderhuysen, P.; Brunius, C.; Hecht, H.; Neumann, S.; Witting, M.; Gibb, S.; Gatto, L. (2024). An open software development-based ecosystem of R packages for metabolomics data analysis. In: Metabolomics 2024: 20th Annual Conference of the Metabolomics Society, Osaka, Japan, 20-24 June 2024. url: https://zenodo.org/records/11370345 handle: https://hdl.handle.net/10449/85696

An open software development-based ecosystem of R packages for metabolomics data analysis

Garcia Aloy, M.;
2024-01-01

Abstract

A frequent problem with scientific research software is the lack of support, maintenance and further development. In particular, development by a single researcher can easily result in orphaned software packages, especially if combined with poor documentation or lack of adherence to open software development standards. The RforMassSpectrometry initiative aims to develop an efficient and stable infrastructure for mass spectrometry (MS) data analysis. As part of this initiative, a growing ecosystem of R software packages is being developed covering different aspects of metabolomics and proteomics data analysis. To avoid the aforementioned problems, community contributions are fostered, and open development, documentation and long-term support emphasized. At the heart of the package ecosystem is the Spectra package that provides the core infrastructure to handle and analyze MS data. Its design allows easy expansion to support additional file or data formats including data representations with minimal memory footprint or remote data access. The xcms package for LC-MS data preprocessing was updated to reuse this infrastructure, enabling now also the analysis of very large, or remote, data. This integration simplifies in addition complete analysis workflows which can include the MsFeatures package for compounding, and the MetaboAnnotation package for annotation of untargeted metabolomics experiments. Public annotation resources can be easily accessed through packages such as MsBackendMassbank, MsBackendMgf, MsBackendMsp or CompoundDb, the latter also allowing to create and manage lab-specific compound databases. Finally, the MsCoreUtils and MetaboCoreUtils packages provide efficient implementations of commonly used algorithms, designed to be re-used in other R packages. Ultimately, and in contrast to a monolithic software design, the package ecosystem enables to build customized, modular, and reproducible analysis workflows. Future development will focus on improved data structures and analysis methods for chromatographic data, and better interoperability with other open source softwares including a direct integration with Python MS libraries.
Metabolomics
Mass Spectrometry
Open Software Development
Annotation
2024
Rainer, J.; Louail, P.; Vicini, A.; Gine, R.; Badia, J.; Stravs, M.; Garcia Aloy, M.; Huber, C.; Salzer, L.; Stanstrup, J.; Shahaf, N.; Panse, C.; Naake, T.; Kumler, W.; Vangeenderhuysen, P.; Brunius, C.; Hecht, H.; Neumann, S.; Witting, M.; Gibb, S.; Gatto, L. (2024). An open software development-based ecosystem of R packages for metabolomics data analysis. In: Metabolomics 2024: 20th Annual Conference of the Metabolomics Society, Osaka, Japan, 20-24 June 2024. url: https://zenodo.org/records/11370345 handle: https://hdl.handle.net/10449/85696
File in questo prodotto:
File Dimensione Formato  
2024 Garcia Aloy - Osaka.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/85696
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact