The development and the validation of innovative approaches for biomarker selection are of paramount importance in many -omics technologies. Unfortunately, the actual testing of new methods on real data is difficult, because in real data sets, one can never be sure about the “true” biomarkers. In this paper, we present a publicly available metabolomic ultra performance liquid chromatography–mass spectrometry spike-in data set for apples. The data set consists of 10 control samples and three spiked sets of the same size, where naturally occurring compounds are added in different concentrations. In this sense, the data set can serve as a test bed to assess the performance of new algorithms and compare them with previously published results. We illustrate some of the possibilities provided by this spike-in data set by comparing the performance of two popular biomarker-selection methods, the univariate t-test and the multivariate variable importance in projection. To promote a widespread use of the data, raw data files as well as preprocessed peak lists are made available.

Franceschi, P.; Masuero, D.; Vrhovsek, U.; Mattivi, F.; Wehrens, H.R.M.J. (2012). A benchmark spike-in data set for biomarker identification in metabolomics. JOURNAL OF CHEMOMETRICS, 26 (1): 16-24. doi: 10.1002/cem.1420 handle: http://hdl.handle.net/10449/20730

A benchmark spike-in data set for biomarker identification in metabolomics

Franceschi, Pietro;Masuero, Domenico;Vrhovsek, Urska;Mattivi, Fulvio;Wehrens, Herman Ronald Maria Johan
2012-01-01

Abstract

The development and the validation of innovative approaches for biomarker selection are of paramount importance in many -omics technologies. Unfortunately, the actual testing of new methods on real data is difficult, because in real data sets, one can never be sure about the “true” biomarkers. In this paper, we present a publicly available metabolomic ultra performance liquid chromatography–mass spectrometry spike-in data set for apples. The data set consists of 10 control samples and three spiked sets of the same size, where naturally occurring compounds are added in different concentrations. In this sense, the data set can serve as a test bed to assess the performance of new algorithms and compare them with previously published results. We illustrate some of the possibilities provided by this spike-in data set by comparing the performance of two popular biomarker-selection methods, the univariate t-test and the multivariate variable importance in projection. To promote a widespread use of the data, raw data files as well as preprocessed peak lists are made available.
Metabolomics
Biomarker Selection
Biostatistics
Spike-in data
Benchmark
Metabolomica
Selezione di Biomarkers
Biostatistica
Settore CHIM/01 - CHIMICA ANALITICA
2012
Franceschi, P.; Masuero, D.; Vrhovsek, U.; Mattivi, F.; Wehrens, H.R.M.J. (2012). A benchmark spike-in data set for biomarker identification in metabolomics. JOURNAL OF CHEMOMETRICS, 26 (1): 16-24. doi: 10.1002/cem.1420 handle: http://hdl.handle.net/10449/20730
File in questo prodotto:
File Dimensione Formato  
2012 JoC Franceschi et al.pdf

non disponibili

Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.68 MB
Formato Adobe PDF
1.68 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/20730
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 33
  • ???jsp.display-item.citation.isi??? 31
social impact