The development and the validation of innovative approaches for biomarker selection are of paramount importance in many -omics technologies. Unfortunately, the actual testing of new methods on real data is difficult, because in real data sets, one can never be sure about the “true” biomarkers. In this paper, we present a publicly available metabolomic ultra performance liquid chromatography–mass spectrometry spike-in data set for apples. The data set consists of 10 control samples and three spiked sets of the same size, where naturally occurring compounds are added in different concentrations. In this sense, the data set can serve as a test bed to assess the performance of new algorithms and compare them with previously published results. We illustrate some of the possibilities provided by this spike-in data set by comparing the performance of two popular biomarker-selection methods, the univariate t-test and the multivariate variable importance in projection. To promote a widespread use of the data, raw data files as well as preprocessed peak lists are made available.
Franceschi, P.; Masuero, D.; Vrhovsek, U.; Mattivi, F.; Wehrens, H.R.M.J. (2012). A benchmark spike-in data set for biomarker identification in metabolomics. JOURNAL OF CHEMOMETRICS, 26 (1): 16-24. doi: 10.1002/cem.1420 handle: http://hdl.handle.net/10449/20730
A benchmark spike-in data set for biomarker identification in metabolomics
Franceschi, Pietro;Masuero, Domenico;Vrhovsek, Urska;Mattivi, Fulvio;Wehrens, Herman Ronald Maria Johan
2012-01-01
Abstract
The development and the validation of innovative approaches for biomarker selection are of paramount importance in many -omics technologies. Unfortunately, the actual testing of new methods on real data is difficult, because in real data sets, one can never be sure about the “true” biomarkers. In this paper, we present a publicly available metabolomic ultra performance liquid chromatography–mass spectrometry spike-in data set for apples. The data set consists of 10 control samples and three spiked sets of the same size, where naturally occurring compounds are added in different concentrations. In this sense, the data set can serve as a test bed to assess the performance of new algorithms and compare them with previously published results. We illustrate some of the possibilities provided by this spike-in data set by comparing the performance of two popular biomarker-selection methods, the univariate t-test and the multivariate variable importance in projection. To promote a widespread use of the data, raw data files as well as preprocessed peak lists are made available.File | Dimensione | Formato | |
---|---|---|---|
2012 JoC Franceschi et al.pdf
non disponibili
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.68 MB
Formato
Adobe PDF
|
1.68 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.