Meta-statistics for biomarker selection
in the omics sciences

Wehrens, H.R.M.J.; Franceschi, P.

Background. Biomarker selection, i.e., the definition of which variables are important in statistical regression or discrimination models, is an ever more important topic in the omics sciences. Data from these fields are typically characterized by a low number of samples, but a large number of variables – a meaningful biological interpretation often is only possible when considering the most important variables. Methods. In this context, statistical tests like the t test will lead to many false positives, while multiple testing corrections tend to lose much power and select only very few variables. In addition, the cutoff value (usually set to a value like 5%) is often chosen in a haphazard way. We present two meta-statistics to tackle the problem of variable selection: higher criticism thresholding [1,2] and stability selection [3,4]. Higher criticism thresholding, applicable in a two-class discrimination setting, is a way to set suitable cutoff levels for significance, based on the data at hand. The underlying mechanism has been described as the “z-score of the p-value” [1]. The current work has extended higher criticism to multivariate methods like PLSDA and the VIP statistics [4]. Stability selection is a novel variable selection method, assessing the stability of biomarker selections under perturbations of the data. The concept is extremely general and robust and can be applied both in regression and discrimination cases: primary selection methods assessed in thie work include PLS and lasso models. Results. Simulated as well as experimental data show very good results for both stability selection and higher criticism. The experimental data in this study consist of LC-MS metabolomics data of spiked-in apple extracts [5] – such spike-in data are extremely important in assessing the value of biomarker selection methods but are rarely available. Good results are also obtained in other areas of science [1-3]. The advantages of stability selection include a broad applicability (regression, discrimination) and modest computational demands; on the other hand, the number of samples that is required is relatively high. For discrimination problems with fewer than, say, eight samples per class, it is probably better to rely on the higher criticism approach. Both higher criticism and stability selection have been implemented in an R package, BioMark, available from the CRAN repository, and also containing the experimental spike-in data

Wehrens, H.R.M.J.; Franceschi, P. (2012). Meta-statistics for biomarker selectionin the omics sciences. In: 4th StatSeq Workshop - Verona, Italy, 18-19 April 2012: 26. url: http://ddlab.sci.univr.it/statseq/booklet.pdf handle: http://hdl.handle.net/10449/22015

Meta-statistics for biomarker selection in the omics sciences

Wehrens, Herman Ronald Maria Johan;Franceschi, Pietro

2012-01-01

Abstract

Background. Biomarker selection, i.e., the definition of which variables are important in statistical regression or discrimination models, is an ever more important topic in the omics sciences. Data from these fields are typically characterized by a low number of samples, but a large number of variables – a meaningful biological interpretation often is only possible when considering the most important variables. Methods. In this context, statistical tests like the t test will lead to many false positives, while multiple testing corrections tend to lose much power and select only very few variables. In addition, the cutoff value (usually set to a value like 5%) is often chosen in a haphazard way. We present two meta-statistics to tackle the problem of variable selection: higher criticism thresholding [1,2] and stability selection [3,4]. Higher criticism thresholding, applicable in a two-class discrimination setting, is a way to set suitable cutoff levels for significance, based on the data at hand. The underlying mechanism has been described as the “z-score of the p-value” [1]. The current work has extended higher criticism to multivariate methods like PLSDA and the VIP statistics [4]. Stability selection is a novel variable selection method, assessing the stability of biomarker selections under perturbations of the data. The concept is extremely general and robust and can be applied both in regression and discrimination cases: primary selection methods assessed in thie work include PLS and lasso models. Results. Simulated as well as experimental data show very good results for both stability selection and higher criticism. The experimental data in this study consist of LC-MS metabolomics data of spiked-in apple extracts [5] – such spike-in data are extremely important in assessing the value of biomarker selection methods but are rarely available. Good results are also obtained in other areas of science [1-3]. The advantages of stability selection include a broad applicability (regression, discrimination) and modest computational demands; on the other hand, the number of samples that is required is relatively high. For discrimination problems with fewer than, say, eight samples per class, it is probably better to rely on the higher criticism approach. Both higher criticism and stability selection have been implemented in an R package, BioMark, available from the CRAN repository, and also containing the experimental spike-in data

Scheda breve

Scheda completa

Scheda completa (DC)

	Date of issue
	
				2012
			
	Citazione
	
				Wehrens, H.R.M.J.; Franceschi, P. (2012). Meta-statistics for biomarker selectionin the omics sciences. In:  4th StatSeq Workshop - Verona, Italy, 18-19 April 2012: 26. url: http://ddlab.sci.univr.it/statseq/booklet.pdf handle: http://hdl.handle.net/10449/22015
			
	Appare nelle tipologie:
	
				4.02 Abstract in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2012 COST VR 1.pdf accesso aperto Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 112.38 kB Formato Adobe PDF Visualizza/Apri	112.38 kB	Adobe PDF	Visualizza/Apri
6708.pdf accesso aperto Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 6.76 MB Formato Adobe PDF Visualizza/Apri	6.76 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/22015

Citazioni

ND

ND

ND

CINECA IRIS Institutional Research Information System

Meta-statistics for biomarker selection in the omics sciences

Wehrens, Herman Ronald Maria Johan;Franceschi, Pietro

2012-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CINECA IRIS Institutional Research Information System

Meta-statistics for biomarker selection in the omics sciences

Wehrens, Herman Ronald Maria Johan;Franceschi, Pietro

2012-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)