CINECA IRIS Institutional Research Information System

Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing alpha cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically. We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of alpha = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.

Wehrens, H.R.M.J.; Franceschi, P. (2012). Meta-statistics for variable selection: the R package BioMark. JOURNAL OF STATISTICAL SOFTWARE, 51 (10): 1-18. doi: 10.18637/jss.v051.i10 handle: http://hdl.handle.net/10449/21657

Meta-statistics for variable selection: the R package BioMark

Wehrens, Herman Ronald Maria Johan;Franceschi, Pietro

2012-01-01

Abstract

Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing alpha cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically. We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of alpha = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.

Scheda breve

Scheda completa

Scheda completa (DC)

	Keywords
	
				Biomarkers
Higher criticism
Stability selection
Spike-in data
Metabolomics
			
	MIUR subjects (validi fino a 24/06/2024)
	
				Settore CHIM/01 - CHIMICA ANALITICA
			
	Date of issue
	
				2012
			
	Citazione
	
				Wehrens, H.R.M.J.; Franceschi, P. (2012). Meta-statistics for variable selection: the R package BioMark. JOURNAL OF STATISTICAL SOFTWARE, 51 (10): 1-18. doi: 10.18637/jss.v051.i10 handle: http://hdl.handle.net/10449/21657
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2012 JSS Wehrens et al.pdf accesso aperto Licenza: Creative commons Dimensione 641.29 kB Formato Adobe PDF Visualizza/Apri	641.29 kB	Adobe PDF	Visualizza/Apri

Questo articolo è pubblicato sotto una Licenza Licenza Creative Commons

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/21657

Citazioni

ND

18

14

social impact