CINECA IRIS Institutional Research Information System

The recent fields of transcriptomics, metabolomics, proteomics, often summarized under the heading “omics”, aim at providing holistic views of biological systems: by measuring as many variables as possible it is hoped that relevant biological information can be uncovered. In many cases, this comes down to investigating which variables are important when comparing classes, when following samples over time or when relating omics measurements with phenotypical observations. Given the extremely low sample-to-variable ratios in typical omics data sets, standard biomarker identification methods like t tests tend to work not very well. Multiple testing corrections are indispensible but often lead to an unacceptable loss of power. Stability selection (Meinshausen and Buehlmann, 2010) provides a way to avoid many of the false positives in biomarker identification by repeatedly subsampling the data, and only considering those variables as putative biomarkers that consistently show up as important. They used the lasso as a primary variable selection method. In our own work, we have shown that also selecting the largest coefficients in non-sparse regression models such as PLS works well, when combined with the stability selection framework (Wehrens et al. 2011). We support these claims with the analysis of several experimental and simulated data sets. In particular, the BioMark package for R, implementing stability selection as well as Higher Criticism thresholding, contains an experimental spike-in data set from the area of metabolomics, which can aid in further algorithm testing and development. From these analyses, it follows that stability selection is a very general and robust framework for variable selection. References: Meinshausen N, Buehlmann P (2010). “Stability selection.” J. R. Statist. Soc. B, 72, 417–473. With discussion. Wehrens R, Franceschi P, Vrhovsek U, Mattivi F (2011). “Stability-based biomarker selection.” Anal. Chim. Acta, 705, 15–23.

Wehrens, H.R.M.J.; Franceschi, P. (2012). Stability selection for omics data. In: XXVIth International Biometric Conference, August 26-31, 2012, Kobe, Japan. url: http://secretariat.ne.jp/ibc2012/programme/c1-c50/C-46/104-C-46-3.pdf handle: http://hdl.handle.net/10449/21652

Stability selection for omics data

Wehrens, Herman Ronald Maria Johan;Franceschi, Pietro

2012-01-01

Abstract

The recent fields of transcriptomics, metabolomics, proteomics, often summarized under the heading “omics”, aim at providing holistic views of biological systems: by measuring as many variables as possible it is hoped that relevant biological information can be uncovered. In many cases, this comes down to investigating which variables are important when comparing classes, when following samples over time or when relating omics measurements with phenotypical observations. Given the extremely low sample-to-variable ratios in typical omics data sets, standard biomarker identification methods like t tests tend to work not very well. Multiple testing corrections are indispensible but often lead to an unacceptable loss of power. Stability selection (Meinshausen and Buehlmann, 2010) provides a way to avoid many of the false positives in biomarker identification by repeatedly subsampling the data, and only considering those variables as putative biomarkers that consistently show up as important. They used the lasso as a primary variable selection method. In our own work, we have shown that also selecting the largest coefficients in non-sparse regression models such as PLS works well, when combined with the stability selection framework (Wehrens et al. 2011). We support these claims with the analysis of several experimental and simulated data sets. In particular, the BioMark package for R, implementing stability selection as well as Higher Criticism thresholding, contains an experimental spike-in data set from the area of metabolomics, which can aid in further algorithm testing and development. From these analyses, it follows that stability selection is a very general and robust framework for variable selection. References: Meinshausen N, Buehlmann P (2010). “Stability selection.” J. R. Statist. Soc. B, 72, 417–473. With discussion. Wehrens R, Franceschi P, Vrhovsek U, Mattivi F (2011). “Stability-based biomarker selection.” Anal. Chim. Acta, 705, 15–23.

Scheda breve

Scheda completa

Scheda completa (DC)

	Keywords
	
				Data analysis
			
	Date of issue
	
				2012
			
	Citazione
	
				Wehrens, H.R.M.J.; Franceschi, P. (2012). Stability selection for omics data. In:  XXVIth International Biometric Conference, August 26-31, 2012, Kobe, Japan. url: http://secretariat.ne.jp/ibc2012/programme/c1-c50/C-46/104-C-46-3.pdf handle: http://hdl.handle.net/10449/21652
			
	Appare nelle tipologie:
	
				4.02 Abstract in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2012 IBC Wehrens et al.pdf accesso aperto Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 18.68 kB Formato Adobe PDF Visualizza/Apri	18.68 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/21652

Citazioni

ND

ND

ND

social impact