The R package BioMark provides several tools to define which variables are associated with class differences in data from fields like metabolomics and proteomics. The first group of tools uses Higher Criticism to define an optimal threshold between interesting and non-interesting variables. This can be applied to any statistic, be it a t value, a regression coefficient or something else, and is related to the expected distribution of p values under the null distribution. The second group of tools is based on stability selection, i.e. an assessment of how often specific variables are highlighted as interesting under perturbation of the data. This approach is especially attractive when the number of samples is larger than, say, ten per group. Also in this case, the strategy can be applied to any type of statistic. Using real and simulated data, the application and usefulness of these techniques will be shown

Wehrens, H.R.M.J.; Franceschi, P. (2013). Biomarker selection for omics data. In: 7th CSDA International Conference on Computational and Financial Econometrics (CFE 2013) and 6th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computational and Methodological Statistics (ERCIM 2013), London, December 14-16, 2013: 164 (E886). handle: http://hdl.handle.net/10449/22873

Biomarker selection for omics data

Wehrens, Herman Ronald Maria Johan;Franceschi, Pietro
2013-01-01

Abstract

The R package BioMark provides several tools to define which variables are associated with class differences in data from fields like metabolomics and proteomics. The first group of tools uses Higher Criticism to define an optimal threshold between interesting and non-interesting variables. This can be applied to any statistic, be it a t value, a regression coefficient or something else, and is related to the expected distribution of p values under the null distribution. The second group of tools is based on stability selection, i.e. an assessment of how often specific variables are highlighted as interesting under perturbation of the data. This approach is especially attractive when the number of samples is larger than, say, ten per group. Also in this case, the strategy can be applied to any type of statistic. Using real and simulated data, the application and usefulness of these techniques will be shown
Data analysis
2013
Wehrens, H.R.M.J.; Franceschi, P. (2013). Biomarker selection for omics data. In: 7th CSDA International Conference on Computational and Financial Econometrics (CFE 2013) and 6th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computational and Methodological Statistics (ERCIM 2013), London, December 14-16, 2013: 164 (E886). handle: http://hdl.handle.net/10449/22873
File in questo prodotto:
File Dimensione Formato  
2013 ERCIMBoA Wehrens et al.pdf

accesso aperto

Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 219.42 kB
Formato Adobe PDF
219.42 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/22873
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact