The R package BioMark provides several tools to define which variables are associated with class differences in data from fields like metabolomics and proteomics. The first group of tools uses Higher Criticism to define an optimal threshold between interesting and non-interesting variables. This can be applied to any statistic, be it a t value, a regression coefficient or something else, and is related to the expected distribution of p values under the null distribution. The second group of tools is based on stability selection, i.e. an assessment of how often specific variables are highlighted as interesting under perturbation of the data. This approach is especially attractive when the number of samples is larger than, say, ten per group. Also in this case, the strategy can be applied to any type of statistic. Using real and simulated data, the application and usefulness of these techniques will be shown
Wehrens, H.R.M.J.; Franceschi, P. (2013). Biomarker selection for omics data. In: 7th CSDA International Conference on Computational and Financial Econometrics (CFE 2013) and 6th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computational and Methodological Statistics (ERCIM 2013), London, December 14-16, 2013: 164 (E886). handle: http://hdl.handle.net/10449/22873
Biomarker selection for omics data
Wehrens, Herman Ronald Maria Johan;Franceschi, Pietro
2013-01-01
Abstract
The R package BioMark provides several tools to define which variables are associated with class differences in data from fields like metabolomics and proteomics. The first group of tools uses Higher Criticism to define an optimal threshold between interesting and non-interesting variables. This can be applied to any statistic, be it a t value, a regression coefficient or something else, and is related to the expected distribution of p values under the null distribution. The second group of tools is based on stability selection, i.e. an assessment of how often specific variables are highlighted as interesting under perturbation of the data. This approach is especially attractive when the number of samples is larger than, say, ten per group. Also in this case, the strategy can be applied to any type of statistic. Using real and simulated data, the application and usefulness of these techniques will be shownFile | Dimensione | Formato | |
---|---|---|---|
2013 ERCIMBoA Wehrens et al.pdf
accesso aperto
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
219.42 kB
Formato
Adobe PDF
|
219.42 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.