Biomarker selection is an important topic in the omics sciences, where holistic measurement methods routinely generate results for many variables simultaneously. Very often, only a small fraction of these variables are really associated with the phenomena of interest. Selection and identification of these biomarkers is essential for obtaining an understanding of the complex biological processes under study. Finding biomarkers, however, is a difficult task. Even if a relative order can be established, e.g., on the basis of p values, it is usually hard to determine where to stop including candidates in the final set. Higher Criticism is an approach for finding data-dependent cutoff values when comparing two distinct groups of samples. Here, we extend its use to multivariate data, providing a principled approach to compromise between not selecting too many variables and catching as many true positives as possible. The results show a marked improvement in biomarker selection, compared to the standard settings available for some methods. Interestingly, HC thresholds can differ considerably from what has been suggested in literature before, again showing that it is not possible to use the same cutoff value for all data sets. The data-specific cutoff values provided by HC also open the way to more fair comparisons between biomarker selection methods, not biased by unlucky or suboptimal threshold choices
Wehrens, H.R.M.J.; Franceschi, P. (2012). Thresholding for biomarker selection in multivariate data using Higher Criticism. MOLECULAR BIOSYSTEMS, 8 (9): 2339-2346. doi: 10.1039/C2MB25121C handle: http://hdl.handle.net/10449/21161
Thresholding for biomarker selection in multivariate data using Higher Criticism
Wehrens, Herman Ronald Maria Johan;Franceschi, Pietro
2012-01-01
Abstract
Biomarker selection is an important topic in the omics sciences, where holistic measurement methods routinely generate results for many variables simultaneously. Very often, only a small fraction of these variables are really associated with the phenomena of interest. Selection and identification of these biomarkers is essential for obtaining an understanding of the complex biological processes under study. Finding biomarkers, however, is a difficult task. Even if a relative order can be established, e.g., on the basis of p values, it is usually hard to determine where to stop including candidates in the final set. Higher Criticism is an approach for finding data-dependent cutoff values when comparing two distinct groups of samples. Here, we extend its use to multivariate data, providing a principled approach to compromise between not selecting too many variables and catching as many true positives as possible. The results show a marked improvement in biomarker selection, compared to the standard settings available for some methods. Interestingly, HC thresholds can differ considerably from what has been suggested in literature before, again showing that it is not possible to use the same cutoff value for all data sets. The data-specific cutoff values provided by HC also open the way to more fair comparisons between biomarker selection methods, not biased by unlucky or suboptimal threshold choicesFile | Dimensione | Formato | |
---|---|---|---|
2012 MB Wehrens et al.pdf
non disponibili
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
2.51 MB
Formato
Adobe PDF
|
2.51 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.