Volatile organic compounds (VOCs) are key markers in applications ranging from food quality assessment to medical diagnostics that can be profiled, for example, by gas chromatography–mass spectrometry (GC-MS) or by direct injection mass spectrometry (e.g. proton transfer reaction mass spectrometry). The common practice in both cases is to construct a tabular dataset from the raw measurements by performing peak extraction across samples and use statistical or machine learning methods to analyze it. However, modeling VOC profiles is particularly challenging due to high dimensionality, noise, and small sample sizes. In this study, we evaluate the Tabular Prior-data Fitted Network (TabPFN), a foundation model recently introduced for tabular data, across diverse VOC datasets. Without requiring task-specific training, TabPFN achieves state-of-the-art performance in both classification and regression tasks, outperforming classical machine learning methods for most datasets. We further explore new strategies to enhance TabPFN’s performance, including ensembling and fine-tuning, finding that a plain ensemble seems to be the best option in this setting. Our results demonstrate that TabPFN is a highly effective modeling tool for VOC profiles obtained with different analytical approaches. It offers robust predictions even in the data-scarce, high-variability scenarios typical of real-world workflows
Granitto, P.M.; Betta, E.; Khomenko, I.; Pedrotti, M.; Romano, A.; Biasioli, F. (2026-12-02). On the use of TabPFN on mass spectrometry analysis of volatile organic compounds. SCIENTIFIC REPORTS, 16: 164. doi: 10.1038/s41598-025-29128-6 handle: https://hdl.handle.net/10449/93675
On the use of TabPFN on mass spectrometry analysis of volatile organic compounds
Granitto, P. M.
Primo
;Betta, E.;Khomenko, I.;Pedrotti, M.;Romano, A.;Biasioli, F.Ultimo
2026-12-02
Abstract
Volatile organic compounds (VOCs) are key markers in applications ranging from food quality assessment to medical diagnostics that can be profiled, for example, by gas chromatography–mass spectrometry (GC-MS) or by direct injection mass spectrometry (e.g. proton transfer reaction mass spectrometry). The common practice in both cases is to construct a tabular dataset from the raw measurements by performing peak extraction across samples and use statistical or machine learning methods to analyze it. However, modeling VOC profiles is particularly challenging due to high dimensionality, noise, and small sample sizes. In this study, we evaluate the Tabular Prior-data Fitted Network (TabPFN), a foundation model recently introduced for tabular data, across diverse VOC datasets. Without requiring task-specific training, TabPFN achieves state-of-the-art performance in both classification and regression tasks, outperforming classical machine learning methods for most datasets. We further explore new strategies to enhance TabPFN’s performance, including ensembling and fine-tuning, finding that a plain ensemble seems to be the best option in this setting. Our results demonstrate that TabPFN is a highly effective modeling tool for VOC profiles obtained with different analytical approaches. It offers robust predictions even in the data-scarce, high-variability scenarios typical of real-world workflows| File | Dimensione | Formato | |
|---|---|---|---|
|
2026 NP Biasioli.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
1.57 MB
Formato
Adobe PDF
|
1.57 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



