Volatile organic compounds (VOCs) are key markers in applications ranging from food quality assessment to medical diagnostics that can be profiled, for example, by gas chromatography–mass spectrometry (GC-MS) or by direct injection mass spectrometry (e.g. proton transfer reaction mass spectrometry). The common practice in both cases is to construct a tabular dataset from the raw measurements by performing peak extraction across samples and use statistical or machine learning methods to analyze it. However, modeling VOC profiles is particularly challenging due to high dimensionality, noise, and small sample sizes. In this study, we evaluate the Tabular Prior-data Fitted Network (TabPFN), a foundation model recently introduced for tabular data, across diverse VOC datasets. Without requiring task-specific training, TabPFN achieves state-of-the-art performance in both classification and regression tasks, outperforming classical machine learning methods for most datasets. We further explore new strategies to enhance TabPFN’s performance, including ensembling and fine-tuning, finding that a plain ensemble seems to be the best option in this setting. Our results demonstrate that TabPFN is a highly effective modeling tool for VOC profiles obtained with different analytical approaches. It offers robust predictions even in the data-scarce, high-variability scenarios typical of real-world workflows

Granitto, P.M.; Betta, E.; Khomenko, I.; Pedrotti, M.; Romano, A.; Biasioli, F. (2026-12-02). On the use of TabPFN on mass spectrometry analysis of volatile organic compounds. SCIENTIFIC REPORTS, 16: 164. doi: 10.1038/s41598-025-29128-6 handle: https://hdl.handle.net/10449/93675

On the use of TabPFN on mass spectrometry analysis of volatile organic compounds

Granitto, P. M.
Primo
;
Betta, E.;Khomenko, I.;Pedrotti, M.;Romano, A.;Biasioli, F.
Ultimo
2026-12-02

Abstract

Volatile organic compounds (VOCs) are key markers in applications ranging from food quality assessment to medical diagnostics that can be profiled, for example, by gas chromatography–mass spectrometry (GC-MS) or by direct injection mass spectrometry (e.g. proton transfer reaction mass spectrometry). The common practice in both cases is to construct a tabular dataset from the raw measurements by performing peak extraction across samples and use statistical or machine learning methods to analyze it. However, modeling VOC profiles is particularly challenging due to high dimensionality, noise, and small sample sizes. In this study, we evaluate the Tabular Prior-data Fitted Network (TabPFN), a foundation model recently introduced for tabular data, across diverse VOC datasets. Without requiring task-specific training, TabPFN achieves state-of-the-art performance in both classification and regression tasks, outperforming classical machine learning methods for most datasets. We further explore new strategies to enhance TabPFN’s performance, including ensembling and fine-tuning, finding that a plain ensemble seems to be the best option in this setting. Our results demonstrate that TabPFN is a highly effective modeling tool for VOC profiles obtained with different analytical approaches. It offers robust predictions even in the data-scarce, high-variability scenarios typical of real-world workflows
PTR-ToF-MS
TabPFN
Volatile Organic Compounds
Settore CHEM-01/A - Chimica analitica
2-dic-2026
Granitto, P.M.; Betta, E.; Khomenko, I.; Pedrotti, M.; Romano, A.; Biasioli, F. (2026-12-02). On the use of TabPFN on mass spectrometry analysis of volatile organic compounds. SCIENTIFIC REPORTS, 16: 164. doi: 10.1038/s41598-025-29128-6 handle: https://hdl.handle.net/10449/93675
File in questo prodotto:
File Dimensione Formato  
2026 NP Biasioli.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.57 MB
Formato Adobe PDF
1.57 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/93675
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact