CINECA IRIS Institutional Research Information System

Volatile organic compounds (VOCs) are key markers in applications ranging from food quality assessment to medical diagnostics that can be profiled, for example, by gas chromatography–mass spectrometry (GC-MS) or by direct injection mass spectrometry (e.g. proton transfer reaction mass spectrometry). The common practice in both cases is to construct a tabular dataset from the raw measurements by performing peak extraction across samples and use statistical or machine learning methods to analyze it. However, modeling VOC profiles is particularly challenging due to high dimensionality, noise, and small sample sizes. In this study, we evaluate the Tabular Prior-data Fitted Network (TabPFN), a foundation model recently introduced for tabular data, across diverse VOC datasets. Without requiring task-specific training, TabPFN achieves state-of-the-art performance in both classification and regression tasks, outperforming classical machine learning methods for most datasets. We further explore new strategies to enhance TabPFN’s performance, including ensembling and fine-tuning, finding that a plain ensemble seems to be the best option in this setting. Our results demonstrate that TabPFN is a highly effective modeling tool for VOC profiles obtained with different analytical approaches. It offers robust predictions even in the data-scarce, high-variability scenarios typical of real-world workflows

Granitto, P.M.; Betta, E.; Khomenko, I.; Pedrotti, M.; Romano, A.; Biasioli, F. (2026-12-02). On the use of TabPFN on mass spectrometry analysis of volatile organic compounds. SCIENTIFIC REPORTS, 16: 164. doi: 10.1038/s41598-025-29128-6 handle: https://hdl.handle.net/10449/93675

On the use of TabPFN on mass spectrometry analysis of volatile organic compounds

Granitto, P. M.^Primo;Betta, E.;Khomenko, I.;Pedrotti, M.;Romano, A.;Biasioli, F.^Ultimo

2026-12-02

Abstract

Volatile organic compounds (VOCs) are key markers in applications ranging from food quality assessment to medical diagnostics that can be profiled, for example, by gas chromatography–mass spectrometry (GC-MS) or by direct injection mass spectrometry (e.g. proton transfer reaction mass spectrometry). The common practice in both cases is to construct a tabular dataset from the raw measurements by performing peak extraction across samples and use statistical or machine learning methods to analyze it. However, modeling VOC profiles is particularly challenging due to high dimensionality, noise, and small sample sizes. In this study, we evaluate the Tabular Prior-data Fitted Network (TabPFN), a foundation model recently introduced for tabular data, across diverse VOC datasets. Without requiring task-specific training, TabPFN achieves state-of-the-art performance in both classification and regression tasks, outperforming classical machine learning methods for most datasets. We further explore new strategies to enhance TabPFN’s performance, including ensembling and fine-tuning, finding that a plain ensemble seems to be the best option in this setting. Our results demonstrate that TabPFN is a highly effective modeling tool for VOC profiles obtained with different analytical approaches. It offers robust predictions even in the data-scarce, high-variability scenarios typical of real-world workflows

Scheda breve

Scheda completa

Scheda completa (DC)

	Keywords
	
				PTR-ToF-MS
TabPFN
Volatile Organic Compounds
			
	MIUR subjects (validi dal 09/05/2024)
	
				Settore CHEM-01/A - Chimica analitica
			
	Date of issue
	
				2-dic-2026
			
	Citazione
	
				Granitto, P.M.; Betta, E.; Khomenko, I.; Pedrotti, M.; Romano, A.; Biasioli, F. (2026-12-02). On the use of TabPFN on mass spectrometry analysis of volatile organic compounds. SCIENTIFIC REPORTS, 16: 164. doi: 10.1038/s41598-025-29128-6 handle: https://hdl.handle.net/10449/93675
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2026 NP Biasioli.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 1.57 MB Formato Adobe PDF Visualizza/Apri	1.57 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/93675

Citazioni

1

2

2

ND

social impact