Complex matrices such as soil have a range of measurable characteristics, and thus data to describe them can be considered multidimensional. These characteristics can be strongly influenced by factors that introduce confounding effects that hinder analyses. Traditional statistical approaches lack the flexibility and granularity required to adequately evaluate such matrices, particularly those with large dataset of varying data types (i.e. quantitative non-compositional, quantitative compositional). We present a statistical workflow designed to effectively analyse complex, multidimensional systems, even in the presence of confounding variables. The developed methodology involves exploratory analysis to identify the presence of confounding variables, followed by data decomposition (including strategies for both compositional and non-compositional quantitative data) to minimise the influence of these confounding factors such as sampling site/location. These data processing methods then allow for common patterns to be highlighted in the data, including the identification of biomarkers and determination of non-trivial associations between variables. We demonstrate the utility of this statistical workflow by jointly analysing the chemical composition and fungal biodiversity of New Zealand vineyard soils that have been managed with either organic low-input or conventional input approaches. By applying this pipeline, we were able to identify biomarkers that distinguish viticultural soil from both approaches and also unearth links and associations between the chemical and metagenomic profiles. While soil is an example of a system that can require this type of statistical methodology, there are a range of biological and ecological systems that are challenging to analyse due to the complex interplay of global and local effects. Utilising our developed pipeline will greatly enhance the way that these systems can be studied and the quality and impact of insight gained from their analysis

Pilkington, L.I.; Kerner, W.; Bertoldi, D.; Larcher, R.; Lee, S.A.; Goddard, M.R.; Albanese, D.; Franceschi, P.; Fedrizzi, B. (2024-07-01). Integration and holistic analysis of multiple multidimensional soil data sets. TALANTA, 274: 125954. doi: 10.1016/j.talanta.2024.125954 handle: https://hdl.handle.net/10449/85355

Integration and holistic analysis of multiple multidimensional soil data sets

Bertoldi, Daniela;Larcher, Roberto;Albanese, Davide;Franceschi, Pietro
;
2024-07-01

Abstract

Complex matrices such as soil have a range of measurable characteristics, and thus data to describe them can be considered multidimensional. These characteristics can be strongly influenced by factors that introduce confounding effects that hinder analyses. Traditional statistical approaches lack the flexibility and granularity required to adequately evaluate such matrices, particularly those with large dataset of varying data types (i.e. quantitative non-compositional, quantitative compositional). We present a statistical workflow designed to effectively analyse complex, multidimensional systems, even in the presence of confounding variables. The developed methodology involves exploratory analysis to identify the presence of confounding variables, followed by data decomposition (including strategies for both compositional and non-compositional quantitative data) to minimise the influence of these confounding factors such as sampling site/location. These data processing methods then allow for common patterns to be highlighted in the data, including the identification of biomarkers and determination of non-trivial associations between variables. We demonstrate the utility of this statistical workflow by jointly analysing the chemical composition and fungal biodiversity of New Zealand vineyard soils that have been managed with either organic low-input or conventional input approaches. By applying this pipeline, we were able to identify biomarkers that distinguish viticultural soil from both approaches and also unearth links and associations between the chemical and metagenomic profiles. While soil is an example of a system that can require this type of statistical methodology, there are a range of biological and ecological systems that are challenging to analyse due to the complex interplay of global and local effects. Utilising our developed pipeline will greatly enhance the way that these systems can be studied and the quality and impact of insight gained from their analysis
Compositional data
Confounding variables
Soil analysis
Statistical workflow
Variable association
Variable transformation
Settore CHIM/01 - CHIMICA ANALITICA
1-lug-2024
Pilkington, L.I.; Kerner, W.; Bertoldi, D.; Larcher, R.; Lee, S.A.; Goddard, M.R.; Albanese, D.; Franceschi, P.; Fedrizzi, B. (2024-07-01). Integration and holistic analysis of multiple multidimensional soil data sets. TALANTA, 274: 125954. doi: 10.1016/j.talanta.2024.125954 handle: https://hdl.handle.net/10449/85355
File in questo prodotto:
File Dimensione Formato  
2024 T Franceschi.pdf

accesso aperto

Licenza: Creative commons
Dimensione 5.94 MB
Formato Adobe PDF
5.94 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/85355
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact