Mapping tree species plays a significant role in forestry and ecology applications, particularly in heterogeneous forest environments with complex terrain and diverse ecological conditions. The availability of Sentinel-2 (S2) satellite data, with its high spatial (10 m), spectral (13 bands), and temporal resolution (revisit time of 5 days), has revolutionised land-cover mapping. These dense image time series offer an effective framework for automatically classifying tree species of large forested areas because they can capture seasonal phenological changes, which are critical for distinguishing species with similar spectral signatures. However, this process relies on robust and accurate training data, which are often unavailable for large forest areas, where field surveys are impractical due to the time and manual labour required. Previous studies have successfully mapped tree species using methods that focused on the classification of few species, often only relying on labeled pixels randomly selected from either very homogeneous areas or identified on the basis of the average spectral reflectance value. While effective, these approaches are less suitable for complex landscapes with high species diversity and mixed forests, where training data are limited, especially for rare species. To address this limitation, we adapt a method previously used in land-cover classification and test its effectiveness for tree species mapping. This method automatically filters and extracts reliable pixel-level training samples from weak thematic data of relatively homogenous forest areas. Our approach leverages forest inventory data, which represent tree species distribution as percentages within forestry units. A forestry unit refers to an area, often delineated by natural or administrative boundaries, within which the tree species composition and forest characteristics are recorded. This forest inventory data is then integrated with S2 satellite imagery to generate a high-quality training dataset for mapping 18 tree species classes in the province of Trento, Italy. To create the classes, minor species frequently co-occurring with overlapping canopies were merged into broader classes, while major species were kept as separate classes. Purity thresholds were then defined for each class to identify almost “pure” forestry units in the forest inventory data, where spectral signatures predominantly represent a specific tree species. From these selected forestry units, S2 data from the summer months of 2019 were sampled. Buffers were created along spatial boundaries to exclude edge regions and minimize their effects during sampling. To improve representativeness while preserving within-class variability, the sampled data underwent unsupervised filtering. This was applied by clustering sampled pixels within each forestry unit using k-means clustering based on the S2 spectral reflectances and keeping only the points from the dominant cluster. Subsequently, a consistency analysis was performed by removing forest units with spectral characteristics far from the distribution of the related tree species class. Finally, the resulting dataset was downsampled by using elevation as a stratification layer to obtain a balanced distribution of classes. To evaluate the effectiveness of the filtering methods, Linear Discriminant Analysis (LDA) was conducted. The results revealed greater centroid distances among classes after filtering, indicating improved class separability. The refined dataset was then employed to train a Support Vector Machine (SVM) model, which has been previously proved successfully in similar studies, to map the distribution of tree species classes in the province of Trento at 10 m resolution. The use of the proposed filtering methods improved classification performance, increasing the average cross-validation accuracy from 77.38% to 84.11% and the Kappa statistic from 0.76 to 0.85. Test accuracies for the ten most abundant classes ranged from 75% to 93%. Preliminary validation using an independent set of individually sampled trees yielded accuracies of up to 80% for the most abundant species, though rare species exhibited lower accuracy due to limited training data. The preliminary results point out the potential of the proposed methodology to address one of the most pressing challenges in large-scale forest mapping: the scarcity of high-quality training data. By leveraging freely available S2 imagery and widely accessible forest inventory data, this approach provides a replicable framework for producing high-resolution, ecologically meaningful tree species maps. The methodology is scalable and adaptable to diverse forest environments, thus it represents a valuable tool for supporting automated forest management and ecological conservation

Wicklein, J.A.; Andreatta, D.; Bruzzone, L.; Dalponte, M.; Marinelli, D. (2025). Tree species classification using time series of sentinel-2 images and weak labelled data. In: Living Planet Symposium 2025: From Observation to Climate Action and Sustainability for Earth, Vienna, Austria, 23-27 June 2025. handle: https://hdl.handle.net/10449/91176

Tree species classification using time series of sentinel-2 images and weak labelled data

Wicklein, J. A.
Primo
;
Andreatta, D.;Dalponte, M.;
2025-01-01

Abstract

Mapping tree species plays a significant role in forestry and ecology applications, particularly in heterogeneous forest environments with complex terrain and diverse ecological conditions. The availability of Sentinel-2 (S2) satellite data, with its high spatial (10 m), spectral (13 bands), and temporal resolution (revisit time of 5 days), has revolutionised land-cover mapping. These dense image time series offer an effective framework for automatically classifying tree species of large forested areas because they can capture seasonal phenological changes, which are critical for distinguishing species with similar spectral signatures. However, this process relies on robust and accurate training data, which are often unavailable for large forest areas, where field surveys are impractical due to the time and manual labour required. Previous studies have successfully mapped tree species using methods that focused on the classification of few species, often only relying on labeled pixels randomly selected from either very homogeneous areas or identified on the basis of the average spectral reflectance value. While effective, these approaches are less suitable for complex landscapes with high species diversity and mixed forests, where training data are limited, especially for rare species. To address this limitation, we adapt a method previously used in land-cover classification and test its effectiveness for tree species mapping. This method automatically filters and extracts reliable pixel-level training samples from weak thematic data of relatively homogenous forest areas. Our approach leverages forest inventory data, which represent tree species distribution as percentages within forestry units. A forestry unit refers to an area, often delineated by natural or administrative boundaries, within which the tree species composition and forest characteristics are recorded. This forest inventory data is then integrated with S2 satellite imagery to generate a high-quality training dataset for mapping 18 tree species classes in the province of Trento, Italy. To create the classes, minor species frequently co-occurring with overlapping canopies were merged into broader classes, while major species were kept as separate classes. Purity thresholds were then defined for each class to identify almost “pure” forestry units in the forest inventory data, where spectral signatures predominantly represent a specific tree species. From these selected forestry units, S2 data from the summer months of 2019 were sampled. Buffers were created along spatial boundaries to exclude edge regions and minimize their effects during sampling. To improve representativeness while preserving within-class variability, the sampled data underwent unsupervised filtering. This was applied by clustering sampled pixels within each forestry unit using k-means clustering based on the S2 spectral reflectances and keeping only the points from the dominant cluster. Subsequently, a consistency analysis was performed by removing forest units with spectral characteristics far from the distribution of the related tree species class. Finally, the resulting dataset was downsampled by using elevation as a stratification layer to obtain a balanced distribution of classes. To evaluate the effectiveness of the filtering methods, Linear Discriminant Analysis (LDA) was conducted. The results revealed greater centroid distances among classes after filtering, indicating improved class separability. The refined dataset was then employed to train a Support Vector Machine (SVM) model, which has been previously proved successfully in similar studies, to map the distribution of tree species classes in the province of Trento at 10 m resolution. The use of the proposed filtering methods improved classification performance, increasing the average cross-validation accuracy from 77.38% to 84.11% and the Kappa statistic from 0.76 to 0.85. Test accuracies for the ten most abundant classes ranged from 75% to 93%. Preliminary validation using an independent set of individually sampled trees yielded accuracies of up to 80% for the most abundant species, though rare species exhibited lower accuracy due to limited training data. The preliminary results point out the potential of the proposed methodology to address one of the most pressing challenges in large-scale forest mapping: the scarcity of high-quality training data. By leveraging freely available S2 imagery and widely accessible forest inventory data, this approach provides a replicable framework for producing high-resolution, ecologically meaningful tree species maps. The methodology is scalable and adaptable to diverse forest environments, thus it represents a valuable tool for supporting automated forest management and ecological conservation
2025
Wicklein, J.A.; Andreatta, D.; Bruzzone, L.; Dalponte, M.; Marinelli, D. (2025). Tree species classification using time series of sentinel-2 images and weak labelled data. In: Living Planet Symposium 2025: From Observation to Climate Action and Sustainability for Earth, Vienna, Austria, 23-27 June 2025. handle: https://hdl.handle.net/10449/91176
File in questo prodotto:
File Dimensione Formato  
2025 LPS Andreatta.pdf

solo utenti autorizzati

Tipologia: Altro materiale allegato (Other attachments)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.88 MB
Formato Adobe PDF
4.88 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
2025 Andreatta abs..pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.87 MB
Formato Adobe PDF
4.87 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/91176
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact