Noisy data, especially in combination with misalignment and model misspecification can have an adverse effect on phylogeny reconstruction; however, effective methods to identify such data are few. One particularly important class of noisy data is saturated positions. To avoid potential errors related to saturation in phylogenomic analyses, we present an automated procedure involving the step-wise removal of the most variable positions in a given data set coupled with a stopping criterion derived from correlation analyses of pairwise ML distances calculated from the deleted (saturated) and the remaining (conserved) subsets of the alignment. Through a comparison with existing methods, we demonstrate both the effectiveness of our proposed procedure for identifying noisy data and the effect of the removal of such data using a well-publicized case study involving placental mammals. At the least, our procedure will identify data sets requiring greater data exploration, and we recommend its use to investigate the effect on phylogenetic analyses of removing subsets of variable positions exhibiting weak or no correlation to the rest of the alignment. However, we would argue that this procedure, by identifying and removing noisy data, facilitates the construction of more accurate phylogenies by, for example, ameliorating potential long-branch attraction artefacts.

Goremykin, V.; Nikiforova, S.; Bininda Emonds, O.R.P. (2010). Automated removal of noisy data in phylogenomic analyses. JOURNAL OF MOLECULAR EVOLUTION, 71 (5-6): 319-331. doi: 10.1007/s00239-010-9398-z handle: http://hdl.handle.net/10449/19758

Automated removal of noisy data in phylogenomic analyses

Goremykin, Vadim;Nikiforova, Svetlana;
2010-01-01

Abstract

Noisy data, especially in combination with misalignment and model misspecification can have an adverse effect on phylogeny reconstruction; however, effective methods to identify such data are few. One particularly important class of noisy data is saturated positions. To avoid potential errors related to saturation in phylogenomic analyses, we present an automated procedure involving the step-wise removal of the most variable positions in a given data set coupled with a stopping criterion derived from correlation analyses of pairwise ML distances calculated from the deleted (saturated) and the remaining (conserved) subsets of the alignment. Through a comparison with existing methods, we demonstrate both the effectiveness of our proposed procedure for identifying noisy data and the effect of the removal of such data using a well-publicized case study involving placental mammals. At the least, our procedure will identify data sets requiring greater data exploration, and we recommend its use to investigate the effect on phylogenetic analyses of removing subsets of variable positions exhibiting weak or no correlation to the rest of the alignment. However, we would argue that this procedure, by identifying and removing noisy data, facilitates the construction of more accurate phylogenies by, for example, ameliorating potential long-branch attraction artefacts.
Noise reduction
Saturation
Long-branch attraction
Model testing
Noisy data
Placental mammals
Rodentia
2010
Goremykin, V.; Nikiforova, S.; Bininda Emonds, O.R.P. (2010). Automated removal of noisy data in phylogenomic analyses. JOURNAL OF MOLECULAR EVOLUTION, 71 (5-6): 319-331. doi: 10.1007/s00239-010-9398-z handle: http://hdl.handle.net/10449/19758
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/19758
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 64
  • ???jsp.display-item.citation.isi??? 62
social impact