CINECA IRIS Institutional Research Information System

Background Second generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome. Results Following error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320× the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously. Conclusions This is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone

Ferrarini, M.; Moretto, M.; Ward, J.A.; Surbanovski, N.; Stevanović, V.; Giongo, L.; Viola, R.; Cavalieri, D.; Velasco, R.; Cestaro, A.; Sargent, D.J. (2013). An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC GENOMICS, 14 (670). doi: 10.1186/1471-2164-14-670 handle: http://hdl.handle.net/10449/23197

An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome

Ferrarini, Marco;Moretto, Marco;Ward, J. A.;Surbanovski, Nada;Stevanović, V.;Giongo, Lara;Viola, Roberto;Cavalieri, Duccio;Velasco, Riccardo;Cestaro, Alessandro;Sargent, Daniel James

2013-01-01

Abstract

Background Second generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome. Results Following error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320× the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously. Conclusions This is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone

Scheda breve

Scheda completa

Scheda completa (DC)

	Keywords
	
			Third-generation sequencing
NGen
Genomics
Assembly
Annotation
Oxford nanopore
Pacific BioSciences
Roche 454
		
	MIUR subjects
	
			Settore BIO/11 - BIOLOGIA MOLECOLARE
		
	Date of issue
	
			2013
		
	Citazione
	
			Ferrarini, M.; Moretto, M.; Ward, J.A.; Surbanovski, N.; Stevanović, V.; Giongo, L.; Viola, R.; Cavalieri, D.; Velasco, R.; Cestaro, A.; Sargent, D.J. (2013). An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC GENOMICS, 14 (670). doi: 10.1186/1471-2164-14-670 handle: http://hdl.handle.net/10449/23197
		
	Appare nelle tipologie:
	
			1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2013 BMC Gen Ferrarini et al.pdf accesso aperto Licenza: Creative commons Dimensione 1.78 MB Formato Adobe PDF Visualizza/Apri	1.78 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10449/23197

Citazioni

73

124

123

social impact