Canephora is a genus of flowering plants in the Rubiaceae family. Among Coffea species, C. canephora has the widest natural distribution which extends west to east from Guinea to Uganda, and north to south from Cameroon to Angola. C. canephora (2n=2x=22) is an allogamous diploid tree consisting of polymorphic populations.
The Coffea canephora genome represents an important stepping-stone for plant gene and genome evolution studies.
|Analysis Name||Whole Genome Assembly and Annotation of Coffea canephora (Genoscope)|
|Materials & Methods||The Coffea canephora reference genome sequence results from collaboration between Genoscope, CIRAD and IRD (UMRs AGAP, DIADE and RPB) funded by ANR. The sequenced genotype (2n=22, 1C=710 Mb) is a doubled-haploid plant (accession DH200-94) produced by IRD from the clone IF200 based on the haploid plants occurring spontaneously in association with polyembryony.|
This version (v.1) of the assembly is 580 Mb spread over 13,345 scaffolds. 25,574 protein-coding loci have been predicted, each with a primary transcript.Publication :
The sequence was completed and analyzed in collaboration with several teams in particular of the International Coffee Genome Sequencing Consortium and was published in: “The Coffee Genome Provides Insight into Convergent Evolution of Caffeine Biosynthesis”. Denoeud et al., 2014 Science.
The 349 anchored scaffolds were joined to generate 11 pseudomolecules that were named according to the linkage group nomenclature. Each scaffold join was denoted with 100 N base pairs. 139 mapped scaffolds have known orientation along the pseudomolecules while the remaining 210 mapped scaffolds were assigned with a random orientation.
12,996 scaffolds (totalling 204 Mb) remain unmapped in the current genome release and were grouped arbitrarily into a pseudomolecule named chr Un (for “unknown”), each scaffold being joined by 100 Ns.
This assembly is currently viewable in GBrowse.
The genome was sequenced using a Whole Genome Shotgun strategy. All data were generated using Next generation sequencers (Roche/454 GSFLX and Illumina GAIIx), except for sequences of BAC ends that were produced by paired-end sequencing of cloned inserts using Sanger technology on ABI3730xl sequencers.
|Number of reads||Number of bases||Coverage||Fragment size (bp)|
single end reads
end long reads
Table 1. Raw sequencing data overview.
454 reads and Sanger BAC ends were assembled using Newbler version MapAsmResearch-04/09/2010-patch-18/17/2010. From the initial 54,415,922 reads, about 86.31% were assembled. We obtained 91,439 contigs that were linked into 13,345 scaffolds. The contig N50 was 14.8 kb, and the scaffold N50 was 1.3Mb.
|Raw assembly||Final Assembly|
|Cumulative size (Mb)||475.6||569.4||471.3||568.6|
|Average size (kb)||5.2||42.6||18.7||42.6|
|N50 size (kb)||14.8||1,261||51.1||1,261|
|N80 size (kb)||4.3||65.2||15.5||65.3|
|Largest size (kb)||193.8||9,035||817.6||9,028|
Table 2. Assembly statistics.
All available sequence-based markers from the consensus genetic linkage map were BLASTaligned against the scaffolds. Sequence-based markers were filtered out and only markers presenting a single hit were retained. More precisely, a hit was taken into account if its HSPs showed a minimal identity per cent of 90%, conformed to a maximal distance of 3000 bp between HSPs, and displayed a cumulated size greater than or superior to 60% of the markersequence length. 1295 markers were unambiguously located on the assembly and used in combination with 1644 RADseq markers to anchor and orient the scaffolds along the C. canephora pseudomolecules.
A total of 349 scaffolds covering approximately 364 megabases (Mb) (64% of the assembled genome sequence) were anchored to the 11 C. canephora chromosomes, among which 139 representing 290 Mb (51% of the assembled genome) were both anchored and oriented. 98% of the 100 largest scaffolds and 96.4% of scaffolds larger than 1Mb were anchored on chromosomes.The overview of the assembly anchoring on the genetic map is reported in Table 3.
|Size (Mb)||No. of genes model||Gene density|
Table 3. Overview of the anchoring of the assembly on the C.canephora linkage groups.
Protein coding genes in the C. canephora genome were automatically annotated using various sources of evidence (cDNAs, RNA-Seq, protein alignments, and ab initio predictions) that were combined into gene models. We obtained 25,574 protein-coding gene models.
|Track Name||Species||Tissue/physiological condition||Platform used||Nb contigs||Nb reads assembled||Single/Paired||Assembly software|
|Canephora Solexa contigs||C.canephora||stem,leaf,flower||Solexa||52,683||172,963.686||Single||Oases|
|Coffea Gmorse models||C.canephora||stem,leaf,flower||Solexa||70,124||172,963.686||Single||G-Mo.R-Se|
|56216 home-made unigenes||C.canephora||mix||mix||56,216||#||Single||2 runs of Cap3 applied on contigs|
|Catura 454 contigs||C.arabica||leaf||454||22,776||493,984||Single||Newbler|
|Arabica 454 contigs||C.arabica||embryo, endosperm||454||24,799||421,307||Single||Newbler|
ESTs / RNASeq reads
|Track Name||Species||Accession||Tissue/physiological condition||Platform used||Nb reads||Read length||Single/Paired|
|Old leaves 1||C.canephora||DH 200-|
|Old leaves 2||C.canephora||DH 200|
|Stem and flower 1||C.canephora||DH 200|
|Stem and flower 2||C.canephora||DH 200|
|Young leaves 1||C.canephora||DH 200|
|Young leaves 2||C.canephora||DH 200|
|Track Name||Species||Nb matches|
|Selected protein matches||Gentianales, Arabidopsis, Potato, Tomato, Vitis||155,283|
|Other Uniprot matches||mix||7,498,085|