Genomic distribution of tRNAs in eukaryotes

Genomic distribution of tRNAs in eukaryotes

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The title says it all. I'm doing a literature search trying to see what is widely known and/or well established. I've found a couple of mentions that tRNAs are dispersed throughout the entire nuclear genome. Does this mean they are isolated from each other? Or do they occur in clusters? Is this true across eukaryotes? Or is any of this even well studied?

In Drosophila, at least, some are clustered (e.g. these), but there are many tRNA genes in eukaryotes, so overall they are dispersed. You might find the Genomic tRNA Database useful, and the paper describing it.

Genomic organization of eukaryotic tRNAs

Surprisingly little is known about the organization and distribution of tRNA genes and tRNA-related sequences on a genome-wide scale. While tRNA gene complements are usually reported in passing as part of genome annotation efforts, and peculiar features such as the tandem arrangements of tRNA gene in Entamoeba histolytica have been described in some detail, systematic comparative studies are rare and mostly restricted to bacteria. We therefore set out to survey the genomic arrangement of tRNA genes and pseudogenes in a wide range of eukaryotes to identify common patterns and taxon-specific peculiarities.


In line with previous reports, we find that tRNA complements evolve rapidly and tRNA gene and pseudogene locations are subject to rapid turnover. At phylum level, the distributions of the number of tRNA genes and pseudogenes numbers are very broad, with standard deviations on the order of the mean. Even among closely related species we observe dramatic changes in local organization. For instance, 65% and 87% of the tRNA genes and pseudogenes are located in genomic clusters in zebrafish and stickleback, resp., while such arrangements are relatively rare in the other three sequenced teleost fish genomes. Among basal metazoa, Trichoplax adhaerens has hardly any duplicated tRNA gene, while the sea anemone Nematostella vectensis boasts more than 17000 tRNA genes and pseudogenes. Dramatic variations are observed even within the eutherian mammals. Higher primates, for instance, have 616 ± 120 tRNA genes and pseudogenes of which 17% to 36% are arranged in clusters, while the genome of the bushbaby Otolemur garnetti has 45225 tRNA genes and pseudogenes of which only 5.6% appear in clusters. In contrast, the distribution is surprisingly uniform across plant genomes. Consistent with this variability, syntenic conservation of tRNA genes and pseudogenes is also poor in general, with turn-over rates comparable to those of unconstrained sequence elements. Despite this large variation in abundance in Eukarya we observe a significant correlation between the number of tRNA genes, tRNA pseudogenes, and genome size.


The genomic organization of tRNA genes and pseudogenes shows complex lineage-specific patterns characterized by an extensive variability that is in striking contrast to the extreme levels of sequence-conservation of the tRNAs themselves. The comprehensive analysis of the genomic organization of tRNA genes and pseudogenes in Eukarya provides a basis for further studies into the interplay of tRNA gene arrangements and genome organization in general.

The diversity of small non-coding RNAs in the diatom Phaeodactylum tricornutum

Background: Marine diatoms constitute a major component of eukaryotic phytoplankton and stand at the crossroads of several evolutionary lineages. These microalgae possess peculiar genomic features and novel combinations of genes acquired from bacterial, animal and plant ancestors. Furthermore, they display both DNA methylation and gene silencing activities. Yet, the biogenesis and regulatory function of small RNAs (sRNAs) remain ill defined in diatoms.

Results: Here we report the first comprehensive characterization of the sRNA landscape and its correlation with genomic and epigenomic information in Phaeodactylum tricornutum. The majority of sRNAs is 25 to 30 nt-long and maps to repetitive and silenced Transposable Elements marked by DNA methylation. A subset of this population also targets DNA methylated protein-coding genes, suggesting that gene body methylation might be sRNA-driven in diatoms. Remarkably, 25-30 nt sRNAs display a well-defined and unprecedented 180 nt-long periodic distribution at several highly methylated regions that awaits characterization. While canonical miRNAs are not detectable, other 21-25 nt sRNAs of unknown origin are highly expressed. Besides, non-coding RNAs with well-described function, namely tRNAs and U2 snRNA, constitute a major source of 21-25 nt sRNAs and likely play important roles under stressful environmental conditions.

Conclusions: P. tricornutum has evolved diversified sRNA pathways, likely implicated in the regulation of largely still uncharacterized genetic and epigenetic processes. These results uncover an unexpected complexity of diatom sRNA population and previously unappreciated features, providing new insights into the diversification of sRNA-based processes in eukaryotes.


TRNA genes.

A survey of the T. brucei sequence project has resulted in the identification of 50 trypanosomal tRNA genes representing 40 different isoacceptor species (one tRNA with undetermined specificity was also detected [2] but will not further be discussed here) (Table ​ (Table2). 2 ). Several of these genes have been characterized before by different research groups (6, 7, 15, 25, 29-31, 45). It is clear that the trypanosomal genome is not yet completely represented in the sequence database and therefore some tRNA genes may have escaped detection. However, the fraction of undetected tRNA genes is expected to be small for the following reasons. If an A at the first position of the anticodon is modified to inosine, which is known to decode U, C, and A (23), the 40 isoacceptors are sufficient to decode 94% of all 61 sense codons (Table ​ (Table2). 2 ). The four codons which cannot be read are UGG (Trp), AUU and AUC (Ile), and UAA (Leu). Based on this observation, the 50 tRNA genes are expected to represent about 94% of the total genomic tRNA gene complement. An independent estimate of the size of the trypanosomal tRNA gene complement was obtained based on the information that each tRNA gene was detected on average 2.46 times. Fifteen genes were found once, and the remaining 35 were found two to six times each. From a statistical analysis of the obtained histogram, we estimate that an additional 3 to 13 genes (8 to 33%, P = 0.05) might be present in the still-missing part of the genome. This analysis assumes that the entire genome was sequenced at random and that therefore the frequencies with which each gene was found should follow a Poisson distribution. Sequencing, however, has also included a chromosome-by-chromosome strategy. Thus, the sequence coverage is higher for some chromosomes than expected for an ideally random approach. After taking these considerations into account, we conclude that more than 80% of all trypanosomal tRNA genes were detected in the present work. The estimated total number of trypanosomal tRNA genes is therefore ca. 62 (representing at least 43 different isoacceptors). Whereas the number of isoacceptors is in the range expected for a genome with a GC content of 45 to 50% (20), the number of tRNA genes is far smaller than for any other eukaryotic genome characterized so far with the exception of the microsporidian parasite Encephalitozoon cuniculi, which has 44 tRNA genes (22). Preliminary analysis of the Leishmania genome (data not shown) indicates that in this organism the number of tRNA genes is also very low, suggesting that this might be a general feature of trypanosomatids or maybe even parasites in general.

Eighty percent of the detected tRNA genes are found in clusters of two to five genes separated by very short intergenic regions of 79 nt on average (Fig. ​ (Fig.1). 1 ). Within these clusters, the tRNA genes appear to be randomly arranged: head-to-head and tail-to-tail arrangements occur as frequently as tandem repeats. The remaining 20% are found dispersed throughout the whole genome. Looking at the predicted tRNA-coding regions, it appears that trypanosomal tRNAs fold into secondary structures that are essentially similar to bona fide eukaryotic tRNAs. The tRNA Tyr is the only one that contains an intron (43). Two distinct tRNAs Met were found, one of which has an AU base pair at the end of the acceptor stem and therefore corresponds to the eukaryotic tRNA Met-i (32) the other shows all features of a tRNA Met-e . Their predicted secondary structures are shown in Fig. ​ Fig.2 2 .

Genomic organization of T. brucei tRNA genes. The 12 clusters which were identified in the genome of T. brucei and which contain in total 40 tRNA genes are shown and drawn to scale (some of these clusters have been analyzed before [6, 7, 15, 25, 29-31, 45]). The directions of transcription are indicated by arrows, and the predicted anticodons are shown in parentheses. The question mark indicates a tRNA of unknown identity. The numbers indicate the lengths of the known 5′- and 3′-flanking and intergenic sequences. Double slashes denote the ends of each contig. Broken lines represent large intergenic regions. Genes for structural RNAs (U2, U5, U6, 7SL) or mRNAs which are found adjacent to tRNA genes are also shown. The tRNA genes which are found dispersed elsewhere in the genome are not shown. tRNA genes whose gene products have been analyzed in this study are shown in bold.

Expression of tRNAs.

In order to obtain an overview of the steady-state expression level of tRNAs, we have selected 15 different tRNA species specific for 12 amino acids (predicted secondary structures are shown in Fig. ​ Fig.2) 2 ) and determined their absolute abundance by quantitative Northern analysis. Different amounts of in vitro transcripts were used as standards and analyzed together with known quantities of total cellular RNA by specific oligonucleotide hybridization. The hybridization signals were quantified on a phosphorimager and allowed us to calculate the number of molecules per cell (see Materials and Methods). A potential drawback of the quantitative Northern analysis is the possible presence of nucleotide modifications in regions of the tRNA which are complementary to the oligonucleotide probes. Modifications will be absent from the in vitro-synthesized tRNAs. It is therefore possible that in such cases the hybridization to the cellular tRNA is less efficient than to the in vitro-produced marker tRNAs, which would result in an underestimation of the tRNA abundance.

Three representative Northern blots with the corresponding quantifications are shown on the left panels of Fig. ​ Fig.3. 3 . A summary of the results on tRNA abundance per cell is presented in Table ​ Table3 3 (first column). It shows that on average 64,000 molecules of each of the selected tRNA species, ranging from 1,850 molecules for the tRNA Met-i to 220,000 for the tRNAs Leu (CAA), are found in a cell.

Quantitative Northern analysis. Specific oligonucleotide hybridization was used to detect tRNA Met-e (CAU) (A), tRNA Lys (CUU, UUU) (B), and tRNA Met-i (CAU) (C). (Left panels) The abundance of tRNAs in total cellular RNA (TOT) was determined by comparison with known quantities of the corresponding in vitro-transcribed tRNA (in vitro trans.). The graphs show the quantification of the blots shown on the right using a phosphorimager. Signal intensities are indicated in arbitrary units (a.u.). (Right panels) Mitochondrial localization was determined by hybridization of the corresponding specific oligonucleotides to known quantities of total and mitochondrial (MIT) RNAs. Lower panels show hybridizations using a probe specific for mitochondrion-encoded rRNA (12S rRNA).

Mitochondrial import.

The same 15 tRNAs whose intracellular quantities have been determined were analyzed for mitochondrial import. Northern blots containing known amounts of total cellular and isolated mitochondrial RNAs were hybridized with mitochondrion-specific probes (directed against a guide RNA and 12S rRNA). The data (not shown) indicated that ca. 2% of total RNA is mitochondrial. The cytosolic contamination of the mitochondrial RNA was shown to be ca. 0.33 to 0.40% as determined by hybridization to 7SL RNA (not shown). This RNA is a component of the signal recognition particle and expected to be exclusively cytosolic, although previous studies have established that this marker overestimates the extent of cytosolic contamination (18, 37). Determining the intracellular distribution of tRNAs is expected to be less error prone than measuring total tRNA abundance, since modified nucleotides in the region recognized by the probe will interfere with the interpretation of the results only if they are compartment specific. Only very few mitochondrion-specific nucleotide modifications are known for T. brucei, and they appear to occur at the same relative positions in most tRNA molecules (42).

The right panels of Fig. ​ Fig.3 3 show Northern blots representing the intracellular distribution of three tRNAs, exhibiting high (tRNA Met-e ) and intermediate (tRNA Lys ) levels of mitochondrial localization as well as the only tRNA with an exclusive cytosolic localization (tRNA Met-i ). The number of imported molecules per cell for each of the selected tRNAs is summarized in Table ​ Table3 3 (second column) and ranges from 370 to 7,700. Abundance and intracellular distribution of all 15 tRNAs are summarized on the graph in Fig. ​ Fig.4, 4 , in which the percentage of each tRNA recovered in mitochondria is plotted against the total number of the molecules per cell. The following conclusions can be drawn from this graph.

tRNA abundance in total cell and mitochondria. This is a graphical representation of the results shown in Table ​ Table3. 3 . The numbers of tRNA molecules per cell were plotted against the percentage of those found in mitochondria. Identities of the tRNAs and their anticodons are indicated. No correlation between expression level and extent of mitochondrial localization was observed (R = 𢄠.29, P = 0.310).

(i) Only 0.2% of the tRNA Met-i is found in mitochondria, a proportion which is less than for cytosolic markers and therefore is most likely the result of cytosolic contamination. The tRNA Met-i therefore represents the first cytosol-specific tRNA to be characterized for T. brucei. The existence of cytosolic tRNAs Met in both T. brucei and Leishmania tarentolae has been shown before, but their identities have not been determined (14, 44). The cytosolic localization of the tRNA Met-i is in agreement with its eukaryotic initiator function, since this tRNA is not expected to be functional in the bacterial-type translation system of the mitochondria (32). In contrast, the tRNA Met-e , which is homologous to the tRNA Met-i (Fig. ​ (Fig.2), 2 ), is efficiently imported into mitochondria.

(ii) If we assume that the selected 15 tRNAs are representative of the whole population, the complements of mitochondrial and cytosolic tRNAs might be identical. With the exception of the tRNA Met-i , all tRNAs are to some extent imported into mitochondria. However, no tRNA showing an exclusive mitochondrial localization has been detected. This is reminiscent of all other organisms in which tRNA import has been studied and suggests that nucleus-encoded tRNAs which are specifically localized to mitochondria do not exist (41).

(iii) The extent of mitochondrial localization is distinct for different isoacceptors and ranges from 1 to 7.5%. There is no apparent correlation between the overall concentration of a given tRNA and the extent of its mitochondrial localization (R = 𢄠.29, P = 0.310).

Northern analysis measures only steady-state levels of tRNAs in the different compartments under investigation. As there is no tRNA synthesis in the mitochondria of T. brucei, tRNA abundance is expected to be determined solely by the import efficiency and the rate of tRNA degradation inside mitochondria. In order to exclude that differential degradation of distinct tRNA species significantly interferes with our attempts to determine the import levels from steady-state quantities of tRNAs, we measured the degradation of selected tRNAs by incubation of isolated mitochondria at 27ଌ. Time course experiments show an approximately 50% loss of full-length tRNAs in a 4-h incubation period (Fig. ​ (Fig.5). 5 ). The observed loss can be attributed to two processes: (i) RNase protection assays (18) using mitochondrially encoded 12S rRNA as a marker (not shown) indicate that 33% (± 4.6% [standard deviation]) of mitochondria lyse during the incubation, resulting in a complete degradation of all released RNA (ii) the remaining ca. 17% reflect intramitochondrial tRNA degradation. Most importantly in the context of this work, however, is the fact that no significant differences in the kinetics of the degradation of the different tRNA species were observed. It is therefore concluded that the steady-state levels of imported tRNAs indeed correlate with their import efficiencies.

Intramitochondrial degradation of tRNAs. Isolated mitochondria were incubated at 27ଌ, and the degradation of the indicated tRNAs was determined by Northern analysis using the same oligonucleotides that were used to determine tRNA abundance. Mean values of two independent experiments are shown. For time points where three or more independent values were available, the standard deviation is indicated.

In vivo import substrate.

All results discussed so far were obtained from wild-type cells. However, quantitative Northern analysis can also be used in transgenic cells, in which tRNA genes and their genomic contexts can be manipulated. We have used this possibility to address the question of whether mature or 5′-extended tRNAs are the in vivo import substrates in T. brucei. The best-studied import substrate is the tRNA Leu (CAA) encoded on the tRNA Ser (CGA)/tRNA Leu (CAA) cluster (Fig. ​ (Fig.1). 1 ). It has previously been shown that this region is transcribed as a dicistronic precursor (25). Furthermore, in vitro experiments have established that only the tRNA Leu (CAA) precursor but not its mature derivative was imported into mitochondria (48). In order to confirm these results, we performed the corresponding in vivo experiments using the same tRNA substrate. A tag corresponding to three nucleotide substitutions in the variable loop was introduced into the tRNA Leu (CAA) gene (Fig. ​ (Fig.2). 2 ). The tagged gene containing either 216, 59, 10, or 0 nucleotides of its original 5′-flanking sequence was cloned into an expression plasmid and stably integrated into a ribosomal DNA locus of T. brucei (3) (Fig. ​ (Fig.6A). 6A ). Finally, by using the same methods as for wild-type cells (Fig. ​ (Fig.4), 4 ), expression of the tagged tRNA Leu (CAA) was quantified in all four cell lines. In order to analyze import, mitochondria of the transgenic cell lines were isolated by digitonin extractions followed by RNase digestions (see Materials and Methods). This procedure has the advantage that it requires only small quantities of cells. It yields crude mitochondrial preparations which are essentially free of cytosolic RNAs. Mitochondrial fractions obtained by digitonin extractions can directly be compared to the ones isolated by the hypotonic isolation procedure as evidenced by the fact that very similar import efficiencies for tRNA Met-i , tRNA Met-e , and tRNA Leu (CAA) were obtained irrespectively of which of the two procedures was used (Table ​ (Table3). 3 ). Figure ​ Figure6B 6B shows that in all four transgenic cell lines the tagged tRNA Leu (CAA) is imported into mitochondria with an efficiency of 2.9 to 3.0%, which is essentially identical to the 3.2 to 3.5% observed for the wild-type tRNA Leu (CAA). The cytosol-specific tRNA Met-i , as expected, is found only in the cytosolic fraction. Thus, these results show that unlike what can be predicted from in vitro experiments, import of the tagged tRNA Leu (CAA) is independent of its endogenous 5′-flanking region. Even complete removal of all of the natural 5′-flanking sequence results in an import efficiency identical to that of the wild-type molecule. In all four constructs the tagged tRNA is expressed at a lower level than that of the wild-type tRNA. tRNA Leu (CAA) (Fig. ​ (Fig.6C 6C and Table ​ Table3). 3 ). One reason for this is probably the presence of the tag in the variable loop. Furthermore, expression is strongly influenced by the 5′ context of the tagged tRNA gene in that longer 5′-flanking regions result in a higher expression of the corresponding gene. Even though expression of the tagged tRNA Leu (CAA) in the different cell lines varies more than 10-fold, the same extent of import is observed. These results therefore allow us to extend the conclusion that there is no apparent correlation between the overall concentration of a given tRNA and the extent of its mitochondrial localization (Fig. ​ (Fig.4) 4 ) to a single tRNA species.


Piñeyro D, Torres AG, Ribas de Pouplana L. Biogenesis and evolution of functional tRNAs. In: Sesma A, von der Haar T, editors. Fungal RNA biology. Cham: Springer International Publishing 2014. p. 233–67.

Phizicky EM, Hopper AK. tRNA biology charges to the front. Genes Dev. 201024:1832–60. doi:10.1101/gad.1956510.

Torres AG, Batlle E, Ribas de Pouplana L. Role of tRNA modifications in human diseases. Trends Mol Med. 201420:306–14. doi:10.1016/j.molmed.2014.01.008.

Shaheen R, Abdel-Salam GMH, Guy MP, Alomar R, Abdel-Hamid MS, Phizicky EM, et al. Mutation in WDR4 impairs tRNA m7G46 methylation and causes a distinct form of microcephalic primordial dwarfism. Genome Biol. 201516:210.

Michaud J, Kudoh J, Berry A, Bonne-Tamir B, Lalioti MD, Rossier C, et al. Isolation and characterization of a human chromosome 21q22.3 gene (WDR4) and its mouse homologue that code for a WD-repeat protein. Genomics. 200068:71–9. doi:10.1006/geno.2000.6258.

Sahún I, Marechal D, Pereira PL, Nalesso V, Gruart A, Garcia JMD, et al. Cognition and hippocampal plasticity in the mouse is altered by monosomy of a genomic region implicated in down syndrome. Genetics. 2014197:899–912. doi:10.1534/genetics.114.165241.

Alexandrov A, Grayhack EJ, Phizicky EM. tRNA m7G methyltransferase Trm8p/Trm82p: evidence linking activity to a growth phenotype and implicating Trm82p in maintaining levels of active Trm8p. RNA. 200511:821–30. doi:10.1261/rna.2030705.

Torres AG, Piñeyro D, Filonava L, Stracker TH, Batlle E, Ribas-de-Pouplana L. A-to-I editing on tRNAs: biochemical, biological and evolutionary implications. FEBS Lett. 2014588:4279–86. doi:10.1016/j.febslet.2014.09.025.

Torres AG, Piñeyro D, Rodríguez-Escribà M, Camacho N, Reina O, Saint-Léger A, et al. Inosine modifications in human tRNAs are incorporated at the precursor tRNA level. Nucleic Acids Res. 201543:5145–57. doi:10.1093/nar/gkv277.

Alazami AM, Hijazi H, Al-Dosari MS, Shaheen R, Hashem A, Aldahmesh MA, et al. Mutation in ADAT3, encoding adenosine deaminase acting on transfer RNA, causes intellectual disability and strabismus. J Med Genet. 201350:425–30. doi:10.1136/jmedgenet-2012-101378.


Hi-C data

The available contact maps for the five organisms are the output of various closely related high-throughput experimental protocols. All protocols were derived and adapted from 3C ref. 10, and are regarded in this work—for the sake of simplicity—as Hi-C methods. Supplementary Table 1 summarizes the data set chosen for each organism.

It can be seen that some parameters vary between data sets. Most importantly, the given resolution for each data set is different with a variability of up to 2 orders of magnitude (compare SP and HS). While all the experiments were done using HindIII restriction enzymes to produce DNA segments that make up the basic unit of raw contact maps, four out of five data sets employed constant size bins to collect the measurements to improve the signal to noise ratio 11,13,16,17 . The size of bins determines the resolution of the data set in these cases. In addition, three out of the five data sets were corrected to minimize experimental biases 13,16,17 . The data set for SC was further filtered to include a selected portion of the contact map that passed 1%-FDR (ref. 12).

All data sets went through additional post processing, as noted in Supplementary Table 1. We completed the processing of the provided data by choosing a post process that minimizes biases in the data and maximizes its significance. We employed an iterative correction process based on ref. 52, similar to the one used for the mouse data set (for details, see the Supplementary Methods). Cis maps were then normalized using the expected Hi-C read by genomic distance. Furthermore, we kept only the top percentage of significant Hi-C measurements after the correction (see Supplementary Table 1). Cis maps (intrachromosomal) and trans maps (interchromosomal) were filtered separately to insure that both types of interactions are represented properly. The filter threshold was chosen according to genome size and Hi-C map density. The above treatment aims at reducing the differences between data sets, before proceeding to a general, non-specific protocol of analysis.

While all results in this work are in general agreement between organisms, some of the diversity between organisms (for example, different levels of correlation) may be attributed to the differences in the protocols, their execution and inherent biases, as well as the preparation of the data. For instance, the HS map was measured on cycling cells, while the data set for MM was measured on cells in the same phase of the cell cycle (G1-arrested cells) 16 .

Genome sequence

Fungal and plant genome sequences were obtained from NCBI (SC S288c strand and SP 972h strand, AT TAIR10), which include 5,123 protein-coding SP genes (, mRNA-protein_coding), 5,888 SC genes (, Protein Sequences) and 27,191 AT genes (, see also Supplementary Table 2). We located all the HindIII restriction sites in SC and updated the coordinates of the SC Hi-C map. We used the NCBI protein tables for the ORF sequences. Since the Hi-C contact maps for mammals were based on the mm9/hg18 versions of the genomes, we used the UCSC table browser tool 53 to generate gene tables for HS hg18/hg19 genomes and MM mm9/mm10 genomes. In our analyses, for example, in HS, we used the set of genes that is shared by the two tables—hg18 and hg19. This enabled us to use updated gene sequences for most of the known protein-coding genes (see Supplementary Table 2). Genome sequences for hg19/mm10 were obtained from NCBI.

3D genomic distance

We utilized the Hi-C contact maps to construct graph/network representations of the spatial organization of the genome. The high resolution of the chosen contact maps allowed us to investigate the 3D structure in single protein-coding gene resolution by representing each gene as a node. In the case of mammals, each node represents all the possible products by alternative splicing of this gene. Binned chromosome interactions from the contact maps were transformed into gene-gene interactions. We mapped every gene to its closest Hi-C bin according to the distance between their centre coordinates. Each bin’s contacts with all others were assigned to its mapped genes.

We tried several criterions for mapping the data from Hi-C bins to genes to choose the least biased one, including: all overlapping bins per gene maximum-overlap of bin per gene (as in ref. 20) maximum-overlap of gene per bin and weighted mapping that is proportional to the overlap between bin and gene. We were able to reproduce our main results with all the aforementioned methods.

Since Hi-C maps were already filtered to include only the most significant interactions (see previous sections), we used binary graph edges (1/0) to depict interactions between genes. Chromosomes backbone edges between adjacent genes on the same chromosome were added to this graph, so that all neighbouring genes are at distance 1 from each other. Graph distances between all pairs of genes were computed according to the shortest path between them and were measured in hops. This setting allowed us to work in single-gene resolution, compute the distance between any given pair of genes and incorporate both interchromosomal and intrachromosomal measurements (some of the previous studies used only one of the two kinds).

Codon usage frequency similarity

Codon usage frequency vectors were computed by counting all appearances ni of a codon i in the ORF, and dividing by the total codon count.

It can be seen that this vector combines both the CUB and amino acid usage bias, because the frequency of each codon is normalized with respect to all other codons, not only synonymous codons for the same amino acid. We used the average frequency vector for genes with a number of alternatively spliced transcripts.

synCUFS frequency vectors were computed as follows:

Where the number of observed codons ni is normalized by the sum of all synonymous codons coding for the same amino acid or stop codon rather than all other codons.

AAF vectors were computed as follows:

Where ni is the number of counted occurrences of amino acid i in the ORF.

The CUFS between genes was computed using the Endres–Schindelin metric 23 for probability distributions. Given the frequency vectors of a pair of genes p and q, the CUF distance/similarity between them is given by:

Where dKL is the Kullback–Leibler divergence—a popular information gain measure, that is non-symmetric and does not satisfy metric properties 54 . Its use in this context, however, satisfies all required properties for a metric. It also bears a similarity to the Jensen–Shannon divergence. AAFS and synCUFS were computed using the same metric.

CUB indices

We computed the CAI 42 , tAI 29 , bcENC 43 —which is an improved variant of the effective number of codons 55 , CDC 44 and the RCB 45 index according to the cited papers. The reference set for CAI was selected according to the available protein abundance data 56 (see also the Supplementary Methods), by taking the top 100 expressed genes. The background nucleotide composition for CDC was estimated from the entire coding sequence of the genome, while for bcENC and RCB it was estimated from the ORF of each gene separately.

PPI graph

We used a number of PPI databases 57,58,59,60,61,62 to construct an undirected PPI network for the five organisms, and used the shortest path on the graph to define the PPI graph distance between each pair of genes. Disconnected pairs were assigned with a finite scalar (255) to include them in the average graph distance calculation, so that the PPI distance value for a set of gene pairs ranges from 1 (adjacent neighbours set) to 255 (completely disconnected set). (See also the Supplementary Methods).

GO term distance

We used the full GO 63 annotations provided for the five organisms 64,65,66,67,68 , and mapped them onto the generic slim ontology definitions provided by GOC, except in the case of AT where the plant slim ontology definitions were used. The distance between a pair of GO terms was defined to be the sum of the distances of the two terms on the GO graph from their least common ancestor. The distance for a pair of genes was computed by averaging the GO term distance between all their terms in the biological process ontology.

Other similarities

Distance for other measures, such as GC content and gene length, which are given as scalars for each gene, were computed as normalized distance:

Scalars given for different splice alternatives (such as GC and length) were averaged per gene before computing the distance.


Correlation was computed using a defined number of bins n according to the test of interest. Binning was conducted as follows. The measure in question, for example, CUFS, was computed for all gene pairs, then n bins of equal size of CUFS values were set, dividing all pairs. The mean CUFS and mean 3D distance were computed for each bin finally, Spearman’s rho was computed between all CUFS/3D distance bins. Supplementary Figure 5 presents the resultant correlation with CUFS/3D distance of different features for various bin sizes. The chosen number of bins for AT (n=64 × 10 3 ) and mammals (n=32 × 10 3 ) was larger than that for fungi (n=2 × 10 3 ) to account for their larger genome (measured in number of protein-coding genes, or nodes on the genomic graph).

We preferred binning the pair of variables being tested for correlation according to the variable with the widest range of values (closest to being continuous) to improve statistical accuracy. When binning integer values, specifically the 3DGD, we found that the distribution of 3D distances led to numerous bins holding the same distance value. For this reason, the variable tested against 3D distance was the one defining the bins in all cases when testing a variable against CUFS, bins were defined by CUFS, which is a continuous distance measure. In two cases, however, we binned the variables according to 3D distance (see the Supplementary Methods).

P value computation

Statistical significance of the results was verified against an empirical null model—cyclic chromosome shift (Fig. 3a). We draw from this model by randomly shifting the location of all genes on their respected chromosomes. The underlying null hypothesis is that the co-localization of specific gene sets of interest is not driven by the chromosome spatial conformation. In practice, drawing from the model is done by shifting the labels of all nodes while leaving the edges unmodified. P values were calculated by drawing 1,000 samples (random genome configurations) from the model and estimating the distribution of correlation coefficients (Fig. 3b,c), according to:

Where 1<> is the indicator function, ri is the random correlation coefficient obtained and rexp the observed correlation coefficient in the experiment. The cyclic chromosome shift model we used, beside its inherent logic, is the most conservative of the ones we tested, including: two-tailed t-test for Spearman’s correlation degree-preserving rewiring of the graphs random sampling of gene sets/gene pairs and cyclic genome shift, which is a whole-genome cyclic shift, allowing genes to rotate and move between chromosomes freely.

Evolution and conservation

For the fungal evolution results, we used the manually curated orthologues database at PomBase 69 , containing 3,367 orthologue families. For mammalian evolution, we used the MGI report of Human and Mouse Homology Classes sorted by HomoloGene ID 67 (file: HOM_MouseHuman Sequence.rpt) containing 15,832 orthologue families. We utilized the orthologue families to transform the CUFS/3D distance matrices, so that the transformed Co-CUFS for a pair of genes is the average CUFS between their corresponding orthologues in the co-organism. So that, given a distance matrix D B in organism B, the orthologous-transformed matrix in organism A is given by:

where Oj is the set of orthologous genes in organism B corresponding to gene j in organism A.

We then followed the correlation procedure, but considered only genes with identified orthologues in both species. The regular test consisted of computing the correlation of, for example, CUFS for orthologue sets of genes in organism X with the 3DGD in X. The obtained correlation was different than that computed for all possible genes following the use of only a subset of these. The hybrid test consisted of computing the correlation of, for example, the transformed Co-CUFS matrix for organism Y with the 3DGD in organism X. The conservation of hybrid sets of CUFS versus CUFS and 3DGD versus 3DGD was computed in the same manner.

HindIII segment properties

For control purposes, we located all the possible HindIII segments (cut site AAGCTT) in the genomes and computed their length as well as segment GC content (in a window of 200 nt upstream of the cut site, as in ref. 6). We discarded HindIII segments larger than 100,000 nt. The average segment GC content/length was computed for each Hi-C bin. Nodes (genes) on the graph were then assigned with segment length/GC content according to the Hi-C bin they were assigned when constructing the 3D genomic graph. When testing for identical node pairs, we included the 5% of pairs with the closest property value (for example, segment GC content), and binned them according to CUFS using 5% of the number of bins to account for the reduction in the amount of data.

Partial correlations

We demonstrated that CUFS is strongly correlated with many other variables (Supplementary Fig. 1). In the partial correlations test, we computed the partial correlation for nine features of the graph nodes, each correlation given the other eight. To this end, all variables were binned according to the 3D distances so that they can be compared (using min-variance binning, see Supplementary Methods). We used Spearman’s correlation.

Results and Discussion

Expression, maturation, and subcellular localization of nev-tRNAs

We have previously identified nematode-specific novel tRNA genes, designated nev-tRNAs, e.g., nev-tRNA Gly (CCC) and nev-tRNA Ile (UAU), which contain 15�-nt V-arm structures and are solely charged with Leu instead of Gly or Ile in vitro [8, 16]. To obtain further evidence of the functionality of nev-tRNAs in cells, the following two characteristics were analyzed: (1) their maturation, with the addition of 3′ CCA and (2) their subcellular localization. In these experiments, tRNA Gly (UCC) and tRNA Ile (UAU), which are the cognate tRNAs of nev-tRNA Gly (CCC) and nev-tRNA Ile (UAU), were used as the positive controls to test for GGG and AUA codon ambiguity in nematode cells.

The addition of CCA to the 3′ end of the tRNA molecule is one of its most important posttranscriptional modifications, and is essential for various tRNA functionalities, including other processing, aminoacylation, and tRNA–ribosome interactions [20]. To determine the 3′ end sequences of the nev-tRNAs with reverse transcription PCR (RT–PCR), a set of template tRNAs was isolated from mixed stages of C. elegans (eggs, larval stages 1𠄴, and adults), and ligated with a 23-nt adaptor sequence at their 3′ ends. RT–PCR amplification was conducted with forward primers that annealed to a specific region on each tRNA (positions 22� and 23� for common tRNA Gly and tRNA Ile , respectively positions 40� and 41� for nev-tRNA Gly and nev-tRNA Ile , respectively) and reverse primers that annealed to the 3′ adaptor region ( Fig. 1A ). Targeted regions of the predicted lengths were successfully amplified, except for nev-tRNA Gly ( Fig. 1B ). The amplification efficiency for nev-tRNA Gly was considerably lower than that for the other templates but the amplified product was clearly detected with a second PCR analysis. The amplification efficiency for nev-tRNA Ile was also slightly lower than those for the normal tRNAs. Taken together with our previous studies [16], these data suggest that the abundance of the mature nev-tRNAs in the cells was low. The amplified products of the expected sizes were then subcloned and the nucleotide sequences at their 3′ ends were determined. Fig. 1C shows that not only the common tRNA Gly (UCC) and tRNA Ile (UAU) but also nev-tRNA Gly (CCC) and nev-tRNA Ile (UAU) matured normally, with the addition of CCA at their 3′ ends. These findings show that nev-tRNAs are processed to the functional form for translation, just like their cognate tRNAs, although the structural and biochemical properties of the nev-tRNAs differ from those of normal tRNAs.

(A) PCR scheme for the detection of the 3′ ends of mature tRNAs: nev-tRNA Gly (CCC) and nev-tRNA Ile (UAU) and their cognates, tRNA Gly (UCC) and tRNA Ile (UAU), respectively. Numbers indicate the nucleotide positions relative to the 5′ end of each tRNA. (B) RT–PCR amplification of the 3′ end of each tRNA. PCR products of the expected sizes are shown as red dots. (C) Nucleotide sequence chromatograms of the 3′ end region of each tRNA.

We next analyzed the subcellular localization of the nev-tRNAs to determine whether they are exported from the nucleus after posttranscriptional modification. The whole C. elegans worm was subjected to subcellular fractionation with differential centrifugation (see Materials and Methods). Fig. 2 (upper panel) shows the subcellular localization of the control RNAs: U6 small nuclear RNA (snU6) and U3 small nucleolar RNA (snoU3) were enriched in the nucleus (

2.9-fold) relative to their levels in the cytoplasm, whereas tRNA iMet was enriched in the cytoplasm (

2.8-fold) relative to its level in the nucleus, as previously reported [21, 22]. Under the same conditions, nev-tRNA Gly (CCC) and nev-tRNA Ile (UAU) were detected at higher levels (

2.0-fold) in the cytoplasm than in the nucleus ( Fig. 2 , lower panel), suggesting that the nev-tRNAs are exported from the nucleus and might therefore be used in translation. This experiment also confirmed that normal tRNA Gly (UCC) and tRNA Ile (UAU) are exported from the nucleus. Moreover, we determined the anticodon sequences of approximately 30 clones of each nev-tRNA, both in the nucleus and cytoplasm, and found that no anticodon was changed to a leucine codon by an RNA editing event. These results support the possibility that nev-tRNAs compete with their cognate tRNAs during translation. It must be noted that it is still unclear whether nev-tRNA anticodons are changed by specific chemical modifications so that they can read leucine codons.

RNA was isolated from each fraction of C. elegans: whole cell (W), nuclear (N), or cytoplasmic (C). RT–PCR analysis was used to detect snU6 and snoU3 RNAs (nuclear markers), tRNA iMet (cytoplasmic marker), and four tRNAs (nev-tRNA Gly and nev-tRNA Ile , and their cognate tRNAs). 5S rRNA expression is shown as the loading control. Band densities were evaluated semiquantitatively with densitometry.

Analysis of amino acid misincorporation in the whole-cell proteome of C. elegans

Our previous studies have shown that nev-tRNA Gly (CCC) can be incorporated into ribosomes and used for protein synthesis in an insect cell-free protein expression system [16]. This finding is evidence that nev-tRNAs cause genetic code ambiguity, at least in vitro. Because nev-tRNAs are exported from the nucleus and might compete with their cognate tRNAs in C. elegans, we assumed that nev-tRNAs are involved in protein synthesis in vivo, creating genetic code ambiguity. To address this hypothesis, we performed a shotgun proteomic analysis of C. elegans using liquid chromatography–tandem MS (LC–MS/MS), and examined the kinds of protein molecules within the whole-cell proteome that contained misincorporated amino acids. High-resolution MS can directly monitor very low levels of minor protein isoforms on a large scale [23, 24]. In this experiment, we mainly focused on Gly-to-Leu (in which Gly at the GGG codon is replaced with Leu) and Gly-to-Ser (in which Gly at the GGG codon is replaced with Ser) misincorporations. Gly-to-Ser misincorporations were used as the negative control because nev-tRNA Gly (CCC) cannot be completely charged with Ser in vitro [16], suggesting that it does not cause Gly-to-Ser misincorporation. We did not look for Ile-to-Leu (in which Ile at the AUA codon is replaced with Leu) misincorporation because the Leu residue is indistinguishable from the Ile residue on MS, because they are structural isomers with identical molecular weights.

For the whole-cell proteomic analysis, a protein mixture was extracted from mixed-stage C. elegans and fragmented into small peptides by digestion with site-specific enzymes. After the LC–MS/MS analysis of the resulting peptides, the data were examined with Mascot v2.4 (Matrix Science, London) to identify the amino acid misincorporations, using two different approaches: (a) an error-tolerant search and (b) an in-house database search ( Fig. 3A ). The error-tolerant search is one of the optional modes of the Mascot protein database search [25], in which the raw data are initially searched against a reference protein database, after which the MS/MS data that do not match the expected amino acid sequences of known proteins are checked against a database containing all possible amino acid misincorporations and posttranslational modifications. With the error-tolerant search, 295,216 nonredundant (unique) peptides were identified. The in-house database search was developed and optimized in this study to compare the raw data against modified protein databases containing only possible Gly-to-Leu or Gly-to-Ser misincorporations, with no initial search against a reference protein database. This search identified 12,719 and 12,502 unique peptides, respectively ( Fig. 3A , Step 1).

(A) Summary of the whole-cell proteome analysis of mixed-stage C. elegans. Values are the unique peptide counts at each step. Values in parentheses are the count of candidate peptides containing misincorporated Ser at the Gly (GGG) codon (negative control). (B) Boxplot of the confidence scores for the candidate peptides in Step 2. Significant differences were determined with Student’s two-sided t test. (C) Example of the validation of targeted proteomics. Extracted ion chromatograms of the candidate peptide and the synthetic peptide SPASLDDDIK (an internal standard) are shown. The candidate peptide ion was separated > 1.0 min earlier than the internal standard, indicating that the amino acid sequence of the candidate peptide was inconsistent with the sequence SPASLDDDIK.

After discarding the low-quality peptides, 75 (= 14 + 30 + 31) candidate Gly-to-Leu mutant peptides and 53 (= 6 + 33 + 14) candidate Gly-to-Ser mutant peptides were extracted ( Fig. 3A , Step 2). The mean Mascot confidence score for the Gly-to-Leu candidates was 20.3 ± 6.3, which did not differ significantly from that of the Gly-to-Ser candidates (p > 0.01) ( Fig. 3B ). The candidate misincorporations were then further screened by the manual curation of their MS/MS spectra and isotope ratios, and 17 (= 1 + 10 + 6) and seven (= 0 + 3 + 4) mutant peptides were finally obtained, respectively ( Fig. 3A , Step 3, and summarized in S1 Table). To confirm that these peptides had identical amino acid sequences to those predicted with Mascot, a targeted proteome analysis was performed using an internal standard (IS) ( Fig. 3A , Step 4). The IS was a synthesized peptide consisting of the same amino acid sequence as that identified with Mascot, in which one amino acid at the N- or C-terminus was labeled with a stable isotope (summarized in S2 Table). If the ions of both targeted peptides and the IS were detected at quite similar elution times with LC, indicating their almost equivalent chemical properties, peptide identification was deemed to be reliable. However, if their elution times differed by > 1.0 min, peptide identification was deemed to be unreliable. Validation with these criteria revealed that all the candidate misincorporations were false-positive Mascot identifications. One example is shown in Fig. 3C . This result means that no Gly-to-Leu mutant peptide was detectable, which was also true for the Gly-to-Ser negative control, suggesting that nev-tRNA Gly (CCC) does not cause GGG codon ambiguity in the whole-cell proteome of C. elegans. This was also supported by the finding that no Gly-to-Leu candidate had a significantly higher Mascot score than the Gly-to-Ser candidates ( Fig. 3B ).

To gain more insight into the frequencies and variations of the amino acid misincorporations for each codon, we estimated the entire 64 × 19 possible codon-to-amino acid errors using data obtained with the error-tolerant search. Note that only a proportion of the identifications, with high Mascot confidence scores (> 30), was selected for this analysis because false-positive Gly-to-Leu misincorporations had low Mascot confidence scores (< 30), as described above. When the relationship between the amino acids used in the whole proteome and the number of predicted misincorporations for each codon was investigated with Pearson’s correlation coefficient, a strong significant correlation (r = 0.917) was observed ( Fig. 4A ). For example, the number of predicted misincorporations at frequent codons, such as the Glu (GAA) and Asp (GAU) codons, was up to 478, whereas fewer misincorporations were predicted at the Gly (GGG) and Ile (AUA) codons (approximately 4%). Furthermore, the Gly residues at the GGG codon showed little tendency to be substituted, not only with Leu (described as ‘Xle’ in the figure) but also with other amino acids ( Fig. 4B ). Similarly, there was no specific variation in the predicted misincorporations at the AUA codon. These observations show that nev-tRNAs do not seem to be involved in mistranslation at the corresponding codons in whole cells of C. elegans. However, in a single regression analysis, a dot corresponding to the Glu (GAG) codon was located outside the 95% confidence interval ( Fig. 4A ). As shown in Fig. 4B , Glu residues at the GAG codon tend to be substituted with Met residues at high levels (

7.3 × 10 𠄴 ). In bacterial, yeast, and mammalian cells, it has been reported that Met is misacylated to specific nonmethionyl tRNA families, such as tRNA Glu and tRNA Lys , and that these Met-misacylated tRNAs are used for protein synthesis during some cellular responses [26�]. Although nev-tRNAs cannot decode the GAG codon because at least one base pair is mismatched, the common tRNA Glu (CUC) encoded in the C. elegans genome can decode it. Therefore, the high Glu-to-Met error rate in C. elegans suggests the involvement of tRNA Glu (CUC) misacylation in this phenomenon, as in bacterial, yeast, and mammalian cells.

(A) Scatterplot of the frequencies of amino acids contained in all nonredundant peptides identified with a normal database search (x-axis) versus the total number of predicted amino acid misincorporations (y-axis) for each codon. The black line in the center denotes the linear regression line. The outer, light blue lines denote the 95% confidence interval for an individual predicted value. The red and green dots correspond to the GGG and AUA codons, respectively. The dots located outside the 95% confidence interval are shown in gray. (B) Heat map indicating the degree of predicted amino acid misincorporation (error rate) for each codon. The error rate was predicted by calculating the abundance of misincorporated amino acids relative to the total number of amino acids contained in the whole proteome. The matrix plots in the Gly (GGG) and Ile (AUA) row and in the ‘Xle’ (i.e., Ile or Leu) column are boxed. The total numbers of predicted misincorporations for each codon are indicated as a bar chart.

Possible explanations of the lack of genetic code ambiguity in C. elegans

We considered two possible reasons why no Gly-to-Leu mutant peptides were detected in this study, even though the nev-tRNAs matured normally and were exported from the nucleus. First, it is possible that nev-tRNAs are excluded from the protein synthesis process by a translation quality control mechanism. In bacteria, one of the elongation factors, EF-Tu, selectively binds to the correct aminoacyl-tRNAs and delivers them into the A-site of the ribosome [3, 30, 31]. In human neural cells, if the translation process is stopped because a tRNA is mutated, one of the ribosome release factor, GTPBP2, interacts with the ribosome recycling protein Pelota, and releases the stalled ribosome [32]. Although it is unclear whether homologues of EF-Tu and GTPBP2 act in C. elegans, as has been reported in other species, these findings allow the possibility that the translational errors induced by mischarged nev-tRNAs might be prohibited by such quality control systems.

Second, it is also possible that nev-tRNAs are used for protein synthesis in the cell, but that the frequency of amino acid misincorporations is below the level of MS detection. The MS-based method can directly measure a large number of amino acid misincorporations, down to a level of 0.01% (10 𠄴 ) [23, 24]. However, because the abundance of mature nev-tRNAs in the cell is very low and they compete with highly expressed cognate tRNAs ( Fig. 2 ), the incorporation of nev-tRNAs into ribosomes might be a rare and limited event compared with the incorporation of their cognate tRNAs. In addition to the low abundance of nev-tRNAs, we noted the low usage of the codons with which nev-tRNAs are associated. For instance, the GGG codon to which nev-tRNA Gly (CCC) corresponds is the second rarest codon (0.44%) in C. elegans [16]. Therefore, we assume that even if nev-tRNAs participate in translation, the identification of amino acid misincorporations at the GGG codon is statistically more difficult than at other more frequent codons. This hypothesis is supported by the observation of more abundant misincorporations at the more frequent codons ( Fig. 4A ). Collectively, our data demonstrate that there is no mutant protein containing misincorporated Leu at “high” frequency in the whole-cell proteome, whereas it is still unknown whether such Leu residues are misincorporated into low-abundance proteins and/or some specific sites in proteins at low frequency.

To determine whether nev-tRNA-induced mistranslations can occur at low frequencies, an overexpressed single recombinant protein was analyzed with targeted proteomics. In this experiment, we overexpressed a green fluorescent protein (GFP)–LacZ protein and purify it to improve the detectable level of Gly-to-Leu misincorporation, because (i) the total 1284 codons of the GFP–LacZ mRNA contain 12 GGG codons (approximately 1% of the codons) and (ii) the purified samples for MS include a small number of proteins, mainly GFP–LacZ, resulting in low background noise. For this analysis, we constructed a transgenic strain expressing myo-3p::GFP-LacZ and extracted the protein mixture. After immunoprecipitation with an anti-GFP antibody, the purified GFP–LacZ protein was fragmented into small peptides by digestion with site-specific enzymes. The LC–MS/MS analysis was performed using two types of ISs for calibration, a synthetic peptide consisting of the same amino acid sequence as that in the database, and a synthetic peptide containing the Leu residue substituted for the Gly residue at the GGG codon (summarized in S3 Table). As shown in S4 Table, wild-type peptides containing the Gly residue at the GGG codon were detected at almost identical elution times as the ISs. In contrast, no aberrant peptide containing a misincorporated Leu residue at the GGG codon was detected. The fragmentation pattern in the mass spectrum of the identified peptide was consistent with that of the wild-type peptide rather than the aberrant peptide. One example is shown in S1 Fig. This result means that the Gly-to-Leu mutant peptides were not represented, even in the high-resolution targeted MS screen, suggesting that nev-tRNA Gly (CCC) is not incorporated into ribosomes at a detectable level.

Evolutionary implications of nev-tRNAs for the nematode genetic code

In this work, we have demonstrated that nev-tRNAs are weakly expressed, mature normally with the addition of the 3′ CCA, and are exported from the nucleus in C. elegans. However, no nev-tRNA-induced amino acid misincorporation was detected in the whole-cell proteome. The possible reasons include: (1) nev-tRNAs are not involved in translation or (2) nev-tRNAs participate in translation but at a very low frequency. Consequently, the nematode genetic code does not seem to be ambiguous, although its genome contains these deviant tRNAs, which decode an alternative code. Because sense codon reassignment is strictly limited during evolution [6𠄸], nematode cells might actively regulate errors in protein synthesis with specific translational quality control mechanisms. Our observations provide an example of the robustness of the genetic code during translation, ensuring cellular homeostasis.

In contrast, pseudo-tRNA genes typically have several mismatched base pairings because of the high evolutionary rate [14, 33], but nev-tRNA genes do not contain such mutations and form a perfect cloverleaf secondary structure. The copy numbers of nev-tRNA genes and their anticodon variants have increased during the evolution of the nematode taxon [16]. From this feature of their evolutionary conservation, we also assume that they play important, if unexpected, roles, especially in certain biological processes. One such possible role is in the protective stress response. In bacterial, yeast, and mammalian cells, the level of Met-misacylation increases during the immune response, as described above. Because Met residues protect proteins from reactive oxygen species (ROS)-mediated damage [34], increased numbers of Met residues in proteins constitute a response mechanism, protecting cells against oxidative stress [29]. In addition to this pathway, recent studies have reported other putative benefits of mistranslation under stress conditions. In Saccharomyces cerevisiae cells, tRNA-misacylation-dependent translation errors increase the ubiquitylation and aggregation of proteins, and enhance the expression of heat shock proteins and other stress proteins. Consequently, the cells can survive even lethal environmental conditions [6, 7, 35]. Although nev-tRNAs are weakly expressed under normal growth conditions, their expression may be enhanced under some stress conditions, causing the synthesis of mistranslated proteins and the upregulation of the stress response to better cope with stress.

Another possible role of nev-tRNAs is in the gain of novel protein functions through the production of mutant proteins. Although most mistranslated proteins will probably be deleterious or neutral in function, a minority of these proteins will acquire novel or altered functions arising from their chemical and/or structural changes, including new subcellular localization [36], antibiotic resistance [37], or phenotypic diversification [38]. Although our data suggest that whole nematode cells do not synthesize mutant proteins using nev-tRNAs, it is still possible that some cells or tissues do synthesize such novel functional mistranslated proteins. For instance, there are cell-specific physiological differences in the translational error rate in mice [39]. Further studies are required to clarify the extensive expression patterns of nev-tRNAs under various environmental conditions and in different cells and tissues, and to identify the cellular response during the induction of genetic code ambiguity by nev-tRNAs.

Overview of SINE evolution

The organism's interaction with SINEs (as well as with other mobile genetic elements) largely resembles the host–parasite coevolution. The integration of new SINE copies often disturbs gene expression on the other hand, they can serve as a source of genomic innovations and a factor of genome plasticity (Makalowski, 2000). Nevertheless, the organism tries to suppress SINE amplification using, for example, APOBEC3-mediated system (Chiu et al., 2006 Hulme et al., 2007) or SINE DNA methylation (Rubin et al., 1994). As LINE RT is required for SINE amplification, LINE repression also protects the genome from SINE expansion. LINE can be repressed through RNA interference or the APOBEC3 system, and the repression can be fixed by DNA methylation. The evolutionary dynamics of interactions between the organism and SINEs (as well as LINEs) resembles an arms race. At the extremes, too aggressive SINEs (or LINEs) can destroy their host organism and are eliminated by selection on the other hand, there are many examples of SINE family death (cessation of amplification). More commonly, ups and downs in the activity of particular SINEs or LINEs are observed. This can be exemplified by the evolutionary waves of genome expansion by B1 or Alu subfamilies (Quentin, 1989 Ohshima et al., 2003) or by the 100-times decline in the Alu retroposition frequency in current humans relative to primates 40–50 MYA (Batzer and Deininger, 2002). Amazingly, some dead SINEs can be ‘reincarnated.’ For instance, after inactivation of a LINE partner, the replacement of the 3′-terminal region with that of another (active) LINE gives rise to a new active SINE family. A demonstrative example of this kind can be found in wallaby genome, where a tRNA-CORE cassette consecutively replaced the 3′-terminal region and LINE partners (L2, L3, Bov-B, and L1 Figure 2). To a large extent, this and many other events in the evolution of SINEs are made possible by the huge number of their genomic copies, a fraction of which is transcribed even if their reverse transcription is impossible.

In contrast to other mobile genetic elements, SINEs emerged in evolution many times. For instance, at least 23 primary SINE families independently appeared in the evolution of placental mammals (currently, 51 mammalian SINE families have been described Figure 4). This amazing property results, on the one hand, from their simple modular structure and the availability of the source modules (for example, tRNA or 3′ end of LINE) in the cell. Moreover, high variation in SINE structures suggests that there are no stringent requirements for their nucleotide sequences excluding several short conserved regions. On the other hand, the emergence and replication of SINEs depend on LINE RT, which is not very secure from processing foreign sequences. Interestingly, some modules and RTs are particularly favorable for SINE emergence. For instance, alanine tRNA CGC independently gave rise to three simple SINEs (ID in rodents, vic-1 in camels and DAS-I in armadillos Borodulina and Kramerov, 2005). Likewise, SINE families mobilized by mammalian L1 are particularly abundant. At present, we have no clue what properties of alanine tRNA and L1 RT proved beneficial for SINE emergence and amplification.

The de novo emergence of SINEs in placental mammals. The mammalian tree corresponds to the TimeTree Knowledge Base (Hedges et al., 2006).

Further SINE evolution involves the complication of their structure by internal duplications, acquisition of new modules (such as CORE) and dimerization. Although simple SINEs can be highly prolific, the majority of successful SINEs are longer than 150 bp and have a more complex structure (Figure 5). It is worth mentioning one more property of SINE evolution, module exchange. Although such recombination occurs in other genetic elements, it is unusually frequent in SINEs, which provides extra flexibility to their evolution. In a sense, SINE dimerization can also be considered as a special case of module exchange.

Length distribution of SINE families (without tail plotted for 125 elements).

Owing to de novo emergence of SINEs and module exchange/dimerization, large-scale evolution of SINEs cannot be presented as a common phylogenetic tree (although short periods of SINE evolution can), which distinguishes it from the evolution of genes and other mobile genetic elements presentable as a common bifurcating tree.

Mammals (placentals, marsupials and monotremes), reptiles, fishes and cephalopods have a large number of different active SINE families. Amazingly, they are absent from Drosophila species and chicken (although the chicken genome contains copies of inactive Ther-1, which amplified in the genomes of vertebrate ancestors), at the same time, their genomes have active LINEs. One can speculate that these LINEs lack some properties essential for SINE mobilization it is also possible that de novo emergence of a SINE is a very rare event, and the odds are that it never occurred in certain genomes. Finally, SINEs could emerge but failed to survive because of some properties of host genomes (for instance, the Drosophila genome is relatively small, which can point to the mechanisms counteracting mobile element expansion). The rapid progress in comparative genomics of eukaryotes shows promise that this and other mysteries of SINE origin and evolution will be solved.


Transfer RNAs (tRNA) are important molecules that involved in protein translation machinery and acts as a bridge between the ribosome and codon of the mRNA. The study of tRNA is evolving considerably in the fields of bacteria, plants, and animals. However, detailed genomic study of the cyanobacterial tRNA is lacking. Therefore, we conducted a study of cyanobacterial tRNA from 61 species. Analysis revealed that cyanobacteria contain thirty-six to seventy-eight tRNA gens per genome that encodes for 20 tRNA isotypes. The number of iso-acceptors (anti-codons) ranged from thirty-two to forty-three per genome. tRNA Ile with anti-codon AAU, GAU, and UAU was reported to be absent from the genome of Gleocapsa PCC 73,106 and Xenococcus sp. PCC 7305. Instead, they were contained anti-codon CAU that is common to tRNA Met and tRNA Ile as well. The iso-acceptors ACA (tRNA Cys ), ACC (tRNA Gly ), AGA, ACU (tRNA Ser ), AAA (tRNA Phe ), AGG (tRNA Pro ), AAC (tRNA Val ), GCG (tRNA Arg ), AUG (tRNA His ), and AUC (tRNA Asp ) were absent from the genome of cyanobacterial lineages studied so far. A few of the cyanobacterial species encode suppressor tRNAs, whereas none of the species were found to encode a selenocysteine iso-acceptor. Cyanobacterial species encode a few putative novel tRNAs whose functions are yet to be elucidated.


A model for tRNA evolution

A model for evolution of the tRNA cloverleaf has been proposed and strongly supported using statistical tests [ 3 ]. Essentially, all predictions of the model have been verified for archaeal and bacterial tRNAs. The model is based on ligation of three 31 nt minihelices followed by two internal, symmetrical 9 nt deletions to yield a 75 nt cloverleaf core (1�), with the attached discriminator base (76) and 3’-CCA (77�). By contrast, historical tRNA numbering utilizes a 72 nt core, which is based on eukaryotic tRNAs with 3 nt deleted in the D loop relative to tRNA Pri . In cloverleaf evolution, one of the three ligated minihelices became the D loop, one the anticodon loop and one the T loop. 9 nt deletions are within ligated acceptor stem sequences, leaving two 5 nt relics of what were initially complementary acceptor stems surrounding the anticodon stem. The anticodon stem and loop and the T stem and loop are homologous, and obviously so, particularly for archaeal tRNAs, and homology is starkly evident from inspection of typical tRNA diagrams (i.e. of Pyrococcus tRNAs Figure S9) [ 3 ].

Two minihelix tRNA evolution models

In a competing two minihelix model for tRNA evolution, proposed by others [ 31� ], the cloverleaf sequence is essentially divided through the anticodon loop, and the halves are expected to be homologous, even though, in the cloverleaf, the halves are expected to be complementary. In the two minihelix model, because, for the comparison, the anticodon stem and loop were bisected, the anticodon loop and the T loop cannot be homologs, although they clearly are, both from inspection of archaeal tRNAs (Figure S9) and using statistical tests [ 3 ]. In the two minihelix model, the D loop and the T loop ought to be homologs, although they clearly are not (in any alignment register). By contrast, the tRNA evolution model utilized here is predictive and apparently accurate, and competing models are falsified. Identification of tRNA Pri based on the tRNA evolution model is highly predictive for the evolution of the genetic code ( Figuresਁ – 3 Figures S1–S8).

TRNA and rugged evolution

A tightly folded RNA such as the tRNA cloverleaf is subject to rugged evolution in which many or most substitutions are catastrophic for folding [ 34 , 35 ]. For instance, most substitutions in a tRNA stem are expected to require rescue by a complementary mutation (except for many C→U substitutions in stems, which allow G∼U pairing). In our model for tRNA evolution from tRNA Pri , very few substitutions (if any) are required to obtain a folded cloverleaf. By contrast, in a two minihelix model for tRNA evolution, many substitutions are necessary to obtain a cloverleaf. Because of rugged RNA evolution and the required number of compensating substitutions, a two minihelix model is untenable. Furthermore, a two minihelix model requires unimaginable convergent evolution of the T stem and loop and the anticodon stem and loop to apparent structural and sequence homology. Because cloverleaf tRNA is subject to rugged evolution [ 3 , 34 , 35 ] many disqualifying criticisms are generated for a two minihelix model. Other tRNA evolution models also appear to be inconsistent with rugged evolution of RNA [ 36 , 37 ].

A root for the tRNA evolutionary tree

The model for tRNA evolution indicates a sequence for tRNA Pri [ 3 ], which is most similar to archaeal tRNA Gly , indicating that Gly may be the founding amino acid of the code ( Figureਂ ) [ 6 , 7 ] The polyglycine hypothesis is posited, that tRNA initially evolved to synthesize short chain polyglycine to stabilize protocells. Very rapidly, every permitted anticodon was initially assigned as tRNA Gly before reassignment to specify other amino acids ( Figureਅ ). Cloverleaf tRNA and the genetic code appear to be prerequisites for cellular and DNA genome-based life, which originate at LUCA. In the RNA-protein world, genes were more independent than they subsequently became, in compact, streamlined and rapidly replicating DNA genomes encapsulated in cells. We propose, therefore, that colonies of independently replicating tRNA genes in an RNA-polymer world quickly diversified to include all permitted anticodon sequences, which, initially, encoded glycine (i.e. based on acceptor stem sequence, discriminator A (as in tRNA Pri and archaeal tRNA Gly ( Figureਂ )) and typical tRNA sequences (Figure S9)). Of course, specification of glycine attachment by tRNA Pri need not have been highly accurate. It appears that errors in tRNA charging drove code evolution [ 2 , 14 , 25 ].

Degeneracy and sectoring

We favor a simple stepwise model for evolution and sectoring of the genetic code ( Figureਅ ). The model describes why the code specifies � amino acids and is degenerate. As we argue here, the initial genetic code probably consisted of 48 and not 64 permitted anticodons, because adenine in the wobble position of the anticodon loop is destabilizing and would be expected to interact awkwardly with mRNA [ 12 ]. Furthermore, adenine in the anticodon wobble position probably supports a genetic code that is overly inflexible during initial code evolution, because adenine too strongly specifies uridine in the mRNA wobble codon position. Because of early positive selection for ambiguity in reading the anticodon wobble position, the genetic code should be considered initially to be primarily a 2 nt code encoding at most 16 amino acids (or 15 amino acids + Ter (stop)) in a register of 3 nt. Discrimination using the wobble anticodon position is only achieved with difficulty and, because of the ambiguity of tRNA anticodon-mRNA codon interactions in the ribosome decoding center [ 38 ], recognition at the wobble anticodon base is not strongly constrained by Watson-Crick base pairing. Despite early selection for ambiguity reading the mRNA wobble position, the tRNA anticodon wobble position was later innovated to add an additional 𢏅𠄶 letters to the code (16 + 5 = 21 letters total, including stops).

Wobble pairing: the importance of being ambiguous

Negative selection against adenine in the anticodon wobble position indicates that tRNA-mRNA wobble A𢏌 pairing is negatively selected when A is the tRNA anticodon wobble base [ 17 ]. We note, however, that G∼U and U∼G wobble pairings are allowed. This raises the question of whether C𢏊 pairing might have been allowed, if C was the tRNA anticodon wobble base. Modifications of tRNA wobble C improve C𢏊 base pairing, including agmatidine (archaea), 2-lysidine (bacteria) and 5-formylcytidine (mitochondria, eukarya) [ 39 ]. Many tRNAs have a weak C𢏊 hydrogen bonding interaction between the 7 nt anticodon loop base position 1 (i.e. 2’-O-methyl-C (C = O or N)) and loop base position 7 (i.e. A (NH2)). From PDB 4TRA, it appears that the weak 1𡤧 C𢏊 interaction is modulated by Mg 2+ , and elevated Mg 2+ is reported to induce translation errors [ 40 , 41 ]. During the early stages of code evolution, therefore, ambiguous wobble base pair interactions appear to have been positively selected. We posit that, for translation, a wobble tRNA base C (or modified C) may pair mRNA base A more efficiently than a wobble tRNA base A will pair mRNA base C, partly explaining the strong negative selection of A in the tRNA anticodon wobble position. It appears that tRNA anticodon wobble C is not as strongly negatively selected as wobble A. We note the possibility that tRNA anticodon wobble C modification to pair mRNA codon A may have occurred very early in evolution to compensate for an otherwise overly restrictive code. Also, there may be a selected preference for G and C over A and U during early evolution of the code. The genetic code initially evolved to be a � letter code before innovating the wobble position to expand to a 21 letter code.

Covalent modifications of tRNAs are common. In Figure S10, archaeal tRNA modifications determined for Haloferax volcanii tRNAs from the Modomics database [ 39 ] are displayed on a Pyrococcus typical tRNA. In concept, tRNA modifications could be used as determinants for aaRS enzymes to discriminate different tRNAs (i.e. tRNA Phe in bacteria, which requires tRNA Phe modifications for accurate charging by PheRS) [ 42 ], although, to our knowledge, such a mechanism has not yet been clearly demonstrated for any archaeal tRNA. In archaea, many covalent modifications are found in the anticodon loop particularly at loop positions 1 and 3 (wobble). Modifications in the anticodon loop may: 1) help stabilize the tight U turn structure 2) affect anticodon readout and/or 3) modify weak anticodon loop positions 1𡤧 interactions. Contacts between loop positions 1 and 7 affect loop dynamics and modify wobble position readout [ 22 , 23 ]. Modifications of the D loop, T loop and V loop may stabilize loop and stem conformations, D loop-T loop interactions and/or stability of the overall cloverleaf fold. Of course, for bacteria and eukaryotes, tRNA modifications allow expansions of the anticodon repertoire, as seen for the enzymatic conversion of wobble position adenine→inosine [ 12 , 13 ].

Cloverleaf tRNA as an evolutionary archetype

In ancient evolution from about 3.8 to 4 billion years ago, cloverleaf tRNA was the defining innovation that made possible the RNA-protein world and then cellular life [ 3 ]. Essentially, without cloverleaf tRNA, the genetic code was impossible, and the RNA-protein world and cellular life were, therefore, impossible. 17 nt microhelices and 31 nt minihelices (17 nt microhelices with 2 × 7 nt acceptor stems) may have supported polyglycine synthesis, but there is little evidence that much more complex products were possible based on minihelix adapters [ 3 ]. For one thing, from the cloverleaf tRNA Pri sequence, the 31 nt minihelix posited to have given rise to the D loop appears to have had glycine-specifying acceptor stems, indicating that, because at least two distinct minihelices (D loop and anticodon loop/T loop) appeared to have specified glycine, few products, if any, other than polyglycine were made.

In a minihelix world, the D loop minihelix could not have supported a 3 nt genetic code register, because the D loop minihelix cannot form a 7 nt U turn. By contrast, the minihelices that gave rise to the anticodon loop and the T loop form the tight 7 nt U turn loop. The anticodon loop and the T loop are homologous to each other and distinct in sequence from the D loop minihelix, except in the acceptor stems, which appear initially to be identical (GCG and CGC repeats) [ 3 ]. We posit, therefore, that polypeptide synthesis based on primitive minihelix adapters was chaotic, limited and inefficient.

Watch the video: Transkription u0026 Translation Genetik Abi Special (August 2022).