Information

Are the genes for transcriptional factors close to their targets in the genome?

Are the genes for transcriptional factors close to their targets in the genome?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Transcriptional factors (activators and repressors) are proteins which regulate transcription. Being proteins, they themselves are also made from expression of certain DNA sequences/genes.

For example, the lac operon repressor is coded for by the lacI gene. Now, in the case of the lac operon, the gene for the transcriptional factor is next to the promoter sequence. It is, in fact, a part of the lac operon. Similarly, the trpR gene is a part of the trp operon.

Is this always the case? Is the gene coding for the transcriptional factor always present next to (or close to) the transcription unit?

The two mentioned operons are only in prokaryotes. Could it be that this fact is valid for prokaryotes only, and not eukaryotes' DNA?


Is the gene coding for the transcriptional factor always present next to (or close to) the transcription unit?

Not necessarily. You can find several counter-examples. Even for prokaryotes this is not necessarily true. The genes regulated by a common factor are called regulons and though the target genes need not be proximal to the regulator, it has been observed for E. coli and B. subtilis that operons within a regulon are clustered in the genome (Zhang et al, 2012). In the same study, the authors note that small regulons have the regulator proximal to them (See the figure below).


Summary

Brassinosteroids (BRs) are important regulators for plant growth and development. BRs signal to control the activities of the BES1 and BZR1 family transcription factors. The transcriptional network through which BES1 and BZR regulate large number of target genes is mostly unknown. By combining chromatin immunoprecipitation coupled with Arabidopsis tiling arrays (ChIP-chip) and gene expression studies, we have identified 1609 putative BES1 target genes, 404 of which are regulated by BRs and/or in gain-of-function bes1-D mutant. BES1 targets contribute to BR responses and interactions with other hormonal or light signaling pathways. Computational modeling of gene expression data using Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) reveals that BES1-targeted transcriptional factors form a gene regulatory network (GRN). Mutants of many genes in the network displayed defects in BR responses. Moreover, we found that BES1 functions to inhibit chloroplast development by repressing the expression of GLK1 and GLK2 transcription factors, confirming a hypothesis generated from the GRN. Our results thus provide a global view of BR regulated gene expression and a GRN that guides future studies in understanding BR-regulated plant growth.


Contents

A DNA transcription unit encoding for a protein may contain both a coding sequence, which will be translated into the protein, and regulatory sequences, which direct and regulate the synthesis of that protein. The regulatory sequence before ("upstream" from) the coding sequence is called the five prime untranslated region (5'UTR) the sequence after ("downstream" from) the coding sequence is called the three prime untranslated region (3'UTR). [3]

As opposed to DNA replication, transcription results in an RNA complement that includes the nucleotide uracil (U) in all instances where thymine (T) would have occurred in a DNA complement.

Only one of the two DNA strands serve as a template for transcription. The antisense strand of DNA is read by RNA polymerase from the 3' end to the 5' end during transcription (3' → 5'). The complementary RNA is created in the opposite direction, in the 5' → 3' direction, matching the sequence of the sense strand with the exception of switching uracil for thymine. This directionality is because RNA polymerase can only add nucleotides to the 3' end of the growing mRNA chain. This use of only the 3' → 5' DNA strand eliminates the need for the Okazaki fragments that are seen in DNA replication. [3] This also removes the need for an RNA primer to initiate RNA synthesis, as is the case in DNA replication.

The non-template (sense) strand of DNA is called the coding strand, because its sequence is the same as the newly created RNA transcript (except for the substitution of uracil for thymine). This is the strand that is used by convention when presenting a DNA sequence. [5]

Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA. As a result, transcription has a lower copying fidelity than DNA replication. [6]

Transcription is divided into initiation, promoter escape, elongation, and termination. [7]

Setting up for transcription Edit

Enhancers, transcription factors, Mediator complex and DNA loops in mammalian transcription Edit

Setting up for transcription in mammals is regulated by many cis-regulatory elements, including core promoter and promoter-proximal elements that are located near the transcription start sites of genes. Core promoters combined with general transcription factors are sufficient to direct transcription initiation, but generally have low basal activity. [8] Other important cis-regulatory modules are localized in DNA regions that are distant from the transcription start sites. These include enhancers, silencers, insulators and tethering elements. [9] Among this constellation of elements, enhancers and their associated transcription factors have a leading role in the initiation of gene transcription. [10] An enhancer localized in a DNA region distant from the promoter of a gene can have a very large effect on gene transcription, with some genes undergoing up to 100-fold increased transcription due to an activated enhancer. [11]

Enhancers are regions of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene transcription programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes. [12] While there are hundreds of thousands of enhancer DNA regions, [13] for a particular type of tissue only specific enhancers are brought into proximity with the promoters that they regulate. In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to their target promoters. [11] Multiple enhancers, each often at tens or hundred of thousands of nucleotides distant from their target genes, loop to their target gene promoters and can coordinate with each other to control transcription of their common target gene. [12]

The schematic illustration in this section shows an enhancer looping around to come into close physical proximity with the promoter of a target gene. The loop is stabilized by a dimer of a connector protein (e.g. dimer of CTCF or YY1), with one member of the dimer anchored to its binding motif on the enhancer and the other member anchored to its binding motif on the promoter (represented by the red zigzags in the illustration). [14] Several cell function specific transcription factors (there are about 1,600 transcription factors in a human cell [15] ) generally bind to specific motifs on an enhancer [16] and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern level of transcription of the target gene. Mediator (a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (pol II) enzyme bound to the promoter. [17]

Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two enhancer RNAs (eRNAs) as illustrated in the Figure. [18] An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of transcription factor bound to enhancer in the illustration). [19] An activated enhancer begins transcription of its RNA before activating transcription of messenger RNA from its target gene. [20]

CpG island methylation and demethylation Edit

Transcription regulation at about 60% of promoters is also controlled by methylation of cytosines within CpG dinucleotides (where 5’ cytosine is followed by 3’ guanine or CpG sites). 5-methylcytosine (5-mC) is a methylated form of the DNA base cytosine (see Figure). 5-mC is an epigenetic marker found predominantly within CpG sites. About 28 million CpG dinucleotides occur in the human genome. [21] In most tissues of mammals, on average, 70% to 80% of CpG cytosines are methylated (forming 5-methylCpG or 5-mCpG). [22] Methylated cytosines within 5’cytosine-guanine 3’ sequences often occur in groups, called CpG islands. About 60% of promoter sequences have a CpG island while only about 6% of enhancer sequences have a CpG island. [23] CpG islands constitute regulatory sequences, since if CpG islands are methylated in the promoter of a gene this can reduce or silence gene transcription. [24]

DNA methylation regulates gene transcription through interaction with methyl binding domain (MBD) proteins, such as MeCP2, MBD1 and MBD2. These MBD proteins bind most strongly to highly methylated CpG islands. [25] These MBD proteins have both a methyl-CpG-binding domain as well as a transcription repression domain. [25] They bind to methylated DNA and guide or direct protein complexes with chromatin remodeling and/or histone modifying activity to methylated CpG islands. MBD proteins generally repress local chromatin such as by catalyzing the introduction of repressive histone marks, or creating an overall repressive chromatin environment through nucleosome remodeling and chromatin reorganization. [25]

As noted in the previous section, transcription factors are proteins that bind to specific DNA sequences in order to regulate the expression of a gene. The binding sequence for a transcription factor in DNA is usually about 10 or 11 nucleotides long. As summarized in 2009, Vaquerizas et al. indicated there are approximately 1,400 different transcription factors encoded in the human genome by genes that constitute about 6% of all human protein encoding genes. [26] About 94% of transcription factor binding sites (TFBSs) that are associated with signal-responsive genes occur in enhancers while only about 6% of such TFBSs occur in promoters. [16]

EGR1 protein is a particular transcription factor that is important for regulation of methylation of CpG islands. An EGR1 transcription factor binding site is frequently located in enhancer or promoter sequences. [27] There are about 12,000 binding sites for EGR1 in the mammalian genome and about half of EGR1 binding sites are located in promoters and half in enhancers. [27] The binding of EGR1 to its target DNA binding site is insensitive to cytosine methylation in the DNA. [27]

While only small amounts of EGR1 transcription factor protein are detectable in cells that are un-stimulated, translation of the EGR1 gene into protein at one hour after stimulation is drastically elevated. [28] Expression of EGR1 transcription factor proteins, in various types of cells, can be stimulated by growth factors, neurotransmitters, hormones, stress and injury. [28] In the brain, when neurons are activated, EGR1 proteins are up-regulated and they bind to (recruit) the pre-existing TET1 enzymes which are highly expressed in neurons. TET enzymes can catalyse demethylation of 5-methylcytosine. When EGR1 transcription factors bring TET1 enzymes to EGR1 binding sites in promoters, the TET enzymes can demethylate the methylated CpG islands at those promoters. Upon demethylation, these promoters can then initiate transcription of their target genes. Hundreds of genes in neurons are differentially expressed after neuron activation through EGR1 recruitment of TET1 to methylated regulatory sequences in their promoters. [27]

The methylation of promoters is also altered in response to signals. The three mammalian DNA methyltransferasess (DNMT1, DNMT3A, and DNMT3B) catalyze the addition of methyl groups to cytosines in DNA. While DNMT1 is a “maintenance” methyltransferase, DNMT3A and DNMT3B can carry out new methylations. There are also two splice protein isoforms produced from the DNMT3A gene: DNA methyltransferase proteins DNMT3A1 and DNMT3A2. [29]

The splice isoform DNMT3A2 behaves like the product of a classical immediate-early gene and, for instance, it is robustly and transiently produced after neuronal activation. [30] Where the DNA methyltransferase isoform DNMT3A2 binds and adds methyl groups to cytosines appears to be determined by histone post translational modifications. [31] [32] [33]

On the other hand, neural activation causes degradation of DNMT3A1 accompanied by reduced methylation of at least one evaluated targeted promoter. [34]

Initiation Edit

Transcription begins with the binding of RNA polymerase, together with one or more general transcription factors, to a specific DNA sequence referred to as a "promoter" to form an RNA polymerase-promoter "closed complex". In the "closed complex" the promoter DNA is still fully double-stranded. [7]

RNA polymerase, assisted by one or more general transcription factors, then unwinds approximately 14 base pairs of DNA to form an RNA polymerase-promoter "open complex". In the "open complex" the promoter DNA is partly unwound and single-stranded. The exposed, single-stranded DNA is referred to as the "transcription bubble." [7]

RNA polymerase, assisted by one or more general transcription factors, then selects a transcription start site in the transcription bubble, binds to an initiating NTP and an extending NTP (or a short RNA primer and an extending NTP) complementary to the transcription start site sequence, and catalyzes bond formation to yield an initial RNA product. [7]

In bacteria, RNA polymerase holoenzyme consists of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω subunit. In bacteria, there is one general RNA transcription factor known as a sigma factor. RNA polymerase core enzyme binds to the bacterial general transcription (sigma) factor to form RNA polymerase holoenzyme and then binds to a promoter. [7] (RNA polymerase is called a holoenzyme when sigma subunit is attached to the core enzyme which is consist of 2 α subunits, 1 β subunit, 1 β' subunit only). Unlike eukaryotes, the initiating nucleotide of nascent bacterial mRNA is not capped with a modified guanine nucleotide. The initiating nucleotide of bacterial transcripts bears a 5′ triphosphate (5′-PPP), which can be used for genome-wide mapping of transcription initiation sites. [35]

In archaea and eukaryotes, RNA polymerase contains subunits homologous to each of the five RNA polymerase subunits in bacteria and also contains additional subunits. In archaea and eukaryotes, the functions of the bacterial general transcription factor sigma are performed by multiple general transcription factors that work together. [7] In archaea, there are three general transcription factors: TBP, TFB, and TFE. In eukaryotes, in RNA polymerase II-dependent transcription, there are six general transcription factors: TFIIA, TFIIB (an ortholog of archaeal TFB), TFIID (a multisubunit factor in which the key subunit, TBP, is an ortholog of archaeal TBP), TFIIE (an ortholog of archaeal TFE), TFIIF, and TFIIH. The TFIID is the first component to bind to DNA due to binding of TBP, while TFIIH is the last component to be recruited. In archaea and eukaryotes, the RNA polymerase-promoter closed complex is usually referred to as the "preinitiation complex." [36]

Transcription initiation is regulated by additional proteins, known as activators and repressors, and, in some cases, associated coactivators or corepressors, which modulate formation and function of the transcription initiation complex. [7]

Promoter escape Edit

After the first bond is synthesized, the RNA polymerase must escape the promoter. During this time there is a tendency to release the RNA transcript and produce truncated transcripts. This is called abortive initiation, and is common for both eukaryotes and prokaryotes. [37] Abortive initiation continues to occur until an RNA product of a threshold length of approximately 10 nucleotides is synthesized, at which point promoter escape occurs and a transcription elongation complex is formed.

Mechanistically, promoter escape occurs through DNA scrunching, providing the energy needed to break interactions between RNA polymerase holoenzyme and the promoter. [38]

In bacteria, it was historically thought that the sigma factor is definitely released after promoter clearance occurs. This theory had been known as the obligate release model. However, later data showed that upon and following promoter clearance, the sigma factor is released according to a stochastic model known as the stochastic release model. [39]

In eukaryotes, at an RNA polymerase II-dependent promoter, upon promoter clearance, TFIIH phosphorylates serine 5 on the carboxy terminal domain of RNA polymerase II, leading to the recruitment of capping enzyme (CE). [40] [41] The exact mechanism of how CE induces promoter clearance in eukaryotes is not yet known.

Elongation Edit

One strand of the DNA, the template strand (or noncoding strand), is used as a template for RNA synthesis. As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing complementarity with the DNA template to create an RNA copy (which elongates during the traversal). Although RNA polymerase traverses the template strand from 3' → 5', the coding (non-template) strand and newly formed RNA can also be used as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA molecule from 5' → 3', an exact copy of the coding strand (except that thymines are replaced with uracils, and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one fewer oxygen atom) in its sugar-phosphate backbone). [ citation needed ]

mRNA transcription can involve multiple RNA polymerases on a single DNA template and multiple rounds of transcription (amplification of particular mRNA), so many mRNA molecules can be rapidly produced from a single copy of a gene. [ citation needed ] The characteristic elongation rates in prokaryotes and eukaryotes are about 10-100 nts/sec. [42] In eukaryotes, however, nucleosomes act as major barriers to transcribing polymerases during transcription elongation. [43] [44] In these organisms, the pausing induced by nucleosomes can be regulated by transcription elongation factors such as TFIIS. [44]

Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure. [ citation needed ]

Termination Edit

Bacteria use two different strategies for transcription termination – Rho-independent termination and Rho-dependent termination. In Rho-independent transcription termination, RNA transcription stops when the newly synthesized RNA molecule forms a G-C-rich hairpin loop followed by a run of Us. When the hairpin forms, the mechanical stress breaks the weak rU-dA bonds, now filling the DNA–RNA hybrid. This pulls the poly-U transcript out of the active site of the RNA polymerase, terminating transcription. In the "Rho-dependent" type of termination, a protein factor called "Rho" destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex. [45]

Transcription termination in eukaryotes is less well understood than in bacteria, but involves cleavage of the new transcript followed by template-independent addition of adenines at its new 3' end, in a process called polyadenylation. [46]


RESULTS

Genome-wide characterization of Foxa2 binding sites in mDA progenitors

To identify direct target genes of Foxa2, we chose a genome-wide approach using ChIP-seq (Barski et al., 2007 Johnson et al., 2007 Robertson et al., 2007). As sufficient numbers of mDA progenitors could not be obtained from mouse embryos at E10.5, we obtained mDA progenitors by in vitro differentiation of NestinLmx1a-transfected ES cells (Andersson et al., 2006b). We efficiently generated 60-80% pure mDA progenitors after 5 days of differentiation of NestinLmx1a-transfected ES cells in the presence of Fgf8 and Shh (supplementary material Fig. S1A,B) (Andersson et al., 2006b). These in vitro generated mDA progenitors robustly expressed Foxa2 at day 5 and differentiated further into Nurr1 + Th + mature mDA neurons by day 8, as previously described (supplementary material Fig. S1A,B) (Andersson et al., 2006b). In addition, in vitro differentiated mDA progenitors have been shown to develop into bona fide mDA neurons that integrate and innervate the striatum of 6-hydroxydopamine lesioned neonatal rats when transplanted into the striatum (Friling et al., 2009), indicating that they have a similar functional potential as embryonic mDA progenitors.

Ten million NestinLmx1a-transfected ES cells were used to prepare chromatin and a Foxa2-specific antibody was used for ChIP (Mavromatakis et al., 2011). Following sequencing, peak calling was performed using MACS with a threshold of FDR<5%, resulting in a dataset of 9160 high confidence peaks. All FBRs were annotated to genes that have their transcription start site (TSS) closest to the peak. Several known and validated FBRs in the CNS were included in this study, such as the Shh brain enhancer (Epstein et al., 1999), Foxa2 enhancer (Sasaki and Hogan, 1996), Foxa1 promoter (Peterson et al., 1997) and Ferd3l (Nato3) promoter (Mansour et al., 2011), but not the Nkx2.2 and Th promoters (Lin et al., 2009). As the latter two genes are not expressed in our in vitro ES cell differentiation system, these gene loci might be in a closed chromatin configuration and inaccessible for Foxa2 binding.

Validation of this dataset was performed by ChIP-qPCR on a selected subset of FBRs located in genes known to be expressed in the midbrain FP, including Lmx1a, Lmx1b (Andersson et al., 2006b), Msx1 (Andersson et al., 2006b), Ferd3l (Ono et al., 2010 Segev et al., 2001), Corin (Ono et al., 2007), Slit2 (Wang et al., 1999), Cobl (Gasca et al., 1995), Rfx3 (Baas et al., 2006), Bmp6 and Bmp7 (Kim et al., 2001), using chromatin from mouse E12.5 ventral midbrain that contains mostly FP tissue. Foxa2-dependent enrichment was confirmed in 17 out of 19 of these FBRs (Fig. 1A,B). We also confirmed Foxa2 binding in 16 out of 20 additional FBRs located in selected transcriptionally active genes (supplementary material Fig. S2). Together, these results indicate a high-quality dataset.

Chromatin immunoprecipitation of in vitro-differentiated murine mDA progenitors. (A) Peaks of Foxa2 enrichment identified by ChIP-seq are shown for selected mDA progenitor-expressed genes. The plots show the peak heights as numbers of sequenced reads. Arrows indicate peak locations called by MACS. (B) Independent ChIP-qPCR experiments were performed with chromatin prepared from E12.5 ventral midbrain tissue for validation of selected candidate targets, where enrichment is shown as percentage of input (*P<0.05). Multiple peaks within a gene region, corresponding to the list in supplementary material Table S1, are numbered E1 (for Element 1), and so on, from left to right. -ve1, Foxa2 open reading frame (Mavromatakis et al., 2011). Error bars indicate s.e.m.

Chromatin immunoprecipitation of in vitro-differentiated murine mDA progenitors. (A) Peaks of Foxa2 enrichment identified by ChIP-seq are shown for selected mDA progenitor-expressed genes. The plots show the peak heights as numbers of sequenced reads. Arrows indicate peak locations called by MACS. (B) Independent ChIP-qPCR experiments were performed with chromatin prepared from E12.5 ventral midbrain tissue for validation of selected candidate targets, where enrichment is shown as percentage of input (*P<0.05). Multiple peaks within a gene region, corresponding to the list in supplementary material Table S1, are numbered E1 (for Element 1), and so on, from left to right. -ve1, Foxa2 open reading frame (Mavromatakis et al., 2011). Error bars indicate s.e.m.

Binding site analysis

The peaks in FBRs had an average width of 290 bp and an average height of 16 tags (Fig. 1A). De novo motif analysis of their sequences using MEME (Zhang et al., 2008) revealed, as expected, a substantial enrichment in Foxa2 motif sites (Fig. 2A). Of the peaks, 7298 (80%) contained at least one Foxa2 motif within 60 bp of the peak summit (Fig. 2A). To further characterize the distribution of the peaks, we compared their locations with UCSC RefSeq genes using CEAS (Shin et al., 2009) and found that 46.7% of the high confidence peaks were located within a gene region extending from the TSS to the 3′UTR downstream (Fig. 2B). In total, 44.2% of all peaks were located within an intron, 0.7% within an exon, 0.8% within the 5′UTR and 1% within the 3′UTR. The remaining Foxa2 peaks were in intergenic regions located further than 10 kb upstream or downstream of the annotated genes. Interestingly, Foxa2 binding was enriched in promoter regions, with 3.7% of peaks located within 1 kb and 11% within 10 kb upstream of the TSS when compared with the genome background (Fig. 2B). Foxa2 binding was not enriched in regions directly downstream of genes (Fig. 2B).

Analyses of motifs and peak distribution in Foxa2-bound regions. (A) MEME identified de novo transcription factor binding motifs within 60 bp of the center of Foxa2 peaks. Genomatix results include: (1) the number of peaks that contained the transcription factor binding motif (2) the Z-score of over-representation against the selected background (randomized sequences) and (3) the over-representation against the mouse genome, which is the fold factor of the number of Foxa2 motifs within the ChIP-seq dataset compared with an equally sized sample of the genome. (B) The distribution and enrichment relative to the genome background of Foxa2 peaks with respect to different gene features. The P-value is given in parenthesis.

Analyses of motifs and peak distribution in Foxa2-bound regions. (A) MEME identified de novo transcription factor binding motifs within 60 bp of the center of Foxa2 peaks. Genomatix results include: (1) the number of peaks that contained the transcription factor binding motif (2) the Z-score of over-representation against the selected background (randomized sequences) and (3) the over-representation against the mouse genome, which is the fold factor of the number of Foxa2 motifs within the ChIP-seq dataset compared with an equally sized sample of the genome. (B) The distribution and enrichment relative to the genome background of Foxa2 peaks with respect to different gene features. The P-value is given in parenthesis.

Foxa2 and Shh signaling have similar roles in ventral midbrain patterning (Blaess et al., 2006 Lin et al., 2009) however, we recently found that Foxa2 also attenuates Shh signaling by repressing the expression of its intracellular transducers Gli1, Gli2 and Gli3 (Mavromatakis et al., 2011). We therefore examined whether targets of Gli1 that were previously identified by ChIP-chip in neural progenitors (Vokes et al., 2007) are also bound by Foxa2. Foxa2 binding overlapped with ten out of 25 Gli1 binding regions, including those located in Ptch1, Nkx2.9, Gli1, Hhip, Shh and Prdx2, many of which are known to function as Gli-dependent enhancers (Sasaki et al., 1997 Dai et al., 1999 Santagati et al., 2003 Agren et al., 2004 Hallikas et al., 2006) (Fig. 3 and supplementary material Table S3). These results therefore suggest that Foxa2 and Gli1 co-regulate a number of common targets through nearby binding sites in the genome. In addition, Foxa2 also bound sites in Gli2 and Gli3 (Mavromatakis et al., 2011) that are not known to be bound by Gli1 (Fig. 3), suggesting the possibility that Foxa2 directly regulates all three Gli genes.

Foxa2 peaks in genes of the Shh pathway. (A) Foxa2 binding sites were observed in genomic regions previously identified as bound by Gli1 (Vokes et al., 2007), including in the genes Ptch1, Nkx2-9, Gli1, Hhip and Foxa2. Peaks found in the Shh and Gli3 genes are also included. (B) ChIP-qPCR experiments performed with chromatin from mouse E12.5 ventral midbrain tissue were used to validate the data (*P<0.05). Error bars indicate s.e.m. -ve1, Foxa2 open reading frame -ve2, Gli-ve (an upstream region of Gli2) (Mavromatakis et al., 2011).

Foxa2 peaks in genes of the Shh pathway. (A) Foxa2 binding sites were observed in genomic regions previously identified as bound by Gli1 (Vokes et al., 2007), including in the genes Ptch1, Nkx2-9, Gli1, Hhip and Foxa2. Peaks found in the Shh and Gli3 genes are also included. (B) ChIP-qPCR experiments performed with chromatin from mouse E12.5 ventral midbrain tissue were used to validate the data (*P<0.05). Error bars indicate s.e.m. -ve1, Foxa2 open reading frame -ve2, Gli-ve (an upstream region of Gli2) (Mavromatakis et al., 2011).

Foxa2 directly and positively regulates the expression of key determinants of the mDA lineage

In order to identify a core set of Foxa2 target genes that are functionally relevant for mDA neuronal progenitor and FP specification, we mapped FBRs to the nearest TSS of Ensembl genes. We then compared this set of Foxa2 candidate target genes with a list of 444 genes that are specifically expressed in the midbrain FP as generated by expression profiling of E10.5 mouse embryos (Gennet et al., 2011). Interestingly, 40% (178) of these FP genes were bound by Foxa2 (hypergeometric analysis, P=2.27×10 −40 Fig. 4C), suggesting that Foxa2 directly regulates many of the genes expressed in this tissue. We then surveyed the functional annotations of these putative Foxa2 targets using Gene Ontology (GO). A wide spectrum of biological processes was significantly enriched, including cell growth, embryonic development, cell differentiation and cell communication (Fig. 4B). Foxa2 targets also display a diverse range of molecular functions, including transcriptional regulator activity, receptor binding and calcium ion binding (Fig. 4D).

Foxa2 directly regulates genes with multiple developmental functions. (A) The overlap between genes associated with Foxa2 binding events and genes expressed predominantly in the floor plate (FP) or the ventrolateral region (VL) of the midbrain of E10.5 mouse embryos. FP and VL data were obtained from Gennet et al. (Gennet et al., 2011). (B) Enrichment of Gene Ontology (GO) terms from GO Slim for biological processes associated with FP-specific Foxa2-bound genes. The number of target genes in each category is shown. (C) The number of Foxa2 target genes in each region with statistical analysis. (D) Enrichment of GO terms for molecular functions (GO Slim) associated with FP-specific Foxa2-bound genes. (E) qRT-PCR analysis of selected FP-specific Foxa2-bound genes using ventral midbrain tissue from E10.5 wild-type and Foxa1/2 cko embryos (*P<0.05). Error bars indicate s.e.m.

Foxa2 directly regulates genes with multiple developmental functions. (A) The overlap between genes associated with Foxa2 binding events and genes expressed predominantly in the floor plate (FP) or the ventrolateral region (VL) of the midbrain of E10.5 mouse embryos. FP and VL data were obtained from Gennet et al. (Gennet et al., 2011). (B) Enrichment of Gene Ontology (GO) terms from GO Slim for biological processes associated with FP-specific Foxa2-bound genes. The number of target genes in each category is shown. (C) The number of Foxa2 target genes in each region with statistical analysis. (D) Enrichment of GO terms for molecular functions (GO Slim) associated with FP-specific Foxa2-bound genes. (E) qRT-PCR analysis of selected FP-specific Foxa2-bound genes using ventral midbrain tissue from E10.5 wild-type and Foxa1/2 cko embryos (*P<0.05). Error bars indicate s.e.m.

To further address the functional significance of Foxa2 binding, we analyzed by qRT-PCR the expression of some of these targets in the midbrain FP of wild-type and Foxa1/2 conditional mutant mouse embryos at E10.5. We selected targets involved in transcriptional regulation and receptor binding, two classes of genes with potentially important roles in regulating cell identity. The expression of nine out of 17 Foxa2 targets with transcriptional activity and nine out of 18 targets with receptor binding activity was severely reduced in En1CreFoxa1 flox/flox Foxa2 flox/flox (referred to as Foxa1/2 cko) mutants compared with control embryos (Fig. 4E). Analysis of this sample therefore suggests that

50% of the candidate FP targets identified in Fig. 4A are regulated in Foxa1/2 cko embryos. Interestingly, five of the 17 bound and regulated targets identified in Fig. 4E, namely Lmx1a (Andersson et al., 2006b), Msx1 (Andersson et al., 2006b), Ferdl3 (Ono et al., 2010), Foxa1 and Foxa2 (Andersson et al., 2006b), are known to have essential roles in regulating mDA progenitor specification, suggesting that Foxa2 directly controls the expression of essential determinants of the lineage. Lmx1b also has a role in regulating the specification and differentiation of mDA neurons (Andersson et al., 2006b) and is bound (supplementary material Table S1) and regulated (Lin et al., 2009) by Foxa2. Its omission from the FP-specific gene list is presumably because its expression is not restricted to the FP but is also present in lateral midbrain progenitors. Taken together, these results suggest that Foxa2 has multiple direct inputs into the molecular pathways regulating the specification and differentiation of the mDA progenitor lineage.

Foxa2 directly and positively regulates the expression of FP axon guidance molecules

In zebrafish embryos, Foxa2 is involved in maintenance of the differentiated character of the FP (Norton et al., 2005). In mouse embryos, Foxa2 is required for the induction of the FP through regulation of the secreted morphogen Shh (Ang and Rossant, 1994 Weinstein et al., 1994), but whether it also has later roles in FP function is largely unknown. The FP expresses secreted and transmembrane proteins that regulate the growth of commissural axons across the midline (reviewed by Brose and Tessier-Lavigne, 2000). FP-specific Foxa2 targets include genes that encode factors with axon guidance activities (reviewed by Kolodkin and Tessier-Lavigne, 2011), such as the Slit ligands Slit2 and Slit3, the TGFβ superfamily members Tgfβ2 and Bmp7, the cell adhesion molecules of the immunoglobulin superfamily Alcam, Chl1 and Nrcam, and Shh (supplementary material Table S2). qRT-PCR experiments indicate that the expression of seven out of nine (78%) of these genes is reduced in Foxa1/2 cko embryos (Fig. 4E), demonstrating a novel role for Foxa2 in promoting the expression of factors that control the guidance of axons across the ventral midline. Since the FP is transformed to more dorsal GABAergic progenitors, we cannot formally rule out the possibility that some of the changes in gene expression (Fig. 4E) in Foxa1/2 cko mutants at E10.5 are a consequence of cell fate transformation (Lin et al., 2009). Therefore, we examined Slit1, Slit2 and Shh expression also in NestinCre/+Foxa1 −/− Foxa2 flox/flox embryos, in which partial specification of the FP still occurs when Foxa1/2 are deleted at E10.5. Shh, Slit1 and Slit2 are not expressed in the FP of these mutant embryos. By contrast, these genes are still expressed in the FP of wild-type embryos at E12.5 (supplementary material Fig. S3).

Analysis of enhancer activity by transient lacZ transgenesis in mouse and chick embryos

To explore the contributions of FBRs to the cis-regulation of gene expression, we selected those associated with Lmx1a, Lmx1b and Corin, three genes expressed in mDA progenitors, for functional analysis by transgenesis. Foxa2 bound to three regions surrounding Lmx1a that are conserved in 15-18 species, referred to as E1, E2 and E3, and two conserved regions around Lmx1b termed E1 and E2 (supplementary material Table S4). These conserved elements were cloned into a minimal β-globin promoter-lacZ reporter plasmid and injected into fertilized mouse zygotes to assess their enhancer activity by X-gal staining at E10.5.

The three Lmx1a genomic regions, 522 bp E1, 881 bp E2 and 446 bp E3, promoted lacZ expression in transgenic embryos (Fig. 5A-C″). Lmx1a E1 transgenic embryos displayed β-galactosidase (β-gal) activity throughout the ventral neural tube, similar to the Foxa2 expression pattern (Fig. 5A-A″), whereas the pattern of β-gal activity in Lmx1a E2 and E3 transgenic embryos was restricted to the rostral part of the endogenous expression domain of Lmx1a (Andersson et al., 2006b), i.e. the caudal forebrain and anterior midbrain (Fig. 5B-C″). The identification of three enhancers for Lmx1a suggests that multiple cis-regulatory elements cooperate to specify the precise spatial expression pattern of this gene.

Successful prediction of FP enhancers from Foxa2 binding events in mDA progenitors. (A-E) E10.0-10.5 transgenic mouse embryos expressing lacZ from a minimal promoter and candidate enhancers bound by Foxa2 in the Lmx1a, Lmx1b and Corin genes. (A′-E″) Histological sections through the midbrain (A′-E′) and the spinal cord (A″-E″). (F) Whole-mount lacZ staining of transgenic embryos containing the Lmx1b E1 enhancer with a mutation in the Foxa2 binding site. (G) Multiple species alignment of the Lmx1b E1 enhancer using ClustalW, with the Foxa2 motif indicated by the red box and the nucleotide substitutions in the mutated version of the motif in red. Similar staining in midbrain tissue was observed in at least three transgenic embryos for each of the enhancers illustrated in this figure. Scale bars: 100 μm.

Successful prediction of FP enhancers from Foxa2 binding events in mDA progenitors. (A-E) E10.0-10.5 transgenic mouse embryos expressing lacZ from a minimal promoter and candidate enhancers bound by Foxa2 in the Lmx1a, Lmx1b and Corin genes. (A′-E″) Histological sections through the midbrain (A′-E′) and the spinal cord (A″-E″). (F) Whole-mount lacZ staining of transgenic embryos containing the Lmx1b E1 enhancer with a mutation in the Foxa2 binding site. (G) Multiple species alignment of the Lmx1b E1 enhancer using ClustalW, with the Foxa2 motif indicated by the red box and the nucleotide substitutions in the mutated version of the motif in red. Similar staining in midbrain tissue was observed in at least three transgenic embryos for each of the enhancers illustrated in this figure. Scale bars: 100 μm.

One Lmx1b element, the 206 bp E1 region, governed reporter gene expression in the FP of the midbrain and spinal cord and in future auditory tissue (Fig. 5D-D″), similar to the endogenous domains of Lmx1b expression (Guo et al., 2007). By contrast, none of the five transgenic embryos containing the Lmx1b E2 construct that were analyzed expressed β-gal at E10.5. We generated a mutation in the single Foxa2 binding site in the Lmx1b E1 enhancer to determine the requirement for Foxa2 binding for the activity of this element (Fig. 5G). The Lmx1b E1 Foxa2 mutant construct failed to drive β-gal expression in the FP throughout the anteroposterior axis of the embryo, heart and dorsal part of the otic vessel, whereas ectopic β-gal expression in the somites was retained (Fig. 5F), indicating that expression in the FP is dependent on the Foxa2 site. However, Foxa2 is not expressed in the heart and otic vessel, so the dependence of β-gal expression on the Foxa2 site in these structures suggests that other forkhead proteins with similar binding properties occupy these sites.

Corin is a cell surface protease that is specifically expressed in mDA progenitors in the FP (Ono et al., 2007). Two FBRs were found in the Corin gene, only one of which is conserved. This element, called E1, was able to promote β-gal expression in the FP in the midbrain and caudal regions except in rhombomere 1 (Fig. 5E-E″).

Taken together, these results demonstrate that at least some of the FBRs in the three genes analyzed have enhancer regulatory functions in the FP and that Foxa2 binding is required for enhancer activity in the case of the Lmx1b E1 element.

We also examined whether the FBRs in Lmx1a, Lmx1b and Corin function as enhancers in chick by electroporating the same constructs as above into the midbrain of stage 10 chick embryos and analyzing β-gal activity 48 hours later. In this assay, Lmx1b E1 and Corin E1 promoted β-gal expression in the FP, whereas none of the three FBRs of Lmx1a showed activity (Fig. 6A,B data not shown). Two additional FBRs, in Bmp7 and Slit2, also promoted β-gal expression in the FP (Fig. 6A). These results indicate that some FBRs have enhancer functions in both mouse and chick.

Screening of candidate enhancer elements bound by Foxa2 in chick embryos. (A) The midbrain neuroepithelium of stage 10 chick embryos was electroporated with selected candidate enhancers in the Lmx1a, Lmx1b, Bmp7, Slit2 and Corin genes cloned in a lacZ reporter construct. A GFP construct was co-electroporated with the test constructs to assess electroporation efficiency and the empty lacZ reporter construct was used as control (not shown). Embryos harvested 48 hours post-electroporation are shown in lateral view (left) and dorsal view (right). Dashed lines indicate midbrain-hindbrain boundary. (B) Coronal section of a chick embryo electroporated with the Lmx1b E1 lacZ construct showing GFP + electroporated cells in most of the midbrain (green), whereas β-gal expression (red) is restricted to the ventral midbrain. The right-hand image is a merge.

Screening of candidate enhancer elements bound by Foxa2 in chick embryos. (A) The midbrain neuroepithelium of stage 10 chick embryos was electroporated with selected candidate enhancers in the Lmx1a, Lmx1b, Bmp7, Slit2 and Corin genes cloned in a lacZ reporter construct. A GFP construct was co-electroporated with the test constructs to assess electroporation efficiency and the empty lacZ reporter construct was used as control (not shown). Embryos harvested 48 hours post-electroporation are shown in lateral view (left) and dorsal view (right). Dashed lines indicate midbrain-hindbrain boundary. (B) Coronal section of a chick embryo electroporated with the Lmx1b E1 lacZ construct showing GFP + electroporated cells in most of the midbrain (green), whereas β-gal expression (red) is restricted to the ventral midbrain. The right-hand image is a merge.

Foxa2 inhibits alternative midbrain neuronal fates by direct binding to cell fate determinants

Our earlier functional studies showed that Foxa2 promotes the fate specification of mDA neurons in part by inhibiting the expression of a dorsal and ventrolateral midbrain determinant, Helt, which is essential for the development of most midbrain GABAergic neurons (Miyoshi et al., 2004 Guimera et al., 2006 Nakatani et al., 2007). Since Foxa2 has been shown to act as a transcriptional activator in transfection experiments in vitro, it was unclear whether Foxa2 directly represses Helt to suppress the GABAergic cell fate. Examination of the Foxa2 ChIP-seq data showed that Helt is indeed bound by Foxa2 and is therefore likely to be a direct target of Foxa2 repression (supplementary material Table S1). In addition, Foxa2 also represses a ventrolateral midbrain determinant, Nkx2.2, probably through direct binding (Lin et al., 2009).

Since Foxa2 directly represses ventrolateral determinants such as Helt and Nkx2.2 (Lin et al., 2009), we rationalized that intersecting our list of FBR-associated genes (FBGs) with a list of genes specifically expressed in ventrolateral midbrain might lead to the identification of additional genes that are normally repressed by Foxa2 in the FP. A list of 312 genes specifically expressed in the ventrolateral midbrain was recently published (Gennet et al., 2011). Of the FBGs identified in this study, 108 were present in this list (hypergeometric analysis, P=2.47×10 −19 Fig. 4E), including genes that have been shown to be repressed in Foxa1/2 cko mutants, such as Gli1, Gli2 and Gli3 (Mavromatakis et al., 2011), thus providing support for the initial rationale. Examination of the expression of five other transcription factors, namely Tle4, Otx1, Sox1, Tal2 and Pax3, showed that all but Pax3 exhibited expanded ventral midbrain expression in E10.5 Foxa1/2 cko embryos (Fig. 7). These results strongly suggest that Foxa2-bound genes that are expressed in the ventrolateral midbrain represent an enriched list of genes that are repressed by Foxa2 in the FP. Similar to genes that are positively regulated by Foxa2 in the FP, some of the repressed genes might be indirectly regulated by Foxa2, although we have found that Otx1 and Ptch1 are ectopically expressed in the FP of NestinCreFoxa1 −/− Foxa2 flox/flox mutants in the absence of Foxa1 and Foxa2 at E12.5 (supplementary material Fig. S3). These results are consistent with the idea that these genes are directly repressed by Foxa2.

Inhibition of ventrolateral determinants in the FP of the midbrain. (A-D′) Tle4, Otx1, Sox1 and Tal2 are ectopically expressed in the FP in addition to their normal expression in the dorsal and/or ventrolateral midbrain regions of mouse E10.5 Foxa1/2 cko mutants, as compared with control littermates. (E,E′) No change was observed for Pax3 expression in the dorsal midbrain between Foxa1/2 cko mutants and control littermates.

Inhibition of ventrolateral determinants in the FP of the midbrain. (A-D′) Tle4, Otx1, Sox1 and Tal2 are ectopically expressed in the FP in addition to their normal expression in the dorsal and/or ventrolateral midbrain regions of mouse E10.5 Foxa1/2 cko mutants, as compared with control littermates. (E,E′) No change was observed for Pax3 expression in the dorsal midbrain between Foxa1/2 cko mutants and control littermates.


16.4 Eukaryotic Transcription Gene Regulation

Like prokaryotic cells, the transcription of genes in eukaryotes requires the actions of an RNA polymerase to bind to a sequence upstream of a gene to initiate transcription. However, unlike prokaryotic cells, the eukaryotic RNA polymerase requires other proteins, or transcription factors, to facilitate transcription initiation. Transcription factors are proteins that bind to the promoter sequence and other regulatory sequences to control the transcription of the target gene. RNA polymerase by itself cannot initiate transcription in eukaryotic cells. Transcription factors must bind to the promoter region first and recruit RNA polymerase to the site for transcription to be established.

View the process of transcription—the making of RNA from a DNA template.

The Promoter and the Transcription Machinery

Genes are organized to make the control of gene expression easier. The promoter region is immediately upstream of the coding sequence. This region can be short (only a few nucleotides in length) or quite long (hundreds of nucleotides long). The longer the promoter, the more available space for proteins to bind. This also adds more control to the transcription process. The length of the promoter is gene-specific and can differ dramatically between genes. Consequently, the level of control of gene expression can also differ quite dramatically between genes. The purpose of the promoter is to bind transcription factors that control the initiation of transcription.

Within the promoter region, just upstream of the transcriptional start site, resides the TATA box. This box is simply a repeat of thymine and adenine dinucleotides (literally, TATA repeats). RNA polymerase binds to the transcription initiation complex, allowing transcription to occur. To initiate transcription, a transcription factor (TFIID) is the first to bind to the TATA box. Binding of TFIID recruits other transcription factors, including TFIIB, TFIIE, TFIIF, and TFIIH to the TATA box. Once this complex is assembled, RNA polymerase can bind to its upstream sequence. When bound along with the transcription factors, RNA polymerase is phosphorylated. This releases part of the protein from the DNA to activate the transcription initiation complex and places RNA polymerase in the correct orientation to begin transcription DNA-bending protein brings the enhancer, which can be quite a distance from the gene, in contact with transcription factors and mediator proteins (Figure 16.9).

In addition to the general transcription factors, other transcription factors can bind to the promoter to regulate gene transcription. These transcription factors bind to the promoters of a specific set of genes. They are not general transcription factors that bind to every promoter complex, but are recruited to a specific sequence on the promoter of a specific gene. There are hundreds of transcription factors in a cell that each bind specifically to a particular DNA sequence motif. When transcription factors bind to the promoter just upstream of the encoded gene, it is referred to as a cis-acting element , because it is on the same chromosome just next to the gene. The region that a particular transcription factor binds to is called the transcription factor binding site . Transcription factors respond to environmental stimuli that cause the proteins to find their binding sites and initiate transcription of the gene that is needed.

Enhancers and Transcription

In some eukaryotic genes, there are regions that help increase or enhance transcription. These regions, called enhancers , are not necessarily close to the genes they enhance. They can be located upstream of a gene, within the coding region of the gene, downstream of a gene, or may be thousands of nucleotides away.

Enhancer regions are binding sequences, or sites, for transcription factors. When a DNA-bending protein binds, the shape of the DNA changes (Figure 16.9). This shape change allows for the interaction of the activators bound to the enhancers with the transcription factors bound to the promoter region and the RNA polymerase. Whereas DNA is generally depicted as a straight line in two dimensions, it is actually a three-dimensional object. Therefore, a nucleotide sequence thousands of nucleotides away can fold over and interact with a specific promoter.

Turning Genes Off: Transcriptional Repressors

Like prokaryotic cells, eukaryotic cells also have mechanisms to prevent transcription. Transcriptional repressors can bind to promoter or enhancer regions and block transcription. Like the transcriptional activators, repressors respond to external stimuli to prevent the binding of activating transcription factors.


Transcription factors make the right contacts

Cell identity is shaped by a complex interplay between transcription factors, enhancers and genome organisation. A study now reveals a dynamic role for the transcription factor KLF4 in directing gene regulatory interactions during pluripotent cell reprogramming, demonstrating that transcription factors can function as chromatin organisers.

The discovery that adult cells can be reprogrammed into pluripotent stem cells (PSCs) with full developmental potential changed the way in which we think about how cell identity is controlled 1 . Reprogramming is achieved by overexpressing four transcription factors—OCT4, KLF4, SOX2 and cMYC—in cells that are nurtured by supportive culture conditions. Determining the molecular events that occur after transcription factor binding has far reaching consequences for our understanding of how cell identity is altered in development and disease. Modulating these processes could lead to improvements in the production of therapeutic cell types. In this issue of Nature Cell Biology, Di Giammartino et al. add to this understanding by revealing that the reprogramming factor KLF4 directs regulatory elements to control their target gene expression through long-range three-dimensional chromatin looping 2 .


Short-term TF binding, long-term expression

While computationally modeling how gene regulatory dynamics unfold over hours can reveal the topology of plant gene regulatory networks, performing very fine scaled time-series experimental analyses of TF-target interactions over the order of minutes have also proven fruitful. Observing gene regulatory responses that occur within smaller time-frames have been able to capture TF-DNA dynamics that would have otherwise been missed in over longer periods. Indeed, such approaches have uncovered new


16.4 Eukaryotic Transcriptional Gene Regulation

In this section, you will explore the following question:

Connection for AP ® Courses

To start transcription, general transcription factors must first bind to a specific area on the DNA called the TATA box and then recruit RNA polymerase to that location. In addition, other areas on the DNA called enhancer regions help augment transcription. Transcription factors can bind to enhancer regions to increase or prevent transcription.

Information presented and the examples highlighted in the section support concepts outlined in Big Idea 3 of the AP ® Biology Curriculum Framework. The Learning Objectives listed in the Curriculum Framework provide a transparent foundation for the AP ® Biology course, an inquiry-based laboratory experience, instructional activities, and AP ® exam questions. A Learning Objective merges required content with one or more of the seven Science Practices

Big Idea 3 Living systems store, retrieve, transmit and respond to information essential to life processes.
Enduring Understanding 3.B Expression of genetic information involves cellular and molecular mechanisms.
Essential Knowledge 3.B.1 Gene regulation results in differential gene expression, leading to cell specialization
Science Practice 7.1 The student can connect phenomena and models across spatial and temporal scales.
Learning Objective 3.18 The student is able to describe the connection between the regulation of gene expression and observed differences between different kinds of organisms
Essential Knowledge 3.B.1 Gene regulation results in differential gene expression, leading to cell specialization
Science Practice 7.1 The student can connect phenomena and models across spatial and temporal scales
Learning Objective 3.19 The student is able to describe the connection between the regulation of gene expression and observed differences between individuals in a population
Essential Knowledge 3.B.1 Gene regulation results in differential gene expression, leading to cell specialization.
Science Practice 6.2 The student can construct explanations of phenomena based on evidence produced through scientific practices
Learning Objective 3.20 The student is able to explain how the regulation of gene expression is essential for the processes and structures that support efficient cell function.
Essential Knowledge 3.B.1 1 Gene regulation results in differential gene expression, leading to cell specialization.
Science Practice 1.4 The student can use representations and models to analyze situations or solve problems qualitatively and quantitatively.
Learning Objective 3.21 The student can use representations to describe how gene regulation influences cell products and function.

Teacher Support

Have students create a visual representation using colored paper that shows DNA transcription and the role of enhancers and repressors in transcription.

The Science Practice Challenge Questions contain additional test questions for this section that will help you prepare for the AP exam. These questions address the following standards:
[APLO 3.18]

Like prokaryotic cells, the transcription of genes in eukaryotes requires the actions of an RNA polymerase to bind to a sequence upstream of a gene to initiate transcription. However, unlike prokaryotic cells, the eukaryotic RNA polymerase requires other proteins, or transcription factors, to facilitate transcription initiation. Transcription factors are proteins that bind to the promoter sequence and other regulatory sequences to control the transcription of the target gene. RNA polymerase by itself cannot initiate transcription in eukaryotic cells. Transcription factors must bind to the promoter region first and recruit RNA polymerase to the site for transcription to be established.

The activity of transcription factors can regulate differential gene expression in cells, resulting in the development of different cell products and functions. For example, scientists have found that primary sexual characteristics is regulated by several genes Figure 16.9. In the fruit fly Drosophila, the slx gene determines sex. This gene is expressed when the organism has two copies of the X chromosome. The gene product for slx binds to the mRNA of the tra gene and regulates its splicing. In the presence of slx, tra is spliced into its female form and influences the expression of dsx and fru to result in female sexual characteristics. In the absence of slx, tra is spliced into its male form and male sexual characteristics result.

Link to Learning

View the process of transcription—the making of RNA from a DNA template—at this site.

  1. DNA unwinds, transcription factors bind, the termination complex forms, and DNA polymerase adds nucleotides to the mRNA.
  2. DNA unwinds, transcription factors bind, and RNA polymerase adds nucleotides to the mRNA.
  3. The transcription complex forms, transcription factors add nucleotides to the forming mRNA, and the mRNA disconnects from the DNA.
  4. Elongation occurs, followed by the formation of the transcription initiation complex and the disconnection of the mRNA strand from DNA.

The Promoter and the Transcription Machinery

Genes are organized to make the control of gene expression easier. The promoter region is immediately upstream of the coding sequence. This region can be short (only a few nucleotides in length) or quite long (hundreds of nucleotides long). The longer the promoter, the more available space for proteins to bind. This also adds more control to the transcription process. The length of the promoter is gene-specific and can differ dramatically between genes. Consequently, the level of control of gene expression can also differ quite dramatically between genes. The purpose of the promoter is to bind transcription factors that control the initiation of transcription.

Within the promoter region, just upstream of the transcriptional start site, resides the TATA box. This box is simply a repeat of thymine and adenine dinucleotides (literally, TATA repeats). RNA polymerase binds to the transcription initiation complex, allowing transcription to occur. To initiate transcription, a transcription factor (TFIID) is the first to bind to the TATA box. Binding of TFIID recruits other transcription factors, including TFIIB, TFIIE, TFIIF, and TFIIH to the TATA box. Once this complex is assembled, RNA polymerase can bind to its upstream sequence. When bound along with the transcription factors, RNA polymerase is phosphorylated. This releases part of the protein from the DNA to activate the transcription initiation complex and places RNA polymerase in the correct orientation to begin transcription DNA-bending protein brings the enhancer, which can be quite a distance from the gene, in contact with transcription factors and mediator proteins (Figure 16.10).

In addition to the general transcription factors, other transcription factors can bind to the promoter to regulate gene transcription. These transcription factors bind to the promoters of a specific set of genes. They are not general transcription factors that bind to every promoter complex, but are recruited to a specific sequence on the promoter of a specific gene. There are hundreds of transcription factors in a cell that each bind specifically to a particular DNA sequence motif. When transcription factors bind to the promoter just upstream of the encoded gene, it is referred to as a cis-acting element , because it is on the same chromosome just next to the gene. The region that a particular transcription factor binds to is called the transcription factor binding site . Transcription factors respond to environmental stimuli that cause the proteins to find their binding sites and initiate transcription of the gene that is needed.

Enhancers and Transcription

In some eukaryotic genes, there are regions that help increase or enhance transcription. These regions, called enhancers , are not necessarily close to the genes they enhance. They can be located upstream of a gene, within the coding region of the gene, downstream of a gene, or may be thousands of nucleotides away.

Enhancer regions are binding sequences, or sites, for transcription factors. When a DNA-bending protein binds, the shape of the DNA changes (Figure 16.10). This shape change allows for the interaction of the activators bound to the enhancers with the transcription factors bound to the promoter region and the RNA polymerase. Whereas DNA is generally depicted as a straight line in two dimensions, it is actually a three-dimensional object. Therefore, a nucleotide sequence thousands of nucleotides away can fold over and interact with a specific promoter.

Turning Genes Off: Transcriptional Repressors

Like prokaryotic cells, eukaryotic cells also have mechanisms to prevent transcription. Transcriptional repressors can bind to promoter or enhancer regions and block transcription. Like the transcriptional activators, repressors respond to external stimuli to prevent the binding of activating transcription factors.


Contents

Transcription Edit

The production of a RNA copy from a DNA strand is called transcription, and is performed by RNA polymerases, which add one ribonucleotide at a time to a growing RNA strand as per the complementarity law of the nucleotide bases. This RNA is complementary to the template 3′ → 5′ DNA strand, [7] with the exception that thymines (T) are replaced with uracils (U) in the RNA.

In prokaryotes, transcription is carried out by a single type of RNA polymerase, which needs to bind a DNA sequence called a Pribnow box with the help of the sigma factor protein (σ factor) to start transcription. In eukaryotes, transcription is performed in the nucleus by three types of RNA polymerases, each of which needs a special DNA sequence called the promoter and a set of DNA-binding proteins—transcription factors—to initiate the process (see regulation of transcription below). RNA polymerase I is responsible for transcription of ribosomal RNA (rRNA) genes. RNA polymerase II (Pol II) transcribes all protein-coding genes but also some non-coding RNAs (e.g., snRNAs, snoRNAs or long non-coding RNAs). RNA polymerase III transcribes 5S rRNA, transfer RNA (tRNA) genes, and some small non-coding RNAs (e.g., 7SK). Transcription ends when the polymerase encounters a sequence called the terminator.

MRNA processing Edit

While transcription of prokaryotic protein-coding genes creates messenger RNA (mRNA) that is ready for translation into protein, transcription of eukaryotic genes leaves a primary transcript of RNA (pre-RNA), which first has to undergo a series of modifications to become a mature RNA. Types and steps involved in the maturation processes vary between coding and non-coding preRNAs i.e. even though preRNA molecules for both mRNA and tRNA undergo splicing, the steps and machinery involved are different. [8] The processing of non-coding RNA is described below (non-coring RNA maturation).

The processing of premRNA include 5′ capping, which is set of enzymatic reactions that add 7-methylguanosine (m 7 G) to the 5′ end of pre-mRNA and thus protect the RNA from degradation by exonucleases. The m 7 G cap is then bound by cap binding complex heterodimer (CBC20/CBC80), which aids in mRNA export to cytoplasm and also protect the RNA from decapping.

Another modification is 3′ cleavage and polyadenylation. They occur if polyadenylation signal sequence (5′- AAUAAA-3′) is present in pre-mRNA, which is usually between protein-coding sequence and terminator. The pre-mRNA is first cleaved and then a series of

200 adenines (A) are added to form poly(A) tail, which protects the RNA from degradation. The poly(A) tail is bound by multiple poly(A)-binding proteins (PABPs) necessary for mRNA export and translation re-initiation. In the inverse process of deadenylation, poly(A) tails are shortened by the CCR4-Not 3′-5′ exonuclease, which often leads to full transcript decay.

A very important modification of eukaryotic pre-mRNA is RNA splicing. The majority of eukaryotic pre-mRNAs consist of alternating segments called exons and introns. During the process of splicing, an RNA-protein catalytical complex known as spliceosome catalyzes two transesterification reactions, which remove an intron and release it in form of lariat structure, and then splice neighbouring exons together. In certain cases, some introns or exons can be either removed or retained in mature mRNA. This so-called alternative splicing creates series of different transcripts originating from a single gene. Because these transcripts can be potentially translated into different proteins, splicing extends the complexity of eukaryotic gene expression and the size of a species proteome.

Extensive RNA processing may be an evolutionary advantage made possible by the nucleus of eukaryotes. In prokaryotes, transcription and translation happen together, whilst in eukaryotes, the nuclear membrane separates the two processes, giving time for RNA processing to occur.

Non-coding RNA maturation Edit

In most organisms non-coding genes (ncRNA) are transcribed as precursors that undergo further processing. In the case of ribosomal RNAs (rRNA), they are often transcribed as a pre-rRNA that contains one or more rRNAs. The pre-rRNA is cleaved and modified (2′-O-methylation and pseudouridine formation) at specific sites by approximately 150 different small nucleolus-restricted RNA species, called snoRNAs. SnoRNAs associate with proteins, forming snoRNPs. While snoRNA part basepair with the target RNA and thus position the modification at a precise site, the protein part performs the catalytical reaction. In eukaryotes, in particular a snoRNP called RNase, MRP cleaves the 45S pre-rRNA into the 28S, 5.8S, and 18S rRNAs. The rRNA and RNA processing factors form large aggregates called the nucleolus. [9]

In the case of transfer RNA (tRNA), for example, the 5′ sequence is removed by RNase P, [10] whereas the 3′ end is removed by the tRNase Z enzyme [11] and the non-templated 3′ CCA tail is added by a nucleotidyl transferase. [12] In the case of micro RNA (miRNA), miRNAs are first transcribed as primary transcripts or pri-miRNA with a cap and poly-A tail and processed to short, 70-nucleotide stem-loop structures known as pre-miRNA in the cell nucleus by the enzymes Drosha and Pasha. After being exported, it is then processed to mature miRNAs in the cytoplasm by interaction with the endonuclease Dicer, which also initiates the formation of the RNA-induced silencing complex (RISC), composed of the Argonaute protein.

Even snRNAs and snoRNAs themselves undergo series of modification before they become part of functional RNP complex. This is done either in the nucleoplasm or in the specialized compartments called Cajal bodies. Their bases are methylated or pseudouridinilated by a group of small Cajal body-specific RNAs (scaRNAs), which are structurally similar to snoRNAs.

RNA export Edit

In eukaryotes most mature RNA must be exported to the cytoplasm from the nucleus. While some RNAs function in the nucleus, many RNAs are transported through the nuclear pores and into the cytosol. [13] Export of RNAs requires association with specific proteins known as exportins. Specific exportin molecules are responsible for the export of a given RNA type. mRNA transport also requires the correct association with Exon Junction Complex (EJC), which ensures that correct processing of the mRNA is completed before export. In some cases RNAs are additionally transported to a specific part of the cytoplasm, such as a synapse they are then towed by motor proteins that bind through linker proteins to specific sequences (called "zipcodes") on the RNA. [14]

Translation Edit

For some RNA (non-coding RNA) the mature RNA is the final gene product. [15] In the case of messenger RNA (mRNA) the RNA is an information carrier coding for the synthesis of one or more proteins. mRNA carrying a single protein sequence (common in eukaryotes) is monocistronic whilst mRNA carrying multiple protein sequences (common in prokaryotes) is known as polycistronic.

Every mRNA consists of three parts: a 5′ untranslated region (5′UTR), a protein-coding region or open reading frame (ORF), and a 3′ untranslated region (3′UTR). The coding region carries information for protein synthesis encoded by the genetic code to form triplets. Each triplet of nucleotides of the coding region is called a codon and corresponds to a binding site complementary to an anticodon triplet in transfer RNA. Transfer RNAs with the same anticodon sequence always carry an identical type of amino acid. Amino acids are then chained together by the ribosome according to the order of triplets in the coding region. The ribosome helps transfer RNA to bind to messenger RNA and takes the amino acid from each transfer RNA and makes a structure-less protein out of it. [16] [17] Each mRNA molecule is translated into many protein molecules, on average

In prokaryotes translation generally occurs at the point of transcription (co-transcriptionally), often using a messenger RNA that is still in the process of being created. In eukaryotes translation can occur in a variety of regions of the cell depending on where the protein being written is supposed to be. Major locations are the cytoplasm for soluble cytoplasmic proteins and the membrane of the endoplasmic reticulum for proteins that are for export from the cell or insertion into a cell membrane. Proteins that are supposed to be expressed at the endoplasmic reticulum are recognised part-way through the translation process. This is governed by the signal recognition particle—a protein that binds to the ribosome and directs it to the endoplasmic reticulum when it finds a signal peptide on the growing (nascent) amino acid chain. [20]

Folding Edit

Each protein exists as an unfolded polypeptide or random coil when translated from a sequence of mRNA into a linear chain of amino acids. This polypeptide lacks any developed three-dimensional structure (the left hand side of the neighboring figure). The polypeptide then folds into its characteristic and functional three-dimensional structure from a random coil. [21] Amino acids interact with each other to produce a well-defined three-dimensional structure, the folded protein (the right hand side of the figure) known as the native state. The resulting three-dimensional structure is determined by the amino acid sequence (Anfinsen's dogma). [22]

The correct three-dimensional structure is essential to function, although some parts of functional proteins may remain unfolded. [23] Failure to fold into the intended shape usually produces inactive proteins with different properties including toxic prions. Several neurodegenerative and other diseases are believed to result from the accumulation of misfolded proteins. [24] Many allergies are caused by the folding of the proteins, for the immune system does not produce antibodies for certain protein structures. [25]

Enzymes called chaperones assist the newly formed protein to attain (fold into) the 3-dimensional structure it needs to function. [26] Similarly, RNA chaperones help RNAs attain their functional shapes. [27] Assisting protein folding is one of the main roles of the endoplasmic reticulum in eukaryotes.

Translocation Edit

Secretory proteins of eukaryotes or prokaryotes must be translocated to enter the secretory pathway. Newly synthesized proteins are directed to the eukaryotic Sec61 or prokaryotic SecYEG translocation channel by signal peptides. The efficiency of protein secretion in eukaryotes is very dependent on the signal peptide which has been used. [28]

Protein transport Edit

Many proteins are destined for other parts of the cell than the cytosol and a wide range of signalling sequences or (signal peptides) are used to direct proteins to where they are supposed to be. In prokaryotes this is normally a simple process due to limited compartmentalisation of the cell. However, in eukaryotes there is a great variety of different targeting processes to ensure the protein arrives at the correct organelle.

Not all proteins remain within the cell and many are exported, for example, digestive enzymes, hormones and extracellular matrix proteins. In eukaryotes the export pathway is well developed and the main mechanism for the export of these proteins is translocation to the endoplasmic reticulum, followed by transport via the Golgi apparatus. [29] [30]

Regulation of gene expression refers to the control of the amount and timing of appearance of the functional product of a gene. Control of expression is vital to allow a cell to produce the gene products it needs when it needs them in turn, this gives cells the flexibility to adapt to a variable environment, external signals, damage to the cell, and other stimuli. More generally, gene regulation gives the cell control over all structure and function, and is the basis for cellular differentiation, morphogenesis and the versatility and adaptability of any organism.

Numerous terms are used to describe types of genes depending on how they are regulated these include:

  • A constitutive gene is a gene that is transcribed continually as opposed to a facultative gene, which is only transcribed when needed.
  • A housekeeping gene is a gene that is required to maintain basic cellular function and so is typically expressed in all cell types of an organism. Examples include actin, GAPDH and ubiquitin. Some housekeeping genes are transcribed at a relatively constant rate and these genes can be used as a reference point in experiments to measure the expression rates of other genes.
  • A facultative gene is a gene only transcribed when needed as opposed to a constitutive gene.
  • An inducible gene is a gene whose expression is either responsive to environmental change or dependent on the position in the cell cycle.

Any step of gene expression may be modulated, from the DNA-RNA transcription step to post-translational modification of a protein. The stability of the final gene product, whether it is RNA or protein, also contributes to the expression level of the gene—an unstable product results in a low expression level. In general gene expression is regulated through changes [31] in the number and type of interactions between molecules [32] that collectively influence transcription of DNA [33] and translation of RNA. [34]

Some simple examples of where gene expression is important are:

  • Control of insulin expression so it gives a signal for blood glucose regulation. in female mammals to prevent an "overdose" of the genes it contains. expression levels control progression through the eukaryotic cell cycle.

Transcriptional regulation Edit

Regulation of transcription can be broken down into three main routes of influence genetic (direct interaction of a control factor with the gene), modulation interaction of a control factor with the transcription machinery and epigenetic (non-sequence changes in DNA structure that influence transcription).

Direct interaction with DNA is the simplest and the most direct method by which a protein changes transcription levels. Genes often have several protein binding sites around the coding region with the specific function of regulating transcription. There are many classes of regulatory DNA binding sites known as enhancers, insulators and silencers. The mechanisms for regulating transcription are very varied, from blocking key binding sites on the DNA for RNA polymerase to acting as an activator and promoting transcription by assisting RNA polymerase binding.

The activity of transcription factors is further modulated by intracellular signals causing protein post-translational modification including phosphorylated, acetylated, or glycosylated. These changes influence a transcription factor's ability to bind, directly or indirectly, to promoter DNA, to recruit RNA polymerase, or to favor elongation of a newly synthesized RNA molecule.

The nuclear membrane in eukaryotes allows further regulation of transcription factors by the duration of their presence in the nucleus, which is regulated by reversible changes in their structure and by binding of other proteins. [35] Environmental stimuli or endocrine signals [36] may cause modification of regulatory proteins [37] eliciting cascades of intracellular signals, [38] which result in regulation of gene expression.

More recently it has become apparent that there is a significant influence of non-DNA-sequence specific effects on transcription. These effects are referred to as epigenetic and involve the higher order structure of DNA, non-sequence specific DNA binding proteins and chemical modification of DNA. In general epigenetic effects alter the accessibility of DNA to proteins and so modulate transcription.

In eukaryotes the structure of chromatin, controlled by the histone code, regulates access to DNA with significant impacts on the expression of genes in euchromatin and heterochromatin areas.

Enhancers, transcription factors, Mediator complex and DNA loops in mammalian transcription Edit

Gene expression in mammals is regulated by many cis-regulatory elements, including core promoters and promoter-proximal elements that are located near the transcription start sites of genes, upstream on the DNA (towards the 5' region of the sense strand). Other important cis-regulatory modules are localized in DNA regions that are distant from the transcription start sites. These include enhancers, silencers, insulators and tethering elements. [39] Among this constellation of elements, enhancers and their associated transcription factors have a leading role in the regulation of gene expression. [40]

Enhancers are regions of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene expression programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes. [41] Multiple enhancers, each often at tens or hundred of thousands of nucleotides distant from their target genes, loop to their target gene promoters and coordinate with each other to control expression of their common target gene. [41]

The schematic illustration at the left shows an enhancer looping around to come into close physical proximity with the promoter of a target gene. The loop is stabilized by a dimer of a connector protein (e.g. dimer of CTCF or YY1), with one member of the dimer anchored to its binding motif on the enhancer and the other member anchored to its binding motif on the promoter (represented by the red zigzags in the illustration). [42] Several cell function specific transcription factors (there are about 1,600 transcription factors in a human cell [43] ) generally bind to specific motifs on an enhancer [44] and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern level of transcription of the target gene. Mediator (a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (pol II) enzyme bound to the promoter. [45]

Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two eRNAs as illustrated in the Figure. [46] An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of transcription factor bound to enhancer in the illustration). [47] An activated enhancer begins transcription of its RNA before activating transcription of messenger RNA from its target gene. [48]

DNA methylation and demethylation in transcriptional regulation Edit

DNA methylation is a widespread mechanism for epigenetic influence on gene expression and is seen in bacteria and eukaryotes and has roles in heritable transcription silencing and transcription regulation. Methylation most often occurs on a cytosine (see Figure). Methylation of cytosine primarily occurs in dinucleotide sequences where a cytosine is followed by a guanine, a CpG site. The number of CpG sites in the human genome is about 28 million. [49] Depending on the type of cell, about 70% of the CpG sites have a methylated cytosine. [50]

Methylation of cytosine in DNA has a major role in regulating gene expression. Methylation of CpGs in a promoter region of a gene usually represses gene transcription [51] while methylation of CpGs in the body of a gene increases expression. [52] TET enzymes play a central role in demethylation of methylated cytosines. Demethylation of CpGs in a gene promoter by TET enzyme activity increases transcription of the gene. [53]

Transcriptional regulation in learning and memory Edit

In a rat, contextual fear conditioning (CFC) is a painful learning experience. Just one episode of CFC can result in a life-long fearful memory. [54] After an episode of CFC, cytosine methylation is altered in the promoter regions of about 9.17% of all genes in the hippocampus neuron DNA of a rat. [55] The hippocampus is where new memories are initially stored. After CFC about 500 genes have increased transcription (often due to demethylation of CpG sites in a promoter region) and about 1,000 genes have decreased transcription (often due to newly formed 5-methylcytosine at CpG sites in a promoter region). The pattern of induced and repressed genes within neurons appears to provide a molecular basis for forming the first transient memory of this training event in the hippocampus of the rat brain. [55]

In particular, the brain-derived neurotrophic factor gene (BDNF) is known as a "learning gene." [56] After CFC there was upregulation of BDNF gene expression, related to decreased CpG methylation of certain internal promoters of the gene, and this was correlated with learning. [56]

Transcriptional regulation in cancer Edit

The majority of gene promoters contain a CpG island with numerous CpG sites. [57] When many of a gene's promoter CpG sites are methylated the gene becomes silenced. [58] Colorectal cancers typically have 3 to 6 driver mutations and 33 to 66 hitchhiker or passenger mutations. [59] However, transcriptional silencing may be of more importance than mutation in causing progression to cancer. For example, in colorectal cancers about 600 to 800 genes are transcriptionally silenced by CpG island methylation (see regulation of transcription in cancer). Transcriptional repression in cancer can also occur by other epigenetic mechanisms, such as altered expression of microRNAs. [60] In breast cancer, transcriptional repression of BRCA1 may occur more frequently by over-expressed microRNA-182 than by hypermethylation of the BRCA1 promoter (see Low expression of BRCA1 in breast and ovarian cancers).

Post-transcriptional regulation Edit

In eukaryotes, where export of RNA is required before translation is possible, nuclear export is thought to provide additional control over gene expression. All transport in and out of the nucleus is via the nuclear pore and transport is controlled by a wide range of importin and exportin proteins.

Expression of a gene coding for a protein is only possible if the messenger RNA carrying the code survives long enough to be translated. In a typical cell, an RNA molecule is only stable if specifically protected from degradation. RNA degradation has particular importance in regulation of expression in eukaryotic cells where mRNA has to travel significant distances before being translated. In eukaryotes, RNA is stabilised by certain post-transcriptional modifications, particularly the 5′ cap and poly-adenylated tail.

Intentional degradation of mRNA is used not just as a defence mechanism from foreign RNA (normally from viruses) but also as a route of mRNA destabilisation. If an mRNA molecule has a complementary sequence to a small interfering RNA then it is targeted for destruction via the RNA interference pathway.

Three prime untranslated regions and microRNAs Edit

Three prime untranslated regions (3′UTRs) of messenger RNAs (mRNAs) often contain regulatory sequences that post-transcriptionally influence gene expression. Such 3′-UTRs often contain both binding sites for microRNAs (miRNAs) as well as for regulatory proteins. By binding to specific sites within the 3′-UTR, miRNAs can decrease gene expression of various mRNAs by either inhibiting translation or directly causing degradation of the transcript. The 3′-UTR also may have silencer regions that bind repressor proteins that inhibit the expression of a mRNA.

The 3′-UTR often contains microRNA response elements (MREs). MREs are sequences to which miRNAs bind. These are prevalent motifs within 3′-UTRs. Among all regulatory motifs within the 3′-UTRs (e.g. including silencer regions), MREs make up about half of the motifs.

As of 2014, the miRBase web site, [61] an archive of miRNA sequences and annotations, listed 28,645 entries in 233 biologic species. Of these, 1,881 miRNAs were in annotated human miRNA loci. miRNAs were predicted to have an average of about four hundred target mRNAs (affecting expression of several hundred genes). [62] Friedman et al. [62] estimate that >45,000 miRNA target sites within human mRNA 3′UTRs are conserved above background levels, and >60% of human protein-coding genes have been under selective pressure to maintain pairing to miRNAs.

Direct experiments show that a single miRNA can reduce the stability of hundreds of unique mRNAs. [63] Other experiments show that a single miRNA may repress the production of hundreds of proteins, but that this repression often is relatively mild (less than 2-fold). [64] [65]

The effects of miRNA dysregulation of gene expression seem to be important in cancer. [66] For instance, in gastrointestinal cancers, nine miRNAs have been identified as epigenetically altered and effective in down regulating DNA repair enzymes. [67]

The effects of miRNA dysregulation of gene expression also seem to be important in neuropsychiatric disorders, such as schizophrenia, bipolar disorder, major depression, Parkinson's disease, Alzheimer's disease and autism spectrum disorders. [68] [69]

Translational regulation Edit

Direct regulation of translation is less prevalent than control of transcription or mRNA stability but is occasionally used. Inhibition of protein translation is a major target for toxins and antibiotics, so they can kill a cell by overriding its normal gene expression control. Protein synthesis inhibitors include the antibiotic neomycin and the toxin ricin.

Post-translational modifications Edit

Post-translational modifications (PTMs) are covalent modifications to proteins. Like RNA splicing, they help to significantly diversify the proteome. These modifications are usually catalyzed by enzymes. Additionally, processes like covalent additions to amino acid side chain residues can often be reversed by other enzymes. However, some, like the proteolytic cleavage of the protein backbone, are irreversible. [70]

PTMs play many important roles in the cell. [71] For example, phosphorylation is primarily involved in activating and deactivating proteins and in signaling pathways. [72] PTMs are involved in transcriptional regulation: an important function of acetylation and methylation is histone tail modification, which alters how accessible DNA is for transcription. [70] They can also be seen in the immune system, where glycosylation plays a key role. [73] One type of PTM can initiate another type of PTM, as can be seen in how ubiquitination tags proteins for degradation through proteolysis. [70] Proteolysis, other than being involved in breaking down proteins, is also important in activating and deactivating them, and in regulating biological processes such as DNA transcription and cell death. [74]

Measuring gene expression is an important part of many life sciences, as the ability to quantify the level at which a particular gene is expressed within a cell, tissue or organism can provide a lot of valuable information. For example, measuring gene expression can:

  • Identify viral infection of a cell (viral protein expression).
  • Determine an individual's susceptibility to cancer (oncogene expression).
  • Find if a bacterium is resistant to penicillin (beta-lactamase expression).

Similarly, the analysis of the location of protein expression is a powerful tool, and this can be done on an organismal or cellular scale. Investigation of localization is particularly important for the study of development in multicellular organisms and as an indicator of protein function in single cells. Ideally, measurement of expression is done by detecting the final gene product (for many genes, this is the protein) however, it is often easier to detect one of the precursors, typically mRNA and to infer gene-expression levels from these measurements.

MRNA quantification Edit

Levels of mRNA can be quantitatively measured by northern blotting, which provides size and sequence information about the mRNA molecules. A sample of RNA is separated on an agarose gel and hybridized to a radioactively labeled RNA probe that is complementary to the target sequence. The radiolabeled RNA is then detected by an autoradiograph. Because the use of radioactive reagents makes the procedure time consuming and potentially dangerous, alternative labeling and detection methods, such as digoxigenin and biotin chemistries, have been developed. Perceived disadvantages of Northern blotting are that large quantities of RNA are required and that quantification may not be completely accurate, as it involves measuring band strength in an image of a gel. On the other hand, the additional mRNA size information from the Northern blot allows the discrimination of alternately spliced transcripts.

Another approach for measuring mRNA abundance is RT-qPCR. In this technique, reverse transcription is followed by quantitative PCR. Reverse transcription first generates a DNA template from the mRNA this single-stranded template is called cDNA. The cDNA template is then amplified in the quantitative step, during which the fluorescence emitted by labeled hybridization probes or intercalating dyes changes as the DNA amplification process progresses. With a carefully constructed standard curve, qPCR can produce an absolute measurement of the number of copies of original mRNA, typically in units of copies per nanolitre of homogenized tissue or copies per cell. qPCR is very sensitive (detection of a single mRNA molecule is theoretically possible), but can be expensive depending on the type of reporter used fluorescently labeled oligonucleotide probes are more expensive than non-specific intercalating fluorescent dyes.

For expression profiling, or high-throughput analysis of many genes within a sample, quantitative PCR may be performed for hundreds of genes simultaneously in the case of low-density arrays. A second approach is the hybridization microarray. A single array or "chip" may contain probes to determine transcript levels for every known gene in the genome of one or more organisms. Alternatively, "tag based" technologies like Serial analysis of gene expression (SAGE) and RNA-Seq, which can provide a relative measure of the cellular concentration of different mRNAs, can be used. An advantage of tag-based methods is the "open architecture", allowing for the exact measurement of any transcript, with a known or unknown sequence. Next-generation sequencing (NGS) such as RNA-Seq is another approach, producing vast quantities of sequence data that can be matched to a reference genome. Although NGS is comparatively time-consuming, expensive, and resource-intensive, it can identify single-nucleotide polymorphisms, splice-variants, and novel genes, and can also be used to profile expression in organisms for which little or no sequence information is available.

RNA profiles in Wikipedia Edit

Profiles like these are found for almost all proteins listed in Wikipedia. They are generated by organizations such as the Genomics Institute of the Novartis Research Foundation and the European Bioinformatics Institute. Additional information can be found by searching their databases (for an example of the GLUT4 transporter pictured here, see citation). [75] These profiles indicate the level of DNA expression (and hence RNA produced) of a certain protein in a certain tissue, and are color-coded accordingly in the images located in the Protein Box on the right side of each Wikipedia page.

Protein quantification Edit

For genes encoding proteins, the expression level can be directly assessed by a number of methods with some clear analogies to the techniques for mRNA quantification.

One of the most commonly used methods is to perform a Western blot against the protein of interest. [76] This gives information on the size of the protein in addition to its identity. A sample (often cellular lysate) is separated on a polyacrylamide gel, transferred to a membrane and then probed with an antibody to the protein of interest. The antibody can either be conjugated to a fluorophore or to horseradish peroxidase for imaging and/or quantification. The gel-based nature of this assay makes quantification less accurate, but it has the advantage of being able to identify later modifications to the protein, for example proteolysis or ubiquitination, from changes in size.

MRNA-protein correlation Edit

Quantification of protein and mRNA permits a correlation of the two levels. The question of how well protein levels correlate with their corresponding transcript levels is highly debated and depends on multiple factors. Regulation on each step of gene expression can impact the correlation, as shown for regulation of translation [19] or protein stability. [77] Post-translational factors, such as protein transport in highly polar cells, [78] can influence the measured mRNA-protein correlation as well.

Localisation Edit

Analysis of expression is not limited to quantification localisation can also be determined. mRNA can be detected with a suitably labelled complementary mRNA strand and protein can be detected via labelled antibodies. The probed sample is then observed by microscopy to identify where the mRNA or protein is.

By replacing the gene with a new version fused to a green fluorescent protein (or similar) marker, expression may be directly quantified in live cells. This is done by imaging using a fluorescence microscope. It is very difficult to clone a GFP-fused protein into its native location in the genome without affecting expression levels so this method often cannot be used to measure endogenous gene expression. It is, however, widely used to measure the expression of a gene artificially introduced into the cell, for example via an expression vector. It is important to note that by fusing a target protein to a fluorescent reporter the protein's behavior, including its cellular localization and expression level, can be significantly changed.

The enzyme-linked immunosorbent assay works by using antibodies immobilised on a microtiter plate to capture proteins of interest from samples added to the well. Using a detection antibody conjugated to an enzyme or fluorophore the quantity of bound protein can be accurately measured by fluorometric or colourimetric detection. The detection process is very similar to that of a Western blot, but by avoiding the gel steps more accurate quantification can be achieved.

An expression system is a system specifically designed for the production of a gene product of choice. This is normally a protein although may also be RNA, such as tRNA or a ribozyme. An expression system consists of a gene, normally encoded by DNA, and the molecular machinery required to transcribe the DNA into mRNA and translate the mRNA into protein using the reagents provided. In the broadest sense this includes every living cell but the term is more normally used to refer to expression as a laboratory tool. An expression system is therefore often artificial in some manner. Expression systems are, however, a fundamentally natural process. Viruses are an excellent example where they replicate by using the host cell as an expression system for the viral proteins and genome.

Inducible expression Edit

In nature Edit

In addition to these biological tools, certain naturally observed configurations of DNA (genes, promoters, enhancers, repressors) and the associated machinery itself are referred to as an expression system. This term is normally used in the case where a gene or set of genes is switched on under well defined conditions, for example, the simple repressor switch expression system in Lambda phage and the lac operator system in bacteria. Several natural expression systems are directly used or modified and used for artificial expression systems such as the Tet-on and Tet-off expression system.

Genes have sometimes been regarded as nodes in a network, with inputs being proteins such as transcription factors, and outputs being the level of gene expression. The node itself performs a function, and the operation of these functions have been interpreted as performing a kind of information processing within cells and determines cellular behavior.

Gene networks can also be constructed without formulating an explicit causal model. This is often the case when assembling networks from large expression data sets. [79] Covariation and correlation of expression is computed across a large sample of cases and measurements (often transcriptome or proteome data). The source of variation can be either experimental or natural (observational). There are several ways to construct gene expression networks, but one common approach is to compute a matrix of all pair-wise correlations of expression across conditions, time points, or individuals and convert the matrix (after thresholding at some cut-off value) into a graphical representation in which nodes represent genes, transcripts, or proteins and edges connecting these nodes represent the strength of association (see [1]). [80]

The following experimental techniques are used to measure gene expression and are listed in roughly chronological order, starting with the older, more established technologies. They are divided into two groups based on their degree of multiplexity.


Watch the video: Gene Regulation and the Order of the Operon (July 2022).


Comments:

  1. Abantiades

    The safe answer ;)

  2. Voodooramar

    I agree with all of the above.

  3. Fenrigami

    I know nothing about it



Write a message