Login | Register | Contact | Sitemap | Follow us in Facebook Follow us in Twitter Roche Institute RSS Youtube Channel - Roche Institute  Español |  English
Search:  

Biotechnology in the mirror
Applied genomics

What is the purpose of the non-coding genome?
 

José Luis Gómez Skarmeta
Andalusian Centre for Developmental Biology, CSIC-University Pablo de Olavide, Seville

Summary
Since only 5% of vertebrate DNA correspond to protein-coding sequences. What is the purpose of the remaining 95%? Part of this DNA contains cis-regulatory sequences which control when, how much and where a gene is transcribed. Despite increasing evidence that the basic function of cis-regulatory DNA affects the development and evolution of human diseases, there is still a big unknown: although we know the code of the coding sequences, we know little about the language of the regulatory DNA. This problem prevents us from identifying regulating regions in the already sequenced genomes. Therefore, as soon as the genome sequences become available, one of the most important objectives for the coming years research is to identify, characterize and decode the regulating DNA scattered across a sea of non-coding DNA.

Some Basic Concepts
The human genome contains some 30,000 genes. We call “genes” those genome regions which codify proteins, i.e., the regions that are transcribed to generate messenger RNA that will be later translated into proteins at the ribosomes. Genes are composed of an average of 3,000 nucleotides or bases. If we consider that the genome encompasses a grand total of 3,164,700 million bases, then the sum of all genes constitute only 2% of the genome. Therefore, 98% of the human genome is composed of non-coding DNA. In other words, it does not contain information relevant to protein synthesis.

We largely ignore the question of what is the function of the non-coding DNA. However, we do know that the regulatory DNA is located there. Regulatory DNA consists of DNA regions (also called regulatory regions) that control the transcription process of a gene: how much, when and where. This transcription control, or, to put it in order words, the gene expression, carried out by the regulatory regions, is key in cellular differentiation. Therefore, although all of an individuals cells contain the same genome, what differentiates a cell type from another is the specific ensemble of active genes that are expressed within a given cell. This is why, in every different type of cell, only a percentage of these 30,000 genes are transcribed. For instance, cell types such as a pancreatic cell or a heart cell, which are very different from each other, activate a highly specialized subset of genes different for every type of cell.

Although it plays a key role in the control of gene expression, regulatory DNA it is extremely difficult to identify within the genome. The main reason for this is that its code, or language, is unknown. In contrast, we know the complete genetic code of coding regions (for instance, ATG, which codifies for the methionine amino acid), thanks partly to the research work of Nobel Prize winner Severo Ochoa, carried out more than four decades ago. In fact, we have set our current estimation of the number of human genes at 30,000 in part because of our ability to use that genetic code to predict the existence of genes. This is not possible for regulatory regions, and therefore it is difficult to predict the location of those regions inside the genome and what information they contain. A regulatory region is formed by variably sized non-coding DNA fragments (they can range from a few bases to hundreds of them), binding to a small or large number of transcriptional factors, depending on its size. In those cases where the binding of transcriptional factors to the regulatory region favours gene transcription, it is called an enhancer. If, on the contrary, it prevents gene transcription, the regulatory region is called a silencer.

The genes containing DNA regions called promoters where RNA polymerase binds in order to begin transcription. Regulatory regions modulate gene expression by interacting with gene promoters. In this way, enhancers or silencers promote or prevent the recruitment of RNA-polymerase to the gene promoters, thus facilitating or blocking gene transcription. Regulatory regions are known as cis-elements (or cis-regulators) because they are located in the same DNA chain as the genes. Transcription factors binding to cis-regulatory elements are called trans-regulators. Therefore, the cis- configuration of a gene includes all of the regulatory regions within the gene.

Figure 1:
This figure represents an A gene (dark green) and a gene complex B with three neighbouring genes (dark blue arrows). Red dots represent Insulators (I) whereas rectangles are regulatory regions. The green rectangle is an enhancer (E) of the A gene, while the light blue rectangles are regulatory regions of the B complex genes (E: enhancers; S: silencers). The different colours marking the different regulation elements represent its activity in different cell types. The Insulators generate two different types of regulating landscapes: a small one around the A gene promoter where the regulating element associated to this gene is located, and another which encompasses from the first intron of the A gene up to the final intron of the B complex, which contains the remaining regulatory elements, marked light blue. The regulatory element, marked in pink and located in the second intron of the A gene, which activates from a long range the three genes of the A complex. This is why it is called a Locus Control Region (LCR). The insulator (I) in the first intron of the A gene prevents this LCR from also activating this gene. The regulating element between the A gene and gene complex B is an Enhancer (E) specific of the first gene of the complex, whereas the other two regulating elements to the right are silencers (S) specific for each one of the other two remaining genes.

A characteristic trait of the regulatory regions is its modularity. Generally, every regulatory region controls the expression of a gene within a certain number of tissues and organs. Therefore, genes expressed in many tissues and organs, or expressed during different stages of embryonic development, contain multiple cis-regulatory elements. Each one is an independent module that controls the expressions inside a small number of territories. In this way, genes with simple expression patterns (such as those found in both very specific types of tissues, as well as those that are present in all body cells) may have very few regulatory elements, and these elements are located in regions close to the promoter. On the contrary, genes with complex expression patterns contain many disperse cis-regulatory elements through-out the non-coding DNA, both in the vicinity of the gene as well as at distances of up to hundreds of kilobases. These cis-regulatory elements located in far away genome regions are known as distal elements. In many cases, these types of genes (which are often involved in early embryonic development), are flanked by large non-coding DNA regions devoid of any other gene. These regions, also known as gene deserts, are usually full of cis-regulatory regions which are essential for the control of gene expression in multiple tissues, as well as at different moments.

If regulatory regions are located in intergenic regions, how is the genome arranged so that some regions act on a precise gene, and not on its neighbour?
This is the reason why there are regulatory regions called insulators. These regions have two functions: to prevent an enhancer from acting on a promoter when the insulator is located within these two elements, and to prevent heterochromatin expansion at certain loci, therefore silencing its expression.

Insulators are, therefore, essential elements inside the genome to generate compartments with different regulatory environments for neighbouring genes. In this way, two adjoining genes within the genome, if separated by insulators, might be influenced by an ensemble of completely different cis-regulatory elements; or, as they say in genome parlance, these genes “view” different regulatory landscapes.

How can we identify the cis-regulatory elements in the genome?
To identify cis-regulatory regions we currently use a combination of bioinformatics analysis, functional studies using transgenesis techniques on different animal models, the use of certain epigenetic marks and the identification of binding sites for genome-level transcriptional factors in different animal models. Then, by comparing the already sequenced genomes of different vertebrates by means of bioinformatic tools, we have been able to observe in the vertebrate s genomes the existence of large amounts of highly- conserved non-coding regions. The great majority of these regions flank genes which are involved in the generation of morphological patterns during embryo development and work as enhancers in transgenetic animal models.

There is currently in progress an important project to identify all functional elements in the sequence of the human genome. This is the ENCODE (Encyclopaedia of DNA elements) project, initiated by the National Human Genome Research Institute (NHGRI). This project uses Chromatin immunoprecipitation techniques, followed by high throughput sequencing (Chip-seq) in order to identify the complete ensemble of genome binding sites of several transcriptional factors, different histone modifications (epigenetic marks), DNAsal hypersensitivity sites, etc. This study, together with several others, has demonstrated that certain epigenetic marks can help to identify the enhancer elements in the genome. Thus, the trimethylation of H3 histone into Lysine 4 (H3K4me3) is associated with active promoters while the monomethylation of this Lysine (H3K4me1) with both active promoters and enhancers. Moreover, of all these enhancers, the active ones are detected through the additional acetylation of the same histone inside the Lysine 27 (H3K27ac). This combination of marks (H3K4me1, H3H27ac) together, or with the binding of acetyltransferase p300 responsible for the acetylation at K27, has allowed the successful prediction of active enhancers in the early development stages of mice and human beings. The completion of this type of test for certain tissues or during certain development stages, coupling genome Chip-seq tests of transcriptional factors essential for certain biological processes, comparing genome sequences, as well as the functional analysis of potential cis-regulating elements and its target genes, allows us to correctly identify several cis-regulatory elements of the genome.

Figure 2:
View of the 1.15 megabase region of the human genome where the SHH gene is located. View generated by the UCSC genome browser. In this image we can visualize the positions of different epigenetic marks characterizing enhancers (H2K4me1 and H3K27ac), that of active promoters (H3K4me3) as well as the binding sites of three transcription factors (BAFT, PAX5 y PU.1) obtained in several different cell lines by the ENCODE consortium. Below the regions, we can see whose sequences are evolutionarily conserved in several species. We can also see (in green) the non-coding region preserved in all vertebrates which contain the enhancer activating the expression of SHH in the leg primordia which mutates in synpolydactyly patients. Note that this enhancer is located 1 megabase away from its target gene.

Regulatory regions and disease
Gene regulation is not only crucial for development; it is also essential to control the physiology of cells in adult organisms. Therefore, it is not surprising that in a large percentage of the many genome-wide association studies between diseases and genome regions being published over the past several years, mutations associated to these genetic diseases are located in the non-coding DNA. In many cases, these mutations will affect regulating regions essential for the proper control of target genes located in the vicinity of these genome areas. When this happens, the tissue, the moment or the amount of mRNA produced in the target gene, and therefore the protein, is not normal. Thus, it causes the predisposition to suffer a disease, or even causes the disease directly, depending on the gene and the lesion associated with it.

A well documented example of a genetic disease in humans associated with an enhancer mutation is synpolydactyly, which affects the Sonic Hedgehog (SHH) gene and an evolutionary conserved enhancer. It affects the regulatory element located inside an intron of the LMBR1 gene, about one megabase away of the SHH transcript, which is necessary to activate the expression of SHH in the mouse developing limb. The elimination of this enhancer by homologous recombination in the mouse suppresses the expression of SHH in the leg primordia, preventing the correct development of its limbs. In human and mice, point mutations in this regulatory element modify its activity, thus causing polydactyly.

The number of human diseases caused by mutations in cis-regulatory regions has been increased exponentially over the last years, therefore we now consider that an important percentage of human diseases can be attributed to non-coding DNA alteration.

Regulatory regions and evolution
The great majority of proteins codified in vertebrates by genes are highly conserved in all species. Then what caused the morphological diversity observable in evolution? Currently, the genetic theory of the morphological evolution proposes that morphological variations between species are due, to a large extend, to alterations in the expression of functionally conserved genes. These changes are produced through mutations in the cis-regulatory sequences, both in pleiotropic genes which regulate development, as well as its target genes in the genetic networks they belong to. Then, the body plan common to all vertebrates, which manifests itself during the early stages of development, would be the result of a similar expression of genes regulating development during this embryonic stage. Once that body plan is specified, during later development stages, the differences in the expressions of the same group of developmental genes would cause the various morphological differences observed between the species within the vertebrates lineage. Therefore, in a large part, evolution is actually the history of the evolution of the regulation of the gene expression during development.

Conclusion
In conclusion, nowadays we still know very little about the human genome. During the next decade most of the efforts will focus on figuring out the function of that 98% of non-coding DNA. This will allow us to identify and understand the functions of the different regulating regions associated with each individual gene, which will be essential to unravel the effect that the myriad mutations found in the non-coding DNA have on embryonic development and evolution.


References
Carroll, S.B. (2008). Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25-36.

Haeussler, M., and Joly, J.S. (2011). When needles look like hay: how to find tissue-specific enhancers in model organism genomes. Dev Biol 350, 239-254.

Nica, A.C., and Dermitzakis, E.T. (2008). Using gene expression to investigate the genetic basis of complex disorders. Hum Mol Genet 17, R129-134.

Phillips, J.E., and Corces, V.G. (2009). CTCF: master weaver of the genome. Cell 137, 1194-1211.

Sakabe, N.J., and Nobrega, M.A. (2010). Genome-wide maps of transcription regulatory elements. Wiley Interdiscip Rev Syst Biol Med 2, 422-437.

Visel, A., Bristow, J., and Pennacchio, L.A. (2007). Enhancer identification through comparative genomics. Semin Cell Dev Biol 18, 140-152.

Visel, A., Rubin, E.M., and Pennacchio, L.A. (2009). Genomic views of distant-acting enhancers. Nature 461, 199-205.

Some reference webs
UCSC Genome Browser



Enviar a un amigo Send to a friend    Imprimir Print

Applied genomics
Cancer Genomes. The Chronic Lymphocytic Leukemia Genome Project
Genomics and Health
Primary Inherited Aminoacidurias
Functional genomics against parasitic diseases
Egipenetics and cancer
 

Editorial
Schizophrenia: a pioneering example of functional genomics and educational experience
 

Bioinformatics
The EMBL-EBI: big science in bioinformatics
 

New pathways towards personalized medicine
Nanotechnology in Proteomics: Protein micro arrays and novel detection systems
 

Featured Publications
Functions and organization of the R&D system in Spain
 

Biotechnology and health agents
MSD extraordinary Chair in genomics and proteomics of Complutense University. Faculty of Pharmacy.
 

Historial Archive

 
 

 
Theoretical and practical Course in pharmacogenetics

Postgraduate course in pharmacogenetics, pharmacogenomics and personalized medicine

Newsletter No 22. Personalized medicine puts under siege Alzheimer’s disease, colorectal cancer and short stature.
Home | Biotechnology | Training | Conferences | Resources | Press | Current events | Journal Service
Privacy/Terms of use | Copyright 2012 | Contact

Proyectos de e-marketing farmaceutico