In 2001, scientists thought only 20,000 genes comprising about one per cent of the human genome code for proteins. And the remaining genome was at best considered junk DNA! These were the findings of researchers who completed the Human Genome Project, the blue print of human biology.
If many researchers doubted this, the ENCODE project (ENCyclopedia Of DNA Elements), which was started in 2003 and picked up where the Human Genome Project had left off, has been proving that it is far from true.
The final results announced today (September 6) in 30 papers and published in three journals — Nature, Genome Research and Genome Biology — show why the ENCODE scientists think so.
The junk DNA is indeed not a wasteland. “The vast desert regions have now been populated by hundreds of thousands of features that contribute to gene regulation. And every cell type uses different combinations and permutations of these features to generate its unique biology,” writes Brendan Maher, a Features Editor for Nature .
The researchers have presented today the results of 1,648 experiments done on 147 cell types.
To start with, the ENCODE project has found that not one per cent but 80.4 per cent of the genome has an active role or function. For instance, it could be “promoter” regions where “proteins bind to control gene expression” or “enhancer” regions that “regulate the expression of distant genes.”
The most important part is that genes comprise only 2 per cent of the genome. The regulatory regions are scattered in the 98 per cent of the genome. This underlines the fact that a major portion of the human genome is indeed not junk DNA.
“Specialised proteins (called regulatory factors), recognise specific DNA sequences in these regulatory regions, thereby creating switches that turn genes on and off, states a University of Washington press release. The on/off switches are otherwise called the regulatory DNA, and the genes are as good as useless in the absence of the switches (regulatory DNA).
Incidentally, in a very few cases, the switches are located far away from the genes they control, thus making it difficult to determine the relationship between the two.
A paper in Nature explains the important features of organisation and functioning of the human genome. It states that in 95 per cent of the cases, the genes are in close proximity to the regulatory switches.
To their amazement, the researchers found that most genes are not controlled by just one switch. Instead, the genes are regulated by more than a dozen switches. In other words, there is no one-to-one relationship between a gene and a switch.
“The scientists determined that genes are connected in a complex web. In this web, regulatory DNA regions typically control one or at most a few genes, but genes receive inputs from large numbers of regulatory regions,” states the release.
What is a gene
According to a paper in Nature by Thomas Gingeras from Cold Spring Harbor Laboratory and his team, the very meaning of a gene and “minimum unit of heredity” stands questioned. For instance, an overwhelming 75 per cent of the genome has its transcription done at “some points in some cells.”
According to conventional thinking, genes are copied (transcribed) into RNA molecules. This then serves as a template for making the necessary protein. But this belief may no longer be valid.
“It has been evident [since 2007] that there is much more to a gene than just a sequence that codes for protein, changing our concept of what defines a gene,” states a Genome Research release. “We now know that the genome is not a set of discrete genes, but rather a complex system of genes and regulatory regions, much of which is transcribed into RNA, including many RNAs that do not code for proteins but have critical cellular functions.”
Prof. Gingeras and his team also discovered a new class of functional RNAs. They also found that some parts of one gene or functional RNA can be found within another. These two observations completely change our understanding of genome architecture.
Nature has a system in place to get rid of useless body parts or even DNAs. That being so, will nature still continue to harbour a large part of the genome that does not code for proteins?
ENCODE provides the answer. These ‘useless’ or non-coding stretches of the genome actually produce non-coding RNAs, which play a role in both activation and silencing of protein-coding genes.
Finding the genetic basis for diseases has been the goal of researchers during the last few years. The data from the functional non-coding regions provide hope for these researchers. Some genes associated with diseases are found in the non-coding regions of the genome, and are relatively “common” in the population.
The ENCODE team found that 76 per cent of disease-associated variants in the non-gene regions are linked to the regulatory DNA. This sheds new light on the contributing factors for many diseases. More than changes in the gene per se, diseases may arise depending on changes in when, where and how genes are turned on.
“ENCODE is a foundation data set for understanding the human genome,” writes Ewan Birney, the coordinator of the ENCODE project in a paper in Nature .
ENCODE is a truly international collaborative effort — 442 scientists from 32 laboratories in the U.K., U.S, Spain, Singapore and Japan were involved. They generated and analysed over 15 terabytes (15 trillion bytes) of raw data. True to its international team effort, all of the data from the project is freely available to the public.