Published in The Hindu on July 14, 2011

The sample is skewed as ninety-six per cent of subjects included in the GWAS conducted so far are people of European descent. — Photo: NIH
Ever since the human genome was successfully sequenced, scientists have found over 1,000 regions on the genome that have an association with traits such as disease susceptibility and response to medication.
This information alone would be of little use, without knowing why certain populations or ethnic groups are more at risk for a particular condition than another population.
Hence it is important to find the differences in the DNA sequence (genetic variations) between populations that make some groups vulnerable and some other resistant to certain diseases/conditions.
Principal intent
The principal intent of the 1,000 Genome-Wide Association Studies (GWAS), which was started in 2008, was to understand these genetic variations.
Three pilot projects have provided some invaluable information. The first pilot project sequenced the genomes of two parents and their child. The second one sequenced the genomes of 179 people, and the third pilot project involved a larger number — 700 people.
The project now plans to sequence 2,500 genomes of individuals from 27 populations. These people have consented to the release of their DNA samples and full sequence data.
But will the final outcome of the 1,000 Genome-wide Association Studies be fruitful? “The findings …are likely to have less relevance than was previously thought for the world’s population as a whole,” note the authors of a Comment piece published today (July 14) in Nature .
The reason? Skewed sample size that does not in any way represent the world’s population. “Ninety-six per cent of subjects included in the GWAS conducted so far are people of European descent,” they state.
African ancestry
Though the genetic variations are greatest in populations of recent African ancestry, they have not been taken into account in the 1,000 Genome-Wide Association Studies.
In other words, the mega project with a noble intent will stand to represent the genetic variant-disease association of just Europeans and not the entire world’s population. And to make matters worse, the skewed sampling will get reflected when the entire genomes of people are sequenced.
Biased picture
The ramifications of such skewed data are hard to ignore. Any result arising from such a sample will tend to produce a biased picture of the genetic variants responsible for certain diseases and any drug to treat/cure such diseases will benefit only a few.
But it will be incorrect to totally discount the project and conclude that it will be of no consequence to the world population. For instance, certain genetic variants are found in people from different countries and ethnic populations. And the GWAS will be looking for these common variants to find any association between them and the diseases.
But even when clear associations between the common variants and the diseases have been found, the associations can at best account for 5 per cent to 50 per cent of the diseases’ inheritance, they note.
“Many of the genetic factors thought to be responsible are still missing,” they warn. The missing variants are the ‘rare variants’ that tend to play a vital role.
According to them, the preliminary results indicate that it is not always possible to easily translate the findings in one population to the rest of the world. They provide one example to drive home their point. A particular variant found in Native South American ancestry is responsible for lower HDL cholesterol, obesity and type-2 diabetes. But this variant is missing in European, Asian and African populations.
Hence there is an overwhelming compulsion to carry out GWAS — a population-based study — on a global scale, and not just restrict it to the European population.