‘There is a need for statisticians to work closely with scientists’

Published in The Hindu on January 3, 2013

“A lot of scientific questions in medicine are mathematical [in nature],” said Dr. Nilanjan Chatterjee, Head of Biostatistics Branch of the Division of Cancer Epidemiology and Genetics at the National Cancer Institute, U.S. So a sound knowledge of both medicine and mathematics is required to solve these questions.

In the case of Dr. Chatterjee, besides medicine and mathematics, he has a firm grounding in statistics too, and this helped him become the first Indian to win the prestigious COPSS President Award in August 2011. The award is given to young statisticians below 40 years. He also won the COPSS Snedecor Award.

Dr. Chatterjee is in Chennai to attend the International Indian Statistical Association meeting on “Statistics, Science and Society: new challenges and opportunities.”

The COPSS President Award was in recognition of his overall contribution to the field of genetics and biostatistics, particularly the risk of developing chronic diseases, including cancers, due to gene-environment interaction. “The award was for statistical application in genetics study,” he explained.

DNA and environment together cause certain chronic diseases like cancer, but most of the time DNA does not determine the kind of environment one is exposed to. This may sound elementary, but believe it or not, statisticians did not take advantage of this kind of scenario while analysing the data. “I work on genetics, so I was able to take advantage of the structure of genetic data to develop better methods,” he said.

Smoking causes lung cancer, but smoking alone does not determine who will develop lung cancer. A person may have some good genes that reduce the risk of lung cancer. So what might be important to know is what kind of genetic background a person may have to know if he is prone to lung cancer.

So the goal of gene-environment interaction studies is first to know the genetic background and then to know how the interaction with the environment increases the risk of developing the disease.

Knowing what kind of study design is required to do these studies and how to analyse the data in an efficient way once the data is collected is important. “There are some epidemiological methods but they are not applicable for specialised studies,” he said.

While analysing gene-environment interactions, he discovered that it was possible to “increase the power of the study by new methods that take advantage of the special structure of genetic data that was not considered before.”

According to him, the decision on how to analyse the data so that most of the information in the data can be extracted in an interpretable way is very important. The methods developed for understanding the risk of cancer can be applied for other diseases as well. This is possible only if the statisticians have a good understanding of medicine.

“There is more and more need for statisticians to work closely with scientists to understand the nature of the underlying scientific hypothesis, measurements and study design,” he underlined.


He has been with the National Cancer Institute since 1999 and has worked closely with scientists on a whole gamut of cancers. His work on gene-environment interaction has been on lung and bladder cancers, and he intends to study other cancers as well.

“Many understand biology but don’t have grounding in statistics and mathematics, and the other way around,” he said. “But I have both. So I can do better modelling and better interpretation.”

Large sample size

Another area where he has worked extensively is the Genome-Wide Association Studies (GWAS) and he has contributed to an understanding of the genetic basis for a variety of cancers.

To undertake really meaningful and useful GWAS, a huge number of biological (human) samples is required. “This is one of the problems in the case of India. It is a few hundreds to a few thousands in the case of India.” he said. In the case of other countries, huge epidemiological studies have collected biological samples from hundreds of thousands of people.

“There is a need for large, well-designed epidemiological studies in India. This is important as such big studies may find genetic associations with diseases that are unique to the Indian population. Currently, many of the genetic associations to diseases found in the Caucasian population are also found in the Indian population.