Researchers from IIT Delhi and IIIT Delhi have developed an algorithm that can find rare cells from a very large pool of cells in a matter of seconds. The new algorithm has superior sensitivity and specificity compared with existing methods. They used the gene expression of each cell to find the rare cells. The team has discovered a new sub-type of pars tuberalis cell lineage from mouse brain cells.
Much like finding a needle in a haystack, identifying rare cells from a dataset comprising millions of cells can be hugely daunting. Now, a new algorithm developed by Delhi-based researchers makes it easy — it can find rare cells from a very large pool of cells in a matter of seconds.
The algorithm — Finder of Rare Entities (FiRE) — assigns a rareness score to each cell that is computed based on the gene expression profile of about twenty thousand genes. Cells having scores above a certain threshold are reported as rare cells. Besides being fast, initial studies show that the new algorithm has superior sensitivity and specificity compared with existing methods.
Circulating tumour cells, cancer stem cells, antigen-specific T cells, circulating endothelial cells are a few examples of rare cells. Rare cell populations such as circulating tumour cells can shed light on the process of cancer metastasis (spreading of cancer to other parts of the body) thus providing invaluable information for early detection and clinical management of the disease.
While testing the efficacy of the algorithm using mouse brain cells taken from a specific region, the four-member team led by Prof. Jayadeva from Indian Institute of Technology (IIT) Delhi and Prof. Debarka Sengupta from Indraprastha Institute of Information Technology (IIIT-Delhi), Delhi discovered a new sub-type of pars tuberalis cell lineage. The authors have linked this newly found cell type to the development of the pituitary gland. The results are published in the journal Nature Communications.
Existing algorithms use clustering or other statistical techniques that involve rigorous parameter estimations, thus incurring a significant computational cost. “FiRE uses sketching, which is a variant of locality-sensitive hashing, to assign rarity to each cell. The hashing technique tends to put cells with similar properties together,” says Prashant Gupta from IIT Delhi and one of the first authors of the paper.
“Spotting an odd cell using existing tools becomes extremely difficult and complex when the number of cells becomes large. The FiRE algorithm makes searching for rare cells in large-scale single cell messenger RNA datasets tractable” says Prof. Jayadeva, who works in machine learning. “We used the gene expression of each cell to find the rare cells. The drop-seq, a state-of-the-art technique, allowed us to read out the gene expression profiles of thousands of cells in a fairly short time and then compared the profiles to find the rare cells.”
Testing and validating the algorithm
The researchers used five data sets to test the algorithm. In the case of peripheral blood containing 0.3% megakaryocytes, the gene expression of about 68,000 different cells was compared, and rare cell populations with different grades of rarity showed up. The cluster with the rarest cells comprised of only megakaryocytes, thus validating the algorithm.
In a simulation experiment to evaluate the performance of FiRE algorithm, the gene expression profiles of two types of cells were mixed in vitro. And by increasing the percentage (from 0.5 to 5%) of one cell type, the team tested the precision and sensitivity of FiRE and other existing algorithms to correctly identify the rare cells. The sensitivity of the FiRE algorithm was higher than the rest even when rare cells comprised 0.5% of the population. “When they constituted 2.5%, FiRE could identify rare cells with 85% accuracy, far higher than the other algorithms,” says Aashi Jindal from IIT Delhi and the other first author of the paper.
Potential of the algorithm
“We are now validating the new cell type [pars tuberalis] discovered using FiRE. Most malignant cancers shed circulating tumour cells. So we are also trying to use our algorithm for early cancer detection by identifying the circulating tumour cells, which are rare in peripheral blood,” says Prof. Sengupta, whose lab pioneered single-cell genomics research in India.