AI against errors

New algorithms make cell data comparable

28 February 2019
Deep Count Autoencoder denoises scRNA-seq data

The deep count autoencoder denoises scRNA-seq data by learning the underlying true zero-noise data manifold using an autoencoder framework.

How do different cell types behave in the tissue? Researchers use the single-cell analysis to investigate this. They sequence individual cells and find out which genes are currently being expressed. However, this method is prone to measurement errors.

A team from the Technical University of Munich (TUM), the Helmholtz Center Munich, and the English Wellcome Sanger Institute has developed new algorithms which predict and correct such sources of error with the use of artificial intelligence (AI).

Human Cell Atlas: Cell database for doctors

Personalized medicine is the vision of the international Human Cell Atlas. This project aims to create a reference database that allows doctors to better diagnose and treat diseases. To do this, the researchers map all the cells of the human body.

This is made possible by single-cell RNA sequencing. With this method, it can be determined which genes play a role for the production of a cell. This requires extremely fine measurements.

New measure for the batch effect

The devices used for this measurements, the environment or cell biology itself often cause errors - the so-called batch effect. There exist several models for the correction of the batch effect. However, these are highly dependent on the actual magnitude of the effect.

To quantify the differences between the experiments, Fabian Theis - Professor of Mathematical Modeling of Biological Systems at TUM and Director of the Institute of Computational Biology at Helmholtz Zentrum München - and his team have developed a new measure called kBET. This new method is a powerful tool for comparing batch-effect correction schemes and is extremly important data-integration in initiatives such as the Human Cell Atlas. The results are published in Nature Methods.

Intelligent algorithm detects dropout events

Another challenge is dropout events: "Let’s say we sequence a cell and observe that a particular gene in the cell does not emit any signal at all. The underlying cause of this can be biological or technical in nature: either the gene is not being read by the sequencer because it’s simply not expressed, or it could not be detected for technical reasons", says Fabian Theis.

To identify the cause of a dropout event, Theis' research group has developed a deep learning algorithm: the software is based on a probability model and compares the original and the reconstructed data. "Our new algorithm is one of the first in the area of single-cell genomics to be based on neural networks and is the fastest in this field so far", says Theis.

Theis explains: "Our chief goal is to identify and correct errors. We’re able to share these data, which are as accurate as possible, with our colleagues worldwide and compare our results with theirs", explains Theis. The research results have been published in Nature Communications.


This article is based on the press releases of the TUM "AI finds errors in RNA analysis" and the Helmholtz Center Munich "Using artificial intelligence for error correction in single cell analyses".