AI spots diseased cells

Mapping single-cell data to reference atlases

31 August 2021
Kartierung neuer Zellkohorten von gesunden Personen und COVID-19-Erkrankten auf einem Referenzatlas für gesunde Zellen (Hellblau: Gesunde Referenzzellen. Blau: Neue Zellen von gesunden Personen. Schwarz: Neue Zellen von COVID-19-Erkrankten mit moderatem Verlauf. Rot: Neue Zellen von schwer an COVID-19 Erkrankten.)

Mapping new cohorts of cells of healthy individuals and COVID-19 patients onto a healthy cells reference atlas (Light blue: Healthy reference patients. Blue: New healthy patients. Black: New moderate COVID-19 patients. Red: New severe COVID-19 patients.)

Pinpoint cells in disease: Researchers at the Technical University of Munich (TUM) and Helmholtz Zentrum München have developed an algorithm for this purpose. It is based on artificial intelligence (AI) and efficiently compares the cells of patients with a reference atlas of healthy cells.

The Human Cell Atlas

The Human Cell Atlas is the world's largest, continuously growing single-cell reference atlas. It contains references of millions of cells across tissues, organs and developmental stages. These references help physicians to understand the influences of aging, environment and disease on a cell – and ultimately diagnose and treat patients better.

Single-cell atlases are now routinely generated and serve as a reference for analysis of smaller studies. However, using them in personalized medicine is fraught with challenges: Single-cell datasets may contain measurement errors (batch effect), the global availability of computational resources is limited and the sharing of raw data is often legally restricted,

scArches: deep learning strategy for data query

Mohammad Lotfollahi, a team leader at Helmholtz Zentrum and doctoral student at TUM, and Fabian Theis, Professor of Mathematical Modeling of Biological Systems at TUM and Director of the Institute of Computational Biology at Helmholtz Zentrum München, developed an algorithm called 'scArches', short for 'Single-Cell Architecture Surgery'. This maps query data sets to a reference.

The biggest advantage: "Instead of sharing raw data between clinics or research centers, the algorithm uses transfer learning to compare new datasets from single-cell genomics with existing references and thus preserves privacy and anonymity. This also makes annotating and interpreting of new data sets very easy and democratizes the usage of single-cell reference atlases dramatically," says Mohammad Lotfollahi.

The researchers present their results in the journal Nature Biotechnology: Mapping single-cell data to reference atlases by transfer learning

Research on COVID-19

The researchers applied scArches to study COVID-19 in several lung bronchial samples. They compared the cells of COVID-19 patients to healthy references using single-cell transcriptomics. The algorithm was able to separate diseased cells from the references and thus enabled the user to pinpoint the cells in need for treatment, for both mild and severe COVID-19 cases. Biological variation between patients did not affect the quality of the mapping process.

"Our vision is that in the future we will use cell references as easily as we nowadays do for genome references," says Fabian Theis, explaining, "In other word, if you want to bake a cake, you usually do not want to try coming up with your own recipe – instead you just look one up in a cookbook. With scArches, we formalize and simplify this lookup process."