Study: COVID-19 lung autopsy samples 106792 cells

Single nucleus and single cell transcriptomic analysis of COVID-19 lung samples

COVID-19 Autopsy Study Summary

Identifying what drives pathology in patients with severe SARS-CoV-2 infection is critical for understanding and treating COVID-19. To address this, we surveyed, at single-cell resolution, the gene expression programs induced by SARS-CoV-2 infection in tissues from severe cases. We have generated a tissue bank of approximately 420 specimens and up to 10 organs from COVID-19 autopsy donors, spanning a wide range of comorbidities, to determine the tissue and immune cellular response to severe COVID-19 infection. The lung atlas presented here is the first in a series of data releases covering different organs profiled from COVID-19 patients.

Lung Study Summary

Here, we share an early release of this SARS-CoV-2 lung atlas to enable the research community to study COVID-19 pathogenesis. We have built an atlas of 106,792 single cells and nuclei collected from the lungs of 16 SARS-CoV-2 infected COVID-19 autopsy donors (23 total samples), ranging from >30 to >80 years of age. The autopsies were performed across three hospitals in the Boston area. Sequencing data was demultiplexed into FASTQ files, aligned to a custom-built joint human and SARS-CoV-2 genome using the Cumulus cellranger_workflow, and ambient RNA was removed using CellBender. These individual count matrices underwent quality control using the Cumulus single cell analysis workflow, and then they were pooled together and batch corrected using Harmony-PyTorch. Draft cell type assignments were made using both an automatic prediction approach and manual curation of marker genes and gene signatures.  We provide the gene expression matrix, cell clustering, UMAP dimensionality reduction coordinates, draft cell type assignments, and associated metadata on the Single Cell Portal. In parallel, another SARS-CoV-2 lung atlas, generated at Columbia University Medical Center and New York Presbyterian Hospital, is being released at:   Count data for the lung spatial transcriptomics presented in our bioRxiv preprint is available on GEO under accession number GSE162911

Data for other tissue types in this study can be found here: 

Heart -

Kidney -

Liver -




This is an early data release from our COVID-19 lung and tissue atlas. The analyses presented herein are preliminary and subject to change. Nevertheless, we are releasing our data at this stage so that the research community can access them as soon as possible and work in parallel to examine COVID-19 pathogenesis. We will continue to update our cell and nucleus annotations, and query their biology, as well as ultimately publish these data in a peer reviewed journal. Detailed experimental and computational methods are available in a preprint of our study on bioRxiv. By accessing these data now for discovery efforts, you agree not to submit any manuscripts that contain analyses of these data until our study of this data is published in a peer-reviewed journal.