Spatial Reconstruction of Single-cell Gene Expression Data
Satija R* and Farrell J*, Gennert D, Schier AF* and Regev A*. Spatial reconstruction of single-cell gene expression data Nature Biotechnology 33(5):495–502 (May 2015). doi: 10.1038/nbt.3192
Contacts: Jeff Farrell (jfarrell@g.harvard.edu) and Rahul Satija (rsatija@nygenome.org)
Spatial localization is a key determinant of cellular fate and behavior, but methods for spatially resolved, transcriptome-wide gene expression profiling across complex tissues are lacking. RNA staining methods assay only a small number of transcripts, whereas single-cell RNA-seq, which measures global gene expression, separates cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns.
This study contains the single-cell transcriptomes of 1,152 zebrafish (Danio rerio) blastomeres from 50% epiboly stage (5.3 hours post-fertilization, just prior to gastrulation) generated during the development of Seurat. We applied Seurat to spatially map 851 of those cells (those of high quality and belonging to the deep layer of the embryo) and generated a transcriptome-wide map of spatial patterning. We confirmed Seurat’s accuracy using several experimental approaches, then used the strategy to identify a set of archetypal expression patterns and spatial markers. Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups.

As input, Seurat takes single-cell RNA-seq data (1) from dissociated cells (e.g., cells A–C), where information about the original spatial context was lost during dissociation, and (2) in situ hybridization patterns for a series of landmark genes. To generate a binary spatial reference map, the tissue of interest is divided into a discrete set of user-defined bins, and the in situ data are binarized to reflect the detection of gene expression within each bin, as is shown for genes X, Y and Z. (3) Seurat uses expression measurements across many correlated genes to ameliorate stochastic noise in individual measurements for landmark genes. As schematized, Seurat learns a model of gene expression for each of the landmark genes based on other variable genes in the data set, reducing the reliance on a single measurement, and mitigating the effect of technical errors. Seurat then builds statistical models of gene expression in each bin (4) by relating the bimodal expression patterns of the RNA-seq estimates to the binarized in situ data. Shown are probability distributions for genes X, Y and Z for three different embryonic bins. Finally, Seurat uses these models to infer the cell’s original spatial location (5), assigning posterior probability of origin (depicted in shades of purple) to each bin. Seurat can map exclusively to one bin (e.g., cell C), or assign probability to multiple bins in some cases (e.g., cells A and B).
