HTAPP MBC structured data download
Matched single cell/nucleus and spatial data are here made available in bundles grouped by sample.
The unregistered data bundle (unregistered_data_bundle.tar.zip) contains the following data types and representations for each sample where available, before spatial registration of the serial sections:
| Data type | Data representations |
|---|---|
| 10x single cell/nucleus | counts.tsv, annot.tsv, anndata.h5ad |
| slide_seq | counts.tsv, annot.tsv, anndata.h5ad |
| exseq | counts.tsv, reads.tsv, annot.tsv, anndata.h5ad, anndata_bin.h5ad |
| merfish | counts.tsv, reads.tsv, annot.tsv, anndata.h5ad, anndata_bin.h5ad |
| codex | counts.tsv, annot.tsv, anndata.h5ad |
| HE images | lowres.jpg, highres.jpg |
Data representations explanation
| Data representation | Explanations |
|---|---|
| counts.tsv | observationXgene matrix containing raw counts (scRNAseq, slide_seq, exseq, merfish) or intensities (codex) |
| annot.tsv | observations annotations including coordinates |
| reads.tsv | Single molecule coordinates and type (gene) |
| anndata.h5ad | anndata object containing counts and annotations for segmented cells or beads as observations |
| anndata_bin.h5ad | anndata object containing counts and annotations for 10x10 um bins as observations |
| lowres.jpg | low resolution microscopy image |
| highres.jpg | high resolution microscopy image |
The registered data bundle (registered_data_bundle.tar.zip) contains the following data types and representations for each sample where available, after spatial registration of the serial sections, annotation and processing:
| Data type | Data representations |
|---|---|
| 10x single cell/nucleus | anndata_processed.h5ad |
| slide_seq | anndata_processed.h5ad |
| exseq | anndata_processed.h5ad, anndata_bin_processed.h5ad |
| merfish | anndata_processed.h5ad, anndata_bin_processed.h5ad |
| codex | anndata_processed.h5ad |
| HE images | processed.jpg |
Anndata structure for sc/snRNAseq data (example):
AnnData object with n_obs × n_vars = 11074 × 5000
obs: 'replicate', 'condition', 'cell_type', 'labels_unif', 'labels_cl_unif', 'labels_cl_unif2_broad', 'compartments', 'cnv_pass_mal', 'Phase', 'n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'n_counts', 'n_genes', 'total_counts_mt', 'leiden'
var: 'gene_ids', 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'mean', 'std'
uns: 'cell_type_colors', 'counts_var', 'hvg', 'leiden', 'neighbors', 'pca', 'umap'
obsm: 'X_pca', 'X_umap', 'counts'
varm: 'PCs'
obsp: 'connectivities', 'distances'
Anndata structure for spatial data (example):
AnnData object with n_obs × n_vars = 11641 × 291
obs: 'x_orig', 'y_orig', 'replicate', 'n_counts', 'n_genes', 'x', 'y', 'Fibrosis_1', 'ImmuneCells_1', 'Unidentifiable_1', 'Tumor_1', 'Tumor_2', 'Fibrosis_2', 'ImmuneCells_2', 'Unidentifiable_2', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'RCTD', 'OT', 'OT_max'
var: 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'mean', 'std' uns: 'counts_var', 'hvg', 'leiden', 'neighbors', 'pca', 'squares_1000', 'umap'
obsm: 'OT', 'OT_compartment', 'OT_max', 'OT_max_compartment', 'RCTD', 'RCTD_compartment', 'X_pca', 'X_umap', 'counts'
varm: 'PCs'
obsp: 'connectivities', 'distances'
The fully processed data are also provided as anndata.h5ad files that contain all samples for each data type and annotations including cell types and registered spatial coordinates:
- scRNAseq.h5ad
- slide_seq.h5ad
- exseq.h5ad
- exseq_bin.h5ad
- merfish.h5ad
- merfish_bin.h5ad
- codex.h5ad
