HTAPP MBC structured data download

Matched single cell/nucleus and spatial data are here made available in bundles grouped by sample.

 

The unregistered data bundle (unregistered_data_bundle.tar.zip) contains the following data types and representations for each sample where available, before spatial registration of the serial sections:

Data typeData representations
10x single cell/nucleuscounts.tsv, annot.tsv, anndata.h5ad
slide_seqcounts.tsv, annot.tsv, anndata.h5ad
exseqcounts.tsv, reads.tsv, annot.tsv, anndata.h5ad, anndata_bin.h5ad
merfishcounts.tsv, reads.tsv, annot.tsv, anndata.h5ad, anndata_bin.h5ad
codexcounts.tsv, annot.tsv, anndata.h5ad
HE  imageslowres.jpg, highres.jpg

Data representations explanation

Data representationExplanations
counts.tsvobservationXgene matrix containing raw counts (scRNAseq, slide_seq, exseq, merfish) or intensities (codex)
annot.tsvobservations annotations including coordinates
reads.tsvSingle molecule coordinates and type (gene)
anndata.h5adanndata object containing counts and annotations for segmented cells or beads as observations
anndata_bin.h5adanndata object containing counts and annotations for 10x10 um bins as observations
lowres.jpglow resolution microscopy image
highres.jpghigh resolution microscopy image

 

The registered data bundle (registered_data_bundle.tar.zip) contains the following data types and representations for each sample where available, after spatial registration of the serial sections, annotation and processing:

Data typeData representations
10x single cell/nucleusanndata_processed.h5ad
slide_seqanndata_processed.h5ad
exseqanndata_processed.h5ad, anndata_bin_processed.h5ad
merfishanndata_processed.h5ad, anndata_bin_processed.h5ad
codexanndata_processed.h5ad
HE  imagesprocessed.jpg

Anndata structure for sc/snRNAseq data (example):

AnnData object with n_obs × n_vars = 11074 × 5000    

obs: 'replicate', 'condition', 'cell_type', 'labels_unif', 'labels_cl_unif', 'labels_cl_unif2_broad', 'compartments', 'cnv_pass_mal', 'Phase', 'n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'n_counts', 'n_genes', 'total_counts_mt', 'leiden'    

var: 'gene_ids', 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'mean', 'std'    

uns: 'cell_type_colors', 'counts_var', 'hvg', 'leiden', 'neighbors', 'pca', 'umap'    

obsm: 'X_pca', 'X_umap', 'counts'    

varm: 'PCs'    

obsp: 'connectivities', 'distances'

 

Anndata structure for spatial data (example):

AnnData object with n_obs × n_vars = 11641 × 291    

obs: 'x_orig', 'y_orig', 'replicate', 'n_counts', 'n_genes', 'x', 'y', 'Fibrosis_1', 'ImmuneCells_1', 'Unidentifiable_1', 'Tumor_1', 'Tumor_2', 'Fibrosis_2', 'ImmuneCells_2', 'Unidentifiable_2', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'RCTD', 'OT', 'OT_max'    

var: 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'mean', 'std'    uns: 'counts_var', 'hvg', 'leiden', 'neighbors', 'pca', 'squares_1000', 'umap'    

obsm: 'OT', 'OT_compartment', 'OT_max', 'OT_max_compartment', 'RCTD', 'RCTD_compartment', 'X_pca', 'X_umap', 'counts'    

varm: 'PCs'    

obsp: 'connectivities', 'distances'

 

The fully processed data are also provided as anndata.h5ad files that contain all samples for each data type and annotations including cell types and registered spatial coordinates:

  • scRNAseq.h5ad
  • slide_seq.h5ad
  • exseq.h5ad
  • exseq_bin.h5ad
  • merfish.h5ad
  • merfish_bin.h5ad
  • codex.h5ad