<< Search across Data Portals in SCP
March 22, 2022
Imagine you’re a researcher who’s interested in a specific topic within single-cell genomics – for example, HIV. To make your own research more thoughtful and impactful, it’s often useful to survey the existing datasets that involve HIV, to get a sense of what questions people are asking about HIV, and what the answers are so far. Or, you might be interested in gathering data on a specific cell type or organ. Gathering data of the same type, across multiple datasets, could help you build a cohort of data – you’ll then be able to run analyses across these datasets, with more statistical power to pick up small effects than if you were to analyze a single dataset. Doing so also allows you to find results that are more robust to the smaller details that set studies apart, such as the specific organisms that were sampled.
While gathering data from multiple studies can be scientifically useful, in practice it’s often tedious and time-consuming. Often interesting datasets are dispersed among several data portals, and searching through each one in turn can be a bit of a headache. To address this challenge of dispersed data, Single Cell Portal has just released a new feature: cross-dataset search. With cross-dataset search, users can launch a single search through SCP that will be compared to data that have been uploaded to SCP directly, and data from outside sources. Right now, external data are drawn from the Human Cell Atlas data portal, which houses a rich repository of single-cell data that have passed through standardized quality-control pipelines.
You can try this out yourself by running a search through the SCP homepage. For example, if you’re looking for data on HIV in humans, you can run a search by selecting “HIV Infectious Disease” from the “disease” menu, and “homo sapiens” from the “species” menu:
When you click “apply”, SCP will return studies with human HIV data, based on the studies’ metadata.
You’ll notice that some studies have a label that says “Human Cell Atlas”. These are studies that came from the HCA data portal. If you click on one of these studies, you’ll go to the HCA data portal’s project overview page, where you can find more information on associated publications, contributing researchers, and the experimental protocols.
The studies that don’t have this HCA label are data that are native to SCP. Click on one of these studies to explore the data through SCP’s interactive visualizations.
It’s also easy to download the results of this search. First, click on the “download” button on the right-hand side of the screen:
Then, choose which studies and files you want to download, and click “next”:
This will generate a CURL command. Copy this command into your terminal (if you’re a Windows user, we recommend using Windows Powershell as an administrator) and the data should download to your computer:
On the horizon, we plan to include data from more external sources beyond HCA, to continue reducing the barriers to aggregating and exploring single-cell data across datasets. Let us know if you have questions or suggestions for this tool by emailing scp-support (at) broadinstitute.zendesk.com.