<< Better Color Scales for Gene Expression Plots
December 9, 2021
Which color scale do you think that SCP should use by default? Take this 2-minute survey to let us know!
One of the Single Cell Portal’s goals is to build interactive plots that make it easy for scientists to explore the patterns in single-cell data. Given that, we’re always looking for ways to make our plots easier to interpret. As anyone who has made a paper figure can attest, there is a real art to this -- how do you make the data “sing” so that a viewer can instantly see the interesting patterns that it holds, while remaining honest about the complexity and noise in the data? Recently, we made some changes that improve the data’s ability to sing on SCP. Specifically, we updated the color scales that we use to plot gene expression data.
To understand the art behind plotting gene expression data well, it’s useful to understand how these plots fit in with the rest of SCP’s visualizations. When you visit a study on SCP, you often see a plot like this:
In this plot, a high-dimensional matrix – storing the expression measured for many genes in many cells – has been reduced down to two or three dimensions. Each point on this plot (typically one cell or bead) is colored according to some kind of metadata – in this example, the cell type.
As you’re exploring this plot, you might wonder how gene expression varies between the cell-type clusters. To explore this question, you can search for a gene in the search bar on the top-left of the study’s “Explore” tab:
When you do this, you’ll see a new plot appear next to the original cluster plot. This new plot shows the measured expression for the gene you searched for, in every cell (or bead). You can then compare these two plots to learn whether the gene is expressed more in some cell types than in others:
For these gene expression plots, the art of data visualization is strongly tied to the color scale used to color the data. For example, in the plot above, cells with 0 measured expression are plotted in a muted color (grey), and the rest of the data are plotted in a vibrant color (shades of red). Because of this contrast, the data with non-zero values stand out more. This is good if you believe that cells with 0 measured expression are unimportant, or possibly noise. This is often the case with gene expression data. However, if you think that it’s important to see how many cells had 0 measured expression, this kind of color scale may obscure the data’s true pattern. Color scales also affect who can see your data, as people with colorblindness may have difficulty perceiving the variations in some color scales.
In fact, there are many rules governing color scale usage, and we encountered another that’s particularly relevant for gene expression data: darker colors tend to wash out lighter colors. Originally, our color scales mapped low expression values to darker colors and high expression values to brighter colors. For example, the Viridis color scale shown below plots low values in dark blue and high values in bright yellow:
This mapping might make sense if the gene expression values were evenly distributed in the data. But, in most of SCP’s datasets this isn’t the case; instead, the data contain a large number of cells with near-0 measured expression, and a relatively small number with high measured expression. The result is that the cells with near-0 measured expression were plotted in a dark color, which swamped the high-expression data. For example:
As a result, the data don’t sing very loudly -- you can barely see in which cells this gene was highly expressed.
To mitigate this effect, we simply flipped the color scales. Now cells with low measured expression are plotted in bright colors, and cells with high measured expression are plotted in dark colors. The difference is striking -- here’s the same data, plotted with the flipped color scale. You can now easily pick out which clusters contain cells where this gene was expressed:
These color scales are now flipped by default for gene expression plots in SCP, so that low values are always mapped to lighter colors. You can check them out by going to any study’s “Explore” tab, searching for a gene, and selecting a color scale under the “continuous color scale” menu on the right-hand panel.
All of this work raises another question: what should the color scale be by default ? Let us know what you think in this 2-minute survey! When you do so, keep in mind that there are many ways that color scales can affect how the data are interpreted, beyond those mentioned above. For example, the variations within a color scale have a big impact on the data. Color scales that are perceptually uniform match the steps in the color scale to the steps in our perception of the differences between colors. So, the colors used for “0” and “0.1” look just as different from each other as the colors used for “0.9” and “1” do. Color scales that aren’t perceptually uniform can make certain differences “pop” more than they should. As another example, the values mapped to the color scale’s limits affect how much variation you can see -- an outlier can drag out the gradient so much that you can barely see any of the variation within the data. Let us know your thoughts on color scales, so that we can continue to improve SCP’s visualizations!