2020-12-01

Show only high density areas with stat_density_2d with ggplot2

Title

Problem

We have the following scatterplot for two categorical variables.

When we create a 2D-density plot, we obtain overlapping densities. We want to control the number of contour bins.

library(ggplot2)
set.seed(123)
plot_data <-
  data.frame(
    X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
    Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
    Label = c(rep('A', 300), rep('B', 150))
  )

ggplot(plot_data, aes(X, Y, colour = Label)) + geom_point()
ggplot(plot_data, aes(X, Y)) +
  stat_density_2d(geom = "polygon", aes(alpha = ..level.., fill = Label))

Solution

  • Option 1
  • By adding to stat_density_2d the argument bins (number of contour bins) we definitely avoid overplotting, control and draw the attention to a number of density areas in a very economical fashion.

    ggplot(plot_data, aes(X, Y, group = Label)) +
      stat_density_2d(geom = "polygon",
                      aes(alpha = ..level.., fill = Label),
                      bins = 4) 
    
  • Option 2
  • Assigning manually the colours, NA for those levels we do not want to plot. The main disadvantage is that we should know the number of values needed by scale_fill_manual in advance. In our example, we need to pass 7 values in manual scale.

    ggplot(plot_data, aes(X, Y, group = Label)) +
      stat_density_2d(geom = "polygon", aes(fill = as.factor(..level..))) +
      scale_fill_manual(values = c(NA, NA, NA, "#BDD7E7", "#6BAED6", "#3182BD", "#08519C"))
    

References

No hay comentarios:

Publicar un comentario

Nube de datos