Nube de datos: Show only high density areas with stat_density_2d with ggplot2

2020-12-01

Show only high density areas with stat_density_2d with ggplot2

Title

Problem

We have the following scatterplot for two categorical variables.

When we create a 2D-density plot, we obtain overlapping densities. We want to control the number of contour bins.

library(ggplot2)
set.seed(123)
plot_data <-
  data.frame(
    X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
    Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
    Label = c(rep('A', 300), rep('B', 150))
  )

ggplot(plot_data, aes(X, Y, colour = Label)) + geom_point()
ggplot(plot_data, aes(X, Y)) +
  stat_density_2d(geom = "polygon", aes(alpha = ..level.., fill = Label))

Solution

Option 1

stat_density_2d

ggplot(plot_data, aes(X, Y, group = Label)) +
  stat_density_2d(geom = "polygon",
                  aes(alpha = ..level.., fill = Label),
                  bins = 4)

Option 2

scale_fill_manual

ggplot(plot_data, aes(X, Y, group = Label)) +
  stat_density_2d(geom = "polygon", aes(fill = as.factor(..level..))) +
  scale_fill_manual(values = c(NA, NA, NA, "#BDD7E7", "#6BAED6", "#3182BD", "#08519C"))

References

stackoverflow

No hay comentarios:

Publicar un comentario

Suscribirse a: Enviar comentarios (Atom)