2021-03-09

How to highlight specific data points in a scatter plot with ggplot2

Problem

We would like to highlight specific data points in a scatter plot created with ggplot2. In our example, we will use the mpg dataset (Fuel economy data from 1999 and 2008 for 38 popular models of car). We want to highlight in red and a larger point size those data points whose displ (engine displacement) are greater than 5 and hwy (highway miles per gallon) greater than 20.

Solution

  • Option 1
  • We add the conditions for both attributes, colour and size, inside geom_point. Then we control manually those using scale_colour_manual and scale_size_manual respectively. Finally, we remove the legend.

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy, colour = displ > 5 & hwy > 20, size = displ > 5 & hwy > 20)) + 
      scale_colour_manual(values = c("black", "red")) + 
      scale_size_manual(values =c(1.5, 3))+
      theme(legend.position = "none")
    
  • Option 2
  • We create two layers of data points using geom_point. The first layer include all data points in black (by default but e). The second layer adds the points we would like to highlight in red with a larger point size.

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy), colour= "black") +
      geom_point(data = subset(mpg, displ > 5 & hwy > 20), aes(x = displ, y = hwy), colour= "red", size = 3)
     
    In ggplot2 the layers are added sequentially. If we change the order of the layers, we would get the following result.

    ggplot(data = mpg) + 
       geom_point(data = subset(mpg, displ > 5 & hwy > 20), aes(x = displ, y = hwy), colour= "red", size = 3) +
       geom_point(mapping = aes(x = displ, y = hwy), colour= "black")
     

    No hay comentarios:

    Publicar un comentario

    Nube de datos