2019-05-03

Drop unused levels from a factor in R

Problem

If we filter a data frame containing a factor and then perform any operation, such as creating a contingency table, R will still show the unused levels. Subsetting does not in general drop unused levels.

df <- data.frame(name = c("a", "a", "a", "b", "b", "c", "c", "c", "c"), x = 1:9)
library(dplyr)
aa <-  df %>%
  group_by(name) %>%
  filter(n() < 4) %>% 
  droplevels()
table(aa$name)
In our example, the level c is still included in the results. We'd like to remove it and display only the used levels a and b.

# Resultado
a b c 
3 2 0
# Resultado deseado
a b 
3 2

Solution

There are two alternatives, the function droplevels or factor.

table(droplevels(aa$name))
table(factor(aa$name))
If we are using dplyr and the pipe operator:

aa <-  df %>%
  group_by(name) %>%
  filter(n() < 4) %>% 
  droplevels()
table(aa$name)

# Better still
df %>%
  group_by(name) %>%
  filter(n() < 4) %>% 
  droplevels() %>% 
  {table(.$name)}

Related posts

References

No hay comentarios:

Publicar un comentario

Nube de datos