2020-11-11

Cómo crear variables ficticias en R

Title

Problema

Queremos crear variables ficticias basadas en determinadas variables en R. En nuestro ejemplo, basadas en las variables Sex y Embarked.

  PassengerId Survived Pclass    Sex Age SibSp Parch    Fare Embarked Age.NA
1           1        0      3   male  22     1     0  7.2500        S      0
2           2        1      1 female  38     1     0 71.2833        C      0
3           3        1      3 female  26     0     0  7.9250        S      0
4           4        1      1 female  35     1     0 53.1000        S      0
5           5        0      3   male  35     0     0  8.0500        S      0
6           6        0      3   male  NA     0     0  8.4583        Q      1
df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L, 
0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L, 
1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"), 
    Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L, 
    1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25, 
    71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L, 
    1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"), 
    Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId", 
"Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", 
"Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5", 
"6"), class = "data.frame")

Solución

Utilizamos la función dummy.data.frame del paquete dummies. Por defecto creará variables ficticias por las variables que seas caracteres o factores.

library(dummies)
dummy.data.frame(df)

Resultado

Las columnas originales Sex y Embarked son reemplazadas por las columnas Sexfemale, Sexmale, EmbarkedC, EmbarkedQ y EmbarkedS.

  PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare
1           1        0      3         0       1  22     1     0  7.2500
2           2        1      1         1       0  38     1     0 71.2833
3           3        1      3         1       0  26     0     0  7.9250
4           4        1      1         1       0  35     1     0 53.1000
5           5        0      3         0       1  35     0     0  8.0500
6           6        0      3         0       1  NA     0     0  8.4583
  EmbarkedC EmbarkedQ EmbarkedS Age.NA
1         0         0         1      0
2         1         0         0      0
3         0         0         1      0
4         0         0         1      0
5         0         0         1      0
6         0         1         0      1

Referencias

No hay comentarios:

Publicar un comentario

Nube de datos