2020-11-11

How to create dummy variables in R

Title

Problem

We want to create dummy variables based on other variables in R. In our example based on variables Sex and Embarked.

  PassengerId Survived Pclass    Sex Age SibSp Parch    Fare Embarked Age.NA
1           1        0      3   male  22     1     0  7.2500        S      0
2           2        1      1 female  38     1     0 71.2833        C      0
3           3        1      3 female  26     0     0  7.9250        S      0
4           4        1      1 female  35     1     0 53.1000        S      0
5           5        0      3   male  35     0     0  8.0500        S      0
6           6        0      3   male  NA     0     0  8.4583        Q      1
df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L, 
0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L, 
1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"), 
    Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L, 
    1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25, 
    71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L, 
    1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"), 
    Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId", 
"Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", 
"Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5", 
"6"), class = "data.frame")

Solution

We use the function dummy.data.frame from the package dummies. By default it will expand dummy variables for character and factor classes.

library(dummies)
dummy.data.frame(df)

Results

The original columns Sex and Embarked are replaced by the dummy variable columns Sexfemale, Sexmale, EmbarkedC, EmbarkedQ and EmbarkedS.

  PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare
1           1        0      3         0       1  22     1     0  7.2500
2           2        1      1         1       0  38     1     0 71.2833
3           3        1      3         1       0  26     0     0  7.9250
4           4        1      1         1       0  35     1     0 53.1000
5           5        0      3         0       1  35     0     0  8.0500
6           6        0      3         0       1  NA     0     0  8.4583
  EmbarkedC EmbarkedQ EmbarkedS Age.NA
1         0         0         1      0
2         1         0         0      0
3         0         0         1      0
4         0         0         1      0
5         0         0         1      0
6         0         1         0      1

References

No hay comentarios:

Publicar un comentario

Nube de datos