Problema
Deseamos extraer una muestra aleatoria de filas por grupos de un data frame en R.
Solución
Veremos dos ejemplos: con un grupo o varios (dos para simplificar).
Un grupo
Extraemos 3 registros de cada una de las especias: setosa, versicolor y virginica.
set.seed(1)
iris1 <- lapply(split(iris, iris$Species), function(x) x[sample(nrow(x), 3), ])
do.call("rbind", iris1)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
setosa.14 4.3 3.0 1.1 0.1 setosa
setosa.19 5.7 3.8 1.7 0.3 setosa
setosa.28 5.2 3.5 1.5 0.2 setosa
versicolor.96 5.7 3.0 4.2 1.2 versicolor
versicolor.60 5.2 2.7 3.9 1.4 versicolor
versicolor.94 5.0 2.3 3.3 1.0 versicolor
virginica.148 6.5 3.0 5.2 2.0 virginica
virginica.133 6.4 2.8 5.6 2.2 virginica
virginica.131 7.4 2.8 6.1 1.9 virginica
library(dplyr)
set.seed(1)
iris %>%
group_by(Species) %>%
sample_n(., 3)
Source: local data frame [9 x 5]
Groups: Species
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.3 3.0 1.1 0.1 setosa
2 5.7 3.8 1.7 0.3 setosa
3 5.2 3.5 1.5 0.2 setosa
4 5.7 3.0 4.2 1.2 versicolor
5 5.2 2.7 3.9 1.4 versicolor
6 5.0 2.3 3.3 1.0 versicolor
7 6.5 3.0 5.2 2.0 virginica
8 6.4 2.8 5.6 2.2 virginica
9 7.4 2.8 6.1 1.9 virginica
Dos gruposPor cada número de cilindros de los coches (4, 6 u 8) extraemos dos con transmisión automática = 0 y dos con transmisión manual = 1.
set.seed(1)
mtcars1 <- lapply(split(mtcars, list(mtcars$cyl, mtcars$am)), function(x) x[sample(nrow(x), 2), ])
do.call("rbind", mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb
0.4.Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
0.4.Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
1.4.Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
1.4.Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
0.6.Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
0.6.Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
1.6.Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
1.6.Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
0.8.Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
0.8.Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
1.8.Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
1.8.Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
set.seed(1)
mtcars %>%
group_by(cyl, am) %>%
sample_n(., 2)
Source: local data frame [12 x 11]
Groups: cyl, am
mpg cyl disp hp drat wt qsec vs am gear carb
1 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
2 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
3 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
4 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
5 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
6 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
7 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
8 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
9 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
10 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
11 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
12 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Entradas relacionadas
No hay comentarios:
Publicar un comentario