Problem
We want to subset a data frame based on multiple conditions using "OR". In our example we want to subset the data frame to include all rows where v1 is less than 0.5 or rows where v2 is equal to g.
Original data frame
v1 v2
1 0.26550866 a
2 0.37212390 b
3 0.57285336 c
4 0.90820779 d
5 0.20168193 e
6 0.89838968 f
7 0.94467527 g
8 0.66079779 h
9 0.62911404 i
10 0.06178627 j
Expected output
v1 v2
1 0.26550866 a
2 0.37212390 b
3 0.20168193 e
4 0.94467527 g
5 0.06178627 j
set.seed(1)
df <- data.frame(v1 = runif(10), v2 = letters[1:10])
Solution
There are multiple options:
- Base functions
subset(df , v1 < 0.5 | v2 == "g")
df[which(df$v1 < 0.5 | df$v2 == "g"), ]
df[df[1] < 0.5 | df[2] == "g", ]
df[df[[1]] < 0.5 | df[[2]] == "g", ]
df[df["v1"] < 0.5 | df["v2"] == "g", ]
df$name is equivalent to df[["name", exact = FALSE]]
library(dplyr)
filter(df, v1 < 0.5 | v2 == "g")
library(sqldf)
sqldf('SELECT *
FROM df
WHERE v1 < 0.5 OR v2 = "g")
References
No hay comentarios:
Publicar un comentario