2020-12-07

How to combine multiple conditions to subset a data frame using “OR”?

Title

Problem

We want to subset a data frame based on multiple conditions using "OR". In our example we want to subset the data frame to include all rows where v1 is less than 0.5 or rows where v2 is equal to g.

Original data frame

           v1 v2
1  0.26550866  a
2  0.37212390  b
3  0.57285336  c
4  0.90820779  d
5  0.20168193  e
6  0.89838968  f
7  0.94467527  g
8  0.66079779  h
9  0.62911404  i
10 0.06178627  j

Expected output

          v1 v2
1 0.26550866  a
2 0.37212390  b
3 0.20168193  e
4 0.94467527  g
5 0.06178627  j
set.seed(1)
df <- data.frame(v1 = runif(10), v2 = letters[1:10])

Solution

There are multiple options:

  • Base functions
  • subset(df , v1 < 0.5 | v2 == "g")
    df[which(df$v1 < 0.5 | df$v2 == "g"), ]
    

  • Operators [ and [[
  • df[df[1] < 0.5 | df[2] == "g", ] 
    df[df[[1]] < 0.5 | df[[2]] == "g", ] 
    df[df["v1"] < 0.5 | df["v2"] == "g", ]
    

    df$name is equivalent to df[["name", exact = FALSE]]

  • dplyr
  • library(dplyr)
    filter(df, v1 < 0.5 | v2 == "g")
    

  • sqldf
  • library(sqldf)
    sqldf('SELECT *
          FROM df 
          WHERE v1 < 0.5 OR v2 = "g")
    

References

No hay comentarios:

Publicar un comentario

Nube de datos