2019-08-25

How to rename suffixes when joining tables in dplyr

Problem

When joining two tables with dplyr, the suffixes .x and .y will be added to the non-joined duplicate variables to disambiguate them. In our example, to the column mpg: mpg.x y mpg.y. How can override these suffixes?

library(dplyr) 
left_join(mtcars, mtcars[, c("mpg", 'cyl')], by = c("cyl")) %>% head()
  • Output
  •   mpg.x cyl disp  hp drat   wt  qsec vs am gear carb mpg.y
    1    21   6  160 110  3.9 2.62 16.46  0  1    4    4  21.0
    2    21   6  160 110  3.9 2.62 16.46  0  1    4    4  21.0
    3    21   6  160 110  3.9 2.62 16.46  0  1    4    4  21.4
    4    21   6  160 110  3.9 2.62 16.46  0  1    4    4  18.1
    5    21   6  160 110  3.9 2.62 16.46  0  1    4    4  19.2
    6    21   6  160 110  3.9 2.62 16.46  0  1    4    4  17.8
    

    Solution

    We can pass the argument suffix, a character vector of length 2 with the desired names. In our example, mpg_original (suffix _original) and mpg_new (suffix_new).

    left_join(mtcars, mtcars[,c("mpg","cyl")], 
                  by = c("cyl"), 
                  suffix = c("_original", "_new")) %>% head()
    
  • Output
  •   mpg_original cyl disp  hp drat   wt  qsec vs am gear carb mpg_new
    1           21   6  160 110  3.9 2.62 16.46  0  1    4    4    21.0
    2           21   6  160 110  3.9 2.62 16.46  0  1    4    4    21.0
    3           21   6  160 110  3.9 2.62 16.46  0  1    4    4    21.4
    4           21   6  160 110  3.9 2.62 16.46  0  1    4    4    18.1
    5           21   6  160 110  3.9 2.62 16.46  0  1    4    4    19.2
    6           21   6  160 110  3.9 2.62 16.46  0  1    4    4    17.8
    

    References

    2019-08-10

    Cómo representar una curva logarítmica en R

    Problema

    Queremos ajustar una curva logarítmica a un diagrama de dispersión.

    library(ggplot2)
    ggplot(mpg, aes(displ, hwy)) +
      geom_point()
    

    Solución

    Paquete base

    plot(hwy ~ displ, mpg)
    logEstimate <- lm(hwy ~ log(displ), data = mpg)
    curve(coef(logEstimate)[1] + coef(logEstimate)[2] * log(x), add = TRUE)
    
    ggplot2

    ggplot(mpg, aes(displ, hwy)) +
      geom_point() +
      stat_smooth(method = "lm", formula = y ~ log(x))
    

    Referencias

    How to plot a log curve in R

    Problem

    We want to fit a logarithmic curve to a scatterplot.

    library(ggplot2)
    ggplot(mpg, aes(displ, hwy)) +
      geom_point()
    

    Solution

    Base package

    plot(hwy ~ displ, mpg)
    logEstimate <- lm(hwy ~ log(displ), data = mpg)
    curve(coef(logEstimate)[1] + coef(logEstimate)[2] * log(x), add = TRUE)
    
    ggplot2

    ggplot(mpg, aes(displ, hwy)) +
      geom_point() +
      stat_smooth(method = "lm", formula = y ~ log(x))
    

    References

    2019-08-04

    How to do cross join in R?

    Problem

    We want to calculate a cross join, the Cartesian product of rows from tables —or elements of a vector— in the join. In other words, we want to combine each row from the first table with each row from the second table.

    Solution

    Two vectors

    • expand.grid
    • expand.grid(1:5, 1:5)
      
    • merge
    • merge(1:5, 1:5)
      
    • Cross join with sqldf
    • library(sqldf)
      df1 <- data.frame(a = 1:5)
      df2 <- df1
      sqldf("SELECT df1.a, df2.a FROM df1 
            CROSS JOIN df2")
      
    • Results
    •    x y
      1  1 1
      2  2 1
      3  3 1
      4  4 1
      5  5 1
      6  1 2
      7  2 2
      8  3 2
      9  4 2
      10 5 2
      11 1 3
      12 2 3
      13 3 3
      14 4 3
      15 5 3
      16 1 4
      17 2 4
      18 3 4
      19 4 4
      20 5 4
      21 1 5
      22 2 5
      23 3 5
      24 4 5
      25 5 5
      
    More than two vectors

    • expand.grid.
    • a <- c('europe', 'asia')
      b <- c('co2', 'temperature')
      c <- c('min', 'max')
      expand.grid(a, b, c)
      
          Var1        Var2 Var3
      1 europe         co2  min
      2   asia         co2  min
      3 europe temperature  min
      4   asia temperature  min
      5 europe         co2  max
      6   asia         co2  max
      7 europe temperature  max
      8   asia temperature  max
      

    References

    Nube de datos