2019-07-16

How to control transparency with geom_rect in ggplot2

Problem

When trying to draw a translucent rectangle ussing geom_rect and the argument alpha we get an opaque rectangle.

  • Data
  • df1 <- structure(list(date = structure(c(1335744000, 1380499200, 1464652800, 1356912000, 
        1485820800, 1490918400, 1383177600, 1461888000, 1454025600, 1367280000, 1343692800, 
        1401408000, 1330473600, 1391126400, 1459382400, 1404086400, 1417132800, 1477872000, 
        1469750400, 1443571200, 1419984000, 1438300800, 1346371200, 1369958400, 1483056000, 
        1440979200, 1424995200, 1377820800, 1388448000, 1375228800, 1480464000, 1359590400, 
        1354233600, 1412035200, 1427760000, 1385683200, 1467244800, 1472601600, 1372377600, 
        1475193600, 1333065600, 1435622400, 1409270400, 1396224000, 1488240000, 1364515200, 
        1340928000, 1406764800, 1456704000, 1430352000, 1338422400, 1348790400, 1351641600, 
        1432857600, 1327968000, 1448841600, 1398816000, 1446163200, 1362009600, 1422576000, 
        1451520000, 1414713600, 1393545600), class = c("POSIXct", "POSIXt")), pb = c(3.24284787690623, 
        2.35203304295562, 1.13562266384702, 2.90837861538151, 1.97393507382208, 1.79790256367522, 
        2.50992970378761, 1.2966057820916, 0.892051550643623, 2.56310397446516, 2.53722570614735, 
        2.42427665519818, 3.40294643294178, 2.2456624603825, 1.06554620628802, 2.12883927956712, 
        1.65800890792078, 1.71460655379306, 1.45450176074979, 1.28199154762022, 1.51004082825039, 
        1.59579220438853, 2.48072865275449, 2.52511938910325, 1.77197981412129, 1.4666225767599, 
        1.65482654263216, 2.24097337718875, 2.39207143276774, 2.18796717170196, 1.78667497794161, 
        2.95189774752025, 2.70906851093917, 1.8620615761957, 1.48932926967017, 2.40482981571083, 
        1.3614263004647, 1.5052848414737, 2.02094466655017, 1.67901881433697, 3.13131652724628, 
        1.7081053507639, 2.15479184551088, 2.37902994058881, 1.88440485774789, 2.57891658188723, 
        2.43424745762712, 2.25929464919641, 0.913664833729333, 1.58426153545149, 2.71711735504797, 
        2.59023788287105, 2.68172936708349, 1.5228439100185, 3.47144812971019, 1.07692509768545, 
        2.46172899067256, 1.34932598268774, 2.86559619320822, 1.43158577260698, 1.06755701001995, 
        1.87542832179586, 2.41716851824514), return_index = c(3.33963134143023, 3.53315257934844, 
        2.24983743575813, 3.54713517594007, 3.17031433226149, 2.92415007661754, 3.72288287285945, 
        2.43858382371356, 1.78853472205546, 3.17524563672478, 2.99957813429811, 3.72169243241355, 
        3.39125767791388, 3.614770311344, 1.98808399128776, 3.61004165944114, 3.16597358572951, 
        2.74562414401218, 2.30169956340851, 2.58899122167033, 3.00735830908446, 2.97573979012093, 
        2.9863799905072, 3.38703452069432, 2.98242176129961, 2.83290428019513, 3.44566574584198, 
        3.47136232987663, 3.75521536603366, 3.36372318786495, 2.90514490677928, 3.58331595473466, 
        3.28803779749728, 3.46781820411579, 3.2620117615886, 3.69607811617486, 2.19921609345358, 
        2.40895876306335, 3.04629884791247, 2.66358570431915, 3.25119873581604, 3.04330107396291, 
        3.68720702192003, 3.66737118374507, 2.97502296939418, 3.18097138521817, 2.95437617876303, 
        3.88945888388189, 1.81411076858085, 3.36085071796473, 3.00333046534821, 3.15899275395851, 
        3.27461188875339, 3.32892263614407, 3.47144812971019, 2.10766875686013, 3.79609591229452, 
        2.68219490565046, 3.54425781082805, 2.99623108334085, 2.08090136364801, 3.4771813132669, 
        3.79370144175553)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -63L), 
        .Names = c("date", "pb", "return_index"))
    

  • Original code
  • library(ggplot2)
    min_date = min(df1$date) # data frame creado más abajo
    max_date = max(df1$date)
    ggplot(df1) +
        geom_rect(aes(xmin = min_date, xmax = max_date, ymin = -Inf, ymax = 3), fill = "palegreen", alpha = 0.2) +
        geom_line(aes(x = date, y = pb, colour = "P/B")) +
        geom_line(aes(x = date, y = return_index, colour = "return"))
    

Solution

Basically geom_rect is drawing rectangles, one for each row, on top of each other, thus rendering the object opaque. We can use two alternatives:

  1. Alternative 1: Using annotate instead of geom_rect.
  2. ggplot(df1) +
      annotate("rect", xmin = min_date, xmax = max_date, ymin = -Inf, ymax = 3, fill = "palegreen", alpha = 0.2) +
      geom_line(aes(x = date, y = pb, colour = "P/B")) +
      geom_line(aes(x = date, y = return_index, colour = "return"))
    
  3. Alternative 2: Removing the argument data = df1 from ggplot and add it to the required layers.
  4. ggplot() +
      geom_rect(aes(xmin = min_date, xmax = max_date, ymin = -Inf, ymax = 3), fill = "palegreen", alpha = 0.2) +
      geom_line(data= df1, aes(x = date, y = pb, colour = "P/B")) +
      geom_line(data= df1, aes(x = date, y = return_index, colour = "return"))
    

Result

2019-07-06

Convert time to a decimal number in Excel

Problem

We'd like to convert a time value to a decimal number in Excel. For instance hours:minutes:seconds (hh:mm:ss) to a decimal number Excel.

Solution

We multiply the time value by the corresponding factor.

  • To hours: multiply the time value by 24, total hours in a day.
  • To minutes: multiply the time value by 1,440, total minutes in a day (24*60).
  • To seconds: multiply the time value by 86,400 total seconds in a day (24*60*60).

Alternative

Using the function CONVERT to convert between time units.

  • To hours: =CONVERT(B4,"day","hr").
  • To minutes: =CONVERT(B4,"day","mn").
  • To seconds: =CONVERT(B4,"day","sec").

Notes

To properly format the cells, select the cells you need to format and press CTRL+1 and choose the format you want.

2019-07-03

How to calculate the percent of column total in R

Problem

We want to calculate the percent of column total in R. In our example, the percent of column freq: 7/397, 23/397, etc.

    x freq
1 Jan    7
2 Feb   23
3 Mar   86
4 Apr  281
Data:

df <- read.table(text = "x    freq
                        Jan   7
                        Feb   23
                        Mar   86
                        Apr   281", 
                        header = TRUE)

Solution

We create the percent of column total using the function prop.table.

df$prob <- prop.table(df$freq)
# Percentages with two decimal places
df$prob <- round(prop.table(df$freq), 4)*100
    x freq       prob
1 Jan    7 0.01763224
2 Feb   23 0.05793451
3 Mar   86 0.21662469
4 Apr  281 0.70780856
If we'd like to calculate the percent of a specific row, February in our example:
prop.table(df$freq)[df$x == "Feb"] 
 [1] 0.05793451

Alternatives

  • Base package
  • df$prob <- df$freq/sum(df$freq)
    
  • dplyr
  • library(dplyr)
    df %>% mutate(prob = prop.table(freq))
    # Or
    df %>% mutate(prob = freq / sum(freq))
    
    A specific row:
    df %>% filter(x == "Feb")
    
        x freq       prob
    1 Feb   23 0.05793451
    

References

2019-06-29

Transforming contingency tables into frequency tables in R

Problem

We want to tranform a contingency table into a frequency table in R.

# Contingency table
tbl <- table (mtcars[, c("am", "gear")])
   gear
am   3  4  5
  0 15  4  0
  1  0  8  5

Frequency tables

We transform the contingency table into a data frame.

df <- as.data.frame(tbl)
df
  am gear Freq
1  0    3   15
2  1    3    0
3  0    4    4
4  1    4    8
5  0    5    0
6  1    5    5

Transforming frequency tables in contingency tables

To transform a frequency table back to a contingency table.

ftable(xtabs(Freq ~ am + gear, data = df)) 
   gear  3  4  5
am              
0       15  4  0
1        0  8  5
It is the equivalent of:

ftable(mtcars[, c("am", "gear")])

References

2019-06-28

Proportion tables in R

Problem

We want to create proportion tables for one or multiple variables.

Solution

  • One variable
  • tabla <- table(mtcars$am)
    prop.table(tabla)
    
          0       1 
    0.59375 0.40625
    
  • Two variables
  • tabla <- table(mtcars[, c("am", "gear")])
    prop.table(tabla)
    
       gear
    am        3       4       5
      0 0.46875 0.12500 0.00000
      1 0.00000 0.25000 0.15625
    
    The prop.table function has two arguments:

    • x, table created with the function table
    • margin, with three possible values:
    •   Null - x/sum(x) default like in the previous example.
        1 - proportion calculated by rows.
        2 - proportion calculated by columns.

    # By row
    prop.table(tabla, 1)
    
       gear
    am          3         4         5
      0 0.7894737 0.2105263 0.0000000
      1 0.0000000 0.6153846 0.3846154
    
    # By column
    prop.table(tabla, 2)
    
       gear
    am          3         4         5
      0 1.0000000 0.3333333 0.0000000
      1 0.0000000 0.6666667 1.0000000
    
  • Three variables
  • tabla <- table(mtcars[, c("am", "gear", "cyl")])
    prop.table(tabla)
    
    , , cyl = 4
    
       gear
    am        3       4       5
      0 0.03125 0.06250 0.00000
      1 0.00000 0.18750 0.06250
    
    , , cyl = 6
    
       gear
    am        3       4       5
      0 0.06250 0.06250 0.00000
      1 0.00000 0.06250 0.03125
    
    , , cyl = 8
    
       gear
    am        3       4       5
      0 0.37500 0.00000 0.00000
      1 0.00000 0.00000 0.06250
    
  • Flat Contingency Table
  • In the previous example, a better approach would be to create a flat contingency table..

    tabla <- ftable(mtcars[, c("am", "gear", "cyl")])
    prop.table(tabla)
    
            cyl       4       6       8
    am gear                            
    0  3        0.03125 0.06250 0.37500
       4        0.06250 0.06250 0.00000
       5        0.00000 0.00000 0.00000
    1  3        0.00000 0.00000 0.00000
       4        0.18750 0.06250 0.00000
       5        0.06250 0.03125 0.06250
    
  • Percentage table
  • We can use the function round.

    round(prop.table(tabla)*100, 2)
    
             cyl     4     6     8
    am gear                      
    0  3         3.12  6.25 37.50
       4         6.25  6.25  0.00
       5         0.00  0.00  0.00
    1  3         0.00  0.00  0.00
       4        18.75  6.25  0.00
       5         6.25  3.12  6.25
    
    round(prop.table(tabla, 1)*100, 2) # By row, am y gear.
    
            cyl     4     6     8
    am gear                      
    0  3         6.67 13.33 80.00
       4        50.00 50.00  0.00
       5          NaN   NaN   NaN
    1  3          NaN   NaN   NaN
       4        75.00 25.00  0.00
       5        40.00 20.00 40.00
    
    round(prop.table(tabla, 2)*100, 2) # By column, cyl
    
            cyl     4     6     8
    am gear                      
    0  3         9.09 28.57 85.71
       4        18.18 28.57  0.00
       5         0.00  0.00  0.00
    1  3         0.00  0.00  0.00
       4        54.55 28.57  0.00
       5        18.18 14.29 14.29
    

References

2019-06-21

Contingency tables in R

Problem

We want to create a contingency table for one or multiple variables.

Solution

  • One variable
  • table(mtcars$am)
    
     0  1 
    19 13 
    
  • Two variables
  • table(mtcars$am, mtcars$gear)
    
         3  4  5
      0 15  4  0
      1  0  8  5
    
    If we want to include the names of the variables:

    table(mtcars[, c("am", "gear")]) 
    tabla <- table(mtcars[, 9:10])
    # or the argument dnn:
    table(mtcars$am, mtcars$gear, dnn = c("am", "gear"))
    
       gear
    am   3  4  5
      0 15  4  0
      1  0  8  5
    
  • Three variables
  • table(mtcars[, c("am", "gear", "cyl")])
    
    , , cyl = 4
    
       gear
    am   3  4  5
      0  1  2  0
      1  0  6  2
    
    , , cyl = 6
    
       gear
    am   3  4  5
      0  2  2  0
      1  0  2  1
    
    , , cyl = 8
    
       gear
    am   3  4  5
      0 12  0  0
      1  0  0  2
    
  • Flat contingency tables
  • In the previous example, a better approach would be to create a flat contingency table.

    ftable(mtcars[, c("am", "gear", "cyl")])
    
            cyl  4  6  8
    am gear             
    0  3         1  2 12
       4         2  2  0
       5         0  0  0
    1  3         0  0  0
       4         6  2  0
       5         2  1  2
    
    We use the arguments row.vars and col.vars to provide the numbers or names of the variables to be used for the rows and columns of the flat contingency table. If neither of these two is given, the last variable is used for the columns. In our example the variable cyl.

    ftable(mtcars[, c("am", "gear", "cyl")], col.vars = c(1, 2))
    
         am    0        1      
        gear  3  4  5  3  4  5
    cyl                       
    4         1  2  0  0  6  2
    6         2  2  0  0  2  1
    8        12  0  0  0  0  2
    

Alternative

The function xtabs creates contingency tables using a formula interface, each variable separated by +.

# One variable
xtabs(~ am, mtcars)
# Two variables
xtabs(~ am + gear, mtcars)
# Three variables
xtabs(~ am + gear + cyl, mtcars)
# Flat contingency table
ftable(xtabs(~ am + gear + cyl, mtcars))

Related posts

2019-06-18

How to draw square cells with geom_tile in ggplot2

Problem

In the following plot created using geom_tile we have rectangular cells. How can we draw squared cells instead?

set.seed(1)
df <- data.frame(val = rnorm(100), 
                 gene = rep(letters[1:20], 5), 
                 cell = c(sapply(LETTERS[1:5], 
                                 function(l) rep(l, 20))))
library(ggplot2)
ggplot(df, aes(y = gene, x = cell, fill = val)) +
  geom_tile(color = "white")

Soluction

We add coord_fixed() or coord_equal( ):

The default, ratio = 1, ensures that one unit on the x-axis is the same length as one unit on the y-axis.

ggplot(df, aes(y = gene, x = cell, fill = val)) +
  geom_tile(color = "white") +
  coord_fixed() # or coord_equal()

References

Related posts

Nube de datos