2019-06-29

Transforming contingency tables into frequency tables in R

Problem

We want to tranform a contingency table into a frequency table in R.

# Contingency table
tbl <- table (mtcars[, c("am", "gear")])
   gear
am   3  4  5
  0 15  4  0
  1  0  8  5

Frequency tables

We transform the contingency table into a data frame.

df <- as.data.frame(tbl)
df
  am gear Freq
1  0    3   15
2  1    3    0
3  0    4    4
4  1    4    8
5  0    5    0
6  1    5    5

Transforming frequency tables in contingency tables

To transform a frequency table back to a contingency table.

ftable(xtabs(Freq ~ am + gear, data = df)) 
   gear  3  4  5
am              
0       15  4  0
1        0  8  5
It is the equivalent of:

ftable(mtcars[, c("am", "gear")])

References

2019-06-28

Proportion tables in R

Problem

We want to create proportion tables for one or multiple variables.

Solution

  • One variable
  • tabla <- table(mtcars$am)
    prop.table(tabla)
    
          0       1 
    0.59375 0.40625
    
  • Two variables
  • tabla <- table(mtcars[, c("am", "gear")])
    prop.table(tabla)
    
       gear
    am        3       4       5
      0 0.46875 0.12500 0.00000
      1 0.00000 0.25000 0.15625
    
    The prop.table function has two arguments:

    • x, table created with the function table
    • margin, with three possible values:
    •   Null - x/sum(x) default like in the previous example.
        1 - proportion calculated by rows.
        2 - proportion calculated by columns.

    # By row
    prop.table(tabla, 1)
    
       gear
    am          3         4         5
      0 0.7894737 0.2105263 0.0000000
      1 0.0000000 0.6153846 0.3846154
    
    # By column
    prop.table(tabla, 2)
    
       gear
    am          3         4         5
      0 1.0000000 0.3333333 0.0000000
      1 0.0000000 0.6666667 1.0000000
    
  • Three variables
  • tabla <- table(mtcars[, c("am", "gear", "cyl")])
    prop.table(tabla)
    
    , , cyl = 4
    
       gear
    am        3       4       5
      0 0.03125 0.06250 0.00000
      1 0.00000 0.18750 0.06250
    
    , , cyl = 6
    
       gear
    am        3       4       5
      0 0.06250 0.06250 0.00000
      1 0.00000 0.06250 0.03125
    
    , , cyl = 8
    
       gear
    am        3       4       5
      0 0.37500 0.00000 0.00000
      1 0.00000 0.00000 0.06250
    
  • Flat Contingency Table
  • In the previous example, a better approach would be to create a flat contingency table..

    tabla <- ftable(mtcars[, c("am", "gear", "cyl")])
    prop.table(tabla)
    
            cyl       4       6       8
    am gear                            
    0  3        0.03125 0.06250 0.37500
       4        0.06250 0.06250 0.00000
       5        0.00000 0.00000 0.00000
    1  3        0.00000 0.00000 0.00000
       4        0.18750 0.06250 0.00000
       5        0.06250 0.03125 0.06250
    
  • Percentage table
  • We can use the function round.

    round(prop.table(tabla)*100, 2)
    
             cyl     4     6     8
    am gear                      
    0  3         3.12  6.25 37.50
       4         6.25  6.25  0.00
       5         0.00  0.00  0.00
    1  3         0.00  0.00  0.00
       4        18.75  6.25  0.00
       5         6.25  3.12  6.25
    
    round(prop.table(tabla, 1)*100, 2) # By row, am y gear.
    
            cyl     4     6     8
    am gear                      
    0  3         6.67 13.33 80.00
       4        50.00 50.00  0.00
       5          NaN   NaN   NaN
    1  3          NaN   NaN   NaN
       4        75.00 25.00  0.00
       5        40.00 20.00 40.00
    
    round(prop.table(tabla, 2)*100, 2) # By column, cyl
    
            cyl     4     6     8
    am gear                      
    0  3         9.09 28.57 85.71
       4        18.18 28.57  0.00
       5         0.00  0.00  0.00
    1  3         0.00  0.00  0.00
       4        54.55 28.57  0.00
       5        18.18 14.29 14.29
    

References

2019-06-21

Contingency tables in R

Problem

We want to create a contingency table for one or multiple variables.

Solution

  • One variable
  • table(mtcars$am)
    
     0  1 
    19 13 
    
  • Two variables
  • table(mtcars$am, mtcars$gear)
    
         3  4  5
      0 15  4  0
      1  0  8  5
    
    If we want to include the names of the variables:

    table(mtcars[, c("am", "gear")]) 
    tabla <- table(mtcars[, 9:10])
    # or the argument dnn:
    table(mtcars$am, mtcars$gear, dnn = c("am", "gear"))
    
       gear
    am   3  4  5
      0 15  4  0
      1  0  8  5
    
  • Three variables
  • table(mtcars[, c("am", "gear", "cyl")])
    
    , , cyl = 4
    
       gear
    am   3  4  5
      0  1  2  0
      1  0  6  2
    
    , , cyl = 6
    
       gear
    am   3  4  5
      0  2  2  0
      1  0  2  1
    
    , , cyl = 8
    
       gear
    am   3  4  5
      0 12  0  0
      1  0  0  2
    
  • Flat contingency tables
  • In the previous example, a better approach would be to create a flat contingency table.

    ftable(mtcars[, c("am", "gear", "cyl")])
    
            cyl  4  6  8
    am gear             
    0  3         1  2 12
       4         2  2  0
       5         0  0  0
    1  3         0  0  0
       4         6  2  0
       5         2  1  2
    
    We use the arguments row.vars and col.vars to provide the numbers or names of the variables to be used for the rows and columns of the flat contingency table. If neither of these two is given, the last variable is used for the columns. In our example the variable cyl.

    ftable(mtcars[, c("am", "gear", "cyl")], col.vars = c(1, 2))
    
         am    0        1      
        gear  3  4  5  3  4  5
    cyl                       
    4         1  2  0  0  6  2
    6         2  2  0  0  2  1
    8        12  0  0  0  0  2
    

Alternative

The function xtabs creates contingency tables using a formula interface, each variable separated by +.

# One variable
xtabs(~ am, mtcars)
# Two variables
xtabs(~ am + gear, mtcars)
# Three variables
xtabs(~ am + gear + cyl, mtcars)
# Flat contingency table
ftable(xtabs(~ am + gear + cyl, mtcars))

Related posts

2019-06-18

How to draw square cells with geom_tile in ggplot2

Problem

In the following plot created using geom_tile we have rectangular cells. How can we draw squared cells instead?

set.seed(1)
df <- data.frame(val = rnorm(100), 
                 gene = rep(letters[1:20], 5), 
                 cell = c(sapply(LETTERS[1:5], 
                                 function(l) rep(l, 20))))
library(ggplot2)
ggplot(df, aes(y = gene, x = cell, fill = val)) +
  geom_tile(color = "white")

Soluction

We add coord_fixed() or coord_equal( ):

The default, ratio = 1, ensures that one unit on the x-axis is the same length as one unit on the y-axis.

ggplot(df, aes(y = gene, x = cell, fill = val)) +
  geom_tile(color = "white") +
  coord_fixed() # or coord_equal()

References

Related posts

2019-06-10

Solving a system of linear equations in R

Problem

We want to solve a system of linear equations in R.

2x + 3y + 3z = 20
  x + 4y + 3z = 15
5x + 3y + 4z = 30

Solution

We use the function solve.

a <- rbind(c(2, 3, 3), 
           c(1, 4, 3), 
           c(5, 3, 4))
b <- c(20, 15, 30)
solve(a, b)
[1] -1.25 -6.25 13.75
If we'd like the resulst in fractions, we use the function fractions from the package MASS.

library(MASS)
fractions(solve(a, b))
[1]  -5/4 -25/4  55/4

Related posts

Spanish version

2019-06-06

Spacing between legend keys in ggplot2

Problem

We want to add some spacing between the elements of the legend to the following plot using ggplot2.

library(tidyverse)
mtcars %>%
  mutate(transmission = ifelse(am, "manual", "automatic")) %>%
  ggplot() +
  aes(x = transmission, fill = transmission) +
  geom_bar() +
  labs(fill = NULL) +
  theme(
    #legend.spacing.x = unit(.5, "char"), # adds spacing to the left too
    legend.position = "top",
    legend.justification = c(0, 0),
    legend.title = element_blank(),
    legend.margin = margin(c(5, 5, 5, 0))
  )

Solution

  • First option: spacing between the 4 elements
  • We use legend.spacing.x to add the spacing between the 4 elements the legend's key and text.

    mtcars %>%
      mutate(transmission = ifelse(am, "manual", "automatic")) %>%
      ggplot() +
      aes(x = transmission, fill = transmission) +
      geom_bar() +
      labs(fill = NULL) +
      theme(
        legend.spacing.x = unit(.5, "char"),
        legend.position = "top",
        legend.justification = c(0, 0),
        legend.title = element_blank(),
        legend.margin = margin(c(5, 5, 5, 0)))
    
  • Second option: spacing between the 2 groups automatic and manual
  • We pass the argument margin to he element_text function to add spacing only between the 2 groups, the two types of transmissions automatic and manual, between the text automatic and the key (blue square) of manual

    mtcars %>%
      mutate(transmission = ifelse(am, "manual", "automatic")) %>%
      ggplot() +
      aes(x = transmission, fill = transmission) +
      geom_bar() +
      labs(fill = NULL) +
      theme(legend.position = "top",
        legend.justification = c(0, 0),
        legend.title = element_blank(),
        legend.margin = margin(c(5, 5, 5, 0)),
        legend.text = element_text(margin = margin(r = 10, unit = "pt")))
    

    Related posts

    References

    2019-06-04

    How to calculate elapsed and remaining days between dates in R

    Problem

    We would like to calculate the number of elapsed and remaining days in a year.

    Solution

  • From a starting date in the year
  • date<- as.Date("2019-03-29")
    # Elapsed days
    date - as.Date(ISOdate(format(Sys.Date(), "%Y"), 1, 1)) + 1
    
    Time difference of 88 days
    
    # Remaining days
    as.Date(ISOdate(format(Sys.Date(), "%Y"), 12, 31)) - date
    
    Time difference of 277 days
    
  • From today
  • # Elapsed days
    Sys.Date() - as.Date(ISOdate(format(Sys.Date(), "%Y"), 1, 1)) + 1 
    
    Time difference of 153 days
    
    # Remaining days
    as.Date(ISOdate(format(Sys.Date(), "%Y"), 12, 31)) - Sys.Date() 
    
    Time difference of 212 days
    

    Related entries

    Nube de datos