2019-02-08

Calcular y representar la duración del día con R

Problema

Queremos calcular y representar la duración del día con R en función de unas coordinadas geográficas.

Solución

  • 1. Calculamos la salida y puesta de sol
  • Primero calculamos la salida y la puesta de sol con la función getSunlightTimes del paquete suncalc. Indicamos el intervalo deseado, las coordinadas (latitud y longitud), y el huso horario (tz, time zone) correspondiente.

    library(suncalc) 
    library(tidyverse)
    library(scales)
    df <-
      getSunlightTimes(
        date = seq.Date(as.Date("2017-12-01"), as.Date("2018-12-31"), by = 1),
        keep = c("sunrise", "sunriseEnd", "sunset", "sunsetStart"),
        lat = 39.8628,
        lon = 4.0273,
        tz = "CET"
      )
    
  • 2. Gráfico de la salida y puesta de sol
  • Necesitamos manipular los datos originales para calcular la diferencia entre el inicio del día, y la salida y puesta de sol. Después representamos las dos nuevas variables usando geom_ribbon de ggplot2. Luego personalizamos los ejes y el título.

    # Amanecer/ocaso
    df %>%
      mutate(
        date = as.POSIXct(date) - 12 * 60 * 60 ,
        sunrise = sunrise - date,
        sunset =  sunset - date,
      ) %>%
      ggplot() +
      geom_ribbon(aes(x = date, ymin = sunrise, ymax = sunset),
                  fill = "#FDE725FF",
                  alpha = .8) + # "#ffeda0"
      scale_x_datetime(
        breaks = seq(as.POSIXct(min(df$date)), as.POSIXct(max(df$date)), "month"),
        expand = c(0, 0),
        labels = date_format("%b %y"),
        minor_breaks = NULL
      ) +
      scale_y_continuous(
        limits = c(0, 24),
        breaks = seq(0, 24, 2),
        expand = c(0, 0),
        minor_breaks = NULL
      ) +
      labs(
        x = "Date",
        y = "Hours",
        title = sprintf(
          "Sunrise and Sunset for %s\n%s ",
          "Toledo (Spain)",
          paste0(as.Date(range(df$date)), sep = " ", collapse = "to ")
        )
      ) +
      theme(
        panel.background = element_rect(fill = "#180F3EFF"),
        panel.grid = element_line(colour = "grey", linetype = "dashed")
      )
    
  • 3. Duración del día
  • Muy similar al gráfico anterior. Ahora solamente necesitamos calcular la duración del día day_length y representar los resultados con geom_area and geom_line.

    df %>%
      mutate(
        date = as.POSIXct(date),
        day_length = as.numeric(sunset - sunrise)
      ) %>%
      ggplot(aes(x = date, y = day_length)) +
      geom_area(fill = "#FDE725FF", alpha = .4) +
      geom_line(color = "#525252") +
      scale_x_datetime(
        expand = c(0, 0),
        labels = date_format("%b '%y"),
        breaks =  seq(as.POSIXct(min(df$date)), as.POSIXct(max(df$date)), "month"),
        minor_breaks = NULL
      ) +
      scale_y_continuous(
        limits = c(0, 24),
        breaks = seq(0, 24, 2),
        expand = c(0, 0),
        minor_breaks = NULL
      ) +
      labs(x = "Date", y = "Hours", title = "Toledo (Spain) - Daytime duration") +
      theme_bw()
    

    Entradas relacionadas

    Referencias

    2019-02-04

    Calculate and plot sunrise and sunset times with R

    Problem

    We would like to calculate and plot the sunrise and sunset times based on any location's latitude and longitude coordinates with R.

    Solution

  • 1. Compute sunrise and sunset times
  • First we calculate the sunrise and sunset times using the function getSunlightTimes from the package suncalc. We pass the desired date interval, the appropiate latitude and longitude coordinates, and time zone (tz).

    library(suncalc) 
    library(tidyverse)
    library(scales)
    df <-
      getSunlightTimes(
        date = seq.Date(as.Date("2017-12-01"), as.Date("2018-12-31"), by = 1),
        keep = c("sunrise", "sunriseEnd", "sunset", "sunsetStart"),
        lat = 39.8628,
        lon = 4.0273,
        tz = "CET"
      )
    
  • 2. Sunrise and sunset times plot
  • We need to manipulate the original data frame to calculate the difference between midnight start of day, and the sunrise and sunset times. Then we plot those two new variables using geom_ribbon from ggplot2. We further customize the axes, and title.

    # Sunrise/set
    df %>%
      mutate(
        date = as.POSIXct(date) - 12 * 60 * 60 ,
        sunrise = sunrise - date,
        sunset =  sunset - date,
      ) %>%
      ggplot() +
      geom_ribbon(aes(x = date, ymin = sunrise, ymax = sunset),
                  fill = "#FDE725FF",
                  alpha = .8) + # "#ffeda0"
      scale_x_datetime(
        breaks = seq(as.POSIXct(min(df$date)), as.POSIXct(max(df$date)), "month"),
        expand = c(0, 0),
        labels = date_format("%b %y"),
        minor_breaks = NULL
      ) +
      scale_y_continuous(
        limits = c(0, 24),
        breaks = seq(0, 24, 2),
        expand = c(0, 0),
        minor_breaks = NULL
      ) +
      labs(
        x = "Date",
        y = "Hours",
        title = sprintf(
          "Sunrise and Sunset for %s\n%s ",
          "Toledo (Spain)",
          paste0(as.Date(range(df$date)), sep = " ", collapse = "to ")
        )
      ) +
      theme(
        panel.background = element_rect(fill = "#180F3EFF"),
        panel.grid = element_line(colour = "grey", linetype = "dashed")
      )
    
  • 3. Daytime duration
  • Very similar to the preceding plot. This time we only need to calculate the day_length and plot the results using geom_area and geom_line.

    df %>%
      mutate(
        date = as.POSIXct(date),
        day_length = as.numeric(sunset - sunrise)
      ) %>%
      ggplot(aes(x = date, y = day_length)) +
      geom_area(fill = "#FDE725FF", alpha = .4) +
      geom_line(color = "#525252") +
      scale_x_datetime(
        expand = c(0, 0),
        labels = date_format("%b '%y"),
        breaks =  seq(as.POSIXct(min(df$date)), as.POSIXct(max(df$date)), "month"),
        minor_breaks = NULL
      ) +
      scale_y_continuous(
        limits = c(0, 24),
        breaks = seq(0, 24, 2),
        expand = c(0, 0),
        minor_breaks = NULL
      ) +
      labs(x = "Date", y = "Hours", title = "Toledo (Spain) - Daytime duration") +
      theme_bw()
    

    Related posts

    References

    2019-01-31

    Cómo controlar la dispersión de puntos dentro de un diagrama de violín con ggplot2

    Problem

    Queremos controlar la dispersión de puntos dentro de un diagrama de violín con ggplot2.

    library(tidyverse)
    p <- ggplot(mpg, aes(class, hwy))
    p + geom_violin() + geom_jitter()
    

    Solución

    • Opción 1
    • Ampliamos la anchura del diagrama de violín (width = 1.3), y jugamos con la transparencia y la variación horizontal de geom_ jitter con (width = .02). No es una opción enteramente satisfactoria. Al restringir la variación horizontal de geom_ jitter, limitamos la propia finalidad de la función que es evitar la superposición de puntos.

      p + geom_violin(width = 1.3) + geom_jitter(alpha = 0.2, width = .02)
      
    • Opción 2
    • Empleamos la función geom_quasirandom del paquete geom_beeswarm:

      The quasirandom geom is a convenient means to offset points within categories to reduce overplotting. Uses the vipor package

      library(ggbeeswarm)
      p + geom_violin(width = 1.3) + geom_quasirandom(alpha = 0.2, width = 0.2)
      

    Entradas relacionadas

    Referencias

    How to restrain scattered jitter points within a violin plot using ggplot2

    Problem

    We would like to restrain the scattered jitter points within a violin plot using ggplot2.

    library(tidyverse)
    p <- ggplot(mpg, aes(class, hwy))
    p + geom_violin() + geom_jitter()
    

    Solution

    • Option 1
    • Not a completely satisfactory option, because by restricting the horizontal jitter we defeat the purpose of handling overplotting. But we can enlarge the width of the violin plots (width = 1.3), and play with alpha for transparency and limit the horizontal jitter (width = .02).

      p + geom_violin(width = 1.3) + geom_jitter(alpha = 0.2, width = .02)
      
    • Option 2
    • Using the function geom_quasirandom from package geom_beeswarm:

      The quasirandom geom is a convenient means to offset points within categories to reduce overplotting. Uses the vipor package

      library(ggbeeswarm)
      p + geom_violin(width = 1.3) + geom_quasirandom(alpha = 0.2, width = 0.2)
      

    Related posts

    References

    2019-01-18

    Cómo transponer un data frame en R

    Problema

    Queremos transponer un data frame.

    df <-
      structure(
        list(
          Country.Name = c("Country1", "Country2", "Country3"),
          `1997` = c(1L, 2L, 4L),
          `1998` = c(1L, 4L, 2L),
          `1999` = c(1L, 7L, 1L),
          `2000` = c(1L, 10L, 5L)
        ),
        .Names = c("Country.Name",
                   "1997", "1998", "1999", "2000"),
        class = "data.frame",
        row.names = c(NA,-3L)
      )
    
      Country.Name 1997 1998 1999 2000
    1     Country1    1    1    1    1
    2     Country2    2    4    7   10
    3     Country3    4    2    1    5
    

    Solución

    Empleamos la función t que transpone una matriz o data frame.

    # Transpone todas las columnas menos la primer
    df_transpose <- data.frame(t(df[-1]))
    # Añadimos los nombres de las columnas
    colnames(df_transpose) <- df[, 1]
    df_transpose
    
         Country1 Country2 Country3
    1997        1        2        4
    1998        1        4        2
    1999        1        7        1
    2000        1       10        5
    

    Entradas relacionadas

    Referenciass

    2019-01-16

    How do I transpose a data frame in R?

    Problem

    We need to transpose a data frame.

    df <-
      structure(
        list(
          Country.Name = c("Country1", "Country2", "Country3"),
          `1997` = c(1L, 2L, 4L),
          `1998` = c(1L, 4L, 2L),
          `1999` = c(1L, 7L, 1L),
          `2000` = c(1L, 10L, 5L)
        ),
        .Names = c("Country.Name",
                   "1997", "1998", "1999", "2000"),
        class = "data.frame",
        row.names = c(NA,-3L)
      )
    
      Country.Name 1997 1998 1999 2000
    1     Country1    1    1    1    1
    2     Country2    2    4    7   10
    3     Country3    4    2    1    5
    

    Solution

    The t function will return the transpose of a matrix or data frame.

    # Transpose all but the firs column
    df_transpose <- data.frame(t(df[-1]))
    # Add column names
    colnames(df_transpose) <- df[, 1]
    df_transpose
    
         Country1 Country2 Country3
    1997        1        2        4
    1998        1        4        2
    1999        1        7        1
    2000        1       10        5
    

    Related posts

    References

    2019-01-10

    Loops with ggplot2

    Problem

    We want to create a loop and save plots for each subset of data using ggplot2. Instead of plotting on the same panel using facet_wrap o facet_grid, we'd like to display and save eachplot separately.

    library(tidyverse)
    p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point()
    p + facet_wrap(vars(Species), scales = "free")
    

    Solution

    We create an empty list to store all plots. Then, we start a loop for each unique element of the variable (column) Species. To keep the same title format, we leave the function facet_wrap.

    # Loop
    plots <- list() # Empty list
    p_list <- unique(iris$Species)
    for (i in seq_along(p_list)) {
      # Plot for each Species
      p <- iris %>% filter(Species == p_list[i]) %>%
        ggplot(aes(Sepal.Length, Sepal.Width)) +
        geom_point() +
        facet_wrap( ~ Species) # Títulos
      plots[[i]] = p
      print(p)
    }
    
    To print the whole plot list or a specific element:

    # Print list
    print(plots)
    # Print an element of the list
    print(plots[[1]])
    

    Related posts

    References

    Nube de datos