2021-01-29

Cómo crear una pirámide de población con ggplot2

Title

Problema

Queremos crear una pirámide de población con ggplot2. Nuestro ejemplo es la población española por edad y sexo en 2020.

Ejemplo

# A tibble: 42 x 3
   Age      Gender   Total
           
 1 "0-4 "   male   1018039
 2 "0-4 "   female  963653
 3 "5-9"    male   1196380
 4 "5-9"    female 1129063
 5 "10-14"  male   1297635
 6 "10-14"  female 1225863
 7 "15-19 " male   1232566
 8 "15-19 " female 1156455
 9 "20-24 " male   1207902
10 "20-24 " female 1152765
# ... with 32 more rows

df <- structure(list(Age = c("0-4 ", "0-4 ", "5-9", "5-9", "10-14", 
"10-14", "15-19 ", "15-19 ", "20-24 ", "20-24 ", "25-29 ", "25-29 ", 
"30-34 ", "30-34 ", "35-39 ", "35-39 ", "40-44 ", "40-44 ", "45-49 ", 
"45-49 ", "50-54 ", "50-54 ", "55-59 ", "55-59 ", "60-64 ", "60-64 ", 
"65-69 ", "65-69 ", "70-74 ", "70-74 ", "75-79 ", "75-79 ", "80-84 ", 
"80-84 ", "85-89 ", "85-89 ", "90-94 ", "90-94 ", "95-99 ", "95-99 ", 
"100+", "100+"), Gender = c("male", "female", "male", "female", 
"male", "female", "male", "female", "male", "female", "male", 
"female", "male", "female", "male", "female", "male", "female", 
"male", "female", "male", "female", "male", "female", "male", 
"female", "male", "female", "male", "female", "male", "female", 
"male", "female", "male", "female", "male", "female", "male", 
"female", "male", "female"), Total = c(1018039L, 963653L, 1196380L, 
1129063L, 1297635L, 1225863L, 1232566L, 1156455L, 1207902L, 1152765L, 
1308197L, 1275776L, 1421558L, 1417845L, 1702135L, 1688865L, 2024303L, 
1971909L, 1968659L, 1926866L, 1828015L, 1840434L, 1652558L, 1712299L, 
1410111L, 1502563L, 1153768L, 1270544L, 1020478L, 1191698L, 773823L, 
974046L, 513692L, 759379L, 361702L, 634714L, 133032L, 302885L, 
27305L, 84007L, 3732L, 13576L)), class = "data.frame", row.names = c(NA, 
-42L))
tibble(df)

Solución

Primero necesitamos crear dos columnas. En la primera cambiamos el signo de la población masculina para mostrarla invertida en el gráfico. En la segunda calculamos el porcentaje de población por edad y sexo.

df <- df %>%
  group_by(Gender) %>% 
  mutate(
    Population = ifelse(Gender == "female", Total,-Total),
    Percent = ifelse(Gender == "female", 100 * (Total / sum(Total)),-100 * (Total / sum(Total))))
df$Age <- factor(df$Age, levels=unique(df$Age)) # Para mantener el orden original de vector carácter

  • Totales
  • ggplot(df, aes(x = Age, Population, fill = Gender)) + 
      geom_bar(data = filter(df, Gender == "female"), stat = "identity") + 
      geom_bar(data = filter(df, Gender == "male"),  stat = "identity") + 
      scale_y_continuous(breaks = seq(-2000000, 2000000, 500000), labels = comma(abs(seq(-2000000, 2000000, 500000))))+
      coord_flip()
    

  • Porcentajes
  • ggplot(df, aes(x = Age, Percent, fill = Gender)) +
      geom_bar(data = filter(df, Gender == "female"), stat = "identity") +
      geom_bar(data = filter(df, Gender == "male"),  stat = "identity") +
      scale_y_continuous(breaks = seq(-10, 10, 2), labels = comma(abs(seq(-10, 10, 2)))) +
      coord_flip()
    

    Referencias

    2021-01-27

    How to create a population pyramid with ggplot2

    Title

    Problem

    We want to create a population pyramid with ggplot2. Our example is the Spanish population by age and gender in 2020.

    Example

    # A tibble: 42 x 3
       Age      Gender   Total
               
     1 "0-4 "   male   1018039
     2 "0-4 "   female  963653
     3 "5-9"    male   1196380
     4 "5-9"    female 1129063
     5 "10-14"  male   1297635
     6 "10-14"  female 1225863
     7 "15-19 " male   1232566
     8 "15-19 " female 1156455
     9 "20-24 " male   1207902
    10 "20-24 " female 1152765
    # ... with 32 more rows
    

    df <- structure(list(Age = c("0-4 ", "0-4 ", "5-9", "5-9", "10-14", 
    "10-14", "15-19 ", "15-19 ", "20-24 ", "20-24 ", "25-29 ", "25-29 ", 
    "30-34 ", "30-34 ", "35-39 ", "35-39 ", "40-44 ", "40-44 ", "45-49 ", 
    "45-49 ", "50-54 ", "50-54 ", "55-59 ", "55-59 ", "60-64 ", "60-64 ", 
    "65-69 ", "65-69 ", "70-74 ", "70-74 ", "75-79 ", "75-79 ", "80-84 ", 
    "80-84 ", "85-89 ", "85-89 ", "90-94 ", "90-94 ", "95-99 ", "95-99 ", 
    "100+", "100+"), Gender = c("male", "female", "male", "female", 
    "male", "female", "male", "female", "male", "female", "male", 
    "female", "male", "female", "male", "female", "male", "female", 
    "male", "female", "male", "female", "male", "female", "male", 
    "female", "male", "female", "male", "female", "male", "female", 
    "male", "female", "male", "female", "male", "female", "male", 
    "female", "male", "female"), Total = c(1018039L, 963653L, 1196380L, 
    1129063L, 1297635L, 1225863L, 1232566L, 1156455L, 1207902L, 1152765L, 
    1308197L, 1275776L, 1421558L, 1417845L, 1702135L, 1688865L, 2024303L, 
    1971909L, 1968659L, 1926866L, 1828015L, 1840434L, 1652558L, 1712299L, 
    1410111L, 1502563L, 1153768L, 1270544L, 1020478L, 1191698L, 773823L, 
    974046L, 513692L, 759379L, 361702L, 634714L, 133032L, 302885L, 
    27305L, 84007L, 3732L, 13576L)), class = "data.frame", row.names = c(NA, 
    -42L))
    tibble(df)
    

    Solution

    First we need to create two columns. One to convert male population to negative so it is reversed in the plot. Another column with the population percentage by age and sex.

    df <- df %>%
      group_by(Gender) %>% 
      mutate(
        Population = ifelse(Gender == "female", Total,-Total),
        Percent = ifelse(Gender == "female", 100 * (Total / sum(Total)),-100 * (Total / sum(Total))))
    df$Age <- factor(df$Age, levels=unique(df$Age)) # To keep the original order of the character vector
    

  • Absolute numbers
  • ggplot(df, aes(x = Age, Population, fill = Gender)) + 
      geom_bar(data = filter(df, Gender == "female"), stat = "identity") + 
      geom_bar(data = filter(df, Gender == "male"),  stat = "identity") + 
      scale_y_continuous(breaks = seq(-2000000, 2000000, 500000), labels = comma(abs(seq(-2000000, 2000000, 500000))))+
      coord_flip()
    

  • Percentage
  • ggplot(df, aes(x = Age, Percent, fill = Gender)) +
      geom_bar(data = filter(df, Gender == "female"), stat = "identity") +
      geom_bar(data = filter(df, Gender == "male"),  stat = "identity") +
      scale_y_continuous(breaks = seq(-10, 10, 2), labels = comma(abs(seq(-10, 10, 2)))) +
      coord_flip()
    

    References

    2021-01-06

    How to insert a picture in an Excel comment

    Title

    Problem

    We want to insert a picture in an Excel comment

    Solution

    1. Select cell, right click, and Insert Comment.
    2. Right click on the border of the comment and clic on Format Comment.
    3. On the Colors and Lines tab, click on the drop-down arrow for Color and select Fill Effects.
    4. On the picture tab, click on Select Picture.
    5. Select the picture, insert, and click OK.
    6. Resize the picture by draggin the border of the comment.
    7. Hide if you only want to show the picture when you hover over the cell.

    Related Posts

    2021-01-04

    Cómo insertar una imagen en comentario de una celda en Excel

    Title

    Problema

    Queremos insertar una imagen en el comentario de una celda en Excel

    Solución

    1. Seleccionamo una celda, clic con el botón derecho e insertamos un comentario.
    2. Clic con el botón derecho en el borde del comentario y luego en Formato de comentario.
    3. En la pestaña Colores y líneas, clic en el desplegable color y selecciona Efectos de relleno.
    4. En la pestaña imagen, clic en Seleccionar Imagen.
    5. Seleccionamos la imagen, insertar, y aceptar.
    6. Ajustamos el tamaño de la imagen arrastrando los bordes del comentario.
    7. Ocultamos el comentario si solamente queremos que aparezca al pasar el cursor por encima.

    Entradas relacionadas

    2021-01-01

    Cómo crear una serie temporal con intervalos de 30 minutos

    Title

    Problema

    Queremos crear una serie temporal de intervalos de 30 minutos.

    Ejemplo

    [1] "2017-01-01 00:00:00 UTC"
    [2] "2017-01-01 00:30:00 UTC"
    [3] "2017-01-01 01:00:00 UTC"
    [4] "2017-01-01 01:30:00 UTC"
    [5] "2017-01-01 02:00:00 UTC"
    [6] "2017-01-01 02:30:00 UTC"

    Solución

    Usamos la función seq y especificando los minutos en el argumento by, así como el uso horario "UTC". Tecleamos ?seq.POSIXt para obtener más detalles sobre el argumento que podemos especificar como cadena sde texto:

    A character string, containing one of "sec", "min", "hour", "day", "DSTday", "week", "month", "quarter" or "year". This can optionally be preceded by a (positive or negative) integer and a space, or followed by "s".

    seq(as.POSIXct("2017-01-01", tz = "UTC"),
        as.POSIXct("2017-01-02", tz = "UTC"),
        by = "30 min")
    

    Referencias

    Nube de datos