Nube de datos: list

Mostrando entradas con la etiqueta list. Mostrar todas las entradas

2020-03-05

Descriptive statistics by group in R

Title

Problem

We'd like to report descriptive statistics in R by a grouping variable and subsetting the output statistics.

Solution

We will use the data frame iris, columns Sepal.Length and Sepal.Width and grouping by Species. In our example, we want to return the mean, the standard deviation, the skewness and kurtosis.

Subset of descriptive statistics by group

library(psych)
# Variables by index
d <- describeBy(iris[1:2], group = iris$Species)
# Two options to subset the statistics:
lapply(d, "[", , c(3, 4, 11, 12))
lapply(d, subset, , c(3, 4, 11, 12))

# Variables by name
i <- match(c("Sepal.Length", "Petal.Length"), names(iris))
d <- describeBy(iris[i], group = iris$Species)
lapply(d, subset, , c("mean", "sd", "skew", "kurtosis"))

$setosa
             mean   sd skew kurtosis
Sepal.Length 5.01 0.35 0.11    -0.45
Sepal.Width  3.43 0.38 0.04     0.60

$versicolor
             mean   sd  skew kurtosis
Sepal.Length 5.94 0.52  0.10    -0.69
Sepal.Width  2.77 0.31 -0.34    -0.55

$virginica
             mean   sd skew kurtosis
Sepal.Length 6.59 0.64 0.11    -0.20
Sepal.Width  2.97 0.32 0.34     0.38

Subset of descriptive statistics without grouping

# Seleccionamos las columnas deseadas de la tabla
d <- describe(iris[1:2])
# Subsetting output statistics
d[, c(3, 4, 11, 12)]

             mean   sd skew kurtosis
Sepal.Length 5.84 0.83 0.31    -0.61
Sepal.Width  3.06 0.44 0.31     0.14

References

Descriptive statistics in R

2019-01-10

Loops with ggplot2

Problem

We want to create a loop and save plots for each subset of data using ggplot2. Instead of plotting on the same panel using facet_wrap o facet_grid, we'd like to display and save eachplot separately.

library(tidyverse)
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point()
p + facet_wrap(vars(Species), scales = "free")

Solution

We create an empty list to store all plots. Then, we start a loop for each unique element of the variable (column) Species. To keep the same title format, we leave the function facet_wrap.

# Loop
plots <- list() # Empty list
p_list <- unique(iris$Species)
for (i in seq_along(p_list)) {
  # Plot for each Species
  p <- iris %>% filter(Species == p_list[i]) %>%
    ggplot(aes(Sepal.Length, Sepal.Width)) +
    geom_point() +
    facet_wrap( ~ Species) # Títulos
  plots[[i]] = p
  print(p)
}

To print the whole plot list or a specific element:

# Print list
print(plots)
# Print an element of the list
print(plots[[1]])

Spanish version: Usar bucles en ggplot2

References

Loops with ggplot2

2019-01-09

Usar bucles en ggplot2

Problema

Queremos crear y guardar gráficos separadamente con ggplot2. Es decir, en lugar de mostrar los gráficos en un mismo panel con facet_wrap o facet_grid, queremos crear gráficos independientes, cada uno en un panel y almacenarlos en una lista.

library(tidyverse)
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point()
p + facet_wrap(vars(Species), scales = "free")

Solución

Creamos una lista vacía donde almacenaremos los gráficos. Iniciamos un bucle generando uno gráfico con ggplot2 para cada elemento único de la lista de especies. Dejamos la función facet_wrap, aunque cada gráfico está en un único panel, para obtener el título de cada especie.

# Bucle
plots <- list() # Creamos una lista vacía
p_list <- unique(iris$Species)
for (i in seq_along(p_list)) {
  # Gráfico por especie
  p <- iris %>% filter(Species == p_list[i]) %>%
    ggplot(aes(Sepal.Length, Sepal.Width)) +
    geom_point() +
    facet_wrap( ~ Species) # Títulos
  plots[[i]] = p
  print(p)
}

Para volver a imprimir la lista completa de gráficos o uno específico.

# Imprime lista completa
print(plots)
# Imprime uno específico
print(plots[[1]])

Entradas relacionadas

Referencias

Loops with ggplot2

2015-07-21

Subconjunto de estadísticas descriptivas en R

Title

Problema

Deseamos calcular las estadísticas descriptivas para unas variables de un data frame. Esta vez, queremos seleccionar tanto las estadísticas como las variables de la tabla.

Solución

Utilizaremos el data frame iris, pero solamente para las columnas Sepal.Length y Sepal.Width. En nuestro ejemplo, calcularemos la media (mean), la desviación típica (SD), la asimetría (skewness) y la curtosis (kurtosis). Pero

Para una tabla

library("psych")
# Seleccionamos las columnas deseadas de la tabla
d <- describe(iris[1:2])
# Seleccionamos las estadísticas deseadas
d[, c(3, 4, 11, 12)]

             mean   sd skew kurtosis
Sepal.Length 5.84 0.83 0.31    -0.61
Sepal.Width  3.06 0.44 0.31     0.14

Estadísticas por grupo

# Variables por índice
d <- describeBy(iris[1:2], group = iris$Species)
# Dos opciones para seleccionar las estadísticas
# de la lista de data frames
lapply(d, "[", , c(3, 4, 11, 12))
lapply(d, subset, , c(3, 4, 11, 12))

# Variables por nombre
i <- match(c("Sepal.Length", "Petal.Length"), names(iris))
d <- describeBy(iris[i], group = iris$Species)
lapply(d, subset, , c("mean", "sd", "skew", "kurtosis"))

$setosa
             mean   sd skew kurtosis
Sepal.Length 5.01 0.35 0.11    -0.45
Sepal.Width  3.43 0.38 0.04     0.60

$versicolor
             mean   sd  skew kurtosis
Sepal.Length 5.94 0.52  0.10    -0.69
Sepal.Width  2.77 0.31 -0.34    -0.55

$virginica
             mean   sd skew kurtosis
Sepal.Length 6.59 0.64 0.11    -0.20
Sepal.Width  2.97 0.32 0.34     0.38

2020-03-05

Descriptive statistics by group in R

Problem

Solution

References

2019-01-10

Loops with ggplot2

Problem

Solution

Related posts

References

2019-01-09

Usar bucles en ggplot2

Problema

Solución

Entradas relacionadas

Referencias

2015-07-21

Subconjunto de estadísticas descriptivas en R

Problema

Solución

Referencias