2020-03-05

Descriptive statistics by group in R

Title

Problem

We'd like to report descriptive statistics in R by a grouping variable and subsetting the output statistics.

Solution

We will use the data frame iris, columns Sepal.Length and Sepal.Width and grouping by Species. In our example, we want to return the mean, the standard deviation, the skewness and kurtosis.

  • Subset of descriptive statistics by group
  • library(psych)
    # Variables by index
    d <- describeBy(iris[1:2], group = iris$Species)
    # Two options to subset the statistics:
    lapply(d, "[", , c(3, 4, 11, 12))
    lapply(d, subset, , c(3, 4, 11, 12)) 
    
    # Variables by name
    i <- match(c("Sepal.Length", "Petal.Length"), names(iris))
    d <- describeBy(iris[i], group = iris$Species)
    lapply(d, subset, , c("mean", "sd", "skew", "kurtosis")) 
    
    $setosa
                 mean   sd skew kurtosis
    Sepal.Length 5.01 0.35 0.11    -0.45
    Sepal.Width  3.43 0.38 0.04     0.60
    
    $versicolor
                 mean   sd  skew kurtosis
    Sepal.Length 5.94 0.52  0.10    -0.69
    Sepal.Width  2.77 0.31 -0.34    -0.55
    
    $virginica
                 mean   sd skew kurtosis
    Sepal.Length 6.59 0.64 0.11    -0.20
    Sepal.Width  2.97 0.32 0.34     0.38
    
  • Subset of descriptive statistics without grouping
  • # Seleccionamos las columnas deseadas de la tabla
    d <- describe(iris[1:2])
    # Subsetting output statistics
    d[, c(3, 4, 11, 12)]
    
                 mean   sd skew kurtosis
    Sepal.Length 5.84 0.83 0.31    -0.61
    Sepal.Width  3.06 0.44 0.31     0.14
    

    References

    No hay comentarios:

    Publicar un comentario

    Nube de datos