2020-10-20

Cómo usar la primera fila como nombre de las columnas en R

Title

Problema

Queremos usar la primera fila como nombre de las columnas en R. En nuestro ejemplo, queremos reemplzar los nombres originales de la columna (V1, V2, etc) por los de la primera fila (col1, col2, etc).

    V1   V2   V3   V4   V5
1 col1 col2 col3 col4 col5
2 row1    2    4    5   56
3 row2   74   74    3  534
4 row3  865  768    8    7
5 row4   68   86   65   87
df <- read.table(text =  "V1    V2  V3  V4  V5
                        col1    col2    col3    col4 col5
                        row1    2   4   5   56
                        row2    74  74  3   534
                        row3    865 768 8   7
                        row4    68  86  65  87", header = TRUE )

Solución

  • Opción 1
  • colnames(df) <- df[1,]
    df <- df[-1, ] 
    

  • Opción 2
  • names(df) <- lapply(df[1, ], as.character)
    df <- df[-1,] 
    

    Resultado

      col1 col2 col3 col4 col5
    2 row1    2    4    5   56
    3 row2   74   74    3  534
    4 row3  865  768    8    7
    5 row4   68   86   65   87
    
    Si queremos reiniciar los nombres de fila:

    rownames(df) <- NULL
    
      col1 col2 col3 col4 col5
    1 row1    2    4    5   56
    2 row2   74   74    3  534
    3 row3  865  768    8    7
    4 row4   68   86   65   87
    

    References

    How to use the first row as column names in R

    Title

    Problem

    We want to use the first row as column names in R. In our example, we'd like to replace the original column names (V1, V2, etc.) for the first row (col1, col2, etc.).

        V1   V2   V3   V4   V5
    1 col1 col2 col3 col4 col5
    2 row1    2    4    5   56
    3 row2   74   74    3  534
    4 row3  865  768    8    7
    5 row4   68   86   65   87
    
    df <- read.table(text =  "V1    V2  V3  V4  V5
                            col1    col2    col3    col4 col5
                            row1    2   4   5   56
                            row2    74  74  3   534
                            row3    865 768 8   7
                            row4    68  86  65  87", header = TRUE )
    

    Solution

  • Option 1
  • colnames(df) <- df[1,]
    df <- df[-1, ] 
    

  • Option 2
  • names(df) <- lapply(df[1, ], as.character)
    df <- df[-1,] 
    

    Results

      col1 col2 col3 col4 col5
    2 row1    2    4    5   56
    3 row2   74   74    3  534
    4 row3  865  768    8    7
    5 row4   68   86   65   87
    
    If we want to reset row names:

    rownames(df) <- NULL
    
      col1 col2 col3 col4 col5
    1 row1    2    4    5   56
    2 row2   74   74    3  534
    3 row3  865  768    8    7
    4 row4   68   86   65   87
    

    References

    2020-10-07

    Filter rows containing a certain string in R

    Title

    Problem

    We want to filter rows containing a certain string in R. In our example, rows containing the string 'foo'.

    foo <- data.frame(Company = c("company1", "foo", "test", "food"), Metric = rnorm(4, 10))
    

       Company    Metric
    1 company1  7.590178
    2      foo  9.711493
    3     test 10.862799
    4     food  9.337434
    

    Solution

  • dplyr
  • Using the grepl function inside filter.

    library(dplyr)
    filter(foo, grepl("foo", Company))
    

      Company   Metric
    1     foo 9.711493
    2    food 9.337434
    
  • data.table
  • Another options is the function like from data.table with a similar syntax to SQL.

    library(data.table)
    DT <- data.table(foo)
    DT[Company %like% 'foo']
    
       Company   Metric
    1:     foo 9.711493
    2:    food 9.337434
    

    References

    Filtrar filas que contengan una cadena de texto en R

    Title

    Problema

    Queremos filtrar las filas de un data frame que contengan una determinada cadena de texto. En nuestro ejemplo las filas que contengan la cadena 'foo'.

    foo <- data.frame(Company = c("company1", "foo", "test", "food"), Metric = rnorm(4, 10))
    

       Company    Metric
    1 company1  7.590178
    2      foo  9.711493
    3     test 10.862799
    4     food  9.337434
    

    Solución

  • dplyr
  • Usamos la función grepl dentro de filter.

    library(dplyr)
    filter(foo, grepl("foo", Company))
    

      Company   Metric
    1     foo 9.711493
    2    food 9.337434
    
  • data.table
  • Otra opción sería usar la función like de data.table, con una sintaxis similar a SQL.

    library(data.table)
    DT <- data.table(foo)
    DT[Company %like% 'foo']
    
       Company   Metric
    1:     foo 9.711493
    2:    food 9.337434
    

    Referencias

    Nube de datos