Nube de datos: noviembre 2020

2020-11-28

How to get the Windows user name with VBA

Problem

We need to get the Windows user name using VBA. For example, in Excel we can display different info depending on the user opening the Excel file.

Solución

Open the Visual Basic Editor: press Alt+F11.
Copy the following subroutine into a module.

Sub user()
MsgBox Environ("UserName")
End Sub

Run the subroutines: press F5.
Excel will display a pop up message showing the Windows user name.

We can also type the following text in the Immediate window and press Enter.

?Environ("Username")

Or we can capture and print the user name in a cell to customize the information displayed in that sheet or workbook.

Sub user_cell()
Sheets("Hoja1").Range("A1") = Environ("UserName")
End Sub

References

Use Environment Variables in VBA

2020-11-27

Error: No Such Column using SQLDF

Title

Problem

When column names include dots, we get the following error: 'No Such Column using SQLDF'.

library(sqldf)
sqldf('SELECT Species, Sepal.Length 
      FROM iris 
      WHERE Species = "virginica" LIMIT 5')

Error: no such column: Sepal.Length

Solution

We only need to write the SQL statement between single quotes, and the column names including dots between double quotes or backticks/backquotes interchangeably.

sqldf('SELECT Species, "Sepal.Length" 
      FROM iris 
      WHERE Species = "virginica" LIMIT 5')
sqldf('SELECT Species, `Sepal.Length` 
      FROM iris 
      WHERE Species = "virginica" LIMIT 5')

    Species Sepal.Length
1 virginica          6.3
2 virginica          5.8
3 virginica          7.1
4 virginica          6.3
5 virginica          6.5

Notes

Previously we had to replace dots for underscores. This is no longer needed:

Staring with RSQLite 1.0.0 and sqldf 0.4-9 dots in column names are no longer translated to underscores.
If you are using an older version of these packages then note that since dot is an SQL operator the RSQLite driver package converts dots to underscores so that SQL statements can reference such columns unquoted.

References

2020-11-26

Cómo ocultar el contenido de una celda mediante formato condicional en Excel

Problema

Queremos ocultar el contenido de una celda en función del contenido de otra celda.

Solución

En A2 aplicamos el formato condicional que oculta el contenido de la celda si es igual a B2.

En Formato condicional creamos la siguiente regla: =$A$2=$B$2. Luego hacemos clic en Formato...

En el menú Formato de celdas, hacemos clic en la pestaña Número.
Debajo de Categoría, clic en Personalizada.
En la celda Tipo, tecleamos ;;; (es decir, tres puntos y coma seguidos), y clic en aceptar

Notas

¿Por qué necesitamos añadir tres puntos y comas en el formato de número personalizado? El formato personalizado está compuesto por 4 secciones separadas de tres puntos y comas, de manera que si solamente tecleamos los separadores (los tres puntos y comas), la celda aparecerá vacía.

Referencias

stackoverflow

2020-11-24

How to select all columns in dplyr

Title

Problem

We want to select all variables in a data frame using dplyr.

Solution

select(iris, everything())
# Using the pipe operator %>%
iris %>% select(everything())

Notes

The function select( ) subsets columns based on their names and other features. We can use the function everything( ) a selection helper to select all variables.

Results

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

References

How to aggregate multiple variables with different functions in R

Title

Problem

We want to group the following data frame by Branch, using different functions to summarize the results.

  Branch Loan_Amount TAT
1      A         100 2.0
2      A         120 4.0
3      A         300 9.0
4      B         150 1.5
5      B         200 2.0

This is the expected outcome. We want to group by Branch, count the Number_of_loans, and sum Loan_Amount and TAT.

  Branch Number_of_loans Loan_Amount  TAT
1      A               3         520 15.0
2      B               2         350  3.5

Data

df <- structure(list(Branch = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), Loan_Amount = c(100L, 120L, 300L, 150L, 
200L), TAT = c(2, 4, 9, 1.5, 2)), .Names = c("Branch", "Loan_Amount", 
"TAT"), class = "data.frame", row.names = c(NA, -5L))

Options

Base package

df1 <- aggregate(.~ Branch, df, FUN = "sum")
df2 <- setNames(aggregate(Loan_Amount~Branch, df, length)[2], c("Number_of_loans"))
cbind(df1, df2)

  Branch Loan_Amount  TAT Number_of_loans
1      A         520 15.0               3
2      B         350  3.5               2

dplyr

library(dplyr)
df %>% 
  group_by(Branch) %>% 
  summarise(Number_of_loans = n(),
            Loan_Amount = sum(Loan_Amount),
            TAT = sum(TAT))

Source: local data frame [2 x 4]

  Branch Number_of_loans Loan_Amount   TAT
  (fctr)           (int)       (int) (dbl)
1      A               3         520  15.0
2      B               2         350   3.5

sqldf

library(sqldf)
sqldf("SELECT Branch, 
              COUNT(Loan_Amount) Number_of_loans, 
              SUM(Loan_Amount) Loan_Amount, 
              SUM(TAT) TAT 
      FROM df 
      GROUP BY Branch")

  Branch Number_of_loans Loan_Amount  TAT
1      A               3         520 15.0
2      B               2         350  3.5

References

stackoverflow

2020-11-21

How to reorder all other columns in a data frame in dplyr

Title

Problem

In dplyr when using select, we want to reorder all other columns at the beginning or end of a data frame without having to type these column names again. In the following example, the data drame flights has 19 columns and we want to order 5 of them at the beginning or end of a data frame.

library(nycflights13)
library(dplyr)
head(flights)

# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
                                     
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      542            540         2      923            850
4  2013     1     1      544            545        -1     1004           1022
5  2013     1     1      554            600        -6      812            837
6  2013     1     1      554            558        -4      740            728
# ... with 11 more variables: arr_delay , carrier , flight ,
#   tailnum , origin , dest , air_time , distance ,
#   hour , minute , time_hour

Solution

5 columns at the beginning of the data frame.

col <- c("carrier", "tailnum", "year", "month", "day")
select(flights, one_of(col), everything())

# A tibble: 336,776 × 19
   carrier tailnum  year month   day dep_time sched_dep_time dep_delay arr_time
                                  
1       UA  N14228  2013     1     1      517            515         2      830
2       UA  N24211  2013     1     1      533            529         4      850
3       AA  N619AA  2013     1     1      542            540         2      923
4       B6  N804JB  2013     1     1      544            545        -1     1004
5       DL  N668DN  2013     1     1      554            600        -6      812
6       UA  N39463  2013     1     1      554            558        -4      740
7       B6  N516JB  2013     1     1      555            600        -5      913
8       EV  N829AS  2013     1     1      557            600        -3      709
9       B6  N593JB  2013     1     1      557            600        -3      838
10      AA  N3ALAA  2013     1     1      558            600        -2      753
# ... with 336,766 more rows, and 10 more variables: sched_arr_time ,
#   arr_delay , flight , origin , dest , air_time ,
#   distance , hour , minute , time_hour

5 columns at the end of the data frame.

select(flights, -one_of(col), one_of(col))

# A tibble: 336,776 × 19
   dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay flight
                                          
1       517            515         2      830            819        11   1545
2       533            529         4      850            830        20   1714
3       542            540         2      923            850        33   1141
4       544            545        -1     1004           1022       -18    725
5       554            600        -6      812            837       -25    461
6       554            558        -4      740            728        12   1696
7       555            600        -5      913            854        19    507
8       557            600        -3      709            723       -14   5708
9       557            600        -3      838            846        -8     79
10      558            600        -2      753            745         8    301
# ... with 336,766 more rows, and 12 more variables: origin , dest ,
#   air_time , distance , hour , minute , time_hour ,
#   carrier , tailnum , year , month , day

To add all columns

In this case we want to add all the data frame at the beginning or the end again, resulting in duplicates of our 5 columns.

# 5 columns at the beginning
bind_cols(select(flights, one_of(col)), flights)
# 5 columns at the end
bind_cols(flights, select(flights, one_of(col)))

References

stackoveflow

2020-11-20

How to change the width of error bars in ggplot2

Problem

We want to change the width of the error bars in a boxplot created with ggplot2. We use stat_boxplot(geom ='errorbar') to add error bars.

library(ggplot2)
ggplot(iris, aes(factor(Species), Sepal.Width, fill = Species)) +
    geom_boxplot() + 
    stat_boxplot(geom ='errorbar')

Solution

We use the argument width inside stat_boxplot.

library(ggplot2)
ggplot(iris, aes(factor(Species), Sepal.Width, fill = Species)) +
  geom_boxplot() + 
  stat_boxplot(geom ='errorbar', width = 0.5)

2020-11-11

How to create dummy variables in R

Title

Problem

We want to create dummy variables based on other variables in R. In our example based on variables Sex and Embarked.

  PassengerId Survived Pclass    Sex Age SibSp Parch    Fare Embarked Age.NA
1           1        0      3   male  22     1     0  7.2500        S      0
2           2        1      1 female  38     1     0 71.2833        C      0
3           3        1      3 female  26     0     0  7.9250        S      0
4           4        1      1 female  35     1     0 53.1000        S      0
5           5        0      3   male  35     0     0  8.0500        S      0
6           6        0      3   male  NA     0     0  8.4583        Q      1

df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L, 
0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L, 
1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"), 
    Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L, 
    1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25, 
    71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L, 
    1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"), 
    Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId", 
"Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", 
"Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5", 
"6"), class = "data.frame")

Solution

We use the function dummy.data.frame from the package dummies. By default it will expand dummy variables for character and factor classes.

library(dummies)
dummy.data.frame(df)

Results

The original columns Sex and Embarked are replaced by the dummy variable columns Sexfemale, Sexmale, EmbarkedC, EmbarkedQ and EmbarkedS.

  PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare
1           1        0      3         0       1  22     1     0  7.2500
2           2        1      1         1       0  38     1     0 71.2833
3           3        1      3         1       0  26     0     0  7.9250
4           4        1      1         1       0  35     1     0 53.1000
5           5        0      3         0       1  35     0     0  8.0500
6           6        0      3         0       1  NA     0     0  8.4583
  EmbarkedC EmbarkedQ EmbarkedS Age.NA
1         0         0         1      0
2         1         0         0      0
3         0         0         1      0
4         0         0         1      0
5         0         0         1      0
6         0         1         0      1

References

Package ‘dummies’