Nube de datos: English

Mostrando entradas con la etiqueta English. Mostrar todas las entradas

2021-03-22

Ignore zero values in conditional formatting in Excel

Title

Problem

We want conditional formatting in Excel to ignore cells containing zero values. In the example below, after applying colour scales in conditional formatting, we can see that Excel applies red formatting to cells with zero values.

Solution

To make conditional formatting ignore zero values, we follow these steps:

Create a new conditional formatting rule.

Apply no format to cells with value equal to 0.

Make sure that this rule is the first to be evaluated, and select the Stop If True check box.

Results

Formato condicional

References

Use conditional formatting to highlight information

2021-03-09

How to highlight specific data points in a scatter plot with ggplot2

Problem

We would like to highlight specific data points in a scatter plot created with ggplot2. In our example, we will use the mpg dataset (Fuel economy data from 1999 and 2008 for 38 popular models of car). We want to highlight in red and a larger point size those data points whose displ (engine displacement) are greater than 5 and hwy (highway miles per gallon) greater than 20.

Solution

Option 1

We add the conditions for both attributes, colour and size, inside geom_point. Then we control manually those using scale_colour_manual and scale_size_manual respectively. Finally, we remove the legend.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, colour = displ > 5 & hwy > 20, size = displ > 5 & hwy > 20)) + 
  scale_colour_manual(values = c("black", "red")) + 
  scale_size_manual(values =c(1.5, 3))+
  theme(legend.position = "none")

Option 2

We create two layers of data points using geom_point. The first layer include all data points in black (by default but e). The second layer adds the points we would like to highlight in red with a larger point size.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), colour= "black") +
  geom_point(data = subset(mpg, displ > 5 & hwy > 20), aes(x = displ, y = hwy), colour= "red", size = 3)

In ggplot2 the layers are added sequentially. If we change the order of the layers, we would get the following result.

ggplot(data = mpg) + 
   geom_point(data = subset(mpg, displ > 5 & hwy > 20), aes(x = displ, y = hwy), colour= "red", size = 3) +
   geom_point(mapping = aes(x = displ, y = hwy), colour= "black")

2021-03-04

How to filter multiple values on a column in R

Problem

We want to filter multiple values on a column in R. In our example, we want to subset the rows containing the string Tom or Lynn for the column name.

Example

  days  name
1   88  Lynn
2   11   Tom
3    2 Chris
4    5  Lisa
5   22  Kyla
6    1   Tom
7  222  Lynn
8    2  Lynn

df <-
  data.frame(
    days = c(88, 11, 2, 5, 22, 1, 222, 2),
    name = c("Lynn", "Tom", "Chris", "Lisa", "Kyla", "Tom", "Lynn", "Lynn")
  )

Solution

Base package

df[df$name %in% c("Tom", "Lynn"), ] # or
target <- c("Tom", "Lynn")
df[df$name %in% target, ]

  days name
1   88 Lynn
2   11  Tom
6    1  Tom
7  222 Lynn
8    2 Lynn

dplyr

library(dplyr)
filter(df, name %in% c("Tom", "Lynn")) # or
target <- c("Tom", "Lynn")
filter(df, name %in% target)

data.table

library(data.table)
DT <- data.table(df)
DT[name %in% target]

sqldf

library(sqldf)
# Two alternatives:
sqldf('SELECT *
      FROM df 
      WHERE name = "Tom" OR name = "Lynn"')
sqldf('SELECT *
      FROM df 
      WHERE name IN ("Tom", "Lynn")')

Filter rows containing a certain string in R

References

stackoverflow

2021-03-01

How to draw a stratified sample in R

Problem

We want to draw a stratified sample from a data frame in R.

Solution

Let's look at two examples, with one or several groups.

One group

We extract 3 records from each of the species: setosa, versicolor y virginica.

Base package

set.seed(1)
iris1 <- lapply(split(iris, iris$Species), function(x) x[sample(nrow(x), 3), ])
do.call("rbind", iris1)

              Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
setosa.14              4.3         3.0          1.1         0.1     setosa
setosa.19              5.7         3.8          1.7         0.3     setosa
setosa.28              5.2         3.5          1.5         0.2     setosa
versicolor.96          5.7         3.0          4.2         1.2 versicolor
versicolor.60          5.2         2.7          3.9         1.4 versicolor
versicolor.94          5.0         2.3          3.3         1.0 versicolor
virginica.148          6.5         3.0          5.2         2.0  virginica
virginica.133          6.4         2.8          5.6         2.2  virginica
virginica.131          7.4         2.8          6.1         1.9  virginica

dplyr

library(dplyr)
set.seed(1)
iris %>%
  group_by(Species) %>%
  sample_n(., 3)

 Source: local data frame [9 x 5]
Groups: Species

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          4.3         3.0          1.1         0.1     setosa
2          5.7         3.8          1.7         0.3     setosa
3          5.2         3.5          1.5         0.2     setosa
4          5.7         3.0          4.2         1.2 versicolor
5          5.2         2.7          3.9         1.4 versicolor
6          5.0         2.3          3.3         1.0 versicolor
7          6.5         3.0          5.2         2.0  virginica
8          6.4         2.8          5.6         2.2  virginica
9          7.4         2.8          6.1         1.9  virginica

Two groups

For each number of cylinders (4, 6 u 8) we will extract 2 records with automatic transmission (am = 0) and 2 with manual transmission (am = 1).

Base package

set.seed(1)
mtcars1 <- lapply(split(mtcars, list(mtcars$cyl, mtcars$am)), function(x) x[sample(nrow(x), 2), ])
do.call("rbind", mtcars1)

                       mpg cyl  disp  hp drat    wt  qsec vs am gear carb
0.4.Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
0.4.Toyota Corona     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
1.4.Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
1.4.Lotus Europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
0.6.Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
0.6.Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
1.6.Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
1.6.Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
0.8.Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
0.8.Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
1.8.Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
1.8.Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

dplyr

set.seed(1)
mtcars %>%
  group_by(cyl, am) %>%
  sample_n(., 2)

Source: local data frame [12 x 11]
Groups: cyl, am

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
2  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
3  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
4  30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
5  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
6  19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
7  19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
8  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
9  14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
10 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
11 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
12 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

How to select a random sample in R

References

stackoverflow

2021-02-09

How to set tick labels to the edges of a continuous ggplot2 legend

Title

Problem

We want to show the minimum and maximum values in the legend of our plot in ggplot2.

library(ggplot2)
d <- subset(diamonds, price >= 257 & price <= 8888)
ggplot(d, aes(depth, carat, colour = price)) +
  geom_point() +
  scale_colour_gradient(limits = c(257, 8888))

The legend shows values from 2.000 al 8.000 but we do not know the minimum and the maximum values.

Solution

Within scale_colour_gradient, we pass the arguments breaks and labels.

ggplot(d, aes(depth, carat, colour = price)) +
  geom_point() +
  scale_colour_gradient(limits = c(257, 8888), 
                        breaks = c(257, 2000, 4000, 6000, 8000, 8888),
                        labels = c(257, 2000, 4000, 6000, 8000, 8888))

Now we have added to the legend the minimum 257 and the maximum 8.888 values.

Notes

Instead of using the default colour scale colours from light blue to dark blue, we can adjust colour choices with the arguments low and high.

ggplot(d, aes(depth, carat, colour = price)) +
  geom_point() +
  scale_colour_gradient(limits = c(257, 8888), 
                        breaks=c(257, 2000, 4000, 6000, 8000, 8888),
                        labels=c(257, 2000, 4000, 6000, 8000, 8888), low = "red")

References

2021-01-06

How to insert a picture in an Excel comment

Title

Problem

We want to insert a picture in an Excel comment

Solution

Select cell, right click, and Insert Comment.

Right click on the border of the comment and clic on Format Comment.

On the Colors and Lines tab, click on the drop-down arrow for Color and select Fill Effects.

On the picture tab, click on Select Picture.

Select the picture, insert, and click OK.
Resize the picture by draggin the border of the comment.
Hide if you only want to show the picture when you hover over the cell.

Nube de datos

2021-03-22

Ignore zero values in conditional formatting in Excel

Problem

Solution

Results

Related posts

References

2021-03-09

How to highlight specific data points in a scatter plot with ggplot2

Problem

Solution

2021-03-04

How to filter multiple values on a column in R

Problem

Example

Solution

Related posts

References

2021-03-01

How to draw a stratified sample in R

Problem

Solution

Related posts

References

2021-02-09

How to set tick labels to the edges of a continuous ggplot2 legend

Problem

Solution

Notes

References

2021-01-06

How to insert a picture in an Excel comment

Problem

Solution

Related Posts