Problem
We need to read compressed files in R.
Solution
We will use the package readr:
Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed. Files starting with http://, https://, ftp://, or ftps:// will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed.
In our example we use the file title.ratings.tsv.gz.
library(readr)
df_ratings <- read_tsv('title.ratings.tsv.gz', na = "\\N", quote = '')
df_ratings %>% head()
We can provide the URL and it will be automatically downloaded and decompressed
df_ratings <- read_tsv('https://datasets.imdbws.com/title.ratings.tsv.gz', na = "\\N", quote = '')
df_ratings %>% head()
Results
# A tibble: 6 x 3
tconst averageRating numVotes
1 tt0000001 5.8 1423
2 tt0000002 6.4 168
3 tt0000003 6.6 1016
4 tt0000004 6.4 100
5 tt0000005 6.2 1713
6 tt0000006 5.5 88
No hay comentarios:
Publicar un comentario