Problema
Queremos extraer con R información sobre películas o series de televisión usando la OMDb API.
Solución
Empleamos el paquete imdbapi que nos permite extraer dicha información. Si utilizamos la versión gratuita, tendremos una limitación de 1.000 peticiones al día. El paquete imdbapi nos permite:
- Búsqueda por imdb id
library(imdbapi)
find_by_id("tt0107692", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678")
# A tibble: 2 x 25
Title Year Rated Released Runtime Genre Director Writer Actors Plot
1 Ninj~ 1993 NOT ~ 1993-06-05 94 min Anim~ Yoshiak~ Yoshi~ Kôich~ A Jo~
2 Ninj~ 1993 NOT ~ 1993-06-05 94 min Anim~ Yoshiak~ Yoshi~ Kôich~ A Jo~
# ... with 15 more variables: Language , Country , Awards ,
# Poster , Ratings , Metascore , imdbRating ,
# imdbVotes , imdbID , Type , DVD , BoxOffice ,
# Production , Website , Response
find_by_title("vertigo", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678")
# A tibble: 3 x 25
Title Year Rated Released Runtime Genre Director Writer Actors Plot
1 Vert~ 1958 PG 1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
2 Vert~ 1958 PG 1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
3 Vert~ 1958 PG 1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
# ... with 15 more variables: Language , Country , Awards ,
# Poster , Ratings , Metascore , imdbRating ,
# imdbVotes , imdbID , Type , DVD , BoxOffice ,
# Production , Website , Response
get_actors(find_by_title("vertigo", api_key = "12345678"))
[1] "James Stewart" "Kim Novak" "Barbara Bel Geddes"
[4] "Tom Helmore"
get_countries(find_by_title("vertigo", api_key = "12345678"))
[1] "USA"
get_directors(find_by_title("vertigo", api_key = "12345678"))
[1] "Alfred Hitchcock"
get_genres(find_by_title("vertigo", api_key = "12345678"))
[1] "Mystery" "Romance" "Thriller"
get_writers(find_by_title("vertigo", api_key = "12345678"))
[1] "Alec Coppel (screenplay by)"
[2] "Samuel A. Taylor (screenplay by)"
[3] "Pierre Boileau (based on the novel \"D'Entre Les Morts\" by)"
[4] "Thomas Narcejac (based on the novel \"D'Entre Les Morts\" by)"
library(RCurl)
df <- find_by_title("Batman Ninja", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678")
plot(0:1,
0:1,
type = "n",
ann = FALSE,
axes = FALSE)
my_image <- readJPEG(getURLContent(df$Poster[1]))
rasterImage(my_image, 0, 0, 1, 1)
Notas
Al inspeccionar las funciones anteriores que comienzan por get, podemos ver que simplemente extraen un subconjunto de datos del objeto omdb generado por la búsqueda por imdb id o título.
function (omdb)
{
if (!inherits(omdb, "omdb")) {
message("get_actors() expects an omdb object")
return(NULL)
}
if ("Actors" %in% names(omdb)) {
str_split(omdb$Actors, ",[ ]*")[[1]]
}
}
Cada petición con find_by_id o find_by_title, generará una fila por cada página de las valoraciones (Ratings) disponibles encontradas. Por ejemplo, Vértigo devolverá 3 filas con las valoraciones de IMDb, Rotten Tomatoes y Metacritic. En cambio Ninja Scroll o Batman Ninja solamente devolverá dos filas: IMDb y Rotten Tomatoes. Las variables extraídas son:
Classes ‘omdb’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 25 variables:
$ Title : chr "Batman Ninja" "Batman Ninja"
$ Year : chr "2018" "2018"
$ Rated : chr "PG-13" "PG-13"
$ Released : Date, format: "2018-04-24" "2018-04-24"
$ Runtime : chr "85 min" "85 min"
$ Genre : chr "Animation, Action" "Animation, Action"
$ Director : chr "Junpei Mizusaki" "Junpei Mizusaki"
$ Writer : chr "Kazuki Nakashima (screenplay), Leo Chu (English screenplay), Eric Garcia (English screenplay), Bob Kane (charac"| __truncated__ "Kazuki Nakashima (screenplay), Leo Chu (English screenplay), Eric Garcia (English screenplay), Bob Kane (charac"| __truncated__
$ Actors : chr "Kôichi Yamadera, Wataru Takagi, Ai Kakuma, Rie Kugimiya" "Kôichi Yamadera, Wataru Takagi, Ai Kakuma, Rie Kugimiya"
$ Plot : chr "Batman, along with a number of his allies and adversaries, finds himself transplanted from modern Gotham City to feudal Japan." "Batman, along with a number of his allies and adversaries, finds himself transplanted from modern Gotham City to feudal Japan."
$ Language : chr "Japanese, English" "Japanese, English"
$ Country : chr "Japan, USA" "Japan, USA"
$ Awards : chr "N/A" "N/A"
$ Poster : chr "https://m.media-amazon.com/images/M/MV5BYmFhYzZhYzgtZjZiYS00NWEwLWFhYTUtN2UxM2FmYzdhNDUyXkEyXkFqcGdeQXVyNDk2Nzc"| __truncated__ "https://m.media-amazon.com/images/M/MV5BYmFhYzZhYzgtZjZiYS00NWEwLWFhYTUtN2UxM2FmYzdhNDUyXkEyXkFqcGdeQXVyNDk2Nzc"| __truncated__
$ Ratings :List of 2
..$ :List of 2
.. ..$ Source: chr "Internet Movie Database"
.. ..$ Value : chr "5.7/10"
..$ :List of 2
.. ..$ Source: chr "Rotten Tomatoes"
.. ..$ Value : chr "79%"
$ Metascore : chr "N/A" "N/A"
$ imdbRating: num 5.7 5.7
$ imdbVotes : num 9759 9759
$ imdbID : chr "tt7451284" "tt7451284"
$ Type : chr "movie" "movie"
$ DVD : Date, format: "2018-05-08" "2018-05-08"
$ BoxOffice : chr "N/A" "N/A"
$ Production: chr "DC Comics" "DC Comics"
$ Website : chr "N/A" "N/A"
$ Response : chr "True" "True"
Entradas relacionadas
- Importar datos en R: tablas HTML
- Importar datos en R: tablas HTML gráficos ggplot2
- Importar HTML en R: Películas más taquilleras IMDb
- Cómo obtener (scrape) el presupuesto de una película IMDd con rvest
Referencias
No hay comentarios:
Publicar un comentario