2018-11-10

Extraer información de la OMDb API con R

Problema

Queremos extraer con R información sobre películas o series de televisión usando la OMDb API.

Solución

Empleamos el paquete imdbapi que nos permite extraer dicha información. Si utilizamos la versión gratuita, tendremos una limitación de 1.000 peticiones al día. El paquete imdbapi nos permite:

  • Búsqueda por imdb id
  • library(imdbapi)
    find_by_id("tt0107692", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678")
    
    # A tibble: 2 x 25
      Title Year  Rated Released   Runtime Genre Director Writer Actors Plot 
                          
    1 Ninj~ 1993  NOT ~ 1993-06-05 94 min  Anim~ Yoshiak~ Yoshi~ Kôich~ A Jo~
    2 Ninj~ 1993  NOT ~ 1993-06-05 94 min  Anim~ Yoshiak~ Yoshi~ Kôich~ A Jo~
    # ... with 15 more variables: Language , Country , Awards ,
    #   Poster , Ratings , Metascore , imdbRating ,
    #   imdbVotes , imdbID , Type , DVD , BoxOffice ,
    #   Production , Website , Response 
    
  • Búsqueda por título
  • find_by_title("vertigo", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678")
    
    # A tibble: 3 x 25
      Title Year  Rated Released   Runtime Genre Director Writer Actors Plot 
                          
    1 Vert~ 1958  PG    1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
    2 Vert~ 1958  PG    1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
    3 Vert~ 1958  PG    1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
    # ... with 15 more variables: Language , Country , Awards ,
    #   Poster , Ratings , Metascore , imdbRating ,
    #   imdbVotes , imdbID , Type , DVD , BoxOffice ,
    #   Production , Website , Response 
    
  • Además cuenta con diferentes funciones para extraer una información específica: actores, países, directores, géneros o escritores.
  • get_actors(find_by_title("vertigo", api_key = "12345678"))
    [1] "James Stewart"      "Kim Novak"          "Barbara Bel Geddes"
    [4] "Tom Helmore"  
    get_countries(find_by_title("vertigo", api_key = "12345678"))
    [1] "USA"
    get_directors(find_by_title("vertigo", api_key = "12345678"))
    [1] "Alfred Hitchcock"
    get_genres(find_by_title("vertigo", api_key = "12345678"))
    [1] "Mystery"  "Romance"  "Thriller"
    get_writers(find_by_title("vertigo", api_key = "12345678"))
    [1] "Alec Coppel (screenplay by)"                                  
    [2] "Samuel A. Taylor (screenplay by)"                             
    [3] "Pierre Boileau (based on the novel \"D'Entre Les Morts\" by)" 
    [4] "Thomas Narcejac (based on the novel \"D'Entre Les Morts\" by)"
    
  • Cargar la imagen del póster
  • library(RCurl)
    df <- find_by_title("Batman Ninja", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678")
    plot(0:1,
         0:1,
         type = "n",
         ann = FALSE,
         axes = FALSE)
    my_image <-  readJPEG(getURLContent(df$Poster[1]))
    rasterImage(my_image, 0, 0, 1, 1)
    

    Notas

    Al inspeccionar las funciones anteriores que comienzan por get, podemos ver que simplemente extraen un subconjunto de datos del objeto omdb generado por la búsqueda por imdb id o título.

    function (omdb) 
    {
      if (!inherits(omdb, "omdb")) {
        message("get_actors() expects an omdb object")
        return(NULL)
      }
      if ("Actors" %in% names(omdb)) {
        str_split(omdb$Actors, ",[ ]*")[[1]]
      }
    }
    
    Cada petición con find_by_id o find_by_title, generará una fila por cada página de las valoraciones (Ratings) disponibles encontradas. Por ejemplo, Vértigo devolverá 3 filas con las valoraciones de IMDb, Rotten Tomatoes y Metacritic. En cambio Ninja Scroll o Batman Ninja solamente devolverá dos filas: IMDb y Rotten Tomatoes. Las variables extraídas son:

    Classes ‘omdb’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  25 variables:
     $ Title     : chr  "Batman Ninja" "Batman Ninja"
     $ Year      : chr  "2018" "2018"
     $ Rated     : chr  "PG-13" "PG-13"
     $ Released  : Date, format: "2018-04-24" "2018-04-24"
     $ Runtime   : chr  "85 min" "85 min"
     $ Genre     : chr  "Animation, Action" "Animation, Action"
     $ Director  : chr  "Junpei Mizusaki" "Junpei Mizusaki"
     $ Writer    : chr  "Kazuki Nakashima (screenplay), Leo Chu (English screenplay), Eric Garcia (English screenplay), Bob Kane (charac"| __truncated__ "Kazuki Nakashima (screenplay), Leo Chu (English screenplay), Eric Garcia (English screenplay), Bob Kane (charac"| __truncated__
     $ Actors    : chr  "Kôichi Yamadera, Wataru Takagi, Ai Kakuma, Rie Kugimiya" "Kôichi Yamadera, Wataru Takagi, Ai Kakuma, Rie Kugimiya"
     $ Plot      : chr  "Batman, along with a number of his allies and adversaries, finds himself transplanted from modern Gotham City to feudal Japan." "Batman, along with a number of his allies and adversaries, finds himself transplanted from modern Gotham City to feudal Japan."
     $ Language  : chr  "Japanese, English" "Japanese, English"
     $ Country   : chr  "Japan, USA" "Japan, USA"
     $ Awards    : chr  "N/A" "N/A"
     $ Poster    : chr  "https://m.media-amazon.com/images/M/MV5BYmFhYzZhYzgtZjZiYS00NWEwLWFhYTUtN2UxM2FmYzdhNDUyXkEyXkFqcGdeQXVyNDk2Nzc"| __truncated__ "https://m.media-amazon.com/images/M/MV5BYmFhYzZhYzgtZjZiYS00NWEwLWFhYTUtN2UxM2FmYzdhNDUyXkEyXkFqcGdeQXVyNDk2Nzc"| __truncated__
     $ Ratings   :List of 2
      ..$ :List of 2
      .. ..$ Source: chr "Internet Movie Database"
      .. ..$ Value : chr "5.7/10"
      ..$ :List of 2
      .. ..$ Source: chr "Rotten Tomatoes"
      .. ..$ Value : chr "79%"
     $ Metascore : chr  "N/A" "N/A"
     $ imdbRating: num  5.7 5.7
     $ imdbVotes : num  9759 9759
     $ imdbID    : chr  "tt7451284" "tt7451284"
     $ Type      : chr  "movie" "movie"
     $ DVD       : Date, format: "2018-05-08" "2018-05-08"
     $ BoxOffice : chr  "N/A" "N/A"
     $ Production: chr  "DC Comics" "DC Comics"
     $ Website   : chr  "N/A" "N/A"
     $ Response  : chr  "True" "True"
    

    Entradas relacionadas

    Referencias

    No hay comentarios:

    Publicar un comentario

    Nube de datos