2020-04-22

Extract movie info using the OMDb API in R

Problem

We want to extract movie information using the OMDb API in R.

Solution

We use the imdbapi package. If we use the free version, the maximum number of requests per day is 1,000. We need to request an API key here. The package allows:

  • Retrieve info by imdb id
  • library(imdbapi)
    find_by_id("tt0107692", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678") # Ninja Scroll imdb id: tt0107692
    
    # A tibble: 2 x 25
      Title Year  Rated Released   Runtime Genre Director Writer Actors Plot 
                          
    1 Ninj~ 1993  NOT ~ 1993-06-05 94 min  Anim~ Yoshiak~ Yoshi~ Kôich~ A Jo~
    2 Ninj~ 1993  NOT ~ 1993-06-05 94 min  Anim~ Yoshiak~ Yoshi~ Kôich~ A Jo~
    # ... with 15 more variables: Language , Country , Awards ,
    #   Poster , Ratings , Metascore , imdbRating ,
    #   imdbVotes , imdbID , Type , DVD , BoxOffice ,
    #   Production , Website , Response 
    
  • Search by title
  • find_by_title("vertigo", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678")
    
    # A tibble: 3 x 25
      Title Year  Rated Released   Runtime Genre Director Writer Actors Plot 
                          
    1 Vert~ 1958  PG    1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
    2 Vert~ 1958  PG    1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
    3 Vert~ 1958  PG    1958-07-21 128 min Myst~ Alfred ~ "Alec~ James~ "Joh~
    # ... with 15 more variables: Language , Country , Awards ,
    #   Poster , Ratings , Metascore , imdbRating ,
    #   imdbVotes , imdbID , Type , DVD , BoxOffice ,
    #   Production , Website , Response 
    
  • Additionally the package includes specific functions to extract info about actors, countries, directors, genres and writers.
  • get_actors(find_by_title("vertigo", api_key = "12345678"))
    [1] "James Stewart"      "Kim Novak"          "Barbara Bel Geddes"
    [4] "Tom Helmore"  
    get_countries(find_by_title("vertigo", api_key = "12345678"))
    [1] "USA"
    get_directors(find_by_title("vertigo", api_key = "12345678"))
    [1] "Alfred Hitchcock"
    get_genres(find_by_title("vertigo", api_key = "12345678"))
    [1] "Mystery"  "Romance"  "Thriller"
    get_writers(find_by_title("vertigo", api_key = "12345678"))
    [1] "Alec Coppel (screenplay by)"                                  
    [2] "Samuel A. Taylor (screenplay by)"                             
    [3] "Pierre Boileau (based on the novel \"D'Entre Les Morts\" by)" 
    [4] "Thomas Narcejac (based on the novel \"D'Entre Les Morts\" by)"
    
  • Load poster image
  • library(RCurl)
    df <- find_by_title("Batman Ninja", type = NULL, year_of_release = NULL, plot = "full", include_tomatoes = FALSE, api_key = "12345678")
    plot(0:1,
         0:1,
         type = "n",
         ann = FALSE,
         axes = FALSE)
    my_image <-  readJPEG(getURLContent(df$Poster[1]))
    rasterImage(my_image, 0, 0, 1, 1)
    

    Notes

    By inspecting the previous functions starting with get, we can see that these are wrappers subetting the info returned by the functions find_by_title or find_by_id.

    function (omdb) 
    {
      if (!inherits(omdb, "omdb")) {
        message("get_actors() expects an omdb object")
        return(NULL)
      }
      if ("Actors" %in% names(omdb)) {
        str_split(omdb$Actors, ",[ ]*")[[1]]
      }
    }
    
    Every request returned by find_by_id o find_by_title, will generate a row for each rating availables. For instance, Vertigo will return 3 rating rows: IMDb, Rotten Tomatoes and Metacritic. Whereas Ninja Scroll or Batman Ninja will return only 2 available ratings: IMDb and Rotten Tomatoes. The variables are:

    Classes ‘omdb’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  25 variables:
     $ Title     : chr  "Batman Ninja" "Batman Ninja"
     $ Year      : chr  "2018" "2018"
     $ Rated     : chr  "PG-13" "PG-13"
     $ Released  : Date, format: "2018-04-24" "2018-04-24"
     $ Runtime   : chr  "85 min" "85 min"
     $ Genre     : chr  "Animation, Action" "Animation, Action"
     $ Director  : chr  "Junpei Mizusaki" "Junpei Mizusaki"
     $ Writer    : chr  "Kazuki Nakashima (screenplay), Leo Chu (English screenplay), Eric Garcia (English screenplay), Bob Kane (charac"| __truncated__ "Kazuki Nakashima (screenplay), Leo Chu (English screenplay), Eric Garcia (English screenplay), Bob Kane (charac"| __truncated__
     $ Actors    : chr  "Kôichi Yamadera, Wataru Takagi, Ai Kakuma, Rie Kugimiya" "Kôichi Yamadera, Wataru Takagi, Ai Kakuma, Rie Kugimiya"
     $ Plot      : chr  "Batman, along with a number of his allies and adversaries, finds himself transplanted from modern Gotham City to feudal Japan." "Batman, along with a number of his allies and adversaries, finds himself transplanted from modern Gotham City to feudal Japan."
     $ Language  : chr  "Japanese, English" "Japanese, English"
     $ Country   : chr  "Japan, USA" "Japan, USA"
     $ Awards    : chr  "N/A" "N/A"
     $ Poster    : chr  "https://m.media-amazon.com/images/M/MV5BYmFhYzZhYzgtZjZiYS00NWEwLWFhYTUtN2UxM2FmYzdhNDUyXkEyXkFqcGdeQXVyNDk2Nzc"| __truncated__ "https://m.media-amazon.com/images/M/MV5BYmFhYzZhYzgtZjZiYS00NWEwLWFhYTUtN2UxM2FmYzdhNDUyXkEyXkFqcGdeQXVyNDk2Nzc"| __truncated__
     $ Ratings   :List of 2
      ..$ :List of 2
      .. ..$ Source: chr "Internet Movie Database"
      .. ..$ Value : chr "5.7/10"
      ..$ :List of 2
      .. ..$ Source: chr "Rotten Tomatoes"
      .. ..$ Value : chr "79%"
     $ Metascore : chr  "N/A" "N/A"
     $ imdbRating: num  5.7 5.7
     $ imdbVotes : num  9759 9759
     $ imdbID    : chr  "tt7451284" "tt7451284"
     $ Type      : chr  "movie" "movie"
     $ DVD       : Date, format: "2018-05-08" "2018-05-08"
     $ BoxOffice : chr  "N/A" "N/A"
     $ Production: chr  "DC Comics" "DC Comics"
     $ Website   : chr  "N/A" "N/A"
     $ Response  : chr  "True" "True"
    

    References

    No hay comentarios:

    Publicar un comentario

    Nube de datos