Web Scraping (IMDB) Using Python

Main Article Content

Narendra Kumar Rao, Beebi Naseeba, Nagendra Panini Challa, S. Chakrvarthi

Abstract

Background: Web scraping is process of obtaining text information from web pages. Most of the analysis focusing web scraping  is about automated web data extraction. In process of the web data extraction, we first create a DOM tree and then extract the relevant data through this tree. The construction technique of DOM tree boosts the time cost depending on the design structure of  the DOM Tree. In this paper we accurately predict particular genre that is popular and well-versed in a particular year based on the analysis of data in real-time. For the latter, we are going to use IMDB website as a source and use python as a intermediary language to pre-process them and clean the data using built in libraries such as BeautifulSoup, HTTPAdaptor, Seaborn etc., and by using subtle python packages such as Numpy, Pandas ,MatPlotLib etc.., On addition to this we have also incorporated an API add-on: in either ways, An API is the carrier which hands over your requisition to the provider and then acknowledges the response to the user. Having all these done we will have various type of visualizations over different parameters and also based on requirement we add certain parameters to facilitate our need for typical analysis of provided data by providing the scores for each genre in each individual year.

Article Details

Section
Articles