Python Project

Netflix Movies and Tv Shows

My Role
Data Analyst
Timeline
April 2023 - May 2023

Netflix Movies and Tv Shows

All the Python scripts for this project can be found here

A Tableau storyboard with the full analysis is available here

An interactive dashboard is available here

Overview

Netflix is one of the most popular streaming platforms in the world, with over 8,000 titles accessible, categorized into movies and TV shows, and over 200 million users worldwide.

Goal

The goal of the analysis is to help Netflix determine what aspects contribute to the success of a movie or TV program.

Tools

  • Tableau (dashboard and storyboard)
  • Python (Jupyter, Anaconda, pandas, numpy, seaborn, matplotlib, folium, sklearn, pylab)

Data

The dataset for this project is open-source and can be downloaded here. It was gathered from JustWatch in March 2023, and it contains data available in the United States.

Data Limitations

The data set had a lot of missing information; therefore, handling the incomplete data was fairly challenging. I decided to drop a potentially valuable column (age certification) since there were too many missing values (2743).

Skills applied

  • Exploratory analysis through visualizations (scatterplots, correlation heat maps, pair plots and categorical plots)
  • Geospatial analysis using a shapefile
  • Regression analysis
  • Cluster analysis
  • Time-series analysis
  • Data Storytelling with Tableau

Business questions

  • Which countries produce the highest-rated titles on Netflix?
  • What is the correlation between the number of reviews and ratings on Netflix?
  • What are the most popular genres on Netflix?
  • Which production countries have the most content on Netflix?
  • What is the average duration of movies and TV shows on Netflix?

1. Data exploration

First and foremost, I created a heatmap to better understand the correlation of all numerical variables. As we can see, there are mostly very weak correlations, negative correlations, or no correlations at all. The only positive correlation seems to be between the IMDb score and the TMDb score.

What are the most popular genres on Netflix is one of the business questions to be answered.

According to the visualization above, the most popular Netflix genres are:

  • Sci-Fi
  • Fantasy
  • Action
  • Horror
  • Sport

2. Geospacial Analysis

Some pertinent business questions to address include which nations create the highest-rated Netflix titles and which have the most content.

As we can see from the visualization above, the countries that produce the most popular titles are the United States, Mexico, Columbia, the UK, Norway, Italy, Poland, Russia, Korea, Japan, and New Zealand.

Instead, the most popular movies and TV shows are made in the United States and the United Kingdom.

Finally, the countries that produce the most content are:

  • The United States (2122 titles)
  • India (640 titles)
  • UK (304 titles)

3. Linear regression

I performed linear regression to test my hypotheses:

  • The most well-liked titles also have the greatest number of reviews.
  • The highest-rated titles are also the most popular ones.

Therefore, I examined the correlations between TMDb popularity and scores and IMDb votes and scores.

In both cases, the correlation value is almost zero, indicating a very weak relationship between the variables.

3. Cluster analysis

In all the graphs, the pink cluster (zero) outperformed all the others. We can notice an interesting pattern: the most popular titles were published most recently (after 2010).

The profile for each cluster is shown below:

4. Summary & Recommendations

The linear regression revealed that the relationships between ratings, popularity, and the number of votes are weak, suggesting that the factors responsible for the success of a movie or TV show should be investigated further.

Next Steps

Since there were a lot of missing values in the data set, it would be useful to look for a more complete one. After that, other factors, such as age certification, must be analyzed as well.

Recommendations

In conclusion, I would recommend Netflix create new content rather than suggest old selections, take into account the most popular genres, such as Sci-Fi, Fantasy, and Action, and support the creation of new content from the nations that create the most popular titles.