An SQL analysis of movies produced between (1980 - 2020).
How AN ANALYSIS OF MOVIES PRODUCED BETWEEN (1980 - 2020) Made Me better at using SQL.
Introduction
I've decided to analyze movies produced between 1980 and 2020. I've gathered data from Kaggle, cleaned the data using Excel, and imported it into the PostgreSQL workbench.
This project mainly aims to use SQL to analyze the movie dataset and answer a few questions. Click movies to access the dataset. Note: All data querying was done using PostgreSQL workbench.
In this article, I will walk you through how I arrived at certain outcomes but before then grab your popcorn and enjoy the ride 😘.
Objectives
i. Using SQL to answer 15 questions
ii. Creating insights from findings.
Questions
a. What is the total gross generated from all the movies?
I used the SUM function to add the gross generated from all the movies.
Total gross generated = 587,105,546,272
b. What is the total revenue generated from all the movies?
I used the SUM function to aggregate and the IS NULL clause to filter out the cells with null values.
Total revenue generated = 364,569,800,248
c. What are the top 10 movies by total revenue?
I used CTE(common table expression) which is a temporary named result set that you can reference within a SELECT.
output:
d. What are all the genres by total revenue?
Here, the total revenue is used to filter the genres.
output:
e. Classify the genres by total gross
Here, the total gross is used to filter the genres.
output:
f. What are the top 10 scores by revenue?
Here, the total revenue is used to filter the scores of each movie.
output:
g. What are the top 10 movies over the decade(1980 -1990) and over (1990 -2020)?
This question is aimed at finding out the movies with the highest rating scores over the first and the second decade.
1980-1990
output:
1990-2020
output:
h. What are the top 10 highest-ranked budgeted movies between 1980 - 2020?
Here, the aim is to find out the top 10 movies with the most budgets.
output:
i. ** What are the top 10 highest-scored budgeted movies between 1980-2020?**
Similar to the previous question, the difference is that, here, the most budgeted movies are sorted by their scores.
output:
j. What are the top 10 countries by highest gross generated?
output:
k. What are the 10 movies with the lowest gross generation and their respective countries?
output:
l. ** What are the top 10 gross-generating movies and their respective countries?**
output:
m. Who are the top 5 movie directors, and movie stars for the highest-scoring movies, and the movie year?
output:
n. Which 10 companies produced the most movies between 1990-2020?
output:
Insights
i. The action genre of movies generated more revenues for movie producers.
ii. The United States is a good area for movie production.
iii. Action, horror, and comedy movies generate more revenue than romance movies😢.
iv. Marvel's Avengers endgame had a budget of 356m USD and is ranked first on the list of movies with the highest budget between 1980 -2020.
v. "The Shawshank Redemption" of 1994 is the most scored movie with a 9.3 rating between 1980-2020.
I hope you enjoyed reading this. Kindly drop your questions, comments and reactions ✌.