An SQL analysis of movies produced between (1980 - 2020).

How AN ANALYSIS OF MOVIES PRODUCED BETWEEN (1980 - 2020) Made Me better at using SQL.

An SQL analysis of movies produced between (1980 - 2020).

Introduction

I've decided to analyze movies produced between 1980 and 2020. I've gathered data from Kaggle, cleaned the data using excel, and imported it into the PostgreSQL workbench.

This project's main aim is to use SQL to analyze the movie dataset and answer a few questions. Click movies to access the dataset. Note: All data querying was done using PostgreSQL workbench.
In this article, I will walk you through how I arrived at certain outcomes but before then grab your popcorn and enjoy the ride 😘.

Objectives

i. Using SQL to answer 15 questions
ii. Creating insights from findings.

Questions

a. What is the total gross generated from all the movies?
I used the SUM function to add up the gross generated from all the movies.

q1.png Total gross generated = 587,105,546,272

2022-09-24_9-59-48.png

b. What is the total revenue generated from all the movies?
I used the SUM function to aggregate and also the IS NULL clause to filter out the cells that had null values.

2022-09-22_12-18-27.png Total revenue generated = 364,569,800,248

2022-09-22_12-19-33.png

c. What are the top 10 movies by total revenue?
I used CTE(common table expression) which is a temporary named result set that you can reference within a SELECT.

2022-09-24_8-45-11.png

output:

2022-09-24_8-47-14.png d. What are all the genres by total revenue?
Here, the total revenue is used to filter the genres.

genre by rev.png output:

2022-09-24_9-21-50.png e. Classify the genres by total gross
Here, the total gross is used to filter the genres.

2022-09-24_9-46-10.png output:

2022-09-24_9-57-12.png f. What are the top 10 scores by revenue?
Here, the total revenue is used to filter the scores of each movie.

Q6.png output:

Q6a.png g. What are the top 10 movies over the decade(1980 -1990) and over (1990 -2020)?
This question is aimed at finding out the movies with the highest rating scores over the first and the second decade.

1980-1990 q8.png output:

q8a.png 1990-2020

q8b.png output:

q8bb.png h. What are the top 10 highest-ranked budgeted movies between 1980 - 2020?
Here, the aim is to find out the top 10 movies with the most budgets.

q9.png output:

q9a.png i. What are the top 10 highest-scored budgeted movies between 1980-2020?
Similar to the previous question, the difference is that, here, the most budgeted movies are sorted by their scores.

q9b.png output:

q9bb.png j. What are the top 10 countries by highest gross generated?

q10.png output:

q10a.png k. What are the 10 movies with the lowest gross generation and their respective countries?

q11.png output:

q11a.png l. What are the top 10 gross-generating movies and their respective countries?

q12.png output:

q12a.png m. Who are the top 5 movie directors, and movie stars for the highest-scoring movies, and the movie year?

Q13.png output:

q13a.png n. Which 10 companies produced the most movies between 1990-2020?

q14.png output:

q14a.png

Insights

i. The action genre of movies generated more revenues for movie producers.
ii. The United States is a good area for movie production.
iii. Action, horror, and comedy movies generate more revenue than romance movies😢.
iv. Marvel's Avengers endgame had a budget of 356m USD and is ranked first on the list of movies with the highest budget between 1980 -2020.
v. "The Shawshank Redemption" of 1994 is the most scored movie with a 9.3 rating between 1980-2020.

I hope you enjoyed reading this. Kindly drop your questions, comments and reactions ✌.