OK, on with the show. So after scraping away at movie websites like Box Office Mojo and Metacritic, I learned a valuable lesson: web scraping without a real problem to solve, is like scraping ice off your car when there’s no ice on it. After contemplating topics focused on the Oscars and Studio success or critical acclaim, I decided to focus on something different all together: sports movies. The premise was this, despite recent successes of movies like “Creed” and “Concussion”, the sports genre still remains an underdog in comparison with other more established genres.
Despite what I had scraped, it wasn't focused on sports movies, so it was back to the drawing board. After some searching I found a website, Ultimate Moving Rankings, that maintained a list of the Top 400 sports movies of all time. The advantages of this source was that the data was concise and easy to scrape, it was categorized by sport, and its domestic gross numbers were adjusted for inflation. Among the cons, the overall feature set ended up being limited and some of the features where aggregates that included values from other features, making them difficult to use as independent variables. Nevertheless, there was enough to work with to start my regression analysis.
First I looked at the top movies by specific sport. Not surprisingly, boxing, football, and baseball movies outplayed the rest of the field significantly. The complete rankings are shown in the histogram below.
Finally, I conducted an actual linear regression using critical audience/rating as a predictor for adjusted domestic gross. Despite a low R-squared value, due largely to the outliers (i.e. your Rocky's, and Blind Side's), there was a correlation between the two variables. However, if you look at the slope of the regression graph, it is fairly gradual, affirming that despite critical acclaim, sports movies really don't move the meter that much.