Home Button

Linear Regression Project

Predicting Vehicle MPG

Download Project Report Document Download Icon
Download Project Briefing Document Download Icon
Check the Code Github Icon

Vehicle Fuel Efficiency & Carbon Dioxide Emissions

A study in factors affecting fuel efficiency and CO2 emissions as well as buidling machine learning models to predict fuel efficiency

Summary

The goal of this project was to demonstrate mastery of data science and machine learning concepts. This started with selection of a problem statement and dataset. I chose to leverage a linear regression model against a dataset on vehicle fuel efficiency from the Environmental Protection Agency (EPA)

Report Navigation

Problem Statement Goto Icon
Goals & Questions Goto Icon
Question 1 Goto Icon
Question 2 Goto Icon
Question 3 Goto Icon
Considerations & Way Ahead Goto Icon

Problem Statement

EPA and other agencies spend large sums of money and time conducting fuel efficiency tests.

Hypothesis: Vehicle MPG Varies by Make/Model.

Vehicle MPG varies by make/model (alternately MPG varies by engine displacement or transmission type) and a machine learning model can be built to predict this. Replacing costly and time consuming tests with a machine learning model would certainly benefit agencies such as the EPA.

Null Hypothesis: Vehicle MPG Does Not vary by Make/Model.

Vehicle MPG does not vary by make/model (MPG does not vary by engine displacement/transmission type) or a machine learning model cannot replace laboratory testing for fuel efficiency.

Goals

  • Create a machine learning model capable of predicting fuel efficiency based on certain variables
  • Make the machine learning model accurate enough that it could potentially replace costly and time consuming laboratory tests for fuel efficiency

Questions

  • First question: Have fuel efficiency standards had an impact on CO2 emissions and overall fuel efficiency over time
  • Second question: Which factors, based on available data, most affect fuel efficiency
  • Third question: Can a machine learning model be built to predict fuel efficiency

First Question: Have fuel efficiency standards had an impact on CO2 emissions and overall fuel efficiency over time

Fuel Efficiency over Time
Figure 1

Fuel Efficiency over Time

Second question: Which factors, based on available data, most affect fuel efficiency

BLUF Not suprisingly, data analysis shows tight correlation between engine displacement and CO2 emission. However, there were some surprises hidden in trending data over time

Relationship Between MPG and Displacement, Cylinders, and CO2 Emissions
Figure 2

Relationship Between MPG and Displacement, Cylinders, and CO2 Emissions

Relationship BetweenDisplacement and CO2 Emissions
Figure 3

Relationship Between Displacement, MPG, and CO2 Emissions


Hidden impacts: Graphs of the Averages of Major Variables over Time

Figure 3 below shows the averages of four major variables over time. Further research into the drastic changes in averages revealed two hidden impacts: federal law and vehicle sales trends

Averages of Major Variables over Time
Figure 3

Averages of Major Variables over Time


Averages of Major Variables over Time
Figure 4

Count of Vehicle Type over Time

Can a Machine Learning Model be Built to Predict Fuel Efficiency

BLUF: Yes, a linear regression model shows great promise in predicting a vehicles fuel economy based on a number of variables:

The below images are from a prototype prediction model in Jupyter Notebook. In the future this will be deployed as a webapp.
Instructions for application use
Figure 5

Instructions for application use

Prediction results based on user input
Figure 6

Prediction results based on user input

Considerations

  • This data does not capture all variables that could affect fuel efficiency and CO2 emissions
  • Hybrid and electric vehicles may skew data

The Way Ahead

  • Build a separate machine learning model specifically for hybrid and electric vehicles
  • Build a separate machine learning model specifically for predicting CO2 emissions based on available data
  • Deploy a front-end user model on a webpage for educational purposes
  • Deploy a machine learning model for making predictions over large sets of new data for EPA, DOE, and other Agencies and NGOs