To EV or Not to EV

December 31, 2022

Introduction

I usually do most of my random pondering about life either while driving or in the shower. On a particular day on the I-95, I was whining to myself about the amount I spend on gas on my beautiful old gas guzzler, a 2010 BMW 328xi. She gives me about 21 MPG on the highway and elegantly requests only the finest of fuels from the premium aisle. I love my car but like Tiger Woods off the golf course, I started wondering about life on the other side. Electric cars are easier on the wallet in the long run but by how much?

Methodology

1. Get the Miles Per Gallon data

The EPA website has a downloadable CSV file on almost all cars made since 1984. The dataset includes almost all the relevant information about MPG for ICE (Internal combustion engine) vehicles, kWh for electric vehicles, and a combination of the two for hybrid vehicles explained here. We also get information on the type of fuel a gas-powered or hybrid vehicle uses, i.e. whether the vehicle requires regular, mid-grade, or premium gasoline.

Click to view python code

import pandas as pd
import numpy as np

df = pd.read_csv('data/car-data/vehicles.csv', usecols=['year', 'make', 'model', 'barrels08', 'barrelsA08', 'city08', 'city08U', 'cityA08', 'cityA08U', 'cityCD', 'cityE', 'cityUF',
'co2TailpipeGpm', 'co2TailpipeAGpm', 'comb08', 'comb08U', 'combA08', 'combA08U', 'combE', 'combinedCD', 'combinedUF', 'cylinders',
'displ', 'drive', 'fuelCost08', 'fuelCostA08', 'fuelType', 'fuelType1', 'fuelType2', 'highway08', 'highway08U', 'highwayA08', 'highwayA08U', 'highwayCD',
'highwayE', 'highwayUF', 'phevBlended', 'rangeA', 'rangeCityA', 'rangeHwyA', 'trany', 'VClass', 'phevCity',
'phevHwy', 'phevComb', 'atvType'])
`

Data Legend:

year - model year
make - manufacturer (division)
model - model name (carline)
VClass - EPA vehicle size class
atvType - type of alternative fuel or advanced technology vehicle
cityCD - city gasoline consumption (gallons/100 miles) in charge depleting mode
cityE - city electricity consumption in kw-hrs/100 miles
comb08U - unrounded combined MPG for fuelType1
combA08U - unrounded combined MPG for fuelType2
combE - combined electricity consumption in kw-hrs/100 miles
combinedCD - combined gasoline consumption (gallons/100 miles) in charge depleting mode
highway08U - unrounded highway MPG for fuelType1
highwayA08U - unrounded highway MPG for fuelType2
highwayCD - highway gasoline consumption (gallons/100miles) in charge depleting mode
highwayE - highway electricity consumption in kw-hrs/100 miles
phevBlended - if true, this vehicle operates on a blend of gasoline and electricity in charge depleting mode
rangeA - EPA range for fuelType2
rangeCityA - EPA city range for fuelType2
rangeHwyA - EPA highway range for fuelType2
barrels08 - annual petroleum consumption in barrels for fuelType1
barrelsA08 - annual petroleum consumption in barrels for fuelType2
co2TailpipeAGpm - tailpipe CO2 in grams/mile for fuelType2
co2TailpipeGpm- tailpipe CO2 in grams/mile for fuelType1
fuelType - fuel type with fuelType1 and fuelType2 (if applicable)
fuelType1 - fuel type 1. For single fuel vehicles, this will be the only fuel. For dual fuel vehicles, this will be the conventional fuel.
fuelType2 - fuel type 2. For dual fuel vehicles, this will be the alternative fuel (e.g. E85, Electricity, CNG, LPG). For single fuel vehicles, this field is not used

Source

Let's remove vehicles that run on CNG, Natural gas, or propane.

Why? Firstly because getting these rates is a little harder, and involves a few extra steps. We would probably have to scrape a few different websites to get this data. But the most important reason- most consumers are not going to be looking for these types of vehicles. Vehicles that run on CNG, Natural gas, or propane are mainly used in commercial applications or older vehicles

to_drop = df[(df['fuelType2'] == 'Natural Gas') | (df['fuelType2'] == 'Propane') | (df['fuelType1'] == 'Natural Gas')].index
df.drop(index = to_drop, axis=1, inplace=True)
df.reset_index(drop=True, inplace=True)

### Some vehicles do not have unrounded MPG values. For those vehicles, the unrounded MPG values are set to 0. This section will set those empty values to the rounded figures.

df['city08U'] = np.where(df['city08U'] == 0, df['city08'], df['city08U'])
df['cityA08U'] = np.where(df['cityA08U'] == 0, df['cityA08'], df['cityA08U'])
df['comb08U'] = np.where(df['comb08U'] == 0, df['comb08'], df['comb08U'])
df['combA08U'] = np.where(df['combA08U'] == 0, df['combA08'], df['combA08U'])
df['highway08U'] = np.where(df['highway08U'] == 0, df['highway08'], df['highway08U'])
df['highwayA08U'] = np.where(df['highwayA08U'] == 0, df['highwayA08'], df['highwayA08U'])
`

Saving results to csv

df.to_csv('data/car-data/vehicles-cleaned.csv')

2. Get the MSRP data

Unfortunately, the above dataset does not have information about the price of each vehicle. To get this data, we have to scrape the fueleconomy.gov website. I used the selenium python package for this, as I find that the commonly used requests python package has a lot of issues when it comes to dynamic websites. The selenium package also launches a browser window as it scrapes through webpages, and that helps a lot with debugging.

Click to view python code

import os
import csv
import pandas as pd
import numpy as np
from datetime import date
from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()
driver.implicitly_wait(10)
category = []
car_name = []
config = []
per100mls = []
driver_mpg = []
annual_fuel_cost = []
msrp = []
energy_impact_score = []
greenhouse_emissions = []
combined_mpgf1 = []
mpg_unitsf1 = []
city_mpgf1 = []  # class is same, first city then highway
hwy_mpgf1 = []
elegasmiles = []  # gas + electric miles. this contains all ele miles as well
totalrange = []  # total car range for hybrids
phevgascomb = []  # combined mpg (city + hwy) for hybrids
phevgasper100 = []  # combined gal/100 miles (city + hwy) for hybrids
fuel_type = []

Code for scraping the website

def get_mpg_data(cat):
   category = []
   car_name = []
   config = []
   fuel_type = []
   per100mls = []
   driver_mpg = []
   annual_fuel_cost = []
   msrp = []
   energy_impact_score = []
   greenhouse_emissions = []
   combined_mpgf1 = []
   mpg_unitsf1 = []
   city_mpgf1 = []
   hwy_mpgf1 = []
   elegasmiles = []  # gas + electric miles. this contains all ele miles as well
   totalrange = []  # total car range for hybrids
   phevgascomb = []  # combined mpg (city + hwy) for hybrids
   phevgasper100 = []  # combined gal/100 miles (city + hwy) for hybrids
   
   page = driver.find_element(By.CSS_SELECTOR, 'table.cars.display.responsive.stickyHeader')
   soup = BeautifulSoup(page.get_attribute('innerHTML'), 'lxml')
   all_cards = soup.find_all('tr', class_='ymm-row')
   
   count = 0
   
   for card in all_cards:
       count += 1
       if count == 199:
           count = 0
       category.append(cat)  # append car category
       car_name.append(card.find('a').text)  # append car name
       conf = card.find('span', class_='config').text  # get car configuration
       conf = conf.split(',')
       ft = conf.pop(-1)  # last part of conf has fuel type
       fuel_type.append(ft)  # Todo: add this to output
       s = '-'.join(conf)  # creating string from list
       config.append(s)
       card = card.next_sibling.next_sibling
       mpgepa = card.find('td', class_='mpg-epa')
       if 'Not Available' in mpgepa.text or 'Currently unavailable' in mpgepa.text:
           print(f'skipping {car_name[-1]} because it does not have MPG information')
           category = category[:-1]
           car_name = car_name[:-1]
           config = config[:-1]
           fuel_type = fuel_type[:-1]
           continue

       if 'and Electricity' in ft:
           r1 = card.findNext('td', class_='range phev')
           elegasmiles.append(r1.find('div', class_='cdRange').text)
           totalrange.append(r1.find('div', class_='phevRange').text)
           fuel2 = mpgepa.find('div', class_='fuel2')
           phevgascomb.append(fuel2.find('td', class_='mpg-comb').text)
           phevgasper100.append(fuel2.find('td', class_='cons').text)
       elif 'or Electricity' in ft:
           r1 = card.findNext('td', class_='range phev')
           elegasmiles.append(r1.find('div', class_='cdRange').text)
           totalrange.append(r1.find('div', class_='phevRange').text)
           fuel2 = mpgepa.find('div', class_='fuel2')
           phevgascomb.append(fuel2.find('td', class_='mpg-comb').text)
           phevgasper100.append(fuel2.find('td', class_='cons').text)
       else:
           elegasmiles.append(np.nan)
           totalrange.append(np.nan)
           phevgascomb.append(np.nan)
           phevgasper100.append(np.nan)

       combined_mpgf1.append(mpgepa.find('td', class_='mpg-comb').text)
       mpg_unitsf1.append(mpgepa.find('td', class_='unitsLabel').text)
       ctyhwyf1 = mpgepa.find_all('td', class_='ctyhwy')
       city_mpgf1.append(ctyhwyf1[0].text)  # class is same, first city then highway
       hwy_mpgf1.append(ctyhwyf1[1].text)
       per100mls.append(mpgepa.find('td', class_='cons').text)
       driver_mpg.append(card.find('td', class_='mpg-user').text.replace('\t', '').replace('\n', ''))
       annual_fuel_cost.append(card.find('td', class_='ann-fuel-cost').text.replace('\t', '').replace('\n',
                                                                                                      ''))
       msrp.append(card.find('td', class_='msrp').text.replace('\t', '').replace('\n', ''))
       energy_impact_score.append(card.find('td', class_='energy').text.replace('\n', ''))
       greenhouse_emissions.append(card.find('td', class_='ghg').text.replace('\n', ''))

   rules = [len(category) == len(car_name), # checks if all lengths are equal
            len(car_name) == len(config),
            len(config) == len(per100mls),
            len(per100mls) == len(driver_mpg),
            len(driver_mpg) == len(annual_fuel_cost),
            len(annual_fuel_cost) == len(msrp),
            len(msrp) == len(energy_impact_score),
            len(energy_impact_score) == len(greenhouse_emissions),
            len(greenhouse_emissions) == len(combined_mpgf1),
            len(combined_mpgf1) == len(mpg_unitsf1),
            len(mpg_unitsf1) == len(city_mpgf1),
            len(city_mpgf1) == len(hwy_mpgf1),
            len(hwy_mpgf1) == len(elegasmiles),
            len(elegasmiles) == len(totalrange),
            len(totalrange) == len(phevgascomb),
            len(phevgascomb) == len(phevgasper100),
            len(phevgasper100) == len(fuel_type)
            ]

   if not all(rules):
       print('Error, data lengths not equal')

   return category, car_name, config, per100mls, driver_mpg, annual_fuel_cost, msrp, energy_impact_score, \
          greenhouse_emissions, combined_mpgf1, mpg_unitsf1, city_mpgf1, hwy_mpgf1, elegasmiles, totalrange, \
          phevgascomb, phevgasper100, fuel_type

Running the scraper for different categories of vehicles

site_dict = {
   # 'test': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcfamilySedans=Family+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Family+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200'
   'Small Cars': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcsmallCars=Small+Cars&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Small+Cars&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Family Sedans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcfamilySedans=Family+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Family+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Upscale Sedans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcupscaleSedans=Upscale+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Upscale+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Luxury Sedans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcluxurySedans=Luxury+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Luxury+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Large Sedans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmclargeSedans=Large+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Large+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Hatchbacks': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmchatchbacks=Hatchbacks&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Hatchbacks&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Coupes': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcCoupes=Coupes&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Coupes&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Convertibles': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcconvertibles=Convertibles&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Convertibles&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Sports/Sporty Cars': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcSports_Sporty_Cars=Sports%2FSporty+Cars&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Sports%2FSporty+Cars&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Station Wagons': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcstationWagons=Station+Wagons&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Station+Wagons&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Pickup Trucks': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcpickupTrucks=Pickup+Trucks&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Pickup+Trucks&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'SUVs': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcsportUtilityVehicles=Sport+Utility+Vehicles&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Sport+Utility+Vehicles&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
   'Minivans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcminivans=Minivans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Minivans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200'
}

for cat, link in site_dict.items():  # Loops through the different category pages
   print(cat)
   driver.get(link)
   sleep(5)  # Have to add this otherwise selenium breaks once in a while
   i = 0
   while True:
       i += 1
       print(f'Scraping page {i} from {cat}')
       mpg_data = get_mpg_data(cat)  # This function scrapes the data
       category += mpg_data[0]
       car_name += mpg_data[1]
       config += mpg_data[2]
       per100mls += mpg_data[3]
       driver_mpg += mpg_data[4]
       annual_fuel_cost += mpg_data[5]
       msrp += mpg_data[6]
       energy_impact_score += mpg_data[7]
       greenhouse_emissions += mpg_data[8]
       combined_mpgf1 += mpg_data[9]
       mpg_unitsf1 += mpg_data[10]
       city_mpgf1 += mpg_data[11]
       hwy_mpgf1 += mpg_data[12]
       elegasmiles += mpg_data[13]
       # allelemiles += mpg_data[14]
       totalrange += mpg_data[14]
       phevgascomb += mpg_data[15]
       phevgasper100 += mpg_data[16]
       fuel_type += mpg_data[17]

       try:
           driver.find_element(By.CSS_SELECTOR, 'a[aria-label="Next"]').click()
       except ElementClickInterceptedException:  # If selenium does not find the next button, signaling that there is no next page, the loop breaks and
           break

Saving the data

headers = ['category', 'car_name', 'config', 'fuel_type', 'per100mls', 'driver_mpg', 'annual_fuel_cost', 'msrp',
          'energy_impact_score', 'greenhouse_emissions', 'combined_mpgf1', 'mpg_unitsf1', 'city_mpgf1', 'hwy_mpgf1',
          'elegasmiles', 'totalrange', 'phevgascomb', 'phevgasper100']
with open(f'data/car-data/{str(date.today())}.csv', 'w') as file:
   writer = csv.writer(file)
   writer.writerow(headers)
   for i in range(len(car_name)):
       writer.writerow([category[i], car_name[i], config[i], fuel_type[i], per100mls[i],
                        driver_mpg[i], annual_fuel_cost[i], msrp[i],
                        energy_impact_score[i],
                        greenhouse_emissions[i],
                        combined_mpgf1[i], mpg_unitsf1[i], city_mpgf1[i], hwy_mpgf1[i], elegasmiles[i], totalrange[i],
                        phevgascomb[i], phevgasper100[i]])

3. Combine the two datasets

Now that we have information on the MPG of each vehicle and the price of the respective vehicle, we can combine this into one dataset. We then need to standardize the data from both datasets to make the text from both sources comparable.

Next, we can match different attributes of each vehicle like make, model, year, config, etc., and save the resulting data in a csv

After matching, data on about 12,340 vehicles manufactured from 2010 on-wards were gathered. Out of this:

10,843 vehicles are gas powered
98 vehicles are powered by Diesel
347 are EVs
787 are Hybrids, and 265 are Plug-in Hybrids

Click to view python code

import os
import re
import csv
import pandas as pd
import numpy as np

pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)

Prepare the MSRP dataset

msrpdf = pd.read_csv('data/car-data/2022-11-09.csv')
#  Removing formating, '$' signs and ','
msrpdf['msrp'] = msrpdf['msrp'].apply(lambda x: x.replace('\xa0', ' ').replace('$', '').replace(',', '').strip())
msrpdf = msrpdf[msrpdf['msrp'] != ''].reset_index(drop=True)
msrpdf['fuel_type'] = msrpdf['fuel_type'].apply(lambda x: x.strip())
`

def parse_msrp(df):
   '''
   Function to convert msrp from strings to int/float, also computes average. If a range of values are given, average is calculated
   else, single msrp is set into the average column
   '''
   text = df['msrp']
   regex = re.compile(r'\d+')
   matches = regex.findall(text)
   if len(matches) == 2:
       low_end = int(matches[0])
       high_end = int(matches[1])
       average = (high_end + low_end)/2
   elif len(matches) == 1:
       low_end = np.nan
       high_end = np.nan
       average = int(matches[0])
   else:
       raise Exception('Error in parsing MSRP')

   return low_end, high_end, average

def parse_model(df):
   text = df['car_name']
   split_text = text.split()
   txt_modifier = 2
   year = split_text[0]
   make = split_text[1]
   if make in ['Land', 'Aston', 'Alfa', 'Roush', 'McLaren', 'CODA']:
       make += ' ' + split_text[2]
       txt_modifier = 3
       
   model = ''
   for i in range(len(split_text) - txt_modifier):
       model += split_text[i+txt_modifier] + ' '
   model = model.strip()
   return year, make, model

def parse_trany(df):
   text = df['config']
   regex = re.compile(r'(Automatic[a-zA-Z0-9\(\- ]*\)|Automatic[a-zA-Z0-9\(\- ]*-spd|Manual[a-zA-Z0-9\(\- ]*-spd)')
   matches = regex.findall(text)
   if len(matches) == 1:
       trany = matches[0]
   elif len(matches) == 0:
       print(f'Error - could not find trany: {text}')
       trany = np.nan
   elif len(mathces) > 1:
       raise Exception('Error in parsing displacement - too many options')
       
   regex = re.compile(r'\d?\d.\d L')
   matches = regex.findall(text)
   if len(matches) == 1:
       displacement = matches[0].replace(' L', '')
   elif len(matches) == 0:
       displacement = np.nan
   elif len(mathces) > 1:
       raise Exception('Error in parsing displacement - too many options')

   regex = re.compile(r'\d?\d cyl')
   matches = regex.findall(text)
   if len(matches) == 1:
       cylinders = matches[0].replace(' cyl', '')
   elif len(matches) == 0:
       cylinders = np.nan
   elif len(mathces) > 1:
       raise Exception('Error in parsing displacement - too many options')
       
   return trany, displacement, cylinders

msrpdf[['msrp_low', 'msrp_high', 'msrp_average']] = msrpdf.apply(parse_msrp, axis=1, result_type ='expand') 
msrpdf[['year', 'make', 'model']] = msrpdf.apply(parse_model, axis=1, result_type ='expand') 
msrpdf[['trany', 'displacement', 'cylinders']] = msrpdf.apply(parse_trany, axis=1, result_type ='expand')
# Probably could have done the below line at the import, but I wasn't sure which columns I would require until later
msrpdf = msrpdf[['category', 'year', 'make', 'model', 'config', 'trany', 'displacement', 'cylinders', 'msrp_low', 'msrp_high', 'msrp_average', 'fuel_type']].copy()
msrpdf['cylinders'] = msrpdf['cylinders'].astype('string')
msrpdf['year'] = msrpdf['year'].astype('int')
msrpdf['model'] = msrpdf['model'].apply(lambda x: x.replace('  ', ' ').replace(' inch', 'in').replace('inch', 'in').replace('-inch', 'in').replace('21-in', 
                                                                                           '21in').replace('22-in', '22in').replace('wheels', 'Wheels'))

Preparing the MPG data

mpgdf = pd.read_csv('data/car-data/vehicles-cleaned.csv', index_col=0)
mpgdf = mpgdf[['year', 'make', 'model', 'highway08U', 'highwayCD', 'highwayE', 'city08U', 'cityCD', 'cityE', 
              'comb08U', 'combinedCD', 'combE', 'trany', 'displ', 'cylinders', 'atvType', 'fuelType1', 'fuelType2', 'rangeA']].copy()
mpgdf['displ'] = mpgdf['displ'].astype('string') # converting to string since we will need to compare it later, comparing floats may create errors
mpgdf['cylinders'] = mpgdf['cylinders'].astype('string')
mpgdf['model'] = mpgdf['model'].apply(lambda x: x.replace('  ', ' ').replace('-inch', 'in').replace('wheels', 'Wheels').replace('MUSTANG MACH', 'Mustang Mach'))

Matching the common attributes between the two dataframes

def compare_datasets(df1, df2): # Could have made this code more readable, but I spent waaaaay too much time on it
   '''
   first compare datasets for the same year, make, and model
   if multiple results are found, compare datasets for the same transmission
   if multiple results are found, compare datasets for the same engine displacement and number of cylinders
   if multiple reults are found, compare datasets for the same fuel type
   '''
   year = df1.year
   make = df1.make
   model = df1.model
   trany = df1.trany
   displacement = df1.displacement
   cylinders = df1.cylinders
   fuel_type = df1.fuel_type
   matches = df2[(df2['year'] == year) & (df2['make'] == make) & (df2['model'] == model)] 

   if len(matches) == 1:
       category = df1.category
       config = df1.config
       msrp_low = df1.msrp_low
       msrp_high = df1.msrp_high
       msrp_average = df1.msrp_average
       highway_mpg = matches.highway08U.values[0]
       highwayCD = matches.highwayCD.values[0]
       highwayE = matches.highwayE.values[0]
       city_mpg = matches.city08U.values[0]
       cityCD = matches.cityCD.values[0]
       cityE = matches.cityE.values[0]
       comb_mpg = matches.comb08U.values[0]
       combCD = matches.combinedCD.values[0]
       combE = matches.combE.values[0]
       
       atv_type = matches.atvType.values[0]
       fuel_type1 = matches.fuelType1.values[0]
       fuel_type2 = matches.fuelType2.values[0]
       rangeA = matches.rangeA.values[0]
       
       return category, year, make, model, config, msrp_low, msrp_high, msrp_average, highway_mpg, highwayCD, \
              highwayE, city_mpg, cityCD, cityE, comb_mpg, combCD, combE, \
              trany, displacement, cylinders, atv_type, fuel_type1, fuel_type2, rangeA
               
   elif len(matches) > 1:
       matches2 = matches[matches['trany'] == trany]
       if len(matches2) == 1:         
           category = df1.category
           config = df1.config
           msrp_low = df1.msrp_low
           msrp_high = df1.msrp_high
           msrp_average = df1.msrp_average
           highway_mpg = matches2.highway08U.values[0]
           highwayCD = matches2.highwayCD.values[0]
           highwayE = matches2.highwayE.values[0]
           city_mpg = matches2.city08U.values[0]
           cityCD = matches2.cityCD.values[0]
           cityE = matches2.cityE.values[0]
           comb_mpg = matches2.comb08U.values[0]
           combCD = matches2.combinedCD.values[0]
           combE = matches2.combE.values[0]
           
           atv_type = matches2.atvType.values[0]
           fuel_type1 = matches2.fuelType1.values[0]
           fuel_type2 = matches2.fuelType2.values[0]
           rangeA = matches2.rangeA.values[0]
           
           
           return category, year, make, model, config, msrp_low, msrp_high, msrp_average, highway_mpg, highwayCD, \
                  highwayE, city_mpg, cityCD, cityE, comb_mpg, combCD, combE, \
                  trany, displacement, cylinders, atv_type, fuel_type1, fuel_type2, rangeA
       
       elif len(matches2) > 1:     
           matches3 = matches2[(matches2['displ'] == displacement) & (matches2['cylinders'] == cylinders)]
           if len(matches3) == 1:
               category = df1.category
               config = df1.config
               msrp_low = df1.msrp_low
               msrp_high = df1.msrp_high
               msrp_average = df1.msrp_average
               highway_mpg = matches3.highway08U.values[0]
               highwayCD = matches3.highwayCD.values[0]
               highwayE = matches3.highwayE.values[0]
               city_mpg = matches3.city08U.values[0]
               cityCD = matches3.cityCD.values[0]
               cityE = matches3.cityE.values[0]
               comb_mpg = matches3.comb08U.values[0]
               combCD = matches3.combinedCD.values[0]
               combE = matches3.combE.values[0]
               
               atv_type = matches3.atvType.values[0]
               fuel_type1 = matches3.fuelType1.values[0]
               fuel_type2 = matches3.fuelType2.values[0]
               rangeA = matches3.rangeA.values[0]
               
               
               return category, year, make, model, config, msrp_low, msrp_high, msrp_average, highway_mpg, highwayCD, \
                      highwayE, city_mpg, cityCD, cityE, comb_mpg, combCD, combE, \
                      trany, displacement, cylinders, atv_type, fuel_type1, fuel_type2, rangeA
           
           elif len(matches3) > 1:
               matches4 = matches3[matches3['fuelType1'] == fuel_type]
               if len(matches4) == 1:
                   category = df1.category
                   config = df1.config
                   msrp_low = df1.msrp_low
                   msrp_high = df1.msrp_high
                   msrp_average = df1.msrp_average
                   highway_mpg = matches4.highway08U.values[0]
                   highwayCD = matches4.highwayCD.values[0]
                   highwayE = matches4.highwayE.values[0]
                   city_mpg = matches4.city08U.values[0]
                   cityCD = matches4.cityCD.values[0]
                   cityE = matches4.cityE.values[0]
                   comb_mpg = matches4.comb08U.values[0]
                   combCD = matches4.combinedCD.values[0]
                   combE = matches4.combE.values[0]
                   
                   atv_type = matches4.atvType.values[0]
                   fuel_type1 = matches4.fuelType1.values[0]
                   fuel_type2 = matches4.fuelType2.values[0]
                   rangeA = matches4.rangeA.values[0]
                   
                   
                   return category, year, make, model, config, msrp_low, msrp_high, msrp_average, highway_mpg, highwayCD, \
                          highwayE, city_mpg, cityCD, cityE, comb_mpg, combCD, combE, \
                          trany, displacement, cylinders, atv_type, fuel_type1, fuel_type2, rangeA
               
               elif len(matches4) > 1:
                   with open('data/car-data/msc/multiple-matches.csv', 'a') as f1:
                       f1writer = csv.writer(f1)
                       f1writer.writerow([df1.name, df1.category, year, make, model])
               
               elif len(matches4) == 0:
                   with open('data/car-data/msc/could-not-match.csv', 'a') as f3:
                       f3writer = csv.writer(f3)
                       f3writer.writerow([df1.name, df1.category, year, make, model])
                       
           elif len(matches3) == 0:
               with open('data/car-data/msc/could-not-match.csv', 'a') as f3:
                   f3writer = csv.writer(f3)
                   f3writer.writerow([df1.name, df1.category, year, make, model])
                   
       elif len(matches2) == 0:
           with open('data/car-data/msc/could-not-match.csv', 'a') as f3:
               f3writer = csv.writer(f3)
               f3writer.writerow([df1.name, df1.category, year, make, model])

   elif len(matches) == 0:
       with open('data/car-data/msc/no-matches.csv', 'a') as f2:
           f2writer = csv.writer(f2)
           f2writer.writerow([df1.name, df1.category, year, make, model])

Running the matching algorithm and saving the vehicles that were not able to be matched

path1 = 'data/car-data/msc/multiple-matches.csv'
if os.path.isfile(path1):
   os.remove(path1)
path2 = 'data/car-data/msc/no-matches.csv'
if os.path.isfile(path2):
   os.remove(path2)

with open('data/car-data/msc/multiple-matches.csv', 'w') as f1:
   f1writer = csv.writer(f1)
   f1writer.writerow(['index_in_msepdf', 'category', 'year', 'make', 'model'])
with open('data/car-data/msc/no-matches.csv', 'w') as f2:
   f2writer = csv.writer(f2)
   f2writer.writerow(['index_in_msepdf', 'category', 'year', 'make', 'model'])
   
combdf = pd.DataFrame()
combdf = msrpdf.apply(compare_datasets, axis=1, args=(mpgdf,), result_type='expand')
combdf.columns = ['category', 'year', 'make', 'model', 'config', 'msrp_low', 'msrp_high', 'msrp_average', 'highway_mpg', 'highwayCD', 
                 'highwayE', 'city_mpg', 'cityCD', 'cityE', 'comb_mpg', 'combCD', 'combE', 'trany', 'displacement', 'cylinders', 'atv_type', 'fuel_type1', 'fuel_type2', 'rangeA']
combdf.dropna(how='all', inplace=True)
combdf = combdf.reset_index(drop=True)
combdf['atv_type'] = combdf.atv_type.fillna('Gas')

Saving the final dataset

combdf.to_csv('data/car-data/combined-mpg-msrp.csv')

4. Getting the current gas prices

The AAA website has gas prices for each state and every fuel grade. The prices are updated daily as well.

Click to view python code

from selenium import webdriver
from selenium.webdriver.common.by import By
from datetime import date
import pandas as pd

driver = webdriver.Firefox()
driver.implicitly_wait(10)

state_list = []
regular_current = []
midgrade_current = []
premium_current = []
diesel_current = []
states_dict = {
   'AK': 'Alaska',
   'AL': 'Alabama',
   'AR': 'Arkansas',
   'AZ': 'Arizona',
   'CA': 'California',
   'CO': 'Colorado',
   'CT': 'Connecticut',
   'DC': 'District of Columbia',
   'DE': 'Delaware',
   'FL': 'Florida',
   'GA': 'Georgia',
   'HI': 'Hawaii',
   'IA': 'Iowa',
   'ID': 'Idaho',
   'IL': 'Illinois',
   'IN': 'Indiana',
   'KS': 'Kansas',
   'KY': 'Kentucky',
   'LA': 'Louisiana',
   'MA': 'Massachusetts',
   'MD': 'Maryland',
   'ME': 'Maine',
   'MI': 'Michigan',
   'MN': 'Minnesota',
   'MO': 'Missouri',
   'MS': 'Mississippi',
   'MT': 'Montana',
   'NC': 'North Carolina',
   'ND': 'North Dakota',
   'NE': 'Nebraska',
   'NH': 'New Hampshire',
   'NJ': 'New Jersey',
   'NM': 'New Mexico',
   'NV': 'Nevada',
   'NY': 'New York',
   'OH': 'Ohio',
   'OK': 'Oklahoma',
   'OR': 'Oregon',
   'PA': 'Pennsylvania',
   'PR': 'Puerto Rico',
   'RI': 'Rhode Island',
   'SC': 'South Carolina',
   'SD': 'South Dakota',
   'TN': 'Tennessee',
   'TX': 'Texas',
   'UT': 'Utah',
   'VA': 'Virginia',
   'VT': 'Vermont',
   'WA': 'Washington',
   'WI': 'Wisconsin',
   'WV': 'West Virginia',
   'WY': 'Wyoming'
}

Scraping the AAA website

link = 'https://gasprices.aaa.com/'
driver.get(link)
table = driver.find_element(By.CSS_SELECTOR, 'table.table-mob')
state_list.append('National')
prices = table.find_element(By.CSS_SELECTOR, 'tbody').find_elements(By.CSS_SELECTOR, 'td')
regular_current.append(prices[1].text.replace('$', ''))
midgrade_current.append(prices[2].text.replace('$', ''))
premium_current.append(prices[3].text.replace('$', ''))
diesel_current.append(prices[4].text.replace('$', ''))

for abr, state in states_dict.items():
   site = link + f'?state={abr}'
   driver.get(site)
   state_list.append(state)
   get_table = driver.find_element(By.CSS_SELECTOR, 'table.table-mob')
   tbtable = get_table.find_element(By.CSS_SELECTOR, 'tbody')
   tables = tbtable.find_elements(By.CSS_SELECTOR, 'tr')
   prices = tables[0].find_elements(By.CSS_SELECTOR, 'td')
   regular_current.append(prices[1].text.replace('$', ''))
   midgrade_current.append(prices[2].text.replace('$', ''))
   premium_current.append(prices[3].text.replace('$', ''))
   diesel_current.append(prices[4].text.replace('$', ''))

Create a dataframe and save csv

df = pd.DataFrame({
   'state': state_list,
   'regular': regular_current,   
   'midgrade': midgrade_current,  
   'premium': premium_current,    
   'diesel': diesel_current,
})
df.to_csv(f'data/car-data/gas-prices/gas-prices-{date.today()}.csv', index=False)
driver.close()

5. Getting the current electricity rates

This website has electricity prices for every state, so it shall be scraped!

Click to view python code

from selenium import webdriver
from selenium.webdriver.common.by import By
from datetime import date
import pandas as pd

driver = webdriver.Firefox()
driver.implicitly_wait(10)

state = []
current_rate = []
last_year_rate = []

Scraping the website

link = 'https://www.chooseenergy.com/electricity-rates-by-state/'
driver.get(link)
table = driver.find_element(By.CSS_SELECTOR, 'table') # gets the first table, the one we need
rows = table.find_elements(By.CSS_SELECTOR, 'tr')
rows = rows[1:]
for row in rows:
   column = row.find_elements(By.CSS_SELECTOR, 'td')
   state.append(column[0].text.replace(' ¢/kWh', '').replace('¢/kWh', ''))
   current_rate.append(column[1].text.replace(' ¢/kWh', '').replace('¢/kWh', ''))
   last_year_rate.append(column[2].text.replace(' ¢/kWh', '').replace('¢/kWh', ''))

for i, item in enumerate(state):
   if item == 'U.S.':
       state[i] = 'National'

Creating a dataframe and saving it to a csv file

df = pd.DataFrame({
   'state': state,
   'current_rate': current_rate,   
   'last_year_rate': last_year_rate
})
df.to_csv(f'data/car-data/electricity-prices/ele-prices-{date.today()}.csv', index=False)
driver.close()

6. Process all the data collected

Now that we have all the data we need for our analysis, we can process it and combine everything into one dataset. The price per mile for each vehicle in the dataset is calculated from the prices of the fuel source in each state.

For calculating the annual fuel costs, the following calculations were used:

For Gas and Diesel Vehicles:

From the gas prices dataset, we have the price of the grade of gasoline/diesel (regular, mid-grade, or premium) in dollars per gallon. From the combined vehicle dataset from step 3, we have the MPG of the vehicle and the grade of gasoline/diesel it uses. We can calculate the price per mile for each car in each state in the US using:

We can then multiply this by the number of miles driven in a year to get the annual fuel cost for gas and diesel vehicles.

For Electric Vehicles:

We have the price of electricity in each state in cents per kWh(s), and from the combined vehicle dataset, we know the energy used (kWh) for electric cars per 100 miles (160.93 km). We can then use this to calculate the price per mile for EVs in each state:

We can then multiply the price per mile by miles driven per year to get the annual fuel cost for the vehicle.

For Hybrid vehicles:

A hybrid vehicle can be charged by electricity (plug-in Hybrid) or be charged by the ICE engine (called Hybrid in this project). The calculations change depending on whether the vehicle can be charged by the grid or not:

Plug-in Hybrids:

The first step is to divide the number of miles driven in a year by the number of days in a year (365). This gives the number of miles driven per day. The dataset contains a range column, which gives us the number of miles a plug-in hybrid vehicle can be driven off electricity alone. If the range available is less than the number of miles driven per day, the remaining miles use gasoline. To calculate the annual fuel costs:

Electricity costs:

Gas costs:

For Hybrids:

Hybrid vehicle annual fuel costs are calculated like a gas/diesel vehicle

The price per mile is then multiplied by the number of miles driven in a year to get the annual fuel cost.

The final saved dataset is created in a form suitable for analysis in tableau.

Click to view python code

import os
import csv
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from datetime import datetime, date
pd.set_option('display.max_columns', 100)

Get the latest gas prices from datasets

df = pd.read_csv('data/car-data/combined-mpg-msrp.csv', index_col=0)
gas_dir, _, gas_price_list = next(os.walk('data/car-data/gas-prices/'))
gas_dates = []
for file in gas_price_list:
   gas_dates.append(file.strip('gas-prices-').strip('.csv'))
   
gas_date_latest = sorted(gas_dates, key = lambda d: datetime.strptime(d, '%Y-%m-%d'), reverse=True)[0]

ele_dir, _, ele_price_list = next(os.walk('data/car-data/electricity-prices/'))
ele_dates = []
for file in ele_price_list:
   ele_dates.append(file.strip('ele-prices-').strip('.csv'))
   
ele_date_latest = sorted(ele_dates, key = lambda d: datetime.strptime(d, '%Y-%m-%d'), reverse=True)[0]

Set the number of miles driven each year the EPA estimates that the average miles driven per person per year are 15000

miles_per_year = 15000 # EPA estimate
years_of_ownership = 11 # Number or years of ownership we want to calculate for

The electricity prices dataset has fewer states than the gas prices dataset, we need to find those extra states and remove them

gp = pd.read_csv(gas_dir+'gas-prices-'+gas_date_latest+'.csv', index_col='state')
ep = pd.read_csv(ele_dir+'ele-prices-'+ele_date_latest+'.csv', index_col='state')
b = list(gp.index)
a = list(ep.index)
for item in b:
   if item not in a:
       print(item)

District of Columbia

Puerto Rico

District of Columbia and Puerto Rico are not in the electricity dataset so removing from the gas_prices dataset

gp.drop(index=['District of Columbia', 'Puerto Rico'], inplace=True)
states = list(gp.index)

Looking for outliers

fig = px.scatter(df, x='make', y='msrp_average')
fig.update_layout(title_text='All cars Make vs MSRP', title_x=0.5)
fig.show()

gas_df = df[(df['atv_type'] =='Gas') | (df['atv_type'] =='FFV')].copy()
fig = px.scatter(gas_df, x='make', y='msrp_average')
fig.update_layout(title_text='ICE Cars Make vs MSRP', title_x=0.5)
fig.show()

ele_df = df[(df['atv_type'] =='EV')].copy()
fig = px.scatter(ele_df, x='make', y='msrp_average')
fig.update_layout(title_text='Electric Cars Make vs MSRP', title_x=0.5)
fig.show()

hyb_df = df[(df['atv_type'] =='Hybrid') | (df['atv_type'] =='Plug-in Hybrid')].copy()
fig = px.scatter(hyb_df, x='make', y='msrp_average')
fig.update_layout(title_text='Hybrid Cars Make vs MSRP', title_x=0.5)
fig.show()

Looking at the scatter plots of gas, electric, and hybrid vehicles, we can see that most vehicles are under $200K with some outliers above that. Additionally, a lot of the high-end automotive manufacturers only produce gas vehicles which can skew the results. So let's remove very high-end manufacturers like Ferrari, Lamborghini, Bentley, Rolls-Royce, Maybach, Aston Martin, and McLaren.

Additionally, I would think that if somebody is about to buy a Ferrari, he is not gonna ask what mileage it gets at the dealership

clean_df = df[(df['make'] != 'Ferrari') & (df['make'] != 'Bentley') & (df['make'] != 'Lamborghini') & (df['make'] != 'Rolls-Royce') & (df['make'] != 'Maybach') & 
             (df['make'] != 'Aston Martin') & (df['make'] != 'McLaren Automotive') & (df['make'] != 'Maserati') & (df['msrp_average'] < 200000)]

Final scatterplot for all vehicle types

fig = px.scatter(clean_df, x='make', y='msrp_average')
fig.update_layout(title_text='All cars Make vs MSRP', title_x=0.5)
fig.show()

Creating a new csv file

path1 = f'data/car-data/tableau/cars-{date.today()}.csv'
if os.path.isfile(path1):
   os.remove(path1)
   
with open(f'data/car-data/tableau/cars-{date.today()}.csv', 'w') as f1:
   f1writer = csv.writer(f1)
   f1writer.writerow(['year_ownership', 'year', 'make', 'model', 'config', 'atv_type', 'msrp', 'state', 'mpg/e', 'annual_fuel_cost', 'cumm_cost'])

Gas vehicle calculations

Assume we only buy the car on December 31st of the current year and don't spend anything on fuel the first year

def gas_diesel(df, gp, state, miles_per_year, years_of_ownership, f1writer):
   grade = df.fuel_type1.split()[0].lower()
   price = gp[grade][state]
   mpg = df.comb_mpg
   cost = df.msrp_average
   year = int(df.year)
   make = df.make
   model = df.model
   config = df.config
   if df.atv_type == 'FFV': # Considering FFV vehicles as gas vehicles since we are ignoring the alternative fuel
       atv_type = 'Gas'
   else:
       atv_type = df.atv_type
   msrp = df.msrp_average
   annual_fuel_costs = price/mpg*miles_per_year
   mpg_ret = str(mpg)+ ' mpg'
   for i in range(years_of_ownership):
       f1writer.writerow([i, year, make, model, config, atv_type, msrp, state, mpg_ret, annual_fuel_costs, msrp+annual_fuel_costs*i])

Adding the ICE Vehicle calculations to the csv file

gas_diesel_df = df[(df['atv_type'] == 'Gas') | (df['atv_type'] == 'Diesel') | (df['atv_type'] == 'FFV')]
with open(f'data/car-data/tableau/cars-{date.today()}.csv', 'a') as f1:
   f1writer = csv.writer(f1)
   for state in states:
       gas_diesel_df.apply(gas_diesel, axis=1, args=(gp, state, miles_per_year, years_of_ownership, f1writer))

Electric vehicle calculations

def electric(df, ep, req_rate, state, miles_per_year, years_of_ownership, f1writer):
   price = ep[req_rate][state] # cents per KWh
   price = price/100 # dollars per KWh
   mpe = df.combE # combined electricity consumption in KWh per 100 miles
   mpe = mpe/100 # KWh per mile
   cost = df.msrp_average
   year = int(df.year)
   make = df.make
   model = df.model
   config = df.config
   atv_type = df.atv_type
   msrp = df.msrp_average
   annual_fuel_costs = price*mpe*miles_per_year
   mpg_ret = str(mpe*100) + ' KWh/100miles'
   for i in range(years_of_ownership):
       f1writer.writerow([i, year, make, model, config, atv_type, msrp, state, mpg_ret, annual_fuel_costs, msrp+annual_fuel_costs*i])

Adding the Electric Vehicle calculations to the csv file

req_rate = 'current_rate' # Take the current rate
ele_df = df[(df['atv_type'] == 'EV')]
with open(f'data/car-data/tableau/cars-{date.today()}.csv', 'a') as f1:
   f1writer = csv.writer(f1)
   for state in states:
       ele_df.apply(electric, axis=1, args=(ep, req_rate, state, miles_per_year, years_of_ownership, f1writer))

Hybrid vehicle calculations

For plug-in hybrids, we can use the following methodology:

Using the range of the vehicle, and the kWh per mile, we calculate how much it costs to fully charge the vehicle with electricity through the grid
Next, we know how many miles are traveled per year (we have taken it to be 15000 as specified by the EPA), so we can calculate the number of miles driven per day by dividing by 365 (eg. 15000/365=41.096)
We can subtract the hybrid vehicle range from this number, the balance will be the amount driven by the gasoline engine. The fuel costs for this can then be calculated.

def hybrid(df, gp, ep, req_rate, state, miles_per_year, years_of_ownership, f1writer):
   price_ele = ep[req_rate][state] # cents per KWh
   price_ele = price_ele/100 # dollars per KWh
   grade = df.fuel_type1.split()[0].lower()
   price_gas = gp[grade][state]    
   cost = df.msrp_average
   year = int(df.year)
   make = df.make
   model = df.model
   config = df.config
   atv_type = df.atv_type
   msrp = df.msrp_average
   if df.atv_type == 'Plug-in Hybrid':
       miles_per_day = miles_per_year/365
       rangeA = float(df.rangeA)
       gas_engine_miles = miles_per_day - rangeA
       if gas_engine_miles < 0:
           gas_engine_miles = 0
           rangeA = miles_per_day # take miles traveled in a day instead of car range when calculating daily costs
           
       # cost to charge = range * combined electricity consumption / 100 = KWh for full charge
       mpe = df.combE # combined electricity consumption in KWh per 100 miles
       mpe = mpe/100 # KWh per mile
       full_charge_cost = rangeA*mpe*price_ele

       # cost of running the rest using gas
       mpg = df.comb_mpg
       gpm = 1/mpg
       gas_costs = gpm*gas_engine_miles*price_gas
       annual_fuel_costs = (full_charge_cost + gas_costs)*365 # price per day * 365 days in a year
       mpg_return = str(mpe*100) + ' KWh/100 miles + ' + str(mpg) + ' mpg'
       
   elif df.atv_type == 'Hybrid':
       mpg = df.comb_mpg
       annual_fuel_costs = price_gas/mpg*miles_per_year
       mpg_return = mpg
   else:
       print('error')
   
   for i in range(years_of_ownership):
       f1writer.writerow([i, year, make, model, config, atv_type, msrp, state, mpg_return, annual_fuel_costs, msrp+annual_fuel_costs*i])

Adding the Hybrid Vehicle calculations to the csv file

hbd_df = df[(df['atv_type'] == 'Plug-in Hybrid') | (df['atv_type'] == 'Hybrid')]
with open(f'data/car-data/tableau/cars-{date.today()}.csv', 'a') as f1:
   f1writer = csv.writer(f1)
   for state in states:
       hbd_df.apply(hybrid, axis=1, args=(gp, ep, req_rate, state, miles_per_year, years_of_ownership, f1writer))

Some interesting information:

df11 = pd.read_csv(f'data/car-data/tableau/cars-{date.today()}.csv')
df11.groupby('atv_type').median()

atv_type	year_ownership	year	msrp	annual_fuel_cost	cumm_cost
Diesel	5.0	2015.0	40806.25	2353.593750	49292.317335
EV	5.0	2019.0	44450.00	726.015000	48278.370223
Gas	5.0	2016.0	40000.00	2403.246772	52668.665374
Hybrid	5.0	2018.0	41820.00	1662.857581	49046.851910
Plug-in Hybrid	5.0	2019.0	51500.00	1493.719735	57608.160000

Results

Visualizations are done using tableau

1. Years of Ownership vs Fuel type

Plug-in Hybrids have the highest median MSRP followed by EVs, Hybrids, Diesel, and Gas vehicles. After 10 years of ownership, plug-in Hybrids still have the highest cumulative ownership costs followed very closely by Gas, then Diesel, Hybrids, and finally EVs. These results mainly stay the same for each state but you can use the drop-down filter on the bottom left of the visualization to check for different states.

2. Annual Fuel Costs by State

Use the filter on the bottom right to select the fuel type and view the median annual fuel costs for the fuel type by state. The states with the 5 highest and 5 lowest annual fuel costs are shown at the bottom.

3. The Difference in Annual Fuel Costs (Gas - Electricity)

This graph shows the difference between the median annual fuel costs for gas-powered vehicles (considering the fuel grade the vehicle uses), and the median annual fuel costs for electric vehicles. The higher the difference, the better the state to own the electric vehicle. A higher difference indicates that there is a higher cost associated with filling gas at the pump compared to charging at home. The current best state to own an electric car is Washington, due to the high gas prices and very low electricity rates. The current worst state to own an electric car is New Hampshire because of low gas prices and extremely high electricity rates.

Conclusion

This project shows the advantages of owning an electric car or other alternative power trains over the long run. While buying a new electric vehicle is generally more expensive, over the long run the amount we would save on fuel costs would eventually make it cheaper to own. With the current electricity rates and gas prices, electric cars would be cheaper to own in less than 3 years (see graph 1). These figures depend on the state in which you would operate the vehicle. Graph 3 shows that the best state to own an electric vehicle is Washington, where it would be cheaper compared to gas-powered options in under 2 years (with current rates), while it would take about 4 years in New Hampshire. Graph 1 also shows that there isn't much benefit to owning a plug-in hybrid vehicle, and a normal Hybrid which uses a gas engine to charge the battery might be more beneficial over the long run.

Graph 2 is helpful as it considers the efficiency of an electric vehicle, and uses MPG and Fuel grade for gasoline vehicles to determine the annual fuel cost. This makes it more representative of the fuel costs you would be expected to pay every year rather than just having a map of fuel prices and electricity prices in each state.

Limitations

Does not consider maintenance costs – Electric cars are cheaper to maintain as they have fewer moving parts, but they would require a battery replacement at around 70,000–150,000 miles at an average price of $5000 (source).
For plug-in hybrids, this project assumes that the car runs completely on electricity and later on gas when the battery is drained. In actuality, most plug-in hybrids run on 'charge depletion mode' where the battery and engine work in tandem to provide power.