To EV or Not to EV
Introduction
I usually do most of my random pondering about life either while driving or in the shower. On a particular day on the I-95, I was whining to myself about the amount I spend on gas on my beautiful old gas guzzler, a 2010 BMW 328xi. She gives me about 21 MPG on the highway and elegantly requests only the finest of fuels from the premium aisle. I love my car but like Tiger Woods off the golf course, I started wondering about life on the other side. Electric cars are easier on the wallet in the long run but by how much?
Methodology
1. Get the Miles Per Gallon data
The EPA website has a downloadable CSV file on almost all cars made since 1984. The dataset includes almost all the relevant information about MPG for ICE (Internal combustion engine) vehicles, kWh for electric vehicles, and a combination of the two for hybrid vehicles explained here. We also get information on the type of fuel a gas-powered or hybrid vehicle uses, i.e. whether the vehicle requires regular, mid-grade, or premium gasoline.
Click to view python code
import pandas as pd
import numpy as np
df = pd.read_csv('data/car-data/vehicles.csv', usecols=['year', 'make', 'model', 'barrels08', 'barrelsA08', 'city08', 'city08U', 'cityA08', 'cityA08U', 'cityCD', 'cityE', 'cityUF',
'co2TailpipeGpm', 'co2TailpipeAGpm', 'comb08', 'comb08U', 'combA08', 'combA08U', 'combE', 'combinedCD', 'combinedUF', 'cylinders',
'displ', 'drive', 'fuelCost08', 'fuelCostA08', 'fuelType', 'fuelType1', 'fuelType2', 'highway08', 'highway08U', 'highwayA08', 'highwayA08U', 'highwayCD',
'highwayE', 'highwayUF', 'phevBlended', 'rangeA', 'rangeCityA', 'rangeHwyA', 'trany', 'VClass', 'phevCity',
'phevHwy', 'phevComb', 'atvType'])
`
Data Legend:
- year - model year
- make - manufacturer (division)
- model - model name (carline)
- VClass - EPA vehicle size class
- atvType - type of alternative fuel or advanced technology vehicle
- cityCD - city gasoline consumption (gallons/100 miles) in charge depleting mode
- cityE - city electricity consumption in kw-hrs/100 miles
- comb08U - unrounded combined MPG for fuelType1
- combA08U - unrounded combined MPG for fuelType2
- combE - combined electricity consumption in kw-hrs/100 miles
- combinedCD - combined gasoline consumption (gallons/100 miles) in charge depleting mode
- highway08U - unrounded highway MPG for fuelType1
- highwayA08U - unrounded highway MPG for fuelType2
- highwayCD - highway gasoline consumption (gallons/100miles) in charge depleting mode
- highwayE - highway electricity consumption in kw-hrs/100 miles
- phevBlended - if true, this vehicle operates on a blend of gasoline and electricity in charge depleting mode
- rangeA - EPA range for fuelType2
- rangeCityA - EPA city range for fuelType2
- rangeHwyA - EPA highway range for fuelType2
- barrels08 - annual petroleum consumption in barrels for fuelType1
- barrelsA08 - annual petroleum consumption in barrels for fuelType2
- co2TailpipeAGpm - tailpipe CO2 in grams/mile for fuelType2
- co2TailpipeGpm- tailpipe CO2 in grams/mile for fuelType1
- fuelType - fuel type with fuelType1 and fuelType2 (if applicable)
- fuelType1 - fuel type 1. For single fuel vehicles, this will be the only fuel. For dual fuel vehicles, this will be the conventional fuel.
- fuelType2 - fuel type 2. For dual fuel vehicles, this will be the alternative fuel (e.g. E85, Electricity, CNG, LPG). For single fuel vehicles, this field is not used
Let's remove vehicles that run on CNG, Natural gas, or propane.
Why? Firstly because getting these rates is a little harder, and involves a few extra steps. We would probably have to scrape a few different websites to get this data. But the most important reason- most consumers are not going to be looking for these types of vehicles. Vehicles that run on CNG, Natural gas, or propane are mainly used in commercial applications or older vehicles
to_drop = df[(df['fuelType2'] == 'Natural Gas') | (df['fuelType2'] == 'Propane') | (df['fuelType1'] == 'Natural Gas')].index
df.drop(index = to_drop, axis=1, inplace=True)
df.reset_index(drop=True, inplace=True)
### Some vehicles do not have unrounded MPG values. For those vehicles, the unrounded MPG values are set to 0. This section will set those empty values to the rounded figures.
df['city08U'] = np.where(df['city08U'] == 0, df['city08'], df['city08U'])
df['cityA08U'] = np.where(df['cityA08U'] == 0, df['cityA08'], df['cityA08U'])
df['comb08U'] = np.where(df['comb08U'] == 0, df['comb08'], df['comb08U'])
df['combA08U'] = np.where(df['combA08U'] == 0, df['combA08'], df['combA08U'])
df['highway08U'] = np.where(df['highway08U'] == 0, df['highway08'], df['highway08U'])
df['highwayA08U'] = np.where(df['highwayA08U'] == 0, df['highwayA08'], df['highwayA08U'])
`
Saving results to csv
df.to_csv('data/car-data/vehicles-cleaned.csv')
2. Get the MSRP data
Unfortunately, the above dataset does not have information about the price of each vehicle. To get this data, we have to scrape the fueleconomy.gov website. I used the selenium python package for this, as I find that the commonly used requests python package has a lot of issues when it comes to dynamic websites. The selenium package also launches a browser window as it scrapes through webpages, and that helps a lot with debugging.
Click to view python code
import os
import csv
import pandas as pd
import numpy as np
from datetime import date
from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.implicitly_wait(10)
category = []
car_name = []
config = []
per100mls = []
driver_mpg = []
annual_fuel_cost = []
msrp = []
energy_impact_score = []
greenhouse_emissions = []
combined_mpgf1 = []
mpg_unitsf1 = []
city_mpgf1 = [] # class is same, first city then highway
hwy_mpgf1 = []
elegasmiles = [] # gas + electric miles. this contains all ele miles as well
totalrange = [] # total car range for hybrids
phevgascomb = [] # combined mpg (city + hwy) for hybrids
phevgasper100 = [] # combined gal/100 miles (city + hwy) for hybrids
fuel_type = []
Code for scraping the website
def get_mpg_data(cat):
category = []
car_name = []
config = []
fuel_type = []
per100mls = []
driver_mpg = []
annual_fuel_cost = []
msrp = []
energy_impact_score = []
greenhouse_emissions = []
combined_mpgf1 = []
mpg_unitsf1 = []
city_mpgf1 = []
hwy_mpgf1 = []
elegasmiles = [] # gas + electric miles. this contains all ele miles as well
totalrange = [] # total car range for hybrids
phevgascomb = [] # combined mpg (city + hwy) for hybrids
phevgasper100 = [] # combined gal/100 miles (city + hwy) for hybrids
page = driver.find_element(By.CSS_SELECTOR, 'table.cars.display.responsive.stickyHeader')
soup = BeautifulSoup(page.get_attribute('innerHTML'), 'lxml')
all_cards = soup.find_all('tr', class_='ymm-row')
count = 0
for card in all_cards:
count += 1
if count == 199:
count = 0
category.append(cat) # append car category
car_name.append(card.find('a').text) # append car name
conf = card.find('span', class_='config').text # get car configuration
conf = conf.split(',')
ft = conf.pop(-1) # last part of conf has fuel type
fuel_type.append(ft) # Todo: add this to output
s = '-'.join(conf) # creating string from list
config.append(s)
card = card.next_sibling.next_sibling
mpgepa = card.find('td', class_='mpg-epa')
if 'Not Available' in mpgepa.text or 'Currently unavailable' in mpgepa.text:
print(f'skipping {car_name[-1]} because it does not have MPG information')
category = category[:-1]
car_name = car_name[:-1]
config = config[:-1]
fuel_type = fuel_type[:-1]
continue
if 'and Electricity' in ft:
r1 = card.findNext('td', class_='range phev')
elegasmiles.append(r1.find('div', class_='cdRange').text)
totalrange.append(r1.find('div', class_='phevRange').text)
fuel2 = mpgepa.find('div', class_='fuel2')
phevgascomb.append(fuel2.find('td', class_='mpg-comb').text)
phevgasper100.append(fuel2.find('td', class_='cons').text)
elif 'or Electricity' in ft:
r1 = card.findNext('td', class_='range phev')
elegasmiles.append(r1.find('div', class_='cdRange').text)
totalrange.append(r1.find('div', class_='phevRange').text)
fuel2 = mpgepa.find('div', class_='fuel2')
phevgascomb.append(fuel2.find('td', class_='mpg-comb').text)
phevgasper100.append(fuel2.find('td', class_='cons').text)
else:
elegasmiles.append(np.nan)
totalrange.append(np.nan)
phevgascomb.append(np.nan)
phevgasper100.append(np.nan)
combined_mpgf1.append(mpgepa.find('td', class_='mpg-comb').text)
mpg_unitsf1.append(mpgepa.find('td', class_='unitsLabel').text)
ctyhwyf1 = mpgepa.find_all('td', class_='ctyhwy')
city_mpgf1.append(ctyhwyf1[0].text) # class is same, first city then highway
hwy_mpgf1.append(ctyhwyf1[1].text)
per100mls.append(mpgepa.find('td', class_='cons').text)
driver_mpg.append(card.find('td', class_='mpg-user').text.replace('\t', '').replace('\n', ''))
annual_fuel_cost.append(card.find('td', class_='ann-fuel-cost').text.replace('\t', '').replace('\n',
''))
msrp.append(card.find('td', class_='msrp').text.replace('\t', '').replace('\n', ''))
energy_impact_score.append(card.find('td', class_='energy').text.replace('\n', ''))
greenhouse_emissions.append(card.find('td', class_='ghg').text.replace('\n', ''))
rules = [len(category) == len(car_name), # checks if all lengths are equal
len(car_name) == len(config),
len(config) == len(per100mls),
len(per100mls) == len(driver_mpg),
len(driver_mpg) == len(annual_fuel_cost),
len(annual_fuel_cost) == len(msrp),
len(msrp) == len(energy_impact_score),
len(energy_impact_score) == len(greenhouse_emissions),
len(greenhouse_emissions) == len(combined_mpgf1),
len(combined_mpgf1) == len(mpg_unitsf1),
len(mpg_unitsf1) == len(city_mpgf1),
len(city_mpgf1) == len(hwy_mpgf1),
len(hwy_mpgf1) == len(elegasmiles),
len(elegasmiles) == len(totalrange),
len(totalrange) == len(phevgascomb),
len(phevgascomb) == len(phevgasper100),
len(phevgasper100) == len(fuel_type)
]
if not all(rules):
print('Error, data lengths not equal')
return category, car_name, config, per100mls, driver_mpg, annual_fuel_cost, msrp, energy_impact_score, \
greenhouse_emissions, combined_mpgf1, mpg_unitsf1, city_mpgf1, hwy_mpgf1, elegasmiles, totalrange, \
phevgascomb, phevgasper100, fuel_type
Running the scraper for different categories of vehicles
site_dict = {
# 'test': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcfamilySedans=Family+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Family+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200'
'Small Cars': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcsmallCars=Small+Cars&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Small+Cars&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Family Sedans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcfamilySedans=Family+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Family+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Upscale Sedans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcupscaleSedans=Upscale+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Upscale+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Luxury Sedans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcluxurySedans=Luxury+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Luxury+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Large Sedans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmclargeSedans=Large+Sedans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Large+Sedans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Hatchbacks': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmchatchbacks=Hatchbacks&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Hatchbacks&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Coupes': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcCoupes=Coupes&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Coupes&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Convertibles': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcconvertibles=Convertibles&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Convertibles&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Sports/Sporty Cars': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcSports_Sporty_Cars=Sports%2FSporty+Cars&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Sports%2FSporty+Cars&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Station Wagons': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcstationWagons=Station+Wagons&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Station+Wagons&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Pickup Trucks': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcpickupTrucks=Pickup+Trucks&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Pickup+Trucks&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'SUVs': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcsportUtilityVehicles=Sport+Utility+Vehicles&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Sport+Utility+Vehicles&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200',
'Minivans': 'https://www.fueleconomy.gov/feg/PowerSearch.do?action=PowerSearch&year1=2010&year2=2023&cbmcminivans=Minivans&minmsrpsel=0&maxmsrpsel=0&city=0&highway=0&combined=0&cbftreggasoline=Regular+Gasoline&cbftmidgasoline=Midgrade+Gasoline&cbftprmgasoline=Premium+Gasoline&cbftdiesel=Diesel&cbftelectricity=Electricity&YearSel=2010-2023&MakeSel=&MarClassSel=Minivans&FuelTypeSel=Regular+Gasoline%2C+Midgrade+Gasoline%2C+Premium+Gasoline%2C+Diesel%2C+Electricity&VehTypeSel=&TranySel=&DriveTypeSel=&CylindersSel=&MpgSel=000&sortBy=Comb&Units=&url=SearchServlet&opt=new&minmsrp=0&maxmsrp=0&minmpg=0&maxmpg=0&sCharge=&tCharge=&startstop=&cylDeact=&rowLimit=200'
}
for cat, link in site_dict.items(): # Loops through the different category pages
print(cat)
driver.get(link)
sleep(5) # Have to add this otherwise selenium breaks once in a while
i = 0
while True:
i += 1
print(f'Scraping page {i} from {cat}')
mpg_data = get_mpg_data(cat) # This function scrapes the data
category += mpg_data[0]
car_name += mpg_data[1]
config += mpg_data[2]
per100mls += mpg_data[3]
driver_mpg += mpg_data[4]
annual_fuel_cost += mpg_data[5]
msrp += mpg_data[6]
energy_impact_score += mpg_data[7]
greenhouse_emissions += mpg_data[8]
combined_mpgf1 += mpg_data[9]
mpg_unitsf1 += mpg_data[10]
city_mpgf1 += mpg_data[11]
hwy_mpgf1 += mpg_data[12]
elegasmiles += mpg_data[13]
# allelemiles += mpg_data[14]
totalrange += mpg_data[14]
phevgascomb += mpg_data[15]
phevgasper100 += mpg_data[16]
fuel_type += mpg_data[17]
try:
driver.find_element(By.CSS_SELECTOR, 'a[aria-label="Next"]').click()
except ElementClickInterceptedException: # If selenium does not find the next button, signaling that there is no next page, the loop breaks and
break
Saving the data
headers = ['category', 'car_name', 'config', 'fuel_type', 'per100mls', 'driver_mpg', 'annual_fuel_cost', 'msrp',
'energy_impact_score', 'greenhouse_emissions', 'combined_mpgf1', 'mpg_unitsf1', 'city_mpgf1', 'hwy_mpgf1',
'elegasmiles', 'totalrange', 'phevgascomb', 'phevgasper100']
with open(f'data/car-data/{str(date.today())}.csv', 'w') as file:
writer = csv.writer(file)
writer.writerow(headers)
for i in range(len(car_name)):
writer.writerow([category[i], car_name[i], config[i], fuel_type[i], per100mls[i],
driver_mpg[i], annual_fuel_cost[i], msrp[i],
energy_impact_score[i],
greenhouse_emissions[i],
combined_mpgf1[i], mpg_unitsf1[i], city_mpgf1[i], hwy_mpgf1[i], elegasmiles[i], totalrange[i],
phevgascomb[i], phevgasper100[i]])
3. Combine the two datasets
Now that we have information on the MPG of each vehicle and the price of the respective vehicle, we can combine this into one dataset. We then need to standardize the data from both datasets to make the text from both sources comparable.
Next, we can match different attributes of each vehicle like make, model, year, config, etc., and save the resulting data in a csv
After matching, data on about 12,340 vehicles manufactured from 2010 on-wards were gathered. Out of this:
- 10,843 vehicles are gas powered
- 98 vehicles are powered by Diesel
- 347 are EVs
- 787 are Hybrids, and 265 are Plug-in Hybrids
Click to view python code
import os
import re
import csv
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)
Prepare the MSRP dataset
msrpdf = pd.read_csv('data/car-data/2022-11-09.csv')
# Removing formating, '$' signs and ','
msrpdf['msrp'] = msrpdf['msrp'].apply(lambda x: x.replace('\xa0', ' ').replace('$', '').replace(',', '').strip())
msrpdf = msrpdf[msrpdf['msrp'] != ''].reset_index(drop=True)
msrpdf['fuel_type'] = msrpdf['fuel_type'].apply(lambda x: x.strip())
`
def parse_msrp(df):
'''
Function to convert msrp from strings to int/float, also computes average. If a range of values are given, average is calculated
else, single msrp is set into the average column
'''
text = df['msrp']
regex = re.compile(r'\d+')
matches = regex.findall(text)
if len(matches) == 2:
low_end = int(matches[0])
high_end = int(matches[1])
average = (high_end + low_end)/2
elif len(matches) == 1:
low_end = np.nan
high_end = np.nan
average = int(matches[0])
else:
raise Exception('Error in parsing MSRP')
return low_end, high_end, average
def parse_model(df):
text = df['car_name']
split_text = text.split()
txt_modifier = 2
year = split_text[0]
make = split_text[1]
if make in ['Land', 'Aston', 'Alfa', 'Roush', 'McLaren', 'CODA']:
make += ' ' + split_text[2]
txt_modifier = 3
model = ''
for i in range(len(split_text) - txt_modifier):
model += split_text[i+txt_modifier] + ' '
model = model.strip()
return year, make, model
def parse_trany(df):
text = df['config']
regex = re.compile(r'(Automatic[a-zA-Z0-9\(\- ]*\)|Automatic[a-zA-Z0-9\(\- ]*-spd|Manual[a-zA-Z0-9\(\- ]*-spd)')
matches = regex.findall(text)
if len(matches) == 1:
trany = matches[0]
elif len(matches) == 0:
print(f'Error - could not find trany: {text}')
trany = np.nan
elif len(mathces) > 1:
raise Exception('Error in parsing displacement - too many options')
regex = re.compile(r'\d?\d.\d L')
matches = regex.findall(text)
if len(matches) == 1:
displacement = matches[0].replace(' L', '')
elif len(matches) == 0:
displacement = np.nan
elif len(mathces) > 1:
raise Exception('Error in parsing displacement - too many options')
regex = re.compile(r'\d?\d cyl')
matches = regex.findall(text)
if len(matches) == 1:
cylinders = matches[0].replace(' cyl', '')
elif len(matches) == 0:
cylinders = np.nan
elif len(mathces) > 1:
raise Exception('Error in parsing displacement - too many options')
return trany, displacement, cylinders
msrpdf[['msrp_low', 'msrp_high', 'msrp_average']] = msrpdf.apply(parse_msrp, axis=1, result_type ='expand')
msrpdf[['year', 'make', 'model']] = msrpdf.apply(parse_model, axis=1, result_type ='expand')
msrpdf[['trany', 'displacement', 'cylinders']] = msrpdf.apply(parse_trany, axis=1, result_type ='expand')
# Probably could have done the below line at the import, but I wasn't sure which columns I would require until later
msrpdf = msrpdf[['category', 'year', 'make', 'model', 'config', 'trany', 'displacement', 'cylinders', 'msrp_low', 'msrp_high', 'msrp_average', 'fuel_type']].copy()
msrpdf['cylinders'] = msrpdf['cylinders'].astype('string')
msrpdf['year'] = msrpdf['year'].astype('int')
msrpdf['model'] = msrpdf['model'].apply(lambda x: x.replace(' ', ' ').replace(' inch', 'in').replace('inch', 'in').replace('-inch', 'in').replace('21-in',
'21in').replace('22-in', '22in').replace('wheels', 'Wheels'))
Preparing the MPG data
mpgdf = pd.read_csv('data/car-data/vehicles-cleaned.csv', index_col=0)
mpgdf = mpgdf[['year', 'make', 'model', 'highway08U', 'highwayCD', 'highwayE', 'city08U', 'cityCD', 'cityE',
'comb08U', 'combinedCD', 'combE', 'trany', 'displ', 'cylinders', 'atvType', 'fuelType1', 'fuelType2', 'rangeA']].copy()
mpgdf['displ'] = mpgdf['displ'].astype('string') # converting to string since we will need to compare it later, comparing floats may create errors
mpgdf['cylinders'] = mpgdf['cylinders'].astype('string')
mpgdf['model'] = mpgdf['model'].apply(lambda x: x.replace(' ', ' ').replace('-inch', 'in').replace('wheels', 'Wheels').replace('MUSTANG MACH', 'Mustang Mach'))
Matching the common attributes between the two dataframes
def compare_datasets(df1, df2): # Could have made this code more readable, but I spent waaaaay too much time on it
'''
first compare datasets for the same year, make, and model
if multiple results are found, compare datasets for the same transmission
if multiple results are found, compare datasets for the same engine displacement and number of cylinders
if multiple reults are found, compare datasets for the same fuel type
'''
year = df1.year
make = df1.make
model = df1.model
trany = df1.trany
displacement = df1.displacement
cylinders = df1.cylinders
fuel_type = df1.fuel_type
matches = df2[(df2['year'] == year) & (df2['make'] == make) & (df2['model'] == model)]
if len(matches) == 1:
category = df1.category
config = df1.config
msrp_low = df1.msrp_low
msrp_high = df1.msrp_high
msrp_average = df1.msrp_average
highway_mpg = matches.highway08U.values[0]
highwayCD = matches.highwayCD.values[0]
highwayE = matches.highwayE.values[0]
city_mpg = matches.city08U.values[0]
cityCD = matches.cityCD.values[0]
cityE = matches.cityE.values[0]
comb_mpg = matches.comb08U.values[0]
combCD = matches.combinedCD.values[0]
combE = matches.combE.values[0]
atv_type = matches.atvType.values[0]
fuel_type1 = matches.fuelType1.values[0]
fuel_type2 = matches.fuelType2.values[0]
rangeA = matches.rangeA.values[0]
return category, year, make, model, config, msrp_low, msrp_high, msrp_average, highway_mpg, highwayCD, \
highwayE, city_mpg, cityCD, cityE, comb_mpg, combCD, combE, \
trany, displacement, cylinders, atv_type, fuel_type1, fuel_type2, rangeA
elif len(matches) > 1:
matches2 = matches[matches['trany'] == trany]
if len(matches2) == 1:
category = df1.category
config = df1.config
msrp_low = df1.msrp_low
msrp_high = df1.msrp_high
msrp_average = df1.msrp_average
highway_mpg = matches2.highway08U.values[0]
highwayCD = matches2.highwayCD.values[0]
highwayE = matches2.highwayE.values[0]
city_mpg = matches2.city08U.values[0]
cityCD = matches2.cityCD.values[0]
cityE = matches2.cityE.values[0]
comb_mpg = matches2.comb08U.values[0]
combCD = matches2.combinedCD.values[0]
combE = matches2.combE.values[0]
atv_type = matches2.atvType.values[0]
fuel_type1 = matches2.fuelType1.values[0]
fuel_type2 = matches2.fuelType2.values[0]
rangeA = matches2.rangeA.values[0]
return category, year, make, model, config, msrp_low, msrp_high, msrp_average, highway_mpg, highwayCD, \
highwayE, city_mpg, cityCD, cityE, comb_mpg, combCD, combE, \
trany, displacement, cylinders, atv_type, fuel_type1, fuel_type2, rangeA
elif len(matches2) > 1:
matches3 = matches2[(matches2['displ'] == displacement) & (matches2['cylinders'] == cylinders)]
if len(matches3) == 1:
category = df1.category
config = df1.config
msrp_low = df1.msrp_low
msrp_high = df1.msrp_high
msrp_average = df1.msrp_average
highway_mpg = matches3.highway08U.values[0]
highwayCD = matches3.highwayCD.values[0]
highwayE = matches3.highwayE.values[0]
city_mpg = matches3.city08U.values[0]
cityCD = matches3.cityCD.values[0]
cityE = matches3.cityE.values[0]
comb_mpg = matches3.comb08U.values[0]
combCD = matches3.combinedCD.values[0]
combE = matches3.combE.values[0]
atv_type = matches3.atvType.values[0]
fuel_type1 = matches3.fuelType1.values[0]
fuel_type2 = matches3.fuelType2.values[0]
rangeA = matches3.rangeA.values[0]
return category, year, make, model, config, msrp_low, msrp_high, msrp_average, highway_mpg, highwayCD, \
highwayE, city_mpg, cityCD, cityE, comb_mpg, combCD, combE, \
trany, displacement, cylinders, atv_type, fuel_type1, fuel_type2, rangeA
elif len(matches3) > 1:
matches4 = matches3[matches3['fuelType1'] == fuel_type]
if len(matches4) == 1:
category = df1.category
config = df1.config
msrp_low = df1.msrp_low
msrp_high = df1.msrp_high
msrp_average = df1.msrp_average
highway_mpg = matches4.highway08U.values[0]
highwayCD = matches4.highwayCD.values[0]
highwayE = matches4.highwayE.values[0]
city_mpg = matches4.city08U.values[0]
cityCD = matches4.cityCD.values[0]
cityE = matches4.cityE.values[0]
comb_mpg = matches4.comb08U.values[0]
combCD = matches4.combinedCD.values[0]
combE = matches4.combE.values[0]
atv_type = matches4.atvType.values[0]
fuel_type1 = matches4.fuelType1.values[0]
fuel_type2 = matches4.fuelType2.values[0]
rangeA = matches4.rangeA.values[0]
return category, year, make, model, config, msrp_low, msrp_high, msrp_average, highway_mpg, highwayCD, \
highwayE, city_mpg, cityCD, cityE, comb_mpg, combCD, combE, \
trany, displacement, cylinders, atv_type, fuel_type1, fuel_type2, rangeA
elif len(matches4) > 1:
with open('data/car-data/msc/multiple-matches.csv', 'a') as f1:
f1writer = csv.writer(f1)
f1writer.writerow([df1.name, df1.category, year, make, model])
elif len(matches4) == 0:
with open('data/car-data/msc/could-not-match.csv', 'a') as f3:
f3writer = csv.writer(f3)
f3writer.writerow([df1.name, df1.category, year, make, model])
elif len(matches3) == 0:
with open('data/car-data/msc/could-not-match.csv', 'a') as f3:
f3writer = csv.writer(f3)
f3writer.writerow([df1.name, df1.category, year, make, model])
elif len(matches2) == 0:
with open('data/car-data/msc/could-not-match.csv', 'a') as f3:
f3writer = csv.writer(f3)
f3writer.writerow([df1.name, df1.category, year, make, model])
elif len(matches) == 0:
with open('data/car-data/msc/no-matches.csv', 'a') as f2:
f2writer = csv.writer(f2)
f2writer.writerow([df1.name, df1.category, year, make, model])
Running the matching algorithm and saving the vehicles that were not able to be matched
path1 = 'data/car-data/msc/multiple-matches.csv'
if os.path.isfile(path1):
os.remove(path1)
path2 = 'data/car-data/msc/no-matches.csv'
if os.path.isfile(path2):
os.remove(path2)
with open('data/car-data/msc/multiple-matches.csv', 'w') as f1:
f1writer = csv.writer(f1)
f1writer.writerow(['index_in_msepdf', 'category', 'year', 'make', 'model'])
with open('data/car-data/msc/no-matches.csv', 'w') as f2:
f2writer = csv.writer(f2)
f2writer.writerow(['index_in_msepdf', 'category', 'year', 'make', 'model'])
combdf = pd.DataFrame()
combdf = msrpdf.apply(compare_datasets, axis=1, args=(mpgdf,), result_type='expand')
combdf.columns = ['category', 'year', 'make', 'model', 'config', 'msrp_low', 'msrp_high', 'msrp_average', 'highway_mpg', 'highwayCD',
'highwayE', 'city_mpg', 'cityCD', 'cityE', 'comb_mpg', 'combCD', 'combE', 'trany', 'displacement', 'cylinders', 'atv_type', 'fuel_type1', 'fuel_type2', 'rangeA']
combdf.dropna(how='all', inplace=True)
combdf = combdf.reset_index(drop=True)
combdf['atv_type'] = combdf.atv_type.fillna('Gas')
Saving the final dataset
combdf.to_csv('data/car-data/combined-mpg-msrp.csv')
4. Getting the current gas prices
The AAA website has gas prices for each state and every fuel grade. The prices are updated daily as well.
Click to view python code
from selenium import webdriver
from selenium.webdriver.common.by import By
from datetime import date
import pandas as pd
driver = webdriver.Firefox()
driver.implicitly_wait(10)
state_list = []
regular_current = []
midgrade_current = []
premium_current = []
diesel_current = []
states_dict = {
'AK': 'Alaska',
'AL': 'Alabama',
'AR': 'Arkansas',
'AZ': 'Arizona',
'CA': 'California',
'CO': 'Colorado',
'CT': 'Connecticut',
'DC': 'District of Columbia',
'DE': 'Delaware',
'FL': 'Florida',
'GA': 'Georgia',
'HI': 'Hawaii',
'IA': 'Iowa',
'ID': 'Idaho',
'IL': 'Illinois',
'IN': 'Indiana',
'KS': 'Kansas',
'KY': 'Kentucky',
'LA': 'Louisiana',
'MA': 'Massachusetts',
'MD': 'Maryland',
'ME': 'Maine',
'MI': 'Michigan',
'MN': 'Minnesota',
'MO': 'Missouri',
'MS': 'Mississippi',
'MT': 'Montana',
'NC': 'North Carolina',
'ND': 'North Dakota',
'NE': 'Nebraska',
'NH': 'New Hampshire',
'NJ': 'New Jersey',
'NM': 'New Mexico',
'NV': 'Nevada',
'NY': 'New York',
'OH': 'Ohio',
'OK': 'Oklahoma',
'OR': 'Oregon',
'PA': 'Pennsylvania',
'PR': 'Puerto Rico',
'RI': 'Rhode Island',
'SC': 'South Carolina',
'SD': 'South Dakota',
'TN': 'Tennessee',
'TX': 'Texas',
'UT': 'Utah',
'VA': 'Virginia',
'VT': 'Vermont',
'WA': 'Washington',
'WI': 'Wisconsin',
'WV': 'West Virginia',
'WY': 'Wyoming'
}
Scraping the AAA website
link = 'https://gasprices.aaa.com/'
driver.get(link)
table = driver.find_element(By.CSS_SELECTOR, 'table.table-mob')
state_list.append('National')
prices = table.find_element(By.CSS_SELECTOR, 'tbody').find_elements(By.CSS_SELECTOR, 'td')
regular_current.append(prices[1].text.replace('$', ''))
midgrade_current.append(prices[2].text.replace('$', ''))
premium_current.append(prices[3].text.replace('$', ''))
diesel_current.append(prices[4].text.replace('$', ''))
for abr, state in states_dict.items():
site = link + f'?state={abr}'
driver.get(site)
state_list.append(state)
get_table = driver.find_element(By.CSS_SELECTOR, 'table.table-mob')
tbtable = get_table.find_element(By.CSS_SELECTOR, 'tbody')
tables = tbtable.find_elements(By.CSS_SELECTOR, 'tr')
prices = tables[0].find_elements(By.CSS_SELECTOR, 'td')
regular_current.append(prices[1].text.replace('$', ''))
midgrade_current.append(prices[2].text.replace('$', ''))
premium_current.append(prices[3].text.replace('$', ''))
diesel_current.append(prices[4].text.replace('$', ''))
Create a dataframe and save csv
df = pd.DataFrame({
'state': state_list,
'regular': regular_current,
'midgrade': midgrade_current,
'premium': premium_current,
'diesel': diesel_current,
})
df.to_csv(f'data/car-data/gas-prices/gas-prices-{date.today()}.csv', index=False)
driver.close()
5. Getting the current electricity rates
This website has electricity prices for every state, so it shall be scraped!
Click to view python code
from selenium import webdriver
from selenium.webdriver.common.by import By
from datetime import date
import pandas as pd
driver = webdriver.Firefox()
driver.implicitly_wait(10)
state = []
current_rate = []
last_year_rate = []
Scraping the website
link = 'https://www.chooseenergy.com/electricity-rates-by-state/'
driver.get(link)
table = driver.find_element(By.CSS_SELECTOR, 'table') # gets the first table, the one we need
rows = table.find_elements(By.CSS_SELECTOR, 'tr')
rows = rows[1:]
for row in rows:
column = row.find_elements(By.CSS_SELECTOR, 'td')
state.append(column[0].text.replace(' ¢/kWh', '').replace('¢/kWh', ''))
current_rate.append(column[1].text.replace(' ¢/kWh', '').replace('¢/kWh', ''))
last_year_rate.append(column[2].text.replace(' ¢/kWh', '').replace('¢/kWh', ''))
for i, item in enumerate(state):
if item == 'U.S.':
state[i] = 'National'
Creating a dataframe and saving it to a csv file
df = pd.DataFrame({
'state': state,
'current_rate': current_rate,
'last_year_rate': last_year_rate
})
df.to_csv(f'data/car-data/electricity-prices/ele-prices-{date.today()}.csv', index=False)
driver.close()
6. Process all the data collected
Now that we have all the data we need for our analysis, we can process it and combine everything into one dataset. The price per mile for each vehicle in the dataset is calculated from the prices of the fuel source in each state.
For calculating the annual fuel costs, the following calculations were used:
For Gas and Diesel Vehicles:
From the gas prices dataset, we have the price of the grade of gasoline/diesel (regular, mid-grade, or premium) in dollars per gallon. From the combined vehicle dataset from step 3, we have the MPG of the vehicle and the grade of gasoline/diesel it uses. We can calculate the price per mile for each car in each state in the US using:
We can then multiply this by the number of miles driven in a year to get the annual fuel cost for gas and diesel vehicles.
For Electric Vehicles:
We have the price of electricity in each state in cents per kWh(s), and from the combined vehicle dataset, we know the energy used (kWh) for electric cars per 100 miles (160.93 km). We can then use this to calculate the price per mile for EVs in each state:
We can then multiply the price per mile by miles driven per year to get the annual fuel cost for the vehicle.
For Hybrid vehicles:
A hybrid vehicle can be charged by electricity (plug-in Hybrid) or be charged by the ICE engine (called Hybrid in this project). The calculations change depending on whether the vehicle can be charged by the grid or not:
Plug-in Hybrids:
The first step is to divide the number of miles driven in a year by the number of days in a year (365). This gives the number of miles driven per day. The dataset contains a range column, which gives us the number of miles a plug-in hybrid vehicle can be driven off electricity alone. If the range available is less than the number of miles driven per day, the remaining miles use gasoline. To calculate the annual fuel costs:
Electricity costs:
Gas costs:
For Hybrids:
Hybrid vehicle annual fuel costs are calculated like a gas/diesel vehicle
The price per mile is then multiplied by the number of miles driven in a year to get the annual fuel cost.
The final saved dataset is created in a form suitable for analysis in tableau.
Click to view python code
import os
import csv
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from datetime import datetime, date
pd.set_option('display.max_columns', 100)
Get the latest gas prices from datasets
df = pd.read_csv('data/car-data/combined-mpg-msrp.csv', index_col=0)
gas_dir, _, gas_price_list = next(os.walk('data/car-data/gas-prices/'))
gas_dates = []
for file in gas_price_list:
gas_dates.append(file.strip('gas-prices-').strip('.csv'))
gas_date_latest = sorted(gas_dates, key = lambda d: datetime.strptime(d, '%Y-%m-%d'), reverse=True)[0]
ele_dir, _, ele_price_list = next(os.walk('data/car-data/electricity-prices/'))
ele_dates = []
for file in ele_price_list:
ele_dates.append(file.strip('ele-prices-').strip('.csv'))
ele_date_latest = sorted(ele_dates, key = lambda d: datetime.strptime(d, '%Y-%m-%d'), reverse=True)[0]
Set the number of miles driven each year the EPA estimates that the average miles driven per person per year are 15000
miles_per_year = 15000 # EPA estimate
years_of_ownership = 11 # Number or years of ownership we want to calculate for
The electricity prices dataset has fewer states than the gas prices dataset, we need to find those extra states and remove them
gp = pd.read_csv(gas_dir+'gas-prices-'+gas_date_latest+'.csv', index_col='state')
ep = pd.read_csv(ele_dir+'ele-prices-'+ele_date_latest+'.csv', index_col='state')
b = list(gp.index)
a = list(ep.index)
for item in b:
if item not in a:
print(item)
District of Columbia
Puerto Rico
District of Columbia and Puerto Rico are not in the electricity dataset so removing from the gas_prices dataset
gp.drop(index=['District of Columbia', 'Puerto Rico'], inplace=True)
states = list(gp.index)
Looking for outliers
fig = px.scatter(df, x='make', y='msrp_average')
fig.update_layout(title_text='All cars Make vs MSRP', title_x=0.5)
fig.show()
gas_df = df[(df['atv_type'] =='Gas') | (df['atv_type'] =='FFV')].copy()
fig = px.scatter(gas_df, x='make', y='msrp_average')
fig.update_layout(title_text='ICE Cars Make vs MSRP', title_x=0.5)
fig.show()
ele_df = df[(df['atv_type'] =='EV')].copy()
fig = px.scatter(ele_df, x='make', y='msrp_average')
fig.update_layout(title_text='Electric Cars Make vs MSRP', title_x=0.5)
fig.show()
hyb_df = df[(df['atv_type'] =='Hybrid') | (df['atv_type'] =='Plug-in Hybrid')].copy()
fig = px.scatter(hyb_df, x='make', y='msrp_average')
fig.update_layout(title_text='Hybrid Cars Make vs MSRP', title_x=0.5)
fig.show()
Looking at the scatter plots of gas, electric, and hybrid vehicles, we can see that most vehicles are under $200K with some outliers above that. Additionally, a lot of the high-end automotive manufacturers only produce gas vehicles which can skew the results. So let's remove very high-end manufacturers like Ferrari, Lamborghini, Bentley, Rolls-Royce, Maybach, Aston Martin, and McLaren.
Additionally, I would think that if somebody is about to buy a Ferrari, he is not gonna ask what mileage it gets at the dealership
clean_df = df[(df['make'] != 'Ferrari') & (df['make'] != 'Bentley') & (df['make'] != 'Lamborghini') & (df['make'] != 'Rolls-Royce') & (df['make'] != 'Maybach') &
(df['make'] != 'Aston Martin') & (df['make'] != 'McLaren Automotive') & (df['make'] != 'Maserati') & (df['msrp_average'] < 200000)]
Final scatterplot for all vehicle types
fig = px.scatter(clean_df, x='make', y='msrp_average')
fig.update_layout(title_text='All cars Make vs MSRP', title_x=0.5)
fig.show()
Creating a new csv file
path1 = f'data/car-data/tableau/cars-{date.today()}.csv'
if os.path.isfile(path1):
os.remove(path1)
with open(f'data/car-data/tableau/cars-{date.today()}.csv', 'w') as f1:
f1writer = csv.writer(f1)
f1writer.writerow(['year_ownership', 'year', 'make', 'model', 'config', 'atv_type', 'msrp', 'state', 'mpg/e', 'annual_fuel_cost', 'cumm_cost'])
Gas vehicle calculations
Assume we only buy the car on December 31st of the current year and don't spend anything on fuel the first year
def gas_diesel(df, gp, state, miles_per_year, years_of_ownership, f1writer):
grade = df.fuel_type1.split()[0].lower()
price = gp[grade][state]
mpg = df.comb_mpg
cost = df.msrp_average
year = int(df.year)
make = df.make
model = df.model
config = df.config
if df.atv_type == 'FFV': # Considering FFV vehicles as gas vehicles since we are ignoring the alternative fuel
atv_type = 'Gas'
else:
atv_type = df.atv_type
msrp = df.msrp_average
annual_fuel_costs = price/mpg*miles_per_year
mpg_ret = str(mpg)+ ' mpg'
for i in range(years_of_ownership):
f1writer.writerow([i, year, make, model, config, atv_type, msrp, state, mpg_ret, annual_fuel_costs, msrp+annual_fuel_costs*i])
Adding the ICE Vehicle calculations to the csv file
gas_diesel_df = df[(df['atv_type'] == 'Gas') | (df['atv_type'] == 'Diesel') | (df['atv_type'] == 'FFV')]
with open(f'data/car-data/tableau/cars-{date.today()}.csv', 'a') as f1:
f1writer = csv.writer(f1)
for state in states:
gas_diesel_df.apply(gas_diesel, axis=1, args=(gp, state, miles_per_year, years_of_ownership, f1writer))
Electric vehicle calculations
def electric(df, ep, req_rate, state, miles_per_year, years_of_ownership, f1writer):
price = ep[req_rate][state] # cents per KWh
price = price/100 # dollars per KWh
mpe = df.combE # combined electricity consumption in KWh per 100 miles
mpe = mpe/100 # KWh per mile
cost = df.msrp_average
year = int(df.year)
make = df.make
model = df.model
config = df.config
atv_type = df.atv_type
msrp = df.msrp_average
annual_fuel_costs = price*mpe*miles_per_year
mpg_ret = str(mpe*100) + ' KWh/100miles'
for i in range(years_of_ownership):
f1writer.writerow([i, year, make, model, config, atv_type, msrp, state, mpg_ret, annual_fuel_costs, msrp+annual_fuel_costs*i])
Adding the Electric Vehicle calculations to the csv file
req_rate = 'current_rate' # Take the current rate
ele_df = df[(df['atv_type'] == 'EV')]
with open(f'data/car-data/tableau/cars-{date.today()}.csv', 'a') as f1:
f1writer = csv.writer(f1)
for state in states:
ele_df.apply(electric, axis=1, args=(ep, req_rate, state, miles_per_year, years_of_ownership, f1writer))
Hybrid vehicle calculations
For plug-in hybrids, we can use the following methodology:
- Using the range of the vehicle, and the kWh per mile, we calculate how much it costs to fully charge the vehicle with electricity through the grid
- Next, we know how many miles are traveled per year (we have taken it to be 15000 as specified by the EPA), so we can calculate the number of miles driven per day by dividing by 365 (eg. 15000/365=41.096)
- We can subtract the hybrid vehicle range from this number, the balance will be the amount driven by the gasoline engine. The fuel costs for this can then be calculated.
def hybrid(df, gp, ep, req_rate, state, miles_per_year, years_of_ownership, f1writer):
price_ele = ep[req_rate][state] # cents per KWh
price_ele = price_ele/100 # dollars per KWh
grade = df.fuel_type1.split()[0].lower()
price_gas = gp[grade][state]
cost = df.msrp_average
year = int(df.year)
make = df.make
model = df.model
config = df.config
atv_type = df.atv_type
msrp = df.msrp_average
if df.atv_type == 'Plug-in Hybrid':
miles_per_day = miles_per_year/365
rangeA = float(df.rangeA)
gas_engine_miles = miles_per_day - rangeA
if gas_engine_miles < 0:
gas_engine_miles = 0
rangeA = miles_per_day # take miles traveled in a day instead of car range when calculating daily costs
# cost to charge = range * combined electricity consumption / 100 = KWh for full charge
mpe = df.combE # combined electricity consumption in KWh per 100 miles
mpe = mpe/100 # KWh per mile
full_charge_cost = rangeA*mpe*price_ele
# cost of running the rest using gas
mpg = df.comb_mpg
gpm = 1/mpg
gas_costs = gpm*gas_engine_miles*price_gas
annual_fuel_costs = (full_charge_cost + gas_costs)*365 # price per day * 365 days in a year
mpg_return = str(mpe*100) + ' KWh/100 miles + ' + str(mpg) + ' mpg'
elif df.atv_type == 'Hybrid':
mpg = df.comb_mpg
annual_fuel_costs = price_gas/mpg*miles_per_year
mpg_return = mpg
else:
print('error')
for i in range(years_of_ownership):
f1writer.writerow([i, year, make, model, config, atv_type, msrp, state, mpg_return, annual_fuel_costs, msrp+annual_fuel_costs*i])
Adding the Hybrid Vehicle calculations to the csv file
hbd_df = df[(df['atv_type'] == 'Plug-in Hybrid') | (df['atv_type'] == 'Hybrid')]
with open(f'data/car-data/tableau/cars-{date.today()}.csv', 'a') as f1:
f1writer = csv.writer(f1)
for state in states:
hbd_df.apply(hybrid, axis=1, args=(gp, ep, req_rate, state, miles_per_year, years_of_ownership, f1writer))
Some interesting information:
df11 = pd.read_csv(f'data/car-data/tableau/cars-{date.today()}.csv')
df11.groupby('atv_type').median()
atv_type | year_ownership | year | msrp | annual_fuel_cost | cumm_cost |
---|---|---|---|---|---|
Diesel | 5.0 | 2015.0 | 40806.25 | 2353.593750 | 49292.317335 |
EV | 5.0 | 2019.0 | 44450.00 | 726.015000 | 48278.370223 |
Gas | 5.0 | 2016.0 | 40000.00 | 2403.246772 | 52668.665374 |
Hybrid | 5.0 | 2018.0 | 41820.00 | 1662.857581 | 49046.851910 |
Plug-in Hybrid | 5.0 | 2019.0 | 51500.00 | 1493.719735 | 57608.160000 |
Results
Visualizations are done using tableau
1. Years of Ownership vs Fuel type
Plug-in Hybrids have the highest median MSRP followed by EVs, Hybrids, Diesel, and Gas vehicles. After 10 years of ownership, plug-in Hybrids still have the highest cumulative ownership costs followed very closely by Gas, then Diesel, Hybrids, and finally EVs. These results mainly stay the same for each state but you can use the drop-down filter on the bottom left of the visualization to check for different states.
2. Annual Fuel Costs by State
Use the filter on the bottom right to select the fuel type and view the median annual fuel costs for the fuel type by state. The states with the 5 highest and 5 lowest annual fuel costs are shown at the bottom.
3. The Difference in Annual Fuel Costs (Gas - Electricity)
This graph shows the difference between the median annual fuel costs for gas-powered vehicles (considering the fuel grade the vehicle uses), and the median annual fuel costs for electric vehicles. The higher the difference, the better the state to own the electric vehicle. A higher difference indicates that there is a higher cost associated with filling gas at the pump compared to charging at home. The current best state to own an electric car is Washington, due to the high gas prices and very low electricity rates. The current worst state to own an electric car is New Hampshire because of low gas prices and extremely high electricity rates.
Conclusion
This project shows the advantages of owning an electric car or other alternative power trains over the long run. While buying a new electric vehicle is generally more expensive, over the long run the amount we would save on fuel costs would eventually make it cheaper to own. With the current electricity rates and gas prices, electric cars would be cheaper to own in less than 3 years (see graph 1). These figures depend on the state in which you would operate the vehicle. Graph 3 shows that the best state to own an electric vehicle is Washington, where it would be cheaper compared to gas-powered options in under 2 years (with current rates), while it would take about 4 years in New Hampshire. Graph 1 also shows that there isn't much benefit to owning a plug-in hybrid vehicle, and a normal Hybrid which uses a gas engine to charge the battery might be more beneficial over the long run.
Graph 2 is helpful as it considers the efficiency of an electric vehicle, and uses MPG and Fuel grade for gasoline vehicles to determine the annual fuel cost. This makes it more representative of the fuel costs you would be expected to pay every year rather than just having a map of fuel prices and electricity prices in each state.
Limitations
- Does not consider maintenance costs – Electric cars are cheaper to maintain as they have fewer moving parts, but they would require a battery replacement at around 70,000–150,000 miles at an average price of $5000 (source).
- For plug-in hybrids, this project assumes that the car runs completely on electricity and later on gas when the battery is drained. In actuality, most plug-in hybrids run on 'charge depletion mode' where the battery and engine work in tandem to provide power.