Analysis of US Elections from 1976 to 2010 with Pandas

Original article was published by Soner Yıldırım on Artificial Intelligence on Medium


The Ratio of the Votes for Each Winner

Some elections were so close that the winner won by a small percentage. There were also some elections in which the winner won by a big margin.

We can calculate the ratio of votes of each winner. We will first add a “winner” column to our dataframe.

This Wikipedia page contains a list of presidents of the US. The tables can easily be read into a pandas dataframe using the read_html function. It converts the tables in the web page to a list of dataframes.

dfs = pd.read_html("https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_States")winners = dfs[1][['Election','President.1']]
winners.head()

The second dataframe contains the list of US presidents. We only need the data for elections from 1976 to 2016.

winners = winners.iloc[-12:-1, :]
winners.Election = winners.Election.astype('int64')
winners.rename(columns={'President.1':'winner'}, inplace=True)
winners

We need the names in the same format as in the president dataframe. “Jimmy Carter” needs to formatted as “Carter, Jimmy”. I will use the pandas string operations to do this task:

first_name = winners.winner.str.rsplit(' ', n=1, expand=True)[0]
last_name = winners.winner.str.rsplit(' ', n=1, expand=True)[1]
winners.winner = last_name + ', ' + first_name
winners

We need a few small adjustments so that the names of presidents exactly match.

winners.winner[73] = 'Bush, George H.W.'
winners.winner[78] = 'Obama, Barack H.'
winners.winner[79] = 'Obama, Barack H.'
winners.winner[80] = 'Trump, Donald J.'

I also converted the election dates to integers to be able to use the merge function in the next step.

We can now merge the “president” and “winners” dataframe based on the election year.

president = pd.merge(president, winners, left_on='year', right_on='Election')

We will filter the president dataframe to include only votes for the winners.

winner_votes = president[president.candidate == president.winner]winner_votes.head()

Each line contains the number of votes of the winner and the total votes in a particular state for a particular election. A simple groupby function will give us the country-wise values.

total_votes = winner_votes[['year','winner','candidatevotes','totalvotes']]\
.groupby(['year','winner']).sum()
total_votes

We can calculate the ratio of the winners by a simple math operation and sort the results.

(total_votes.candidatevotes / total_votes.totalvotes)\
.sort_values(ascending=False)

The number one is Ronald Reagan in his second term.