In [13]:
import math
import pandas as pd
1. What's the probability when n = 10?¶
In [7]:
1-(364/365)**((15*14)/2)
Out[7]:
0.25028790861398265
2. What's the probability when n is 15?¶
In [ ]:
 
3. Implement the birthday_probability function¶
In [9]:
def birthday_probability(number_of_people):
    result = 1-(364/365)**((number_of_people*(number_of_people-1))/2)
    return result
print(birthday_probability(15))
0.25028790861398265

NBA Birthday Paradox Analysis¶

In [14]:
df = pd.read_csv('nba_2017.csv', parse_dates=['Birth Date'])
In [15]:
df.head()
Out[15]:
Player Pos Age Team Birth Date
0 Alex Abrines SG 23.0 Oklahoma City Thunder 1993-08-01
1 Quincy Acy PF 26.0 Dallas Mavericks 1990-10-06
2 Quincy Acy PF 26.0 Brooklyn Nets 1990-10-06
3 Steven Adams C 23.0 Oklahoma City Thunder 1993-07-20
4 Arron Afflalo SG 31.0 Sacramento Kings 1985-10-15
4. Create the Birth Date column¶
In [16]:
df['Birth Date'].dt.strftime("%Y-%m-%d").head()
Out[16]:
0    1993-08-01
1    1990-10-06
2    1990-10-06
3    1993-07-20
4    1985-10-15
Name: Birth Date, dtype: object
In [18]:
df["Birthday"] = df['Birth Date'].dt.strftime("%m-%d")
df
Out[18]:
Player Pos Age Team Birth Date Birthday
0 Alex Abrines SG 23.0 Oklahoma City Thunder 1993-08-01 08-01
1 Quincy Acy PF 26.0 Dallas Mavericks 1990-10-06 10-06
2 Quincy Acy PF 26.0 Brooklyn Nets 1990-10-06 10-06
3 Steven Adams C 23.0 Oklahoma City Thunder 1993-07-20 07-20
4 Arron Afflalo SG 31.0 Sacramento Kings 1985-10-15 10-15
... ... ... ... ... ... ...
546 Cody Zeller PF 24.0 Charlotte Hornets 1992-10-05 10-05
547 Tyler Zeller C 27.0 Boston Celtics 1990-01-17 01-17
548 Stephen Zimmerman C 20.0 Orlando Magic 1996-09-09 09-09
549 Paul Zipser SF 22.0 Chicago Bulls 1994-02-18 02-18
550 Ivica Zubac C 19.0 Los Angeles Lakers 1997-03-18 03-18

551 rows × 6 columns

Interlude: Combinatorics¶

For this project, you're free to use any techinque that you prefer to answer how many players share a birthday for a given team. But, one recommendation would be to use combinatorics; specifically the Combinations, using the itertools.combinations function. Here's a quick example. Suppose we have these samples:

Name Birthday
John March 5th
Mary Sept 20th
Rob March 5th

Using combinations, we can take all the samples in paris (r=2) to compare them:

Person 1 Person 2
John Mary
John Rob
Mary Rob

Using Python:

In [20]:
from itertools import combinations
In [ ]:
names = ["John", "Mary", "Rob"]
birthdays = ["March 5th", "Sept 20th", "March 5th"]
In [ ]:
# Note: we need to wrap it in a list to force display
list(combinations(names, 2))
In [ ]:
# Note: we need to wrap it in a list to force display
list(combinations(birthdays, 2))

We can see how March 5th (John and Rob) are the same dates. Using Pandas:

In [ ]:
names_df = pd.DataFrame(combinations(names, 2), columns=["Person 1", "Person 2"])
names_df
In [ ]:
birthdays_df = pd.DataFrame(combinations(birthdays, 2), columns=["Birthday 1", "Birthday 2"])
birthdays_df

Combining it:

In [ ]:
df_concat = pd.concat([names_df, birthdays_df], axis=1)
In [ ]:
df_concat
In [ ]:
df_concat['Birthday 1'] == df_concat['Birthday 2']

End of the interlude! Now, it's your turn to answer questions.


Activities¶

5. How many pairs of players share a birthday for the Atlanta Hawks?¶
In [53]:
df.head(5)
names = list(df.loc[df["Team"]=="Atlanta Hawks", "Player"])
birthdays = list(df.loc[df["Team"]=="Atlanta Hawks", "Birthday"])
names_df =pd.DataFrame(combinations(names,2),columns=["Person 1","Person 2"])
birthday_df = pd.DataFrame(combinations(birthdays,2),columns=["Birthday 1","Birthday 2"])
df_concat = pd.concat([names_df,birthday_df],axis=1)
df_pair = (df_concat["Birthday 1"] == df_concat["Birthday 2"]).value_counts()
df_pair
Out[53]:
False    228
True       3
Name: count, dtype: int64
6. How many pairs of players share a birthday in the Cleveland Cavaliers?¶
In [70]:
names = list(df.loc[df["Team"]=="Cleveland Cavaliers","Player"])
birthdays = list(df.loc[df["Team"]=="Cleveland Cavaliers","Birthday"])
names_df = pd.DataFrame(combinations(names,2),columns = ["Player 1","Player 2"])
birthdays_df =  pd.DataFrame(combinations(birthdays,2),columns = ["Birthday 1","Birthday 2"])
df_concat = pd.concat([names_df,birthdays_df],axis = 1)
result = (df_concat["Birthday 1"] == df_concat["Birthday 2"]).value_counts()
result
Out[70]:
False    230
True       1
Name: count, dtype: int64
In [ ]:
 
7. In the Dallas Mavericks, who shares a birthday with J.J. Barea?¶
In [118]:
names = list(df.loc[df["Team"]=="Dallas Mavericks","Player"])
names.pop(2)

birthdays = list(df.loc[df["Team"] == "Dallas Mavericks","Birthday"])
birthdays.pop(2)
birthday_df = pd.DataFrame(birthdays,columns=["Birthday"])
names_df = pd.DataFrame(names, columns = ["Name"])
df_concat = pd.concat([names_df,birthday_df],axis = 1)
result = df_concat.loc[df_concat["Birthday"]=='06-26']
result
Out[118]:
Name Birthday
22 Deron Williams 06-26

The End!¶