In [13]:
import math
import pandas as pd
1. What's the probability when n = 10?¶
In [7]:
1-(364/365)**((15*14)/2)
Out[7]:
0.25028790861398265
2. What's the probability when n is 15?¶
In [ ]:
3. Implement the birthday_probability function¶
In [9]:
def birthday_probability(number_of_people):
result = 1-(364/365)**((number_of_people*(number_of_people-1))/2)
return result
print(birthday_probability(15))
0.25028790861398265
NBA Birthday Paradox Analysis¶
In [14]:
df = pd.read_csv('nba_2017.csv', parse_dates=['Birth Date'])
In [15]:
df.head()
Out[15]:
| Player | Pos | Age | Team | Birth Date | |
|---|---|---|---|---|---|
| 0 | Alex Abrines | SG | 23.0 | Oklahoma City Thunder | 1993-08-01 |
| 1 | Quincy Acy | PF | 26.0 | Dallas Mavericks | 1990-10-06 |
| 2 | Quincy Acy | PF | 26.0 | Brooklyn Nets | 1990-10-06 |
| 3 | Steven Adams | C | 23.0 | Oklahoma City Thunder | 1993-07-20 |
| 4 | Arron Afflalo | SG | 31.0 | Sacramento Kings | 1985-10-15 |
4. Create the Birth Date column¶
In [16]:
df['Birth Date'].dt.strftime("%Y-%m-%d").head()
Out[16]:
0 1993-08-01 1 1990-10-06 2 1990-10-06 3 1993-07-20 4 1985-10-15 Name: Birth Date, dtype: object
In [18]:
df["Birthday"] = df['Birth Date'].dt.strftime("%m-%d")
df
Out[18]:
| Player | Pos | Age | Team | Birth Date | Birthday | |
|---|---|---|---|---|---|---|
| 0 | Alex Abrines | SG | 23.0 | Oklahoma City Thunder | 1993-08-01 | 08-01 |
| 1 | Quincy Acy | PF | 26.0 | Dallas Mavericks | 1990-10-06 | 10-06 |
| 2 | Quincy Acy | PF | 26.0 | Brooklyn Nets | 1990-10-06 | 10-06 |
| 3 | Steven Adams | C | 23.0 | Oklahoma City Thunder | 1993-07-20 | 07-20 |
| 4 | Arron Afflalo | SG | 31.0 | Sacramento Kings | 1985-10-15 | 10-15 |
| ... | ... | ... | ... | ... | ... | ... |
| 546 | Cody Zeller | PF | 24.0 | Charlotte Hornets | 1992-10-05 | 10-05 |
| 547 | Tyler Zeller | C | 27.0 | Boston Celtics | 1990-01-17 | 01-17 |
| 548 | Stephen Zimmerman | C | 20.0 | Orlando Magic | 1996-09-09 | 09-09 |
| 549 | Paul Zipser | SF | 22.0 | Chicago Bulls | 1994-02-18 | 02-18 |
| 550 | Ivica Zubac | C | 19.0 | Los Angeles Lakers | 1997-03-18 | 03-18 |
551 rows × 6 columns
Interlude: Combinatorics¶
For this project, you're free to use any techinque that you prefer to answer how many players share a birthday for a given team. But, one recommendation would be to use combinatorics; specifically the Combinations, using the itertools.combinations function. Here's a quick example. Suppose we have these samples:
| Name | Birthday |
|---|---|
| John | March 5th |
| Mary | Sept 20th |
| Rob | March 5th |
Using combinations, we can take all the samples in paris (r=2) to compare them:
| Person 1 | Person 2 |
|---|---|
| John | Mary |
| John | Rob |
| Mary | Rob |
Using Python:
In [20]:
from itertools import combinations
In [ ]:
names = ["John", "Mary", "Rob"]
birthdays = ["March 5th", "Sept 20th", "March 5th"]
In [ ]:
# Note: we need to wrap it in a list to force display
list(combinations(names, 2))
In [ ]:
# Note: we need to wrap it in a list to force display
list(combinations(birthdays, 2))
We can see how March 5th (John and Rob) are the same dates. Using Pandas:
In [ ]:
names_df = pd.DataFrame(combinations(names, 2), columns=["Person 1", "Person 2"])
names_df
In [ ]:
birthdays_df = pd.DataFrame(combinations(birthdays, 2), columns=["Birthday 1", "Birthday 2"])
birthdays_df
Combining it:
In [ ]:
df_concat = pd.concat([names_df, birthdays_df], axis=1)
In [ ]:
df_concat
In [ ]:
df_concat['Birthday 1'] == df_concat['Birthday 2']
End of the interlude! Now, it's your turn to answer questions.
Activities¶
5. How many pairs of players share a birthday for the Atlanta Hawks?¶
In [53]:
df.head(5)
names = list(df.loc[df["Team"]=="Atlanta Hawks", "Player"])
birthdays = list(df.loc[df["Team"]=="Atlanta Hawks", "Birthday"])
names_df =pd.DataFrame(combinations(names,2),columns=["Person 1","Person 2"])
birthday_df = pd.DataFrame(combinations(birthdays,2),columns=["Birthday 1","Birthday 2"])
df_concat = pd.concat([names_df,birthday_df],axis=1)
df_pair = (df_concat["Birthday 1"] == df_concat["Birthday 2"]).value_counts()
df_pair
Out[53]:
False 228 True 3 Name: count, dtype: int64
6. How many pairs of players share a birthday in the Cleveland Cavaliers?¶
In [70]:
names = list(df.loc[df["Team"]=="Cleveland Cavaliers","Player"])
birthdays = list(df.loc[df["Team"]=="Cleveland Cavaliers","Birthday"])
names_df = pd.DataFrame(combinations(names,2),columns = ["Player 1","Player 2"])
birthdays_df = pd.DataFrame(combinations(birthdays,2),columns = ["Birthday 1","Birthday 2"])
df_concat = pd.concat([names_df,birthdays_df],axis = 1)
result = (df_concat["Birthday 1"] == df_concat["Birthday 2"]).value_counts()
result
Out[70]:
False 230 True 1 Name: count, dtype: int64
In [ ]:
7. In the Dallas Mavericks, who shares a birthday with J.J. Barea?¶
In [118]:
names = list(df.loc[df["Team"]=="Dallas Mavericks","Player"])
names.pop(2)
birthdays = list(df.loc[df["Team"] == "Dallas Mavericks","Birthday"])
birthdays.pop(2)
birthday_df = pd.DataFrame(birthdays,columns=["Birthday"])
names_df = pd.DataFrame(names, columns = ["Name"])
df_concat = pd.concat([names_df,birthday_df],axis = 1)
result = df_concat.loc[df_concat["Birthday"]=='06-26']
result
Out[118]:
| Name | Birthday | |
|---|---|---|
| 22 | Deron Williams | 06-26 |