Indian Premier League (IPL).¶

  1. Importing required libraries
In [1]:
import pandas as pd
import numpy as np
  1. Load the IPL dataset ('matches.csv' and 'deliveries.csv') into your notebook. Once the dataset is loaded, explore it to gain a better understanding of its contents. Look at the variables included in the dataset, their data types, and the number of observations. Identify any missing data, outliers, or errors that may affect the analysis.
In [4]:
matches = pd.read_csv('matches.csv')
deli = pd.read_csv('deliveries.csv')
  1. Did you find any missing value in the deliveries dataset?
In [10]:
mis = deli.isnull().sum()*100/len(deli)
mis
Out[10]:
match_id             0.000000
inning               0.000000
batting_team         0.000000
bowling_team         0.000000
over                 0.000000
ball                 0.000000
batsman              0.000000
non_striker          0.000000
bowler               0.000000
is_super_over        0.000000
wide_runs            0.000000
bye_runs             0.000000
legbye_runs          0.000000
noball_runs          0.000000
penalty_runs         0.000000
batsman_runs         0.000000
extra_runs           0.000000
total_runs           0.000000
player_dismissed    95.066954
dismissal_kind      95.066954
fielder             96.399334
dtype: float64
  1. Compute the average runs scored by each team during IPL 2008 - 2019 for the First Batting
In [26]:
first_bat[first_bat.batting_team == 'Chennai Super Kings']
Out[26]:
match_id inning batting_team bowling_team over ball batsman non_striker bowler is_super_over ... bye_runs legbye_runs noball_runs penalty_runs batsman_runs extra_runs total_runs player_dismissed dismissal_kind fielder
14087 61 1 Chennai Super Kings Kings XI Punjab 1 1 PA Patel ML Hayden B Lee 0 ... 0 0 0 0 0 0 0 NaN NaN NaN
14088 61 1 Chennai Super Kings Kings XI Punjab 1 2 PA Patel ML Hayden B Lee 0 ... 0 0 0 0 0 0 0 NaN NaN NaN
14089 61 1 Chennai Super Kings Kings XI Punjab 1 3 PA Patel ML Hayden B Lee 0 ... 0 0 0 0 1 0 1 NaN NaN NaN
14090 61 1 Chennai Super Kings Kings XI Punjab 1 4 ML Hayden PA Patel B Lee 0 ... 0 0 0 0 0 0 0 NaN NaN NaN
14091 61 1 Chennai Super Kings Kings XI Punjab 1 5 ML Hayden PA Patel B Lee 0 ... 0 0 0 0 4 0 4 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
178221 11412 1 Chennai Super Kings Mumbai Indians 20 3 MS Dhoni AT Rayudu JJ Bumrah 0 ... 0 0 0 0 1 0 1 NaN NaN NaN
178222 11412 1 Chennai Super Kings Mumbai Indians 20 4 AT Rayudu MS Dhoni JJ Bumrah 0 ... 0 0 0 0 1 0 1 NaN NaN NaN
178223 11412 1 Chennai Super Kings Mumbai Indians 20 5 MS Dhoni AT Rayudu JJ Bumrah 0 ... 0 0 0 0 2 0 2 NaN NaN NaN
178224 11412 1 Chennai Super Kings Mumbai Indians 20 6 MS Dhoni AT Rayudu JJ Bumrah 0 ... 0 0 0 0 1 0 1 NaN NaN NaN
178225 11412 1 Chennai Super Kings Mumbai Indians 20 7 AT Rayudu MS Dhoni JJ Bumrah 0 ... 0 0 0 0 1 0 1 NaN NaN NaN

10974 rows × 21 columns

In [23]:
first_bat = deli[deli.inning == 1]
sum_ = first_bat.groupby('batting_team')['total_runs'].sum()
total = first_bat.groupby('batting_team')['match_id'].nunique()
avg = sum_ / total
avg
Out[23]:
batting_team
Chennai Super Kings            166.685393
Deccan Chargers                157.325581
Delhi Capitals                 169.833333
Delhi Daredevils               157.083333
Gujarat Lions                  161.928571
Kings XI Punjab                162.076923
Kochi Tuskers Kerala           144.142857
Kolkata Knight Riders          160.783133
Mumbai Indians                 166.475248
Pune Warriors                  148.650000
Rajasthan Royals               158.582090
Rising Pune Supergiant         163.000000
Rising Pune Supergiants        160.428571
Royal Challengers Bangalore    167.694118
Sunrisers Hyderabad            164.619048
dtype: float64
  1. Compute the mean, median and mode of the variable win_by_runs. win_by_runs columns represents the margin in which a team has won against the opponent, if the team batting first has won. Let's exclude all instances when win_by_runs = 0 then compute the mean, median and mode.
  1. Compute the mean and standard deviation of win_by_wickets. Exclude all instances when win_by_wickets <= 0.
In [33]:
sort = matches[matches.win_by_runs > 0]
mean_ = sort['win_by_runs'].mean()
mean_
Out[33]:
29.798219584569733
In [37]:
win_by_runs_data = matches[matches.win_by_runs > 0].win_by_runs
win_by_runs_data.mean()
Out[37]:
29.798219584569733
  1. Which Venue(Stadium) has hosted the most number of matches?
In [47]:
most = matches.groupby('venue')['id'].count()
most.nlargest()
Out[47]:
venue
Eden Gardens                                 77
M Chinnaswamy Stadium                        73
Wankhede Stadium                             73
Feroz Shah Kotla                             67
Rajiv Gandhi International Stadium, Uppal    56
Name: id, dtype: int64
  1. In which season most number of matches had been played ?
In [49]:
matches.groupby('season')['id'].count()
Out[49]:
season
2008    58
2009    57
2010    60
2011    73
2012    74
2013    76
2014    60
2015    59
2016    60
2017    59
2018    60
2019    60
Name: id, dtype: int64
In [51]:
matches[matches['win_by_wickets']>0].win_by_wickets.mean()
Out[51]:
6.238916256157635
In [ ]: