Indian Premier League (IPL).¶
- Importing required libraries
In [1]:
import pandas as pd
import numpy as np
- Load the IPL dataset ('matches.csv' and 'deliveries.csv') into your notebook. Once the dataset is loaded, explore it to gain a better understanding of its contents. Look at the variables included in the dataset, their data types, and the number of observations. Identify any missing data, outliers, or errors that may affect the analysis.
In [4]:
matches = pd.read_csv('matches.csv')
deli = pd.read_csv('deliveries.csv')
- Did you find any missing value in the deliveries dataset?
In [10]:
mis = deli.isnull().sum()*100/len(deli)
mis
Out[10]:
match_id 0.000000 inning 0.000000 batting_team 0.000000 bowling_team 0.000000 over 0.000000 ball 0.000000 batsman 0.000000 non_striker 0.000000 bowler 0.000000 is_super_over 0.000000 wide_runs 0.000000 bye_runs 0.000000 legbye_runs 0.000000 noball_runs 0.000000 penalty_runs 0.000000 batsman_runs 0.000000 extra_runs 0.000000 total_runs 0.000000 player_dismissed 95.066954 dismissal_kind 95.066954 fielder 96.399334 dtype: float64
- Compute the average runs scored by each team during IPL 2008 - 2019 for the First Batting
In [26]:
first_bat[first_bat.batting_team == 'Chennai Super Kings']
Out[26]:
match_id | inning | batting_team | bowling_team | over | ball | batsman | non_striker | bowler | is_super_over | ... | bye_runs | legbye_runs | noball_runs | penalty_runs | batsman_runs | extra_runs | total_runs | player_dismissed | dismissal_kind | fielder | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14087 | 61 | 1 | Chennai Super Kings | Kings XI Punjab | 1 | 1 | PA Patel | ML Hayden | B Lee | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN |
14088 | 61 | 1 | Chennai Super Kings | Kings XI Punjab | 1 | 2 | PA Patel | ML Hayden | B Lee | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN |
14089 | 61 | 1 | Chennai Super Kings | Kings XI Punjab | 1 | 3 | PA Patel | ML Hayden | B Lee | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 1 | NaN | NaN | NaN |
14090 | 61 | 1 | Chennai Super Kings | Kings XI Punjab | 1 | 4 | ML Hayden | PA Patel | B Lee | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN |
14091 | 61 | 1 | Chennai Super Kings | Kings XI Punjab | 1 | 5 | ML Hayden | PA Patel | B Lee | 0 | ... | 0 | 0 | 0 | 0 | 4 | 0 | 4 | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
178221 | 11412 | 1 | Chennai Super Kings | Mumbai Indians | 20 | 3 | MS Dhoni | AT Rayudu | JJ Bumrah | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 1 | NaN | NaN | NaN |
178222 | 11412 | 1 | Chennai Super Kings | Mumbai Indians | 20 | 4 | AT Rayudu | MS Dhoni | JJ Bumrah | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 1 | NaN | NaN | NaN |
178223 | 11412 | 1 | Chennai Super Kings | Mumbai Indians | 20 | 5 | MS Dhoni | AT Rayudu | JJ Bumrah | 0 | ... | 0 | 0 | 0 | 0 | 2 | 0 | 2 | NaN | NaN | NaN |
178224 | 11412 | 1 | Chennai Super Kings | Mumbai Indians | 20 | 6 | MS Dhoni | AT Rayudu | JJ Bumrah | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 1 | NaN | NaN | NaN |
178225 | 11412 | 1 | Chennai Super Kings | Mumbai Indians | 20 | 7 | AT Rayudu | MS Dhoni | JJ Bumrah | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 1 | NaN | NaN | NaN |
10974 rows × 21 columns
In [23]:
first_bat = deli[deli.inning == 1]
sum_ = first_bat.groupby('batting_team')['total_runs'].sum()
total = first_bat.groupby('batting_team')['match_id'].nunique()
avg = sum_ / total
avg
Out[23]:
batting_team Chennai Super Kings 166.685393 Deccan Chargers 157.325581 Delhi Capitals 169.833333 Delhi Daredevils 157.083333 Gujarat Lions 161.928571 Kings XI Punjab 162.076923 Kochi Tuskers Kerala 144.142857 Kolkata Knight Riders 160.783133 Mumbai Indians 166.475248 Pune Warriors 148.650000 Rajasthan Royals 158.582090 Rising Pune Supergiant 163.000000 Rising Pune Supergiants 160.428571 Royal Challengers Bangalore 167.694118 Sunrisers Hyderabad 164.619048 dtype: float64
- Compute the mean, median and mode of the variable
win_by_runs
.win_by_runs
columns represents the margin in which a team has won against the opponent, if the team batting first has won. Let's exclude all instances whenwin_by_runs = 0
then compute the mean, median and mode.
- Compute the mean and standard deviation of
win_by_wickets
. Exclude all instances whenwin_by_wickets <= 0
.
In [33]:
sort = matches[matches.win_by_runs > 0]
mean_ = sort['win_by_runs'].mean()
mean_
Out[33]:
29.798219584569733
In [37]:
win_by_runs_data = matches[matches.win_by_runs > 0].win_by_runs
win_by_runs_data.mean()
Out[37]:
29.798219584569733
- Which Venue(Stadium) has hosted the most number of matches?
In [47]:
most = matches.groupby('venue')['id'].count()
most.nlargest()
Out[47]:
venue Eden Gardens 77 M Chinnaswamy Stadium 73 Wankhede Stadium 73 Feroz Shah Kotla 67 Rajiv Gandhi International Stadium, Uppal 56 Name: id, dtype: int64
- In which season most number of matches had been played ?
In [49]:
matches.groupby('season')['id'].count()
Out[49]:
season 2008 58 2009 57 2010 60 2011 73 2012 74 2013 76 2014 60 2015 59 2016 60 2017 59 2018 60 2019 60 Name: id, dtype: int64
In [51]:
matches[matches['win_by_wickets']>0].win_by_wickets.mean()
Out[51]:
6.238916256157635
In [ ]: