In [2]:
import pandas as pd
In [3]:
df = pd.read_csv('words.csv', index_col='Word')
In [4]:
df.head()
Out[4]:
| Char Count | Value | |
|---|---|---|
| Word | ||
| aa | 2 | 2 |
| aah | 3 | 10 |
| aahed | 5 | 19 |
| aahing | 6 | 40 |
| aahs | 4 | 29 |
Activities¶
How many elements does this dataframe have?¶
In [6]:
df.index.size
Out[6]:
172821
What is the value of the word microspectrophotometries?¶
In [7]:
df.loc["microspectrophotometries"]
Out[7]:
Char Count 24 Value 317 Name: microspectrophotometries, dtype: int64
What is the highest possible value of a word?¶
In [11]:
df.describe()
Out[11]:
| Char Count | Value | |
|---|---|---|
| count | 172821.000000 | 172821.000000 |
| mean | 9.087628 | 107.754179 |
| std | 2.818285 | 39.317452 |
| min | 2.000000 | 2.000000 |
| 25% | 7.000000 | 80.000000 |
| 50% | 9.000000 | 103.000000 |
| 75% | 11.000000 | 131.000000 |
| max | 28.000000 | 319.000000 |
Which of the following words have a Char Count of 15?¶
In [13]:
df.loc[["pinfish", "glowing", "enfold", "superheterodyne","microbrew"]]
Out[13]:
| Char Count | Value | |
|---|---|---|
| Word | ||
| pinfish | 7 | 81 |
| glowing | 7 | 87 |
| enfold | 6 | 56 |
| superheterodyne | 15 | 198 |
| microbrew | 9 | 106 |
What is the highest possible length of a word?¶
In [17]:
df.describe()
Out[17]:
| Char Count | Value | |
|---|---|---|
| count | 172821.000000 | 172821.000000 |
| mean | 9.087628 | 107.754179 |
| std | 2.818285 | 39.317452 |
| min | 2.000000 | 2.000000 |
| 25% | 7.000000 | 80.000000 |
| 50% | 9.000000 | 103.000000 |
| 75% | 11.000000 | 131.000000 |
| max | 28.000000 | 319.000000 |
What is the word with the value of 319?¶
In [18]:
df.sort_values(by = "Value", ascending = False)
Out[18]:
| Char Count | Value | |
|---|---|---|
| Word | ||
| reinstitutionalizations | 23 | 319 |
| microspectrophotometries | 24 | 317 |
| microspectrophotometry | 22 | 309 |
| microspectrophotometers | 23 | 308 |
| immunoelectrophoretically | 25 | 307 |
| ... | ... | ... |
| aba | 3 | 4 |
| baa | 3 | 4 |
| ba | 2 | 3 |
| ab | 2 | 3 |
| aa | 2 | 2 |
172821 rows × 2 columns
In [19]:
df.loc[df["Value"]==319]
Out[19]:
| Char Count | Value | |
|---|---|---|
| Word | ||
| reinstitutionalizations | 23 | 319 |
What is the most common value?¶
In [21]:
df.Value.describe()
Out[21]:
count 172821.000000 mean 107.754179 std 39.317452 min 2.000000 25% 80.000000 50% 103.000000 75% 131.000000 max 319.000000 Name: Value, dtype: float64
In [23]:
df["Value"].value_counts()
Out[23]:
Value
93 1965
100 1921
95 1915
99 1907
92 1902
...
317 1
304 1
300 1
319 1
278 1
Name: count, Length: 303, dtype: int64
In [22]:
df["Value"].mode()
Out[22]:
0 93 Name: Value, dtype: int64
What is the shortest word with value 274?¶
In [24]:
df.loc[df["Value"] == 274]
Out[24]:
| Char Count | Value | |
|---|---|---|
| Word | ||
| countercountermeasure | 21 | 274 |
| overprotectivenesses | 20 | 274 |
| psychophysiologically | 21 | 274 |
Create a column Ratio which represents the 'Value Ratio' of a word¶
In [26]:
df['Ratio'] = df['Value']/ df["Char Count"]
Out[26]:
| Char Count | Value | Ratio | |
|---|---|---|---|
| Word | |||
| aa | 2 | 2 | 1.000000 |
| aah | 3 | 10 | 3.333333 |
| aahed | 5 | 19 | 3.800000 |
| aahing | 6 | 40 | 6.666667 |
| aahs | 4 | 29 | 7.250000 |
| ... | ... | ... | ... |
| zymotic | 7 | 111 | 15.857143 |
| zymurgies | 9 | 143 | 15.888889 |
| zymurgy | 7 | 135 | 19.285714 |
| zyzzyva | 7 | 151 | 21.571429 |
| zyzzyvas | 8 | 170 | 21.250000 |
172821 rows × 3 columns
What is the maximum value of Ratio?¶
In [29]:
df['Ratio'].max()
Out[29]:
22.5
What word is the one with the highest Ratio?¶
In [30]:
df.sort_values(by = "Ratio", ascending= False)
Out[30]:
| Char Count | Value | Ratio | |
|---|---|---|---|
| Word | |||
| xu | 2 | 45 | 22.500000 |
| muzzy | 5 | 111 | 22.200000 |
| wry | 3 | 66 | 22.000000 |
| xyst | 4 | 88 | 22.000000 |
| tux | 3 | 65 | 21.666667 |
| ... | ... | ... | ... |
| ba | 2 | 3 | 1.500000 |
| baba | 4 | 6 | 1.500000 |
| aba | 3 | 4 | 1.333333 |
| baa | 3 | 4 | 1.333333 |
| aa | 2 | 2 | 1.000000 |
172821 rows × 3 columns
How many words have a Ratio of 10?¶
In [32]:
df.loc[df["Ratio"] == 10]
Out[32]:
| Char Count | Value | Ratio | |
|---|---|---|---|
| Word | |||
| aardwolf | 8 | 80 | 10.0 |
| abatements | 10 | 100 | 10.0 |
| abducts | 7 | 70 | 10.0 |
| abetment | 8 | 80 | 10.0 |
| abettals | 8 | 80 | 10.0 |
| ... | ... | ... | ... |
| ycleped | 7 | 70 | 10.0 |
| yodeled | 7 | 70 | 10.0 |
| zamia | 5 | 50 | 10.0 |
| zebecs | 6 | 60 | 10.0 |
| zwieback | 8 | 80 | 10.0 |
2604 rows × 3 columns
What is the maximum Value of all the words with a Ratio of 10?¶
In [33]:
df.loc[df["Ratio"] == 10].sort_values(by = "Value", ascending= False)
Out[33]:
| Char Count | Value | Ratio | |
|---|---|---|---|
| Word | |||
| electrocardiographically | 24 | 240 | 10.0 |
| electroencephalographies | 24 | 240 | 10.0 |
| electroencephalographer | 23 | 230 | 10.0 |
| electrodesiccation | 18 | 180 | 10.0 |
| phonocardiographic | 18 | 180 | 10.0 |
| ... | ... | ... | ... |
| col | 3 | 30 | 10.0 |
| bis | 3 | 30 | 10.0 |
| sib | 3 | 30 | 10.0 |
| as | 2 | 20 | 10.0 |
| oe | 2 | 20 | 10.0 |
2604 rows × 3 columns
Of those words with a Value of 260, what is the lowest Char Count found?¶
In [35]:
df.loc[df["Value"]==260]
Out[35]:
| Char Count | Value | Ratio | |
|---|---|---|---|
| Word | |||
| countermobilizations | 20 | 260 | 13.000000 |
| hydroxytryptamine | 17 | 260 | 15.294118 |
| neuropsychologists | 18 | 260 | 14.444444 |
| psychophysiologist | 18 | 260 | 14.444444 |
| revolutionarinesses | 19 | 260 | 13.684211 |
| underrepresentations | 20 | 260 | 13.000000 |
Based on the previous task, what word is it?¶
In [ ]: