使用 Python 进行奥运会数据分析
pythonserver side programmingprogramming更新于 2024/2/10 18:06:00
当代奥运会,有时也被称为奥运会,是大型国际体育赛事,包括夏季和冬季体育比赛,来自世界各地的数千名参赛者将参加各种项目的比赛。奥运会有超过 200 个国家参赛,被认为是世界顶级体育赛事。在本文中,我们将使用 Python 研究奥运会。让我们开始吧。
导入必要的库
!pip install pandas !pip install numpy import numpy as np import pandas as pd import seaborn as sns from matplotlib import pyplot as plt
导入和理解数据集
处理奥运数据时,我们有两个 CSV 文件。一个详细列出了所有奥运会的体育相关总费用。另一个包含所有年份参赛运动员的信息。
您可以点击此处获取 CSV 数据文件 −
data = pd.read_csv('/content/sample_data/athlete_events.csv') # data.head() display first 5 entry print(data.head(), data.describe(), data.info())
合并两个数据集
# regions and country noc data CSV file regions = pd.read_csv('/content/sample_data/datasets_31029_40943_noc_regions.csv') print(regions.head()) # merging to data and regions frame merged = pd.merge(data, regions, on='NOC', how='left') print(merged.head())
数据分析从这里开始。
金牌的数据分析
示例
#creating goldmedal dataframes goldMedals = merged[(merged.Medal == 'Gold')] print(goldMedals.head())
输出
ID Name Sex Age Height Weight Team \ 3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden 42 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland 44 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland 48 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland 60 20 Kjetil Andr Aamodt M 20.0 176.0 85.0 Norway NOC Games Year Season City Sport \ 3 DEN 1900 Summer 1900 Summer Paris Tug-Of-War 42 FIN 1948 Summer 1948 Summer London Gymnastics 44 FIN 1948 Summer 1948 Summer London Gymnastics 48 FIN 1948 Summer 1948 Summer London Gymnastics 60 NOR 1992 Winter 1992 Winter Albertville Alpine Skiing Event Medal region notes 3 Tug-Of-War Men's Tug-Of-War Gold Denmark NaN 42 Gymnastics Men's Team All-Around Gold Finland NaN 44 Gymnastics Men's Horse Vault Gold Finland NaN 48 Gymnastics Men's Pommelled Horse Gold Finland NaN 60 Alpine Skiing Men's Super G Gold Norway NaN
按年龄分析金牌获得者
在这里,我们将制作一张图表,显示金牌数量与年龄的关系。为此,我们将开发一个图形表示的反图,其中参与者的年龄显示在 X 轴上,奖牌数量显示在 Y 轴上。
示例
plt.figure(figsize=(20, 10)) plt.title('Distribution of Gold Medals') sns.countplot(goldMedals['Age']) plt.show()
输出

创建一个名为"masterDisciplines"的新数据框,将这个新群体放入其中。然后,使用该数据框进行可视化。
示例
masterDisciplines = goldMedals['Sport'][goldMedals['Age'] > 50] plt.figure(figsize=(20, 10)) plt.tight_layout() sns.countplot(masterDisciplines) plt.title('Gold Medals for Athletes Over 50') plt.show()
输出

分析女性获得奖牌
示例
womenInOlympics = merged[(merged.Sex == 'F') & (merged.Season == 'Summer')] print(womenInOlympics.head(10)) sns.set(style="darkgrid") plt.figure(figsize=(20, 10)) sns.countplot(x='Year', data=womenInOlympics) plt.title('Women medals per edition of the Games') plt.show()
输出

分析获得奖牌的前 5 个国家
示例
print(goldMedals.region.value_counts().reset_index(name='Medal').head()) totalGoldMedals = goldMedals.region.value_counts().reset_index(name='Medal').head(5) g = sns.catplot(x="index", y="Medal", data=totalGoldMedals, height=6, kind="bar", palette="muted") g.despine(left=True) g.set_xlabels("Top 5 countries") g.set_ylabels("Number of Medals") plt.title('Medals per Country') plt.show()
输出

运动员随时间的演变
示例
MenOverTime = merged[(merged.Sex == 'M') & (merged.Season == 'Summer')] WomenOverTime = merged[(merged.Sex == 'F') & (merged.Season == 'Summer')] part = MenOverTime.groupby('Year')['Sex'].value_counts() plt.figure(figsize=(20, 10)) part.loc[:,'M'].plot() plt.title('Variation of Male Athletes over time')
输出

示例
part = WomenOverTime.groupby('Year')['Sex'].value_counts() plt.figure(figsize=(20, 10)) part.loc[:,'F'].plot() plt.title('Variation of Female Athletes over time')
输出

结论
我们对数据进行了一些分析,您还可以进一步分析并得出更多见解。