熊猫烧香源码分析_熊猫体育分析入门
熊貓燒香源碼分析
Sports analytics is a major subfield of data science. The advancements in data collection techniques and data analysis have made it more appealing to the teams to adapt strategies based on data analytics.
運(yùn)動分析是數(shù)據(jù)科學(xué)的主要子領(lǐng)域。 數(shù)據(jù)收集技術(shù)和數(shù)據(jù)分析的進(jìn)步使其對團(tuán)隊更具吸引力,以基于數(shù)據(jù)分析來調(diào)整策略。
Data analytics provide valuable insight into both team performance and player performance. If used wisely and systematically, data analytics is most likely to take the teams ahead of the competitors.
數(shù)據(jù)分析可提供有關(guān)團(tuán)隊績效和球員績效的寶貴見解。 如果明智且系統(tǒng)地使用數(shù)據(jù)分析,則最有可能使團(tuán)隊領(lǐng)先于競爭對手。
Some clubs have an entire team dedicated to data analytics. Liverpool is a pioneer in using data analytics which I think is an important part of their success. They are the last Premier League champion and the winner of the Champions League in 2019.
一些俱樂部擁有整個團(tuán)隊致力于數(shù)據(jù)分析。 利物浦是使用數(shù)據(jù)分析的先驅(qū),我認(rèn)為這是其成功的重要組成部分。 他們是最后的英超聯(lián)賽冠軍和2019年的冠軍聯(lián)賽冠軍。
In this post, we will use Pandas to draw meaningful results from German Bundesliga matches in the 2017–18 season. The datasets can be downloaded from the link. We will use a part of the datasets introduced in the paper “A public data set of spatio-temporal match events in soccer competitions”.
在本文中,我們將使用熊貓從2017-18賽季德國德甲比賽中得出有意義的結(jié)果。 可以從鏈接下載數(shù)據(jù)集。 我們將使用論文“足球比賽中時空比賽事件的公共數(shù)據(jù)集”中介紹的部分?jǐn)?shù)據(jù)集。
The datasets are saved in JSON format which can easily be read into pandas dataframes.
數(shù)據(jù)集以JSON格式保存,可以輕松讀取到pandas數(shù)據(jù)框中。
import numpy as npimport pandas as pdevents = pd.read_json("/content/events_Germany.json")
matches = pd.read_json("/content/matches_Germany.json")
teams = pd.read_json("/content/teams.json")
players = pd.read_json("/content/players.json")events.head()(image by author)(作者提供的圖片)
The events dataframe contains details of events that occurred in matches. For instance, the first line tells us that player 15231 made a “simple pass” from the location (50,50) to (50,48) in the third second of the match 2516739.
事件數(shù)據(jù)幀包含匹配中發(fā)生的事件的詳細(xì)信息。 例如,第一行告訴我們玩家15231在比賽2516739的第三秒從位置(50,50)到(50,48)進(jìn)行了“簡單傳遞”。
The events dataframe includes player and team IDs but not the player and team names. We will add them from the teams and players dataframes using the merge function.
事件數(shù)據(jù)框包括球員和球隊的ID,但不包括球員和球隊的名稱。 我們將使用合并功能從球隊和球員數(shù)據(jù)框中添加他們。
(image by author)(作者提供的圖片)The IDs are stored in the “wyId” column in the teams and players dataframes.
這些ID存儲在球隊和球員數(shù)據(jù)框的“ wyId”列中。
#merge with teamsevents = pd.merge(
events, teams[['name','wyId']],left_on='teamId',right_on='wyId'
)
events.rename(columns={'name':'teamName'}, inplace=True)
events.drop('wyId', axis=1, inplace=True)#merge with players
events = pd.merge(
events, players[['wyId','shortName','firstName']],
left_on ='playerId',right_on='wyId'
)
events.rename(columns={'shortName':'playerName', 'firstName':'playerFName'}, inplace=True)
events.drop('wyId', axis=1, inplace=True)
We merged the dataframes based on the columns that contain IDs and then rename the new columns. Finally, the “wyId” column is dropped because IDs are already stored in the events dataframe.
我們根據(jù)包含ID的列合并數(shù)據(jù)框,然后重命名新列。 最后,因?yàn)镮D已存儲在事件數(shù)據(jù)框中,所以刪除了“ wyId”列。
(image by author)(作者提供的圖片)每場比賽的平均傳球次數(shù) (Average Number of Passes per Match)
The teams that dominate the game usually do more passes. In general, they are more likely to win the match. There are, of course, some exceptions.
主導(dǎo)比賽的球隊通常會傳更多球。 通常,他們更有可能贏得比賽。 當(dāng)然,也有一些例外。
Let’s check the average number of passes per match for each team. We will first create a dataframe that contains the team name, match ID, and the number of passes done in that match.
讓我們檢查一下每支球隊每場比賽的平均傳球次數(shù)。 我們將首先創(chuàng)建一個數(shù)據(jù)框,其中包含團(tuán)隊名稱,比賽ID和該比賽中完成的傳球次數(shù)。
pass_per_match = events[events.eventName == 'Pass']\[['teamName','matchId','eventName']]\.groupby(['teamName','matchId']).count()\
.reset_index().rename(columns={'eventName':'numberofPasses'})(image by author)(作者提供的圖片)
Augsburg made 471 passes in match 2516745. Here is the list of top 5 teams in terms of the number of passes per match.
奧格斯堡在比賽2516745中取得471次傳球。這是每場比賽的傳球次數(shù)排名前5的球隊。
pass_per_match[['teamName','numberofPasses']]\.groupby('teamName').mean()\
.sort_values(by='numberofPasses', ascending=False).round(1)[:5](image by author)(作者提供的圖片)
It is not a surprise that Bayern Munich has the most number of passes. They have been dominating the Bundesliga in recent years.
拜仁慕尼黑通過的次數(shù)最多也就不足為奇了。 近年來,他們一直統(tǒng)治著德甲聯(lián)賽。
球員平均傳球時間 (Average Pass Length of Players)
A pass can be evaluated based on many things. Some passes are so successful that they make it extremely easy to score.
可以基于許多因素評估通過。 有些通行證是如此成功,以至于它們非常容易得分。
We will focus on a quantifiable evaluation of passes which is the length. Some players are very good at long passes.
我們將專注于通過的量化評估,即長度。 有些球員擅長長傳。
The positions column contains the initial and final location of the ball in terms of x and y coordinates. We can calculate the length based on these coordinates. Let’s first create a dataframe that only contains the passes.
位置列包含球在x和y坐標(biāo)上的初始和最終位置。 我們可以根據(jù)這些坐標(biāo)計算長度。 首先創(chuàng)建一個僅包含傳遞的數(shù)據(jù)框。
passes = events[events.eventName=='Pass'].reset_index(drop=True)We can now calculate the length.
現(xiàn)在我們可以計算長度了。
pass_length = []for i in range(len(passes)):
length = np.sqrt(((passes.positions[i][0]['x'] -
passes.positions[i][1]['x'])**2)\ +
((passes.positions[i][0]['y'] -
passes.positions[i][1]['y'])**2))pass_length.append(length)passes['pass_length'] = pass_length
The groupby function can be used to calculate the average pass length for each player.
groupby函數(shù)可用于計算每個玩家的平均傳球長度。
passes[['playerName','pass_length']].groupby('playerName')\.agg(['mean','count']).\
sort_values(by=('pass_length','mean'), ascending=False).round(1)[:5](image by author)(作者提供的圖片)
We have listed the top 5 players in terms of the average pass length along with the number of passes they completed. The number of passes is important because making only 3 passes do not mean much with regards to the average. Thus, we can filter the ones that are less than a certain amount of passes.
我們根據(jù)平均傳球長度和他們完成的傳球次數(shù)列出了前5名選手。 通過的次數(shù)很重要,因?yàn)閷τ谄骄?#xff0c;僅進(jìn)行3次并不意味著太多。 因此,我們可以過濾少于通過次數(shù)的那些。
獲勝和不獲勝的平均通過次數(shù) (Average Number of Passes for Win and Not-Win)
Let’s do a comparison of the average number of passes between win and not-win matches. I will use the matched of B. Leverkusen as an example.
讓我們比較獲勝和非獲勝比賽的平均傳球次數(shù)。 我將以勒沃庫森(B. Leverkusen)的匹配為例。
We first need to add the winner of the match from the “matches” dataframe.
我們首先需要從“比賽”數(shù)據(jù)框中添加比賽的獲勝者。
events = pd.merge(events, matches[['wyId','winner']], left_on='matchId', right_on='wyId')events.drop('wyId', axis=1, inplace=True)We can now create a dataframe that only contains events whose team Id is 2446 (ID of B. Leverkusen).
現(xiàn)在,我們可以創(chuàng)建一個僅包含團(tuán)隊ID為2446(B。Leverkusen的ID)的事件的數(shù)據(jù)框。
leverkusen = events[events.teamId == 2446]The winner is B. Leverkusen if the value in the “winner” column is equal to 2446. In order to calculate the average number of passes in the matches that B. Leverkusen won, we need to filter the dataframe based on the winner and eventName columns. We will then apply groupby and count to see the number of passes per match.
如果“獲勝者”列中的值等于2446,則獲勝者為B. Leverkusen。為了計算B. Leverkusen贏得比賽的平均傳球次數(shù),我們需要根據(jù)獲勝者和eventName過濾數(shù)據(jù)幀列。 然后,我們將應(yīng)用groupby并計數(shù)以查看每場比賽的傳球次數(shù)。
passes_in_win = leverkusen[(leverkusen.winner == 2446) & (leverkusen.eventName == 'Pass')][['matchId','eventName']].groupby('matchId').count()passes_in_notwin = leverkusen[(leverkusen.winner != 2446) & (leverkusen.eventName == 'Pass')][['matchId','eventName']].groupby('matchId').count()(image by author)(作者提供的圖片)We can easily get the average number of passes by applying the mean function.
通過應(yīng)用均值函數(shù),我們可以輕松獲得平均通過次數(shù)。
(image by author)(作者提供的圖片)Although making more passes does not mean a certain win, it will help you in dominating the game and increasing your chances to score.
盡管獲得更多的通過并不意味著一定會獲勝,但這將幫助您控制比賽并增加得分機(jī)會。
The scope of sports analytics extends far beyond what we have done in this post. However, without getting familiar with the basics, it will be harder to grasp the knowledge of more advanced techniques.
體育分析的范圍遠(yuǎn)遠(yuǎn)超出了我們在本文中所做的。 但是,如果不熟悉基礎(chǔ)知識,將很難掌握更先進(jìn)技術(shù)的知識。
Data visualization is also fundamental in sports analytics. How teams and players manage the pitch, the locations of shots and passes, and areas of the pitch that are covered the most provide valuable insight.
數(shù)據(jù)可視化也是體育分析的基礎(chǔ)。 團(tuán)隊和球員如何管理球場,射門和傳球的位置以及球場上覆蓋最廣的區(qū)域,這些都可以提供寶貴的見解。
I will also write posts about how certain events can be visualized on the pitch. Thank you for reading. Please let me know if you have any feedback.
我還將撰寫有關(guān)如何在球場上可視化某些事件的文章。 感謝您的閱讀。 如果您有任何反饋意見,請告訴我。
[1] Pappalardo et al., (2019) A public data set of spatio-temporal match events in soccer competitions, Nature Scientific Data 6:236, https://www.nature.com/articles/s41597-019-0247-7
[1] Pappalardo等人,(2019)足球比賽中時空比賽事件的公共數(shù)據(jù)集,自然科學(xué)數(shù)據(jù)6:236, https ://www.nature.com/articles/s41597-019-0247- 7
[2] https://figshare.com/articles/Events/7770599
[2] https://figshare.com/articles/Events/7770599
翻譯自: https://towardsdatascience.com/introduction-to-sports-analytics-with-pandas-ad6303db9e11
熊貓燒香源碼分析
總結(jié)
以上是生活随笔為你收集整理的熊猫烧香源码分析_熊猫体育分析入门的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: USB/UART 串口转LoRa无线传输
- 下一篇: 汇编语言程序设计--基于ARM