使用Scikit-learn,Spotify API和Tableau Public进行无监督学习
I will also walk through the OSEMN framework for this machine learning example. The acronym, OSEMN, stands for Obtain, Scrub, Explore, Model, and iNterpret. This is the most common framework for Data Scientists working on machine learning problems.
我還將通過該機器學習示例的OSEMN框架。 首字母縮寫詞OSEMN代表獲取,清理,探索,模型和iNterpret。 這是數據科學家研究機器學習問題的最常見框架。
With out further ado, let’s get started.
事不宜遲,讓我們開始吧。
使用Spotify API獲取數據 (Obtaining Data Using Spotify API)
First, we use the Spotify API client to obtain our data. If you are not familiar with Spotify API, you will find the CodeEntrepreneur’s 30 Days of Python — Day 19 — The Spotify API — Python TUTORIAL very helpful, especially since the documentation is written in JavaScript. Spotify’s Web API requires a developers account, and the API call requires an authenticated token.
首先,我們使用Spotify API客戶端獲取數據。 如果您不熟悉Spotify API,您會發現CodeEntrepreneur的Python 30天-第19天 -Spotify API-Python教程非常有用,特別是因為該文檔是用JavaScript編寫的。 Spotify的Web API需要開發者帳戶,并且API調用需要經過身份驗證的令牌。
We will use the Get Artist endpoint from Spotify to obtain data for 30 artists originally from Houston, TX. You can download the .xlsx file from my GitHub repository if you would like to use the same artists for the API client.
我們將使用Spotify的“ 獲取藝術家”端點來獲取30位來自德克薩斯州休斯頓的藝術家的數據。 如果您希望對API客戶端使用相同的藝術家,則可以從我的GitHub存儲庫下載.xlsx文件。
Make sure you set the client id and client secret:
確保設置了客戶端ID和客戶端密碼:
client_id = 'your_client_id'client_secret = 'your_secret_id'
spotify = SpotifyAPI(client_id, client_secret)In [ ]:spotify.get_artist('35magIA6t9JpNwT0sPEBgM')Out[ ]:{'external_urls': {'spotify': 'https://open.spotify.com/artist/35magIA6t9JpNwT0sPEBgM'},
'followers': {'href': None, 'total': 383},
'genres': ['houston rap'],
'href': 'https://api.spotify.com/v1/artists/35magIA6t9JpNwT0sPEBgM',
'id': '35magIA6t9JpNwT0sPEBgM',
'images': [{'height': 640,
'url': 'https://i.scdn.co/image/ab67616d0000b27326ac896c0d9a0e266c29ec27',
'width': 640},
{'height': 300,
'url': 'https://i.scdn.co/image/ab67616d00001e0226ac896c0d9a0e266c29ec27',
'width': 300},
{'height': 64,
'url': 'https://i.scdn.co/image/ab67616d0000485126ac896c0d9a0e266c29ec27',
'width': 64}],
'name': 'Yb Puerto Rico',
'popularity': 26,
'type': 'artist',
'uri': 'spotify:artist:35magIA6t9JpNwT0sPEBgM'}
Since the api response returns a JSON object, we can parse each artists’ data for the information we need. I used a for loop to pull data from each artists adding their results to list. You can also try the Get Several Artists endpoint to get multiple artists in one response.
由于api響應返回一個JSON對象,因此我們可以解析每個藝術家的數據以獲取所需的信息。 我使用了for循環從每位歌手中提取數據,并將其結果添加到列表中。 您也可以嘗試“ 獲取多位藝術家”端點來在一個響應中獲得多位藝術家。
Here is my code to load the .csv file and get the artist info from Spotify:
這是我的代碼,用于加載.csv文件并從Spotify獲取藝術家信息:
# Load .csv with artists and spotifyIDsimport pandas as pdcsv = "Houston_Artists_SpotifyIDs.csv"
df = pd.read_csv(csv)
X = df['Spotify ID']# For loop to collect JSON responses for each artist
json_results = []for i in X:
json_results.append(spotify.get_artist(f'{i}'))
從Spotify清除數據 (Scrubbing the Data from Spotify)
Once we have everyone’s information parsed, we no longer require the api client. We can use python check pull the number of followers and the popularity score for each artist stored in our list. We can also check for any duplicates and handle any missing values.
解析完每個人的信息后,我們將不再需要api客戶端。 我們可以使用python check pull來存儲列表中每個藝術家的關注者數量和受歡迎度得分。 我們還可以檢查是否有重復項,并處理任何缺失的值。
Here is my code to parse the JSON info into a Pandas Dataframe:
這是將JSON信息解析為Pandas Dataframe的代碼:
In [ ]:names = []followers = []
popularity = []
genres = []
urls = []
for i in json_results:
names.append(i['name'])
followers.append(i['followers']['total'])
popularity.append(i['popularity'])
genres.append(i['genres'])
urls.append(i['external_urls']['spotify'])
df = pd.DataFrame()
df['names'] = names
df['followers'] = followers
df['popularity'] = popularity
df['genre'] = genres
df['url'] = urls
df.head()Out [ ]: names followers popularity genre url
0 AcePer$ona 15 1 [] https://open.spotify.com/artist/4f06tvRb3HaDFC...
1 AliefBiggie 26 0 [] https://open.spotify.com/artist/1WkWfhdsdSqVYT...
2 Amber Smoke 93 2 [] https://open.spotify.com/artist/2JrntJAExmduLd...
3 Beyoncé 23821006 89 [dance pop, pop, post-teen pop, r&b] https://open.spotify.com/artist/6vWDO969PvNqNY...
4 Chucky Trill 957 25 [houston rap] https://open.spotify.com/artist/2mdDdKL0UzOqSq...
探索Spotify中的數據 (Exploring the Data from Spotify)
With our data now in a pandas dataframe, we can begin analyzing our sample artists. By view the data table by followers, we can see two outliers, Beyonce and Travis Scott.
現在我們的數據已存儲在熊貓數據框中,我們就可以開始分析樣本藝術家了。 通過關注者查看數據表,我們可以看到兩個離群值,碧昂絲和特拉維斯·斯科特。
Image by Jacob Tadesse圖片提供者:Jacob TadesseWe can also see that these two artists are similar but slightly different. While Beyonce has more followers, Travis Scott trumps Beyonce in Spotify popularity with Travis scoring 96 and Beyonce scoring 89, a whole seven points higher than ‘Queen B’.
我們還可以看到,這兩位藝術家相似但略有不同。 盡管碧昂斯擁有更多的追隨者,但特拉維斯·斯科特(Travis Scott)在Spotify的流行度上擊敗了碧昂斯(Beyonce),特拉維斯獲得96分,碧昂斯獲得89分,比“女王B”高出整整七分。
Image by Jacob Tadesse圖片提供者:Jacob TadesseWe can also see the top artists by followers and popularity, and on the opposite end of the spectrum, we can see the bottom artists by followers and popularity.
我們還可以按關注者和受歡迎程度查看排名靠前的藝術家,而在頻譜的另一端,我們可以按關注者和受歡迎程度查看排名靠后的藝術家。
追隨者: (By Followers:)
Image by Jacob Tadesse圖片提供者:Jacob Tadesse按受歡迎程度: (By Popularity:)
Image by Jacob Tadesse圖片提供者:Jacob TadesseWhile this can be helpful to know who is who, and who has what kind of following or popularity, the purpose of this post was to use an unsupervised learning algorithm to compare these artists.
雖然這有助于了解誰是誰,以及誰具有什么樣的追隨者或受歡迎程度,但本文的目的是使用無監督學習算法來比較這些藝術家。
使用Scikit學習建模數據 (Modeling the Data with Scikit-learn)
“Unsupervised Learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data.” — Mathworks
“無監督學習是一種機器學習算法,用于從包含輸入數據而沒有標記響應的數據集中得出推論。 最常見的無監督學習方法是聚類分析,它用于探索性數據分析,以發現隱藏的模式或數據分組。” — Mathworks
Since we would like to compare these artists, we will use an unsupervised learning algorithm to group/cluster our artists together. To be more specific, will use Scikit-Learn’s KMeans algorithm for this example. Let’s import the module from sklearn. You can follow the instructions here to install sklearn.
由于我們想比較這些藝術家,因此我們將使用無監督學習算法將我們的藝術家分組/聚類。 更具體地說,在此示例中,將使用Scikit-Learn的KMeans算法。 讓我們從sklearn導入模塊。 您可以按照此處的說明安裝sklearn 。
from sklearn.cluster import KMeansThe KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.
的 KMeans 通過嘗試不同的樣品中的n個組等于方差,最小化被稱為慣性或內簇求和的平方 的標準算法簇的數據 。
Scikit-Learn.org
Scikit-Learn.org
縮放功能 (Scaling the Features)
It’s always a best practice to scale your features before training any machine learning algorithm. It’s also great practice to handle your outliers. In this example, I will keep our outliers in and used a MinMaxScaler to scale the features in our small dataset, but we could also use the RobustScaler to scale features using statistics that are robust to outliers.
在訓練任何機器學習算法之前,擴展功能始終是最佳實踐。 處理異常值也是一種很好的做法。 在此示例中,我將保留離群值,并使用MinMaxScaler縮放小型數據集中的要素 ,但我們也可以使用RobustScaler通過對離群值具有魯棒性的統計來縮放要素 。
from sklearn.preprocessing import MinMaxScalers = MinMaxScaler()
Scaled_X = s.fit_transform(X)
訓練模型 (Training the Model)
Now that we have scaled and transformed our features, we can train our KMeans algorithm. We are required to set a number of clusters, and there are several methods to help select the best number of clusters, but for our example, we will use 8 clusters. We will use the predict method to predict the group or cluster for each artists based on their followers and popularity.
現在我們已經縮放和變換了功能,我們可以訓練KMeans算法。 我們需要設置多個群集,并且有多種方法可以幫助選擇最佳群集數量,但是在我們的示例中,我們將使用8個群集。 我們將使用預測方法根據每個藝術家的關注者和受歡迎程度來預測其分組或聚類。
kmeans = KMeans(n_clusters=8, random_state=0).fit(Scaled_X)y_kmeans = kmeans.predict(X)
查看集群 (Viewing the Clusters)
To view the clusters, we can pull the labels from our Kmeans model by using the method, .labels_. We can also view the centers of the clusters by using the method, .cluster_centers_.
要查看群集,可以使用.labels_方法從Kmeans模型中提取標簽。 我們還可以使用.cluster_centers_方法查看群集的中心。
df['labels'] = kmeans.labels_centers = kmeans.cluster_centers_Image by Jacob Tadesse圖片提供者:Jacob Tadesse
Image by Jacob Tadesse
圖片提供者:Jacob Tadesse
While we could plot these in a notebook, I’ll use Tableau Public to create a dashboard for public access. Let’s save our dataframe as an Excel file.
雖然我們可以在筆記本中繪制這些圖形,但我將使用Tableau Public創建儀表板以供公共訪問。 讓我們將數據框保存為Excel文件。
保存結果 (Saving the Results)
Below, we will filter our dataframe to only include the artist name, followers, popularity, and labels. Then we will sort the data by the label value.
在下面,我們將過濾數據框,使其僅包含藝術家姓名,關注者,受歡迎程度和標簽。 然后,我們將根據標簽值對數據進行排序。
final = df[['names','followers','popularity','labels']].sort_values(by='labels', ascending=False)Finally, we will use the built-in Pandas method, .to_excel, to save the file.
最后,我們將使用內置的Pandas方法.to_excel來保存文件。
final.to_excel('Houston_Artists_Categories_8-7-2020.xlsx')從Kmeans解釋聚類 (Interpreting Clusters from Kmeans)
I used Tableau Public (it’s FREE) to create an interactive dashboard of the results. With Artist grouped by cluster, we can see Beyonce and Travis Scott are in their own clusters, while other Houston artists are grouped together by similar followers and popularity. Thank you for reading this article, I hope you found it helpful in comparing Houston Artists using unsupervised learning!
我使用Tableau Public (免費)創建了結果的交互式儀表板。 通過按分組對藝術家進行分組,我們可以看到碧昂斯和特拉維斯·斯科特在各自的分組中,而其他休斯敦藝術家則按相似的追隨者和受歡迎程度分組。 感謝您閱讀本文,希望對使用無監督學習的休斯頓藝術家進行比較有幫助!
Image by Jacob Tadesse圖片提供者:Jacob TadesseHere is the link to the dashboard.
這是儀表板的鏈接。
Also, here is a link to the repo.
另外,這是到repo的鏈接。
If you would like to contribute to this project, contact me on LindedIn.
如果您想為這個項目做貢獻,請通過LindedIn與我聯系。
翻譯自: https://towardsdatascience.com/unsupervised-learning-with-scikit-learn-spotify-api-and-tableau-public-50fcecf3bdf5
總結
以上是生活随笔為你收集整理的使用Scikit-learn,Spotify API和Tableau Public进行无监督学习的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 苹果 iPhone 14“车祸检测”功能
- 下一篇: 浙江:目标到 2025 年新能源汽车年产