使用机器学习预测天气_如何使用机器学习根据文章标题预测喜欢和分享
使用機器學習預測天氣
by Flavio H. Freitas
Flavio H.Freitas著
如何使用機器學習根據文章標題預測喜歡和分享 (How to predict likes and shares based on your article’s title using Machine Learning)
Choosing a good title for an article is an important step in the writing process. The more interesting the title seems, the higher the chance a reader will interact with the whole thing. Furthermore, showing the user content they prefer (to interact with) increases the user’s satisfaction.
為文章選擇一個好的標題是寫作過程中的重要一步。 標題似乎越有趣,讀者與整個事物進行交互的機會就越高。 此外,顯示他們喜歡(與之交互)的用戶內容可以提高用戶的滿意度。
This is how my final project from the Machine Learning Engineer Nanodegree specialization started. I just finished it, and I feel so proud and happy ? that I wanted to share with you some insights I’ve had about the whole flow. Also, I promised Quincy Larson this article when I finished the project.
這就是我來自機器學習工程師納米學位專業的最終項目的開始方式。 我剛完成,就感到如此自豪和幸福 ? 我想與您分享我對整個流程的一些見解。 另外,我在完成項目時向Q uincy Larson承諾了這篇文章。
If you want to see the final technical document click here. If you want the implementation of the code, check it out here or fork my project on GitHub. If you just want an overview using layperson’s terms, this is the right place — continue reading this article.
如果要查看最終技術文檔, 請單擊此處 。 如果您想執行代碼,請在此處查看或在GitHub上分叉我的項目。 如果您只想使用通俗易懂的術語進行概述,那么這里是正確的地方-繼續閱讀本文。
Some of the most used platforms to spread ideas nowadays are Twitter and Medium (you are here!). On Twitter, articles are normally posted including external URLs and the title, where users can access the article and demonstrate their satisfaction with a like or a retweet of the original post.
如今,用于傳播思想的一些最常用的平臺是Twitter和Medium(您在這里!)。 在Twitter上,通常會發布包含外部URL和標題的文章,用戶可以在其中訪問文章并通過對原始帖子的贊或轉發來表明其滿意。
Medium shows the full text with tags (to classify the article) and claps (similar to Twitter’s likes) to show how much the users appreciate the content. A correlation between these two platforms can bring us valuable information.
中號顯示帶有標簽(對文章進行分類)和拍手(類似于Twitter的贊)的全文,以顯示用戶對內容的欣賞程度。 這兩個平臺之間的關聯可以為我們帶來有價值的信息。
該項目 (The project)
The problem that I defined was a classification task using supervised learning: Predict the number of likes and retweets an article receives based on the title.
我定義的問題是使用監督學習的分類任務: 根據標題預測文章收到的喜歡和轉發的次數。
Correlating the number of likes and retweets from Twitter with a Medium article is an attempt to isolate the effect of the number of reached readers and the number of Medium claps. Because the more the article is shared on different platforms, the more readers it will reach and the more Medium claps it will (likely) receive.
將來自Twitter的點贊和轉發的次數與“中型”文章相關聯,是一種嘗試將達到的讀者數量和“中型”拍手數量的影響分開的嘗試。 由于在不同平臺上分享的文章越多,讀者就會越多,并且(可能)會收到更多的中獎。
Using only the Twitter statistic, we’d expect that the articles reached initially almost the same number of readers (those readers being the followers of the freeCodeCamp account on Twitter). Their performance and interactions, therefore, would be limited to the characteristics of the tweet — for example, the title of the article. And that is exactly what we want to measure.
我們僅使用Twitter統計信息,就可以預期文章最初吸引的讀者人數幾乎相同(這些讀者是Twitter上freeCodeCamp帳戶的追隨者)。 因此,它們的性能和交互作用將僅限于該推文的特性,例如,文章標題。 而這正是我們要衡量的。
I chose the freeCodeCamp account for this project because the idea was to limit the scope of the subject of the articles and better predict the response on a specific field. The same title can perform well in one category (e.g. Technology), but not necessarily in a different one (e.g. Culinary). Also, this account posts the title of the original article and the URL on Medium as the tweet content.
我之所以選擇該項目的freeCodeCamp帳戶 ,是因為其想法是限制文章主題的范圍,并更好地預測特定領域的響應。 同一標題在一個類別(例如技術)中可以表現良好,但不一定在另一個類別(例如烹飪)中表現良好。 另外,此帳戶將原始文章的標題和URL張貼在Medium上作為推文內容。
數據看起來如何? (How does the data look?)
The first step of this project was to get the information from Twitter and Medium and then correlate it. The dataset can be found here and it has 711 data points. This is how the dataset looks like:
該項目的第一步是從Twitter和Medium獲取信息,然后將其關聯。 數據集可以在這里找到,它具有711個數據點。 數據集如下所示:
分析和學習數據 (Analyzing and learning with the data)
After analyzing the dataset and plotting some graphics, I found interesting information about it. For these analyses, the outliers were removed, and I just considered the 25% top performers for each feature (retweet, like, and clap).
在分析數據集并繪制一些圖形之后,我發現了有關它的有趣信息。 對于這些分析, 離群值被刪除了,我只是考慮了每個功能(轉推,喜歡和鼓掌)中表現最好的25% 。
So let’s take a look at what the numbers say for freeCodeCamp articles written on Medium and shared on Twitter.
因此,讓我們看一下這些數字對在Medium上寫并在Twitter上共享的freeCodeCamp文章的含義。
好的標題長度是多少? (What is a good title length?)
Writing titles that have a length greater than 50 and less than 110 characters helps to increase the chances of a successful article.
撰寫長度超過50個字符且少于110個字符的標題有助于增加文章成功的機會。
標題中有多少個單詞? (What is a good number of words in the title?)
The most effective number of words in the title is 9 to 17. To optimize the number of retweets and likes, try something from 9 to 18 words, and for claps from 7 to 17.
標題中最有效的單詞數是9到17 。 要優化轉發和點贊的次數,請嘗試輸入9到18個單詞,拍手范圍為7到17個單詞。
哪些類別最適合標記? (Which are the best categories to tag?)
Programming, Tech, Technology, JavaScript and Web Development are categories you should consider when tagging your next article. They appear for all the three features as a good indicator.
編程 , 技術 , 技術 , JavaScript和Web開發是標記下一篇文章時應考慮的類別。 對于所有這三個功能,它們都可以作為一個很好的指示。
最好使用哪些單詞? (Which are the best words to use?)
In this lexical analysis, you’ll notice that some words get much more attention on the freeCodeCamp community than others. If the intention is to make the articles reach further in numbers, talking about JavaScript, React or CSS will increase how much it’s appreciated. Using the words “learn” or “guide” to describe will also make the probability higher.
在此詞法分析中,您會注意到,在FreeCodeCamp社區中,某些單詞比其他單詞受到更多關注。 如果希望使文章的數量更多,那么談論JavaScript,React或CSS將會增加它的贊賞程度。 使用“學習”或“指南”一詞來描述也將使概率更高。
使用機器學習 (Using Machine Learning)
OK! After taking a look at the data and extracting some information from it, the goal was to create a Machine Learning model that makes predictions of the number of retweets, likes, and claps based on the title of the article.
好! 在查看了數據并從中提取了一些信息之后,目標是創建一個機器學習模型,該模型根據文章的標題來預測轉發,喜歡和拍手的數量。
Predicting the number of retweets, likes, and claps of an article can be treated as a classification problem, and that is a common task of machine learning (ML). But for this, we need to use the output as discrete values (a range of numbers). The input will be the title of the articles with each word as a token (t1, t2, t3, … tn), the title length, and the number of words in the title.
預測文章的轉發,喜歡和拍手的數量可以視為分類問題,而這是機器學習(ML)的常見任務。 但是為此,我們需要將輸出用作離散值(數字范圍)。 輸入將是文章的標題,每個單詞作為標記(t1,t2,t3,…tn),標題長度和標題中的單詞數。
The ranges for our features are:
我們的功能范圍是:
- Retweets: 0–10, 10–30, 30+ 轉推:0-10、10-30、30 +
- Likes: 0–25, 25–60, 60+ 喜歡:0–25、25–60、60 +
- Claps: 0–50, 50–400, 400+ 拍手:0–50、50–400、400 +
And finally, after preprocessing our dataset and evaluating some models (everything fully described here), we reached the conclusion that the MultinomialNB model performed better for retweets reaching an accuracy of 60.6%. Logistic regression reached 55.3% for likes and 49% for claps.
最后,在對數據集進行預處理并評估了一些模型( 此處已全面描述)后,我們得出的結論是,MultinomialNB模型對轉發的性能更好,達到60.6%的準確性。 對喜歡的Logistic回歸達到55.3%,對拍手的Logistic回歸達到49%。
As an experiment for this article, I ran the prediction of the title of this article and the model predicted that:
作為本文的實驗,我對本文標題進行了預測,該模型預測:
It will have 10–30 retweets and 25–60 favorites on Twitter and 400+ claps on Medium.
在Twitter上將有10–30條轉發和25–60條收藏夾,在Medium將有400多個拍手。
How is this prediction? ?
這個預測如何? ?
Follow me if you want to read more of my articles ? And if you enjoyed this article, be sure to like it give me a lot of claps — it means the world to the writer.
如果您想我的文章,請 關注我 ? 而且,如果您喜歡這篇文章,請確保喜歡它給了我很多鼓掌-這對作家來說意味著世界。
Flávio H. de Freitas is an Entrepreneur, Engineer, Tech lover, Dreamer and Traveler. Has worked as CTO in Brazil, Silicon Valley and Europe.
FlávioH. de Freitas是一位企業家,工程師,技術愛好者,夢想家和旅行者。 曾在巴西 , 硅谷和歐洲擔任首席技術官 。
翻譯自: https://www.freecodecamp.org/news/how-to-predict-likes-and-shares-based-on-your-articles-title-using-machine-learning-47f98f0612ea/
使用機器學習預測天氣
總結
以上是生活随笔為你收集整理的使用机器学习预测天气_如何使用机器学习根据文章标题预测喜欢和分享的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到老婆生儿子什么意思
- 下一篇: 梦到偶像死了是什么征兆