vray阴天室内_阴天有话:第1部分
vray陰天室內(nèi)
When working with text data and NLP projects, word-frequency is often a useful feature to identify and look into. However, creating good visuals is often difficult because you don’t have a lot of options outside of bar charts. Lets face it; bar charts get old and boring quick! This is where word clouds come into play. In this blog learn how to spice up your visualizations using word clouds on your next project.
在處理文本數(shù)據(jù)和NLP項目時,單詞頻率通常是識別和調(diào)查的有用功能。 但是,創(chuàng)建良好的視覺效果通常很困難,因為在條形圖之外您沒有太多選擇。 面對現(xiàn)實吧; 條形圖變老又無聊! 這就是詞云發(fā)揮作用的地方。 在此博客中,學(xué)習(xí)如何在下一個項目中使用詞云為您的可視化增添趣味。
Up until my most recent project I actually didn’t know a word cloud library existed in python, but I assure you it does, and it has some amazing features!
在我最近的項目之前,我實際上還不知道python中存在詞云庫,但是我向您保證,它確實存在,并且它具有一些驚人的功能!
The full WordCloud library and documentation can be found here for those interested.
完整的WordCloud庫和文檔可以在 此處 找到 感興趣的人。
TLDR (TLDR)
Part 1 of this blog will walk you through obtaining the appropriate libraries and the basic parameters and functions of the wordcloud library as well as how to create a generic word cloud. Part 2 will build upon this and walk you through creating custom masks for word clouds and other unique visual options.
本博客的第1部分將引導(dǎo)您獲得合適的庫以及wordcloud庫的基本參數(shù)和功能,以及如何創(chuàng)建通用詞云。 第2部分將以此為基礎(chǔ),并引導(dǎo)您為詞云和其他獨特的視覺選項創(chuàng)建自定義蒙版。
WordCloud入門 (Getting Started With WordCloud)
Before we can start making visuals, we’ll need to make sure we have the libraries we need to create our word clouds. You’ll need the following libraries:
在開始制作視覺效果之前,我們需要確保擁有創(chuàng)建詞云所需的庫。 您將需要以下庫:
- numpy 麻木
- matplotlib matplotlib
- PIL 皮爾
- wordcloud 詞云
nltk (This is only necessary for the purpose of this blog and as a source of sample text to create word clouds from)
nltk (這僅對于本博客而言是必需的,并且作為從其創(chuàng)建詞云的示例文本的來源)
All of these libraries can be pip installed if you’re unable to import them. For my specific project, I used Google Colab which required a slightly more unique solution to import wordcloud. For Google Colab users, you can use the following command to install wordcloud:
如果您無法導(dǎo)入所有這些庫,則可以通過pip安裝。 對于我的特定項目,我使用了Google Colab,它需要一個稍微獨特的解決方案來導(dǎo)入wordcloud。 對于Google Colab用戶,您可以使用以下命令來安裝wordcloud:
!pip install git+https://github.com/amueller/word_cloud.git #egg=wordcloud
!pip安裝git + https://github.com/amueller/word_cloud.git#egg = wordcloud
That last part is important for Colab because it identifies and effectively names the library so that it can be properly imported.
最后一部分對Colab很重要,因為它可以識別并有效地命名庫,以便可以正確導(dǎo)入它。
Once we have all of our needed libraries installed, we can use the following set of import statements:
一旦我們安裝了所有需要的庫,就可以使用以下一組導(dǎo)入語句:
We’re now ready to create some word clouds!
現(xiàn)在我們準(zhǔn)備創(chuàng)建一些詞云!
通用詞云 (Generic Word Clouds)
To start with, lets explore generic word clouds. For those that want to follow along, we’ll use some corpora from the nltk library.
首先,讓我們探索通用詞云。 對于那些想要繼續(xù)學(xué)習(xí)的人,我們將使用nltk庫中的一些語料庫。
First off, we’ll need to acquire our text. I’ll note here that there are two forms of text that WordCloud can use to generate a visual. The first, and the main one we’ll use, is in the form of a string. The second, is from a dictionary of words and their frequency as key-value pairs.
首先,我們需要獲取文本。 我將在此處指出,WordCloud可使用兩種形式的文本來生成視覺效果。 我們將使用的第一個也是主要的字符串形式。 第二個是來自單詞字典及其作為鍵值對的頻率。
If you’re following along, or want to attempt this using other sample text from nltk, you can use the following code to acquire our text samples:
如果您正在遵循,或者想使用來自nltk的其他示例文本來嘗試此操作,則可以使用以下代碼獲取我們的文本示例:
This shows a list of the different authors and texts we have to choose from within nltk’s gutenberg files這顯示了我們必須從nltk的gutenberg文件中選擇的不同作者和文本的列表Feel free to attempt creating word clouds from any of the above options. The one that we’ll continue with in these examples, however, will be Moby Dick.
隨意嘗試從以上任何選項創(chuàng)建詞云。 但是,在這些示例中我們將繼續(xù)討論的是Moby Dick。
To gather our sample text as a single string you can use the following command:
要將示例文本作為單個字符串收集,可以使用以下命令:
Now that we have our text, let’s take a look at how to turn this into a word cloud. What we’re doing in the code block below is instantiating a WordCloud object, we then use that object to generate a cloud based upon the text that we pass in. Once we have the cloud generated, we then want to be able to show it without the unnecessary x and y axis.
現(xiàn)在我們有了文本,讓我們看一下如何將其變成詞云。 在下面的代碼塊中,我們正在實例化一個WordCloud對象,然后使用該對象根據(jù)傳入的文本生成一個云。一旦生成了云,我們便希望能夠顯示它沒有不必要的x和y軸。
Look at that! We made a word cloud!
看那個! 我們做了一個詞云!
Now personally, I’m not a fan of the black background and it seems a little small, so let’s change that with some simple parameters.
現(xiàn)在我個人不喜歡黑色背景,而且看起來有點小,所以讓我們用一些簡單的參數(shù)來更改它。
Now we’re talking! Although, there seems to be some strange things showing up in our generic word cloud doesn’t there?
現(xiàn)在我們在說話! 雖然,在通用詞云中似乎有一些奇怪的事情出現(xiàn)了嗎?
參數(shù)和語言處理 (Parameters and Language Processing)
Looking at the cloud above we notice some things. Some words seem to be paired.
看著上面的云,我們注意到一些事情。 有些話似乎成對出現(xiàn)。
- the whale 鯨魚
- the ship 船
- the sea 海
- the captain 隊長
- White Whale 白鯨
So on and so forth. Our word cloud is still showing word frequencies however one of the parameters WordCloud has is ‘collocations’ which it defaults to True. What this does is also looks at pairs of words and their frequencies. In some instances this can definitely be useful, but in this one I think we’ll get better results not using it.
等等等等。 我們的詞云仍在顯示詞頻,但是WordCloud的參數(shù)之一是“配置”,默認為True。 這還著眼于單詞對及其頻率。 在某些情況下,這絕對是有用的,但在我看來,不使用它會得到更好的結(jié)果。
Notice the difference?
注意區(qū)別嗎?
A keen eye may recognize that the word ‘the’ no longer appears in our word cloud. This is because ‘the’ is recognized as a stop-word and excluded from the cloud even though it appears quite frequently in the text.
敏銳的眼睛可能會意識到“ the”一詞不再出現(xiàn)在我們的詞云中。 這是因為“ the”被識別為停用詞,即使在文本中出現(xiàn)頻率很高,也被排除在云端之外。
You may be wondering where stop-words came into play, and that is one of the really cool features of the wordcloud library. The library comes with it’s own list of stop-words that it uses by default. The library actually uses quite a few NLP practices by default that makes creating the clouds that much easier and also adjustable for the more experienced NLP practitioner. Some of these additional NLP parameters that are used are:
您可能想知道停用詞在哪里起作用,而這是wordcloud庫的真正酷功能之一。 該庫附帶了它自己的默認停用詞列表。 默認情況下,該庫實際上使用了許多NLP實踐,這使得創(chuàng)建云變得更加容易,并且對于經(jīng)驗豐富的NLP從業(yè)者而言也是可調(diào)整的。 使用的一些其他NLP參數(shù)是:
regexp — an optional parameter that if left blank will use r”\w[\w’]+” by default. Custom regex string can be passed in here.
regexp —一個可選參數(shù),如果保留為空白,默認情況下將使用r” \ w [\ w'] +” 。 自定義正則表達式字符串可以在此處傳遞。
normalize_plurals — default = True; For words that appear both with and without a trailing ‘s’, that ‘s’ is removed from the plural and it’s counted as another of it’s singular version
normalize_plurals —默認= True; 對于同時帶有和不帶有尾部“ s”的單詞,該“ s”將從復(fù)數(shù)形式中刪除,并被視為另一個單數(shù)形式
In our original import statement we imported STOPWORDS from the wordcloud library. You can print this to see the entire list of words that are being excluded by default, but it currently uses 192 of the most common stop-words. You can also add to this list if you have additional words you want excluded. You can also supply your own stop-words if prefer. Note that the stopwords must be passed in as a set and not a list.
在原始的導(dǎo)入語句中,我們從wordcloud庫中導(dǎo)入了STOPWORDS。 您可以打印此內(nèi)容以查看默認情況下排除的單詞的整個列表,但當(dāng)前它使用192個最常用的停用詞。 如果您想排除其他單詞,也可以添加到此列表中。 如果愿意,您也可以提供自己的停用詞。 請注意,停用詞必須作為集合而不是列表傳遞。
What a difference!
有什么不同!
One last thing we’ll talk about before moving on to making fun and unique word clouds is “relative scaling”.
在繼續(xù)取笑和獨特的詞云之前,我們要談?wù)摰淖詈笠患率恰跋鄬s放”。
Relative scaling is what’s used to determine the size of the word based upon its frequency. By default, relative scaling is set to 0.5, which is essentially the equivalent of saying that a word that occurs twice as often as another word will be 50% larger.
相對縮放是根據(jù)單詞的頻率來確定單詞大小的方法。 默認情況下,相對縮放比例設(shè)置為0.5,這基本上等于說一個單詞出現(xiàn)的頻率是另一個單詞的兩倍將增加50%。
Relative scaling can be set to any number between 0 and 1. With 0 being essentially kind of pointless as all words will be the same size, and 1 being that words that occur twice as often will be twice as large. In some cases this can be useful to better identify the differences in frequency. However, this doesn’t always look very good and can affect the fit of a word cloud to a mask which we will talk about later.
相對縮放比例可以設(shè)置為0到1之間的任何數(shù)字。0本質(zhì)上是毫無意義的,因為所有單詞的大小都相同,而1表示出現(xiàn)頻率兩倍的單詞將是兩倍大。 在某些情況下,這有助于更好地識別頻率差異。 但是,這并不總是看起來很好,并且可能會影響詞云與蒙版的匹配度,我們將在后面討論。
In this case, using a relative scaling of 1 actually doesn’t look too bad! We’ll soon see how this translates to using it with an image mask.
在這種情況下,使用1的相對比例實際上看起來還不錯! 我們將很快看到如何將其轉(zhuǎn)換為與圖像蒙版一起使用。
保存您的詞云 (Saving Your Word Cloud)
Once you have your word cloud the way you want it, you’ll probably want to save it. To do so, you can run the following code which will save the current state of your WordCloud object.
一旦有了您想要的詞云,就可能要保存它。 為此,您可以運行以下代碼來保存WordCloud對象的當(dāng)前狀態(tài)。
Keep in mind this will save the image to your local folder and if you have a specific location in mind, you will need to add in the appropriate path.
請記住,這會將圖像保存到本地文件夾,如果您有特定的位置,則需要添加適當(dāng)?shù)穆窂健?
值得一玩的其他參數(shù) (Other Parameters Worth Playing With)
We looked at the key parameters for making word clouds, but there are many more that are worth looking into and toying with. These parameters are fairly self-explanatory and can be used to further tweak your clouds:
我們研究了制作詞云的關(guān)鍵參數(shù),但是還有很多值得研究和研究的參數(shù)。 這些參數(shù)是不言自明的,可用于進一步調(diào)整云:
prefer_horizontal — (float)If set to 1, all words will appear horizontal while lower values will increase the frequency of vertical words. default = 0.9
preferred_horizo??ntal —(浮動)如果設(shè)置為1,則所有單詞將顯示為水平,而較低的值將增加垂直單詞的頻率。 默認值= 0.9
min_font_size — (int) Smallest font size to be used. default = 4
min_font_size —(int)要使用的最小字體大小。 默認= 4
max_words — (int) default = 200
max_words —(整數(shù))默認= 200
min_word_length — (int) Minimum number of letters required in a word to be in the cloud. default = 0
min_word_length —(int)單詞在云中所需的最小字母數(shù)。 默認值= 0
include_numbers — (bool) default = False
include_numbers —(布爾值)默認= False
repeat — (bool) Determines if words/phrases will be repeated until max_words or min_font_size is reached. (Can be used to create word clouds from a single word) default = False
repeat —(布爾)確定是否重復(fù)單詞/短語,直到達到max_words或min_font_size。 (可用于從單個單詞創(chuàng)建單詞云)default = False
獨特和自定義詞云 (Unique and Custom Word Clouds)
Due to this blog turning out much longer than I had initially planned, I’ll discuss using image masks to create custom word clouds, how to create your own image masks from any image, and how to apply an image’s color to your cloud in a soon to follow, Part 2 of this blog.
由于此博客的發(fā)布時間比我最初計劃的要長得多,因此我將討論使用圖像蒙版創(chuàng)建自定義文字云,如何從任何圖像創(chuàng)建自己的圖像蒙版以及如何將圖像的顏色應(yīng)用于云中。不久之后,該博客的第2部分 。
翻譯自: https://medium.com/swlh/cloudy-with-a-chance-of-words-part-1-d34a29739dba
vray陰天室內(nèi)
總結(jié)
以上是生活随笔為你收集整理的vray阴天室内_阴天有话:第1部分的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到彩色的鱼是胎梦吗
- 下一篇: 梦到老公买车了是什么意思