数据可视化及其重要性:Python
Data visualization is an important skill to possess for anyone trying to extract and communicate insights from data. In the field of machine learning, visualization plays a key role throughout the entire process of analysis.
對(duì)于任何試圖從數(shù)據(jù)中提取和傳達(dá)見解的人來說,數(shù)據(jù)可視化都是一項(xiàng)重要技能。 在機(jī)器學(xué)習(xí)領(lǐng)域,可視化在整個(gè)分析過程中都扮演著關(guān)鍵角色。
Why do we need to visualize the data?
為什么我們需要可視化數(shù)據(jù)?
Let’s say, we have data set of Car Sales across four continents in the first 11 months.
假設(shè)我們?cè)谇?1個(gè)月?lián)碛兴拇笾薜钠囦N售數(shù)據(jù)集。
Car Sales from Jan to Nov1月至11月的汽車銷量It is pretty cumbersome to analyze each column separately and draw some conclusions by the above data. So, what we generally do is, summarize the data and deduce some insights from it. Now, let’s see how the sales have performed in each continent when compared to others, for that, we’ll calculate the average of Discount and Sales for each continent,
分別分析各列并根據(jù)上述數(shù)據(jù)得出一些結(jié)論是非常麻煩的。 因此,我們通常要做的是匯總數(shù)據(jù)并從中得出一些見解。 現(xiàn)在,讓我們看看與其他大陸相比,每個(gè)大陸的銷售情況如何,為此,我們將計(jì)算每個(gè)大陸的折扣和銷售平均值,
Average of Discount and Sales折扣和銷售平均值It looks like the Sales have been pretty equal across the continents for the first 11 months. Let’s also take a look at the Standard Deviation of each column by further inspecting the data,
前11個(gè)月,各大洲的銷售情況似乎相當(dāng)。 讓我們通過進(jìn)一步檢查數(shù)據(jù)來查看每列的標(biāo)準(zhǔn)差,
Standard Deviation across the continents各大洲的標(biāo)準(zhǔn)差So, by the above data, we can infer that the performance of the sales has been the same when compared to the continents. See, this is where the summary statistics tend to mislead.
因此,根據(jù)以上數(shù)據(jù),我們可以推斷出與各大洲相比,銷售業(yè)績(jī)是相同的。 瞧,這就是匯總統(tǒng)計(jì)數(shù)據(jù)容易引起誤解的地方。
If we plot the Sales performance across the Discount rate from the above data in Python on a scatter plot, we get the following graphs.
如果我們根據(jù)散點(diǎn)圖上Python中上述數(shù)據(jù)在折現(xiàn)率上繪制Sales性能,則會(huì)得到以下圖形。
Scatter Plot散點(diǎn)圖Each of the continents had employed a different strategy to boost their sales and their discount rate, and the sales numbers were also quite different across all of them. It is difficult to understand the pattern or the strategy of each of the continents using the numbers alone. So, that’s why it is important to Visualize the data instead of drawing the conclusions based on only numbers.
每個(gè)大洲都采用了不同的策略來提高銷售量和折扣率,并且所有銷售量的差異也很大。 僅憑數(shù)字很難理解每個(gè)大洲的格局或戰(zhàn)略。 因此,這就是為什么要可視化數(shù)據(jù)而不是僅基于數(shù)字得出結(jié)論很重要的原因。
The above data-set is a modified version of Anscombe’s quartet, they were constructed in 1973 by the statistician Francis Anscombe, to counter the impression among statisticians that “numerical calculations are exact, but graphs are rough.”
上面的數(shù)據(jù)集是Anscombe四重奏的修改版本,它們是由統(tǒng)計(jì)學(xué)家Francis Anscombe于1973年構(gòu)建的,目的是抵消統(tǒng)計(jì)學(xué)家的印象,即“數(shù)值計(jì)算是精確的,但圖形是粗糙的”。
You can find more about Anscombe’s quartet here.
您可以在此處找到有關(guān)Anscombe四重奏的更多信息。
So, now comes the million-dollar question,
因此,現(xiàn)在出現(xiàn)了百萬美元的問題,
我們應(yīng)該使用哪個(gè)Python庫進(jìn)行數(shù)據(jù)可視化? (Which Python Library should we use for Data Visualization?)
Python has some of the most interactive data visualization tools. The most basic plot types are shared between multiple libraries, but others are only available in certain libraries.
Python具有一些最具交互性的數(shù)據(jù)可視化工具。 最基本的繪圖類型在多個(gè)庫之間共享,但是其他類型僅在某些庫中可用。
The three main data visualization libraries used by every data scientist is:
每個(gè)數(shù)據(jù)科學(xué)家使用的三個(gè)主要的數(shù)據(jù)可視化庫是:
1. Matplotlib (1. Matplotlib)
Matplotlib is the most popular data visualization library of Python. It is used to generate simple yet powerful visualizations. Everyone, from beginners to seasoned professionals in Data science, Matplotlib is the most widely used library for plotting.
Matplotlib是最受歡迎的Python數(shù)據(jù)可視化庫。 它用于生成簡(jiǎn)單而強(qiáng)大的可視化。 從初學(xué)者到經(jīng)驗(yàn)豐富的數(shù)據(jù)科學(xué)專業(yè)人士,Matplotlib是最廣泛使用的繪圖庫。
Advantages:
優(yōu)點(diǎn):
2. Seaborn (2. Seaborn)
The Python library Seaborn is a data visualization library based on Matplotlib. Seaborn provides a variety of visualization patterns. It is more integrated to work with Pandas dataframe compared to matplotlib. Seaborn is widely used for statistics visualization because it has some of the best statistical tasks built with-in.
Python庫Seaborn是基于Matplotlib的數(shù)據(jù)可視化庫。 Seaborn提供了多種可視化模式。 與matplotlib相比,它與Pandas數(shù)據(jù)框的集成度更高。 Seaborn被廣泛用于統(tǒng)計(jì)可視化,因?yàn)樗哂幸恍﹥?nèi)置的最佳統(tǒng)計(jì)任務(wù)。
Advantages:
優(yōu)點(diǎn):
3. Seaborn works with the whole dataset as a whole compared to matplotlib which deals with dataframes and arrays.
3.與處理數(shù)據(jù)幀和數(shù)組的matplotlib相比,Seaborn可以處理整個(gè)數(shù)據(jù)集。
3.密謀 (3. Plotly)
Plotly provides interactive plots and is easily readable to an audience who doesn’t have much knowledge of reading plots. Plotly is mostly used for handing the geographical, scientific, statistical, and financial data.
Plotly提供交互式繪圖,對(duì)于不了解繪圖的讀者很容易理解。 Plotly主要用于處理地理,科學(xué),統(tǒng)計(jì)和財(cái)務(wù)數(shù)據(jù)。
Advantages:
優(yōu)點(diǎn):
3. While using Plotly, if we mouse over on the Graph, it shows the values of the axis at that particular point.
3.使用Plotly時(shí),如果將鼠標(biāo)懸停在Graph上,它將顯示該特定點(diǎn)處的軸值。
There are some more data visualization libraries available in Python like Bokeh, Altair, ggplot, etc. But, the ones mentioned above are the most common and widely used libraries across the world.
Python中還有更多可用的數(shù)據(jù)可視化庫,例如Bokeh,Altair,ggplot等。但是,上面提到的那些是世界上最常見且使用最廣泛的庫。
結(jié)論 (Conclusion)
In this article first, we learned why it is important to visualize the data instead of inferring solely based on datasheets. After that, we have seen the different types of data visualization libraries in Python. There are a wide variety of data visualization tools available in Python apart from the ones discussed and mentioned above. It is important to familiarize yourself with the libraries before proceeding with a particular approach.
首先,在本文中,我們了解了為什么對(duì)數(shù)據(jù)進(jìn)行可視化而不是僅基于數(shù)據(jù)表進(jìn)行推斷很重要。 之后,我們看到了Python中不同類型的數(shù)據(jù)可視化庫。 除了上面討論和提到的工具外,Python還提供了各種各樣的數(shù)據(jù)可視化工具。 在繼續(xù)使用特定方法之前,一定要熟悉這些庫,這一點(diǎn)很重要。
Thank you for reading and Happy Coding!!!
感謝您的閱讀和快樂編碼!!!
在這里查看我以前有關(guān)Python的文章 (Check out my previous articles about Python here)
Pandas: Python
熊貓:Python
Matplotlib: Python
Matplotlib:Python
NumPy: Python
NumPy:Python
Time Complexity and Its Importance in Python
時(shí)間復(fù)雜度及其在Python中的重要性
Python Recursion or Recursive Function in Python
Python中的Python遞歸或遞歸函數(shù)
Python Programs to check for Armstrong Number (n digit) and Fenced Matrix
用于檢查Armstrong編號(hào)(n位)和柵欄矩陣的Python程序
Python: Problems for Basics Reference — Swapping, Factorial, Reverse Digits, Pattern Print
Python:基本參考問題-交換,階乘,反向數(shù)字,圖案打印
翻譯自: https://levelup.gitconnected.com/data-visualization-and-its-importance-python-7599c1092a09
總結(jié)
以上是生活随笔為你收集整理的数据可视化及其重要性:Python的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: openai-gpt_为什么到处都看到G
- 下一篇: 梦到逮到鱼又跑了好不好