泊松分布–计算概率分布的公式
Probability Distributions play an important role in our daily lives. We commonly use them when trying to summarise and gain insights from different forms of data.
概率分布在我們的日常生活中起著重要作用。 在嘗試總結不同形式的數據并從中獲取見解時,我們通常使用它們。
Because of this, they're quite an important topic in fields such as Mathematics, Computer Science, Statistics, and Data Science.
因此,它們是數學,計算機科學,統計和數據科學等領域的重要主題。
There are two main types of data: Numerical (for example integers and floats), and Categorical (for example strings of text).
數據有兩種主要類型: 數值 (例如整數和浮點數)和分類 (例如文本字符串)。
Numerical data can also be in either of two forms:
數值數據也可以采用以下兩種形式之一:
Discrete: this form of data can just take a limited number of values (like the number of clothes we have). We can infer probability mass functions from discrete data.
離散的:這種形式的數據只能接受有限數量的值(例如我們擁有的衣服數量)。 我們可以從離散數據推斷概率質量函數。
Continuous: on the other hand, continuous data is used to describe more abstract concepts such as weight/distance which can take any fractional or real value. From continuous data we can instead infer probability density functions.
連續的:另一方面,連續的數據用于描述更抽象的概念,例如權重/距離,它可以取任何分數或實數值。 我們可以從連續數據中推斷出概率密度函數。
Probability mass functions can give us the probability that a variable is equal to a certain value. On the other hand, the values of probability density functions do not represent probabilities on their own, but instead first need to be integrated (within the considered range).
概率質量函數可以為我們提供變量等于某個值的概率。 另一方面,概率密度函數的值本身并不表示概率,而是首先需要積分(在所考慮的范圍內)。
什么是泊松分布? (What is a Poisson Distribution?)
Poisson Distributions are commonly used for two main purposes:
泊松分布通常用于兩個主要目的:
- Predicting how many times an event will take place within a chosen time period. This technique can be used for different risk analysis applications such as house insurance price estimation. 預測事件在選定時間段內將發生多少次。 該技術可用于不同的風險分析應用,例如房屋保險價格估計。
- Estimating a probability that an event might occur given how often it happened in the past (for example how likely it is that there will be a power-cut in the next two months). 考慮到事件過去發生的頻率,估計事件發生的可能性(例如,未來兩個月停電的可能性有多大)。
Poisson Distributions let us be confident of the average time between the occurrence of different events. They can't, however, tell us the precise moment an event might take place (since processes usually have stochastic behaviour).
泊松分布使我們對不同事件發生之間的平均時間充滿信心。 但是,他們無法告訴我們事件可能發生的確切時間(因為流程通常具有隨機行為)。
線性與非線性系統 (Linear vs non-linear systems)
Natural systems can, in fact, be divided into two main categories: linear and non-linear (stochastic).
實際上,自然系統可以分為兩大類: 線性和非線性(隨機) 。
In linear systems, causes always precede their effect which creates a strong time precedence effect.
在線性系統中,原因總是先于其結果,從而產生很強的時間優先效應。
But this doesn't instead hold true when talking about non-linear systems, as small changes in the system's initial conditions can lead to unpredictable outcomes.
但這在談論非線性系統時并不能成立,因為系統初始條件的微小變化會導致不可預測的結果。
Considering how complex and chaotic our real world is, most processes are better described using non-linear systems, although linear approximations are sometimes possible.
考慮到我們現實世界的復雜性和混亂性,使用非線性系統可以更好地描述大多數過程,盡管有時可以進行線性近似。
Poisson Distributions can be modeled using the expression in the figure below, where λ is used to represent the expected number of events which can take place in the considered time-span.
可以使用下圖中的表達式對泊松分布建模,其中λ用于 表示在考慮的時間跨度內可能發生的預期事件數。
The main characteristics which describe Poisson Processes are:
描述泊松過程的主要特征是:
泊松分布的一個例子 (An example of a Poisson Distribution)
In the figure below, you can see how varying the expected number of events (λ) which can take place in a period can change a Poisson Distribution. The image below has been simulated, making use of this Python code:
在下圖中,您可以看到改變一個時期內可能發生的事件數(λ)如何改變泊松分布。 下面的圖像已使用此Python代碼進行了模擬:
import numpy as np import matplotlib.pyplot as plt import scipy.stats as stats# n = number of events, lambd = expected number of events # which can take place in a period for lambd in range(2, 12, 2):n = np.arange(0, 9)poisson = stats.poisson.pmf(n, lambd)plt.plot(n, poisson, '-o', label="λ = {:f}".format(lambd))plt.xlabel('Number of Events', fontsize=12)plt.ylabel('Probability', fontsize=12)plt.title("Poisson Distribution varying λ")plt.legend()plt.savefig('name.png')Taking a closer look to this simulation, we can discover the following patterns:
仔細研究此模擬,我們可以發現以下模式:
- In each of the different cases, the number assigned to λ corresponds to the peak of the distribution, which then trails off moving further away from the peak. 在每種不同情況下,分配給λ的數字對應于分布的峰值,然后逐漸遠離峰值。
- The more events that are expected to take place during the simulation, the greater the expected area under the distribution curve will be. 在模擬過程中預期發生的事件越多,分布曲線下的預期面積將越大。
This type of simulation could, for example, be used to try to reduce the queuing time when going shopping to a supermarket.
例如,可以使用這種類型的模擬來嘗試減少去超市購物時的排隊時間。
The owner could create a record of how many customers visit the store at different times and on different days of the week in order to then fit this data to a Poisson Distribution.
所有者可以創建一個記錄,記錄有多少顧客在一周的不同時間和一周中的不同日期訪問該商店,然后將該數據擬合到泊松分布中。
In this way, it would be much easier to determine how many cashiers should be working at different times of the day/week in order to enhance the customer experience.
這樣,確定一天/一周的不同時間應有多少個收銀員工作以提高客戶體驗會容易得多。
結語 (Wrapping up)
In case you are interested in learning more about the applications of distributions in stochastic settings, more information is available here.
如果您有興趣了解更多有關隨機環境中分布的應用的信息,請在此處獲取更多信息。
I hope you enjoyed this article, thank you for reading!
希望您喜歡這篇文章,感謝您的閱讀!
聯絡我 (Contact me)
If you want to keep updated with my latest articles and projects follow me on Medium and subscribe to my mailing list. These are some of my contacts details:
如果您想隨時了解我的最新文章和項目,請在Medium上關注我,并訂閱我的郵件列表 。 這些是我的一些聯系方式:
Linkedin
領英
Personal Blog
個人博客
Personal Website
個人網站
Patreon
Patreon
Medium Profile
中檔
GitHub
的GitHub
Kaggle
卡格勒
翻譯自: https://www.freecodecamp.org/news/poisson-distribution-a-formula-to-calculate-probability-distribution/
總結
以上是生活随笔為你收集整理的泊松分布–计算概率分布的公式的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 5G学习(三)-SSB与初始接入
- 下一篇: 【网络实验】10G网络下的真实带宽——C