當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

python 仪表盘_如何使用Python刮除仪表板

發(fā)布時(shí)間：2023/11/29 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 仪表盘_如何使用Python刮除仪表板小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

python 儀表盤

Dashboard scraping is a useful skill to have when the only way to interact with the data you need is through a dashboard. We’re going to learn how to scrape data from a dashboard using the Selenium and Beautiful Soup packages in Python. The Selenium package allows you to write Python code to automate web browser interaction, and the Beautiful Soup package allows you to easily pull data from the HTML code that produces the webpage you want to scrape.

當(dāng)與所需數(shù)據(jù)進(jìn)行交互的唯一方法是通過(guò)儀表板時(shí)，儀表板抓取是一項(xiàng)有用的技能。我們將學(xué)習(xí)如何使用Python中的Selenium和Beautiful Soup軟件包從儀表板上抓取數(shù)據(jù)。 Selenium程序包允許您編寫(xiě)Python代碼來(lái)自動(dòng)執(zhí)行Web瀏覽器交互，而B(niǎo)eautiful Soup程序包則使您可以輕松地從生成您要抓取的網(wǎng)頁(yè)HTML代碼中提取數(shù)據(jù)。

Our goal is to scrape the Fort Bend County Community Impact Dashboard that visualizes the COVID-19 situation in Fort Bend County in Texas. We will extract the history of total tests performed and the daily case counts reported so that we can estimate the percent of positive cases in Fort Bend County.

我們的目標(biāo)是刮擦本德堡縣社區(qū)影響儀表板，以可視化方式顯示德克薩斯州本德堡縣的COVID-19情況。我們將提取進(jìn)行的總檢測(cè)的歷史記錄和每日?qǐng)?bào)告的病例計(jì)數(shù)，以便我們可以估算本德堡縣陽(yáng)性病例的百分比。

Note that all of the code in this tutorial is written in Python version 3.6.2.

請(qǐng)注意，本教程中的所有代碼都是使用Python 3.6.2版編寫(xiě)的。

步驟1：導(dǎo)入Python軟件包，模塊和方法 (Step 1: Import Python Packages, Modules, and Methods)

The first step is to import the Python packages, modules, and methods needed for dashboard scraping. The versions of the packages used in this tutorial are listed below.

第一步是導(dǎo)入儀表板抓取所需的Python包，模塊和方法。下面列出了本教程中使用的軟件包的版本。

步驟2：抓取HTML源代碼 (Step 2: Scrape HTML Source Code)

The next step is to write Python code to automate our interaction with the dashboard. Before writing any code, we must look at the dashboard and inspect its source code to identify the HTML elements that contain the data we need. The dashboard source code refers to the HTML code that tells your browser how to render the dashboard web page. To view the dashboard source code, navigate to the dashboard and use the keyboard shortcut Ctrl+Shift+I. An interactive panel containing the dashboard source code will appear.

下一步是編寫(xiě)Python代碼來(lái)自動(dòng)化我們與儀表板的交互。在編寫(xiě)任何代碼之前，我們必須查看儀表板并檢查其源代碼以識(shí)別包含我們所需數(shù)據(jù)HTML元素。儀表板源代碼是指HTML代碼，它告訴您的瀏覽器如何呈現(xiàn)儀表板網(wǎng)頁(yè)。要查看儀表板源代碼，請(qǐng)導(dǎo)航至儀表板并使用鍵盤快捷鍵Ctrl+Shift+I 將出現(xiàn)一個(gè)包含儀表板源代碼的交互式面板。

Notice that the history of total tests performed and the daily case counts reported are only visible after clicking the “History” tab in the “Total Numbers of Tests Performed at County Sites” panel and the “Daily Case Count” tab in the “Confirmed Cases” panel, respectively. This means that we need to write Python code that automatically clicks on the “History” and “Daily Case Count” tabs so that the history of total tests performed and the daily case counts reported will be visible to Beautiful Soup.

請(qǐng)注意，僅在單擊“縣站點(diǎn)執(zhí)行的測(cè)試總數(shù)”面板中的“歷史記錄”選項(xiàng)卡和“已確認(rèn)案例”中的“每日案例計(jì)數(shù)”選項(xiàng)卡之后，才可以執(zhí)行總測(cè)試的歷史記錄和報(bào)告的每日案例計(jì)數(shù)”面板。這意味著我們需要編寫(xiě)Python代碼，該代碼自動(dòng)單擊“歷史記錄”和“每日案例計(jì)數(shù)”選項(xiàng)卡，以便Beautiful Soup可以看到執(zhí)行的總測(cè)試的歷史記錄和每日?qǐng)?bào)告的案例計(jì)數(shù)。

Fort Bend County Community Impact Dashboard on July 10th, 2020本德堡縣社區(qū)影響儀表板

To find the HTML element that contains the “History” tab, use the shortcut Ctrl+Shift+C and then click on the "History" tab. You will see in the source code panel that the "History" tab is in a div element with ID "ember208".

要查找包含“歷史記錄”選項(xiàng)卡HTML元素，請(qǐng)使用快捷鍵Ctrl+Shift+C ，然后單擊“歷史記錄”選項(xiàng)卡。您將在源代碼面板中看到“歷史記錄”選項(xiàng)卡位于ID為“ ember208”的div元素中。

History Tab Source Code歷史記錄選項(xiàng)卡源代碼

Following the same steps for the “Daily Case Count” tab, you will see that the “Daily Case Count” tab is in a div element with ID “ember238”.

按照“每日案件計(jì)數(shù)”標(biāo)簽的相同步驟，您將看到“每日案件計(jì)數(shù)”標(biāo)簽位于ID為“ ember238”的div元素中。

Source Code of Daily Case Count Tab每日病例計(jì)數(shù)選項(xiàng)卡的源代碼

Now that we have identified the elements we need, we can write code that:

現(xiàn)在我們已經(jīng)確定了所需的元素，我們可以編寫(xiě)代碼：

Launches the dashboard in Chrome

在Chrome中啟動(dòng)儀表板

Clicks on the “History” tab once the “History” tab finishes loading

一旦“歷史記錄”選項(xiàng)卡完成加載，請(qǐng)單擊“歷史記錄”選項(xiàng)卡

Clicks on the “Daily Case Count” tab once the “Daily Case Count” tab finishes loading

一旦“每日病例數(shù)”選項(xiàng)卡完成加載，請(qǐng)單擊“每日病例數(shù)”選項(xiàng)卡

Extracts the dashboard HTML source code

提取儀表板HTML源代碼

Exits Chrome

退出Chrome

步驟3：從HTML解析數(shù)據(jù) (Step 3: Parse Data from HTML)

Now, we need to parse the HTML source code to extract the history of total tests performed and the daily case counts reported. We will begin by looking at the dashboard source code to identify the HTML elements that contain the data.

現(xiàn)在，我們需要解析HTML源代碼，以提取執(zhí)行的總測(cè)試的歷史記錄和每日?qǐng)?bào)告的病例數(shù)。我們將從查看儀表板源代碼開(kāi)始，以識(shí)別包含數(shù)據(jù)HTML元素。

To find the div element that contains the history of total tests performed, use the Ctrl+Shift+C shortcut and then click in the general area of the "Testing Sites" plot. You will see in the source code that the entire plot is in the div element with ID "ember96".

要查找包含已執(zhí)行的全部測(cè)試的歷史記錄的div元素，請(qǐng)使用Ctrl+Shift+C快捷鍵，然后單擊“測(cè)試站點(diǎn)”圖的常規(guī)區(qū)域。您會(huì)在源代碼中看到整個(gè)圖位于ID為“ ember96”的div元素中。

Source Code of Testing Sites Plot測(cè)試站點(diǎn)圖的源代碼

If you hover over a specific data point, a label containing the date and number of tests performed will appear. Use the Ctrl+Shift+C shortcut and click on a specific data point. You will see that the label text is stored as the aria-label attribute of a g element.

如果將鼠標(biāo)懸停在特定數(shù)據(jù)點(diǎn)上，則會(huì)顯示一個(gè)標(biāo)簽，其中包含執(zhí)行的測(cè)試的日期和數(shù)量。使用Ctrl+Shift+C快捷鍵，然后單擊特定的數(shù)據(jù)點(diǎn)。您將看到標(biāo)簽文本存儲(chǔ)為g元素的aria-label屬性。

Source Code of Testing Sites Data Labels測(cè)試站點(diǎn)數(shù)據(jù)標(biāo)簽的源代碼

Following the same steps for the daily case counts reported, you will see that the plot of daily case counts is in the div element with ID “ember143”.

按照?qǐng)?bào)告的每日案件計(jì)數(shù)的相同步驟，您將看到每日案件計(jì)數(shù)的圖位于ID為“ ember143”的div元素中。

Source Code of Daily Cases based on Report Date Plot基于報(bào)告日期圖的日常案例源代碼

If you hover over a specific data point, a label containing the date and the number of positive cases reported will appear. Using the Ctrl+Shift+C shortcut, you will notice that the data are also stored in the aria-label attribute of g elements.

如果將鼠標(biāo)懸停在特定數(shù)據(jù)點(diǎn)上，將顯示一個(gè)標(biāo)簽，其中包含日期和報(bào)告的陽(yáng)性病例數(shù)。使用Ctrl+Shift+C快捷鍵，您會(huì)注意到數(shù)據(jù)也存儲(chǔ)在g元素的aria-label屬性中。

Source Code of Daily Cases based on Report Date Data Labels基于報(bào)告日期數(shù)據(jù)標(biāo)簽的日常案例的源代碼

Once we have the elements that contain the data, we can write code that:

一旦有了包含數(shù)據(jù)的元素，就可以編寫(xiě)代碼：

Finds the div element that contains the plot of the total tests performed and pulls the total tests performed data

查找包含執(zhí)行的總測(cè)試次數(shù)的圖的div元素，并提取執(zhí)行的總測(cè)試數(shù)據(jù)

Finds the div element that contains the plot of the daily case counts and pulls the daily case count data

查找包含每日案件計(jì)數(shù)圖的div元素，并提取每日案件計(jì)數(shù)數(shù)據(jù)

Combines the data in a pandas dataframe and exports it to a CSV

將數(shù)據(jù)合并到pandas數(shù)據(jù)框中，并將其導(dǎo)出到CSV

步驟4：計(jì)算正率 (Step 4: Calculate Positivity Rate)

Now, we can finally estimate the COVID-19 positivity rate in Fort Bend County. We will divide the cases reported by the tests performed and calculate the 7-day moving averages. It is unclear from the dashboard whether the reported positive cases include cases that were determined through tests not conducted by the county (e.g. tests conducted at a hospital or clinic). It is also unclear when the tests for the positive cases were conducted since the dashboard only displays the reported case date. That is why the positivity rates derived from these data are only considered a rough estimate for the true positivity rate.

現(xiàn)在，我們終于可以估算出本德堡縣的COVID-19陽(yáng)性率。我們將通過(guò)執(zhí)行的測(cè)試報(bào)告的案例相除，并計(jì)算7天移動(dòng)平均值。從儀表板尚不清楚，報(bào)告的陽(yáng)性病例是否包括那些不是由縣進(jìn)行的檢測(cè)(例如，在醫(yī)院或診所進(jìn)行的檢測(cè))確定的病例。還不清楚何時(shí)進(jìn)行陽(yáng)性病例的測(cè)試，因?yàn)閮x表板僅顯示報(bào)告的病例日期。這就是為什么僅將這些數(shù)據(jù)得出的陽(yáng)性率視為真實(shí)陽(yáng)性率的粗略估計(jì)。

翻譯自: https://towardsdatascience.com/how-to-scrape-a-dashboard-with-python-8b088f6cecf3

python 儀表盤

總結(jié)

以上是生活随笔為你收集整理的python 仪表盘_如何使用Python刮除仪表板的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。