當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

一个简单的爬虫例子-天气

發布時間：2023/12/20 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了一个简单的爬虫例子-天气小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、設計任務

目標：用Python設計一個數據抓取程序，達到以下基本要求：

數據抓取任務自擬，如電子商務交易數據、客戶評論、新聞、圖片等。
獲取的數據存儲為數據文件，或sqlite數據庫。

程序有適當的注釋，有完整的說明文件。

二、數據來源

本爬蟲程序爬取的數據均來自于中國天氣網城市首頁的72小時天氣預報（日期、天氣現象、氣溫及空氣質量）及某時刻實時天氣實況，具體網址如下：

http://www.weather.com.cn/weather1d/101280101.shtml#dingzhi_first%EF%BC%89

打開網址，查詢：甘肅-酒泉-酒泉，可得如下界面：

我的設想，就是從這個界面中，爬取酒泉72小時天氣預報（日期、天氣現象、氣溫及空氣質量）及某時刻實時天氣實況。

三、爬取工具和環境配置

Python環境安裝配置：安裝Python所需要的環境，使用python3.9版本.

需要使用到的庫：urllib.request、csv以及BeautifulSoup

BeautifulSoup庫需要手動安裝，BeautifulSoup是一個網頁解析庫，它支持很多解析器，不過最主流的有兩個。一個是python標準庫，一個是lxml HTML 解析器。兩者的使用方法相似：

from bs4 import BeautifulSoup

# Python的標準庫

BeautifulSoup(html, 'html.parser')

# lxml

BeautifulSoup(html, 'lxml')

四、分析過程

1.查看網頁源代碼

下面我給出了網頁源代碼的頭部，我們需要分析的關鍵信息是找出想爬取信息對應的代碼。

<!DOCTYPE html>
?	<html>
?	<head>
?	<link rel="dns-prefetch" href="http://i.tq121.com.cn">
?	<meta charset="utf-8" />
?	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
?	<title>酒泉天氣預報,酒泉7天天氣預報,酒泉15天天氣預報,酒泉天氣查詢 - 中國天氣網</title>
?	<meta http-equiv="Content-Language" content="zh-cn">
?	<meta name="keywords" content="酒泉天氣預報,jqtq,酒泉今日天氣,酒泉周末天氣,酒泉一周天氣預報,酒泉15日天氣預報,酒泉40日天氣預報" />
?	<meta name="description" content="酒泉天氣預報，及時準確發布中央氣象臺天氣信息，便捷查詢北京今日天氣，酒泉周末天氣，酒泉一周天氣預報，酒泉15日天氣預報，酒泉40日天氣預報，酒泉天氣預報還提供酒泉各區縣的生活指數、健康指數、交通指數、旅游指數，及時發布酒泉氣象預警信號、各類氣象資訊。" />
?	<!-- 城市對比上線
?	<link type="text/css" rel="stylesheet" href="http://c.i8tq.com/cityListCmp/cityListCmp.css?20191230" />
?	<link type="text/css" rel="stylesheet" href="http://c.i8tq.com/cityListCmp/weathers.css?20191230" /> -->
?	<style>

可以看出此網站的天氣有wea、tem、win三個屬性，均寫在p標簽里，沒有定義父標簽，可單獨直接抓取。

2.爬蟲的編寫

（1）相關包的導入

import csv

import urllib.request

from? bs4 import BeautifulSoup

（2）模擬瀏覽器得到數據

url = "http://www.weather.com.cn/weather/101270101.shtml"

header = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36")? # 設置頭部信息

opener = urllib.request.build_opener()? # 修改頭部信息

opener.addheaders = [header]???????? #修改頭部信息

request = urllib.request.Request(url)?? # 制作請求

response = urllib.request.urlopen(request)?? #? 得到請求的應答包

html = response.read()?? #將應答包里面的內容讀取出來

html = html.decode('utf-8')??? # 使用utf-8進行編碼，不重新編碼就會成亂碼

（3）查找要爬取的部分

在頁面上找到所需要的信息部分，需要日期、天氣以及溫度。

# 以上部分的代碼如下：

final = []?? #初始化一個空的list，我們為將最終的的數據保存到list

bs = BeautifulSoup(html,"html.parser")?? # 創建BeautifulSoup對象

body = bs.body? # 獲取body部分

data = body.find('div',{'id':'7d'})? # 找到id為7d的div

之后再往下看，所需要的信息都存在ul標簽中，我們需要查找ul標簽

ul = data.find('ul')? # 獲取ul部分，由于ul標簽只有一個? 我們使用find()函數，如果有多個我們使用find_all()

所需要的信息在ul標簽里面的li標簽內部，而且不止一個，所以我們需要使用find_all()方法

li = ul.find_all('li')? # 獲取所有的li???? 返回的是list對象

（4）對查找到部分進行數據的爬取

我們最后將所有的數據保存在list之中在進行寫入文件。

日期在li標簽的h1標簽之中。

天氣在li標簽的第一個p標簽之中。

溫度在第二個p標簽之中的span標簽之中。

i = 0

for day in li:? # 對每個li標簽中的內容進行遍歷

??? if i < 7:

??????? temp = []

??????? date = day.find('h1').string # 找到日期

#???????? print (date)

??????? temp.append(date)? # 添加到temp中

??? #???? print (temp)

??????? inf = day.find_all('p')? # 找到li中的所有p標簽

??? #???? print(inf)

??? #???? print (inf[0])

??????? temp.append(inf[0].string)? # 第一個p標簽中的內容（天氣狀況）加到temp中

??????? if inf[1].find('span') is None:

??????????? temperature_highest = None # 天氣預報可能沒有當天的最高氣溫（到了傍晚，就是這樣），需要加個判斷語句,來輸出最低氣溫

??????? else:

??????????? temperature_highest = inf[1].find('span').string # 找到最高溫度

??????????? temperature_highest = temperature_highest.replace('℃', '') # 到了晚上網站會變，最高溫度后面也有個℃

??????? temperature_lowest = inf[1].find('i').string? #找到最低溫度

??????? temperature_lowest = temperature_lowest.replace('℃', '')? # # 最低溫度后面有個℃，去掉這個符號

??????? temp.append(temperature_highest)

??????? temp.append(temperature_lowest)

??????? final.append(temp)? # 將每一次循環的list的內容都插入最后保存數據的list

??????? i = i +1

（5）寫入文件

with open('天氣.txt', 'a', errors='ignore', newline='') as f:

??????????? f_csv = csv.writer(f)

??????????? f_csv.writerows(final)

五、爬取效果展示

1.源代碼截圖

2.運行效果截圖

3.數據文件存儲截圖

六、完整代碼

# !/usr/bin/env python3

# -*- coding: utf-8 -*-

import requests

from bs4 import BeautifulSoup

import time

def getINFO(city='jiuquan'):

??? url = 'https://m.tianqi.com/{}/'.format(city)

??? headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.64 Safari/537.36'}

??? r = requests.get(url, headers=headers, timeout=30)

??? r.raise_for_status()

??? r.encoding = r.apparent_encoding

??? html = r.text

??? soup = BeautifulSoup(html,'html.parser')

??? # 獲取當前位置

??? getLocation = soup.find('h2').text

??? print(getLocation)

??? # 獲取更新時間

??? getUpdatetime = soup.find(id='nowHour').text

??? getUpdatetime = '更新時間 ' + getUpdatetime

??? print(getUpdatetime)

??? # 獲取當前溫度

??? getWeather_now = soup.find(class_='now').text

??? getWeather_now = '現在溫度 ' + getWeather_now

??? print(getWeather_now)

??? # 獲取當天天氣

??? getWeather = soup.find('dd', class_='txt').text

??? getWeather = '今日天氣 ' + getWeather

??? print(getWeather)

??? # 獲取當天空氣質量

??? getAir = soup.find(class_='b1').text

??? getAir = '空氣質量 ' + getAir

??? print(getAir)

??? # 獲取當前濕度

??? getWet = soup.find(class_='b2').text

??? print(getWet)

??? # 獲取當前風力

??? getWind = soup.find(class_='b3').text

??? print(getWind)

??? print('\n' + '-'*10 + '\n')

??? # 把多個天氣信息組合成一個文本

??? weather_info = getLocation + '\n' + getUpdatetime + '\n' + getWeather_now + '\n' + getWeather + '\n' + getAir + '\n' + getWet + '\n' + getWind

??? # print(weather_info)

???

??? Temperature = ''

??? # 獲取未來幾天的天氣

??? getTemperature = soup.find(class_='weather_week')

??? # 篩選未來3天天氣并對格式做調整，合并為一個文本

??? for i in getTemperature.find_all('dl')[:3]:

??????? i = i.text

??????? li = i.split('\n')

??????? li[7] = '空氣質量 ' + li[7]

??????? li = li[2:8]

??????? li = ' '.join(li)

??????? Temperature = Temperature + li + '\n'

???

??? print(Temperature)

??????

??? result_info = weather_info + '\n' + '-'*40 + '\n' + Temperature

??? # print(result_info)

??? return result_info

???

# 寫到本地

def saveFile(text):

??? with open("./天氣.txt", "w", encoding='utf-8') as f:

??????? f.write(text)

if __name__ == "__main__":

??? while True:

??????? city=input("輸入查詢城市的拼音(如酒泉輸入jiuquan)：")

??????? result_info = getINFO(city)

??????? saveFile(result_info)

??????? answer=input("是否繼續查詢? y/n")

??????? if answer=="y" or answer=="Y":

??????????? continue

??????? else:

??????????? break

# 循環一小時更新

for second in range(3600,-1,-1):

???? time.sleep(1)

???? print('天氣更新倒計時：' + "%02d:%02d"%(second // 60,second % 60), end='\r\b')

總結

以上是生活随笔為你收集整理的一个简单的爬虫例子-天气的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： gpu-z怎么用,显卡怎么看体质
下一篇： Python数据处理之一：数据读取