UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte
生活随笔
收集整理的這篇文章主要介紹了
UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
當我們使用urllib庫打印爬取的網(wǎng)頁信息print(res.read().decode('utf-8'))出現(xiàn):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte示例:
from urllib import request url = 'https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=index&fr=&hs=0&xthttps=111110&sf=1&fmq=&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E7%8B%97&oq=%E7%8B%97&rsp=-1'headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', 'Cookie': 'BDqhfp=%E7%8B%97%26%260-10-1undefined%26%260%26%261; BIDUPSID=4B61D634D704A324E3C7E274BF11F280; PSTM=1624157516; BAIDUID=4B61D634D704A324C7EA5BA47BA5886E:FG=1; __yjs_duid=1_f7116f04cddf75093b9236654a2d70931624173362209; indexPageSugList=%5B%22%E7%8B%97%22%2C%22%E7%8C%AB%E5%92%AA%22%2C%22%E5%B0%8F%E9%80%8F%E6%98%8E%22%5D; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BAIDUID_BFESS=5DD3805F1A4CC3C9562CEAC3C22A1408:FG=1; __yjs_st=2_YTMzN2ZlYWQwNjg5NTFlNGY4NTMxMDBhOTc0ZDQxZjYwZWI0NzBiNjU1N2UyOGRiY2MzNWQ4OTM2YjU4MGU4MmNjYTNiZTk4ZDFkMWE1YmU2ODZhNGMwYzQ3OGE1YjcxZjNmZTEzYWY2ZjNiNGYxNjc0NWNlYjY5YmRhMTI3MmI2N2ZjOTkyYWUwYTZlZDUyMzY3NTc3YmU0MWUwNGM3MDk5NWE1ZTRhNzE4NjQwYWJlMjE3OTg5YzdkYjc0NmE4MjBhMjA2MDBkZmIwNDhjMjYzZjYxMTcyOGM2OTZmYjRlOGUwNTc1N2ZhYWI5YzEwZTVkODg0ZjI4OWM2ZjcyZF83XzM0OWQ2ZTJh; H_PS_PSSID=34268_34099_33969_34222_31660_34226_33848_34113_34073_34107_26350_22159; delPer=0; PSINO=6; BA_HECTOR=al21a125ag2l25851j1genv370q; BDRCVFR[X_XKQks0S63]=mk3SLVN4HKm; firstShowTip=1; cleanHistoryStatus=0; BDRCVFR[dG2JNJb_ajR]=mk3SLVN4HKm; BDRCVFR[-pGxjrCMryR]=mk3SLVN4HKm; userFrom=null; ab_sr=1.0.1_NzczYjg1NGJiOWUwOGQwM2E4YTE0MDJkM2E0YjQ4M2E1ZDk0YWQ1MGUyMmNjZTg4NzhjZDNkZDI0YjcwMjU5N2MxYmQxNWIwZmRjMWEwZjVkNmZkYzkwYTNiYTE3NDUwYWFkZDkyZWM3Njg3ZjQ0OGQ5ZWU3YTkxNDk1M2FiZTAxZTY5NmY3ZjA1NDgxODE3ZWE4MWQxOWUwMmIwYmUxZA==', 'Host': 'image.baidu.com', 'Referer': 'https://image.baidu.com/', 'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"', 'sec-ch-ua-mobile': '?0', 'Sec-Fetch-Dest': 'document', 'Sec-Fetch-Mode': 'navigate', 'Sec-Fetch-Site': 'same-origin', 'Sec-Fetch-User': '?1', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36' }req = request.Request(url,headers=headers) res = request.urlopen(req) print(res.read().decode('utf-8'))#結(jié)果:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte以下提供了兩種解決方法:
方法一:做gzip的解壓
導入模塊:
做gzip的解壓
req = request.Request(url,headers=headers) res = request.urlopen(req) #在示例里導入模塊,以及添加下面的這幾行代碼就OK了 buff = BytesIO(res.read()) f = gzip.GzipFile(fileobj=buff) data= f.read().decode('utf-8') print(data)方法二
直接去掉在請求的頭里的:"Accept-Encoding":"gzip, deflate, br"就OK了
總結(jié)
以上是生活随笔為你收集整理的UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 爬虫学习笔记(四)—— urllib 与
- 下一篇: 爬虫学习笔记(五)——网页解析工具(bs