python 状态码转字典文本_python爬虫 处理521状态码
用request.get(url)獲取js代碼
執(zhí)行函數(shù)eval()語句修改為return語句返回cookie值
調(diào)用execjs執(zhí)行js代碼獲得cookie值
將cookie值轉(zhuǎn)化為字典格式,用request.get(url, headers=headers)方法獲取得到正確的網(wǎng)頁信息
代碼如下:
def getResponse():
"""
獲取response
:return:
"""
response = requests.get(startUrl, headers=headers)
return response
def getJslid(response):
"""
:param response:
:return:
"""
cook = response.cookies
return '; '.join(['='.join(item) for item in cook.items()])
def getClearance(response):
"""
:return:
"""
txt = ''.join(re.findall('', response.text))
func_return = txt.replace('eval', 'return')
print(func_return)
content = execjs.compile(func_return)
eval_func = content.call('x')
name = re.findall(r'var (.*?)=function.*', eval_func)[0]
mode_func = eval_func.replace('while(window._phantom||window.__phantomas){};', ''). \
replace('document.cookie=', 'return').replace('if((function(){try{return !!window.addEventListener;}', ''). \
replace("catch(e){return false;}})()){document.addEventListener('DOMContentLoaded',%s,false)}" % name, ''). \
replace("else{document.attachEvent('onreadystatechange',%s)}" % name, '').replace(
r"setTimeout('location.href=location.pathname+location.search.replace(/[\?|&]captcha-challenge/,\'\')',1500);",
'')
content = execjs.compile(mode_func)
cookies = content.call(name)
# print(cookies)
clearance = cookies.split(';')[0]
return clearance
def structureHeaders(cook, clearance):
"""
構(gòu)造新的headers
:return:
"""
cookie = {
'cookie': cook + ';' + clearance
}
return dict(headers, **cookie)
總結(jié)
以上是生活随笔為你收集整理的python 状态码转字典文本_python爬虫 处理521状态码的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 直接用自己服务器做图床可以吗_我花 9
- 下一篇: 不认识java代码_程序员进阶:优雅的代