python 中cookie_使用Python分析Cookies
前言
本文的靈感來自于:
正文
本文已豆瓣電影的Cookies為例, 展示了從Cookies的獲取, 解析的過程.
我們在瀏覽器中看到的Cookies大概是這樣的:
Cookie:bid=hZdgjLJMNv4; _vwo_uuid_v2=AD40AA237919D79C67460DEFD37AFAA4|65f61f85190c51b2cfa95d3910cc2914; gr_user_id=2d7956ee-7cd2-4fad-8a7d-d0b2265ceeba; ll="118316"; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1489750475%2C%22https%3A%2F%2Fwww.google.com.hk%2F%22%5D; ap=1; _pk_id.100001.4cf6=270eb4959a2a2414.1489750475.1.1489750559.1489750475.; _pk_ses.100001.4cf6=*; __utma=30149280.1851478845.1488968861.1489658025.1489750475.5; __utmb=30149280.0.10.1489750475; __utmc=30149280; __utmz=30149280.1489750475.5.5.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); __utma=223695111.721177542.1489750475.1489750475.1489750475.1; __utmb=223695111.0.10.1489750475; __utmc=223695111; __utmz=223695111.1489750475.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
首先我們先把它導入到Python字典中, 作為我們的前期準備工作:
In [57]: from http.cookies import SimpleCookie
In [58]: s = SimpleCookie('''bid=hZdgjLJMNv4; _vwo_uuid_v2=AD40AA237919D79C67460DEFD37AFAA4|65f61f85190c51b2
...: cfa95d3910cc2914; gr_user_id=2d7956ee-7cd2-4fad-8a7d-d0b2265ceeba; ll="118316"; _pk_ref.100001.4cf6
...: =%5B%22%22%2C%22%22%2C1489750475%2C%22https%3A%2F%2Fwww.google.com.hk%2F%22%5D; ap=1; _pk_id.100001
...: .4cf6=270eb4959a2a2414.1489750475.1.1489750559.1489750475.; _pk_ses.100001.4cf6=*; __utma=30149280.
...: 1851478845.1488968861.1489658025.1489750475.5; __utmb=30149280.0.10.1489750475; __utmc=30149280; __
...: utmz=30149280.1489750475.5.5.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided);
...: __utma=223695111.721177542.1489750475.1489750475.1489750475.1; __utmb=223695111.0.10.1489750475; _
...: _utmc=223695111; __utmz=223695111.1489750475.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmc
...: tr=(not%20provided)''')
In [59]: {v.key:v.value for k,v in s.items()}
{'__utma': '223695111.721177542.1489750475.1489750475.1489750475.1',
'__utmb': '223695111.0.10.1489750475',
'__utmc': '223695111',
'__utmz': '223695111.1489750475.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)',
'_pk_id.100001.4cf6': '270eb4959a2a2414.1489750475.1.1489750559.1489750475.',
'_pk_ref.100001.4cf6': '%5B%22%22%2C%22%22%2C1489750475%2C%22https%3A%2F%2Fwww.google.com.hk%2F%22%5D',
'_pk_ses.100001.4cf6': '*',
'_vwo_uuid_v2': 'AD40AA237919D79C67460DEFD37AFAA4|65f61f85190c51b2cfa95d3910cc2914',
'ap': '1',
'bid': 'hZdgjLJMNv4',
'gr_user_id': '2d7956ee-7cd2-4fad-8a7d-d0b2265ceeba',
'll': '118316'}
首先我們注意到__utm開頭的cookies, 它們是Google Analytics用于分析訪客信息的:__utma stores the amount of visits (for each visitor), the time of the first visit, the previous visit, and the current visit
__utma 是用于記錄訪問時間的:
In [68]: for ts in cookies['__utma'].split('.'):
...: print(datetime.fromtimestamp(int(ts)))
...:
...:
1977-02-02 09:31:51
1992-11-08 07:05:42
2017-03-17 19:34:35
2017-03-17 19:34:35
2017-03-17 19:34:35
1970-01-01 08:00:01
從第三個時間開始就是我們的初次訪問, 之前訪問, 以及現(xiàn)在的時間.__utmb and __utmc are used to check approximately how fast people leave: when a visit starts, and approximately ends (c expires quickly).
__utmb 和 __utmc也是時間戳, 用于計算你在豆瓣逗留的時間, 這里就不再展示了.__utmz records whether the visitor came from a search engine (and if so, the search keyword used), a link, or from no previous page (e.g. a bookmark).
__utmz 記錄著你進入豆瓣的途徑, 通過搜索引擎或者是其他的鏈接:
In [70]: cookies['__utmz']
...:
...:
Out[70]: '223695111.1489750475.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)'
可以簡單的看出我是從Google搜索進入豆瓣的._pk_id* 你的ID
_p_ses* 這個一般不包含數(shù)據(jù)
_pk_ref 這個類似與HTTP 首部中的Refer, 記錄著你跳轉(zhuǎn)過來的頁面:
In [75]: from urllib.parse import unquote
In [79]: unquote(cookies['_pk_ref.100001.4cf6'])
Out[79]: '["","",1489750475,"https://www.google.com.hk/"]'
In [80]: cookies['_pk_ref.100001.4cf6']
Out[80]: '%5B%22%22%2C%22%22%2C1489750475%2C%22https%3A%2F%2Fwww.google.com.hk%2F%22%5D'
In [81]: eval(unquote(cookies['_pk_ref.100001.4cf6']))[2]
Out[81]: 1489750475
In [82]: datetime.fromtimestamp(_)
Out[82]: datetime.datetime(2017, 3, 17, 19, 34, 35)
我們把Cookies中的URL解碼一下, 可以得出我是從 https://www.google.com.hk/ 跳轉(zhuǎn)過來的, 還有一個記錄時間的時間戳
剩下的就是豆瓣自己設置的Cookies了, 不屬于任何分析平臺.
以上是使用瀏覽器打開豆瓣向服務器發(fā)送的Cookies, 那么服務器會向我們設置一些什么Cookies呢, 我們來測試一下:
In [93]: for i in range(10):
...: r = requests.get('https://movie.douban.com/tag/2016?start=0&type=T')
...: print(r.headers['Set-Cookie'])
...:
bid=YE31t9f2CtY; Expires=Sat, 17-Mar-18 13:11:42 GMT; Domain=.douban.com; Path=/
bid=AkV_9uN6CxQ; Expires=Sat, 17-Mar-18 13:11:43 GMT; Domain=.douban.com; Path=/
bid=8EhJ9dCj1pw; Expires=Sat, 17-Mar-18 13:11:44 GMT; Domain=.douban.com; Path=/
bid=G4O0c55MbGU; Expires=Sat, 17-Mar-18 13:11:52 GMT; Domain=.douban.com; Path=/
bid=UtW6FWxzk5E; Expires=Sat, 17-Mar-18 13:11:54 GMT; Domain=.douban.com; Path=/
bid=QzZ_sVbO4Qs; Expires=Sat, 17-Mar-18 13:11:56 GMT; Domain=.douban.com; Path=/
bid=dLPTZc4Kh7Q; Expires=Sat, 17-Mar-18 13:11:58 GMT; Domain=.douban.com; Path=/
bid=imFq99iN5f8; Expires=Sat, 17-Mar-18 13:12:00 GMT; Domain=.douban.com; Path=/
bid=Q-bdkpxA0zM; Expires=Sat, 17-Mar-18 13:12:15 GMT; Domain=.douban.com; Path=/
bid=3rv0SUwSG2c; Expires=Sat, 17-Mar-18 13:12:17 GMT; Domain=.douban.com; Path=/
分別請求豆瓣電影頁10次, 可以看到豆瓣服務器向我們設置的是bid這一項, 并且是設置在豆瓣域名根目錄下的, 表明這個Cookies會在我們訪問任何豆瓣網(wǎng)頁的時候都會發(fā)送給服務器, 我們看一下過期時間:
In [102]: import dateutil.parser
In [103]: dateutil.parser.parse('17-Mar-18 13:11:42 GMT')
Out[103]: datetime.datetime(2018, 3, 17, 13, 11, 42, tzinfo=tzutc())
可以看到這個Cookies的有效期為一年, 應該是作為我們的ID來追蹤用戶的.
總結:
這篇文章介紹了使用Python分析Cookies的一些方法, 找出了豆瓣用于追蹤用戶的Cookies項. 后續(xù)將會介紹如果偽裝Cookies.
總結
以上是生活随笔為你收集整理的python 中cookie_使用Python分析Cookies的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python import 文件路径_p
- 下一篇: python清洗完数据做什么_Pytho