手把手带你入门Python爬虫(五、CSDN论坛之模型设计)
生活随笔
收集整理的這篇文章主要介紹了
手把手带你入门Python爬虫(五、CSDN论坛之模型设计)
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
CSDN論壇之模型設(shè)計(jì)
- 一、CSDN論壇分析
- 二、模型設(shè)計(jì)與數(shù)據(jù)表設(shè)計(jì)
一、CSDN論壇分析
論壇主題列表頁:
詳情頁:
博主個(gè)人詳情頁:
我們根據(jù)以上頁面分析我們需要抓取的數(shù)據(jù),然后設(shè)計(jì)模型。
二、模型設(shè)計(jì)與數(shù)據(jù)表設(shè)計(jì)
from peewee import *db = MySQLDatabase("py_spider", host="localhost", port=3307, user="root", password="root")class BaseModel(Model):class Meta:database = dbclass Topic(BaseModel):title = CharField() # 標(biāo)題content = TextField(default="") # 內(nèi)容id = IntegerField(primary_key=True) # idauthor = CharField() # 作者create_time = DateTimeField() # 創(chuàng)建時(shí)間answer_nums = IntegerField(default=0) # 回復(fù)數(shù)量click_nums = IntegerField(default=0) # 查看數(shù)量parised_nums = IntegerField(default=0) # 點(diǎn)贊數(shù)量jtl = FloatField(default=0.0) # 結(jié)帖率score = IntegerField(default=0) # 賞分status = CharField() # 狀態(tài)class Answer(BaseModel):topic_id = IntegerField()author = CharField()content = TextField(default="")create_time = DateTimeField()parised_nums = IntegerField(default=0) # 點(diǎn)贊數(shù)量class Author(BaseModel):name = CharField()id = IntegerField(primary_key=True)click_nums = IntegerField(default=0) # 訪問數(shù)original_nums = IntegerField(default=0) # 原創(chuàng)數(shù)forward_nums = IntegerField(default=0) # 轉(zhuǎn)發(fā)數(shù)rate = IntegerField(default=-1) # 排名answer_nums = IntegerField(default=0) # 評(píng)論數(shù)parised_nums = IntegerField(default=0) # 獲贊數(shù)desc = TextField(null=True) # 個(gè)人描述簽名industry = CharField(null=True) # 行業(yè)location = CharField(null=True) # 所在地區(qū)follower_nums = IntegerField(default=0) # 粉絲數(shù)following_nums = IntegerField(default=0) # 關(guān)注數(shù)if __name__ == "__main__":db.create_tables([Topic, Answer, Author])運(yùn)行后生成的數(shù)據(jù)表:
總結(jié)
以上是生活随笔為你收集整理的手把手带你入门Python爬虫(五、CSDN论坛之模型设计)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: linux的i o模型,I/O模型的分类
- 下一篇: Linux基础学习五:软件的相关安装(J