生活随笔
收集整理的這篇文章主要介紹了
数据导入与预处理实验二---json格式文件转换
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
一、實驗概述:
【實驗目的】
初步掌握數(shù)據(jù)采集的方法;初步掌握利用爬蟲爬取網(wǎng)絡數(shù)據(jù)的方法掌握不同數(shù)據(jù)格式之間的轉(zhuǎn)換方法;
【實施環(huán)境】(使用的材料、設備、軟件) Linux或Windows操作系統(tǒng)環(huán)境,MySql數(shù)據(jù)庫,Python或其他高級語言
二、實驗內(nèi)容
第1題 爬取網(wǎng)絡數(shù)據(jù)
【實驗要求】
爬取酷狗音樂網(wǎng)站(https://www.kugou.com/)上榜單前500名的歌曲名稱,演唱者,歌名和歌曲時長將爬取的數(shù)據(jù)以JSon格式文件保存。讀取JSON格式任意數(shù)據(jù),檢驗文件格式是否正確。
【實驗過程】(步驟、記錄、數(shù)據(jù)、程序等)
請?zhí)峁┎僮鞑襟E及界面截圖證明。
from bs4
import BeautifulSoup
import requests
import time
import re
import json
import demjson
headers
= {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}nameList
= []
singerList
= []
timeList
= []
song
= []
total
= []
keys
= ['songName','singer','time']def get_info(url
, file):res
= requests
.get
(url
, headers
=headers
)res
.encoding
= file.encoding soup
= BeautifulSoup
(res
.text
, 'lxml')ranks
= soup
.select
('span.pc_temp_num')titles
= soup
.select
('a.pc_temp_songname')times
= soup
.select
('span.pc_temp_time')for rank
, title
, time
in zip(ranks
, titles
, times
):data
= {'title': title
.get_text
().strip
(),'time': time
.get_text
().strip
()}singer
, songName
= data
['title'].split
(' - ')nameList
.append
(songName
)singerList
.append
(singer
)timeList
.append
(data
['time'])def output(url
, file):songInfo
= []for i
in range(0,len(nameList
)):songInfo
.append
(nameList
[i
])songInfo
.append
(singerList
[i
])songInfo
.append
(timeList
[i
])for i
in range(0, len(songInfo
), 3):temp
= songInfo
[i
:i
+ 3]song
.append
(temp
)file.write
('{\n"songInfo":[\n')for i
in range(0,len(song
)):d
= dict(zip(keys
, song
[i
]))file.write
(json
.dumps
(d
,ensure_ascii
=False,indent
=4,separators
=(',', ': ')))if i
!= len(song
)-1:file.write
(',')file.write
('\n]\n}')
def get_website_encoding(url
): res
= requests
.get
(url
, headers
=headers
)charset
= re
.search
("charset=(.*?)>", res
.text
)if charset
is not None:blocked
= ['\'', ' ', '\"', '/']filter = [c
for c
in charset
.group
(1) if c
not in blocked
]return ''.join
(filter) else:return res
.encoding
if __name__
== '__main__':encoding
= get_website_encoding
('http://www.kugou.com')urls
= ['http://www.kugou.com/yy/rank/home/{}-8888.html?from=rank'.format(str(i
)) for i
in range(1, 23)]
with open(r
'.\kugou_500.json', 'w+', encoding
=encoding
) as f
:for url
in urls
:get_info
(url
, f
)time
.sleep
(1) output
(url
,f
)
得到的json文件
打開使用json.load打開文件,成功輸出后代表文件格式正確
import json
with open("kugou_500.json",'r',encoding
='UTF-8') as f
:new_dict
= json
.load
(f
)print(new_dict
)
第2題 編程生成CSV文件并轉(zhuǎn)換成JSon格式
【實驗要求】
編程生成CSV格式文件。文件內(nèi)容如下: 姓名,性別,籍貫,系別 張迪,男,重慶,計算機系 蘭博,男,江蘇,通信工程系 黃飛,男,四川,物聯(lián)網(wǎng)系 鄧玉春,女,陜西,計算機系 周麗,女,天津,藝術(shù)系 李云,女,上海,外語系將上述CSV格式文件轉(zhuǎn)換成JSon格式,并查詢文件中所有女生的信息。
【實驗過程】(步驟、記錄、數(shù)據(jù)、程序等)
請?zhí)峁┎僮鞑襟E及界面截圖證明。
import csv
f
= open("question02.csv","w",encoding
="utf-8")
csv_writer
= csv
.writer
(f
)
csv_writer
.writerow
(["姓名","性別","籍貫","系別"])
csv_writer
.writerow
(["張迪","男","重慶","計算機系"])
csv_writer
.writerow
(["蘭博","男","江蘇","通信工程系"])
csv_writer
.writerow
(["黃飛","男","四川","物聯(lián)網(wǎng)系"])
csv_writer
.writerow
(["周麗","女","天津","藝術(shù)系"])
csv_writer
.writerow
(["李蕓","女","上海","外語系"])
轉(zhuǎn)換為json格式
import csv
import json
csvFile
= open("question02.csv","r",encoding
="utf-8")
jsonFile
= open("question02.json","w",encoding
="utf-8")fieldNames
= {"姓名","性別","籍貫","系別"}
reader
= csv
.DictReader
(csvFile
)
i
= 1
jsonFile
.write
('{\n"personInfo":[\n')
for row
in reader
:print(row
)jsonFile
.write
(json
.dumps
(row
,ensure_ascii
=False,indent
=4))if i
!= 5:jsonFile
.write
(',')i
= i
+1
jsonFile
.write
('\n]\n}')
import json
with open("question02.json","r",encoding
="utf-8") as f
:data
= json
.load
(f
)for i
in range(0,5):if data
['personInfo'][i
]['性別'] == '女':print(data
['personInfo'][i
])
第3題. XML格式文件與JSon的轉(zhuǎn)換
【實驗內(nèi)容集要求】
(1) 讀取以下XML格式的文件,內(nèi)容如下: <?xml
version=”1.0” encoding=”gb2312”> <圖書> <書名>紅樓夢</書名> <作者>曹雪芹</作者><主要內(nèi)容>描述賈寶玉和林黛玉的愛情故事</主要內(nèi)容> <出版社>人民文學出版社</出版社> </圖書>
(2) 將以上XML格式文件轉(zhuǎn)換成JSon格式。
【實驗過程】(步驟、記錄、數(shù)據(jù)、程序等)
請?zhí)峁┫鄳a及程序運行界面截圖。
新建xml文件
import xml
.dom
.minidom
import xmltodict
import json
file = open("question_03.xml","r",encoding
="utf-8")
xmlStr
= file.read
()
jsonStr
= xmltodict
.parse
(xmlStr
)
with open("question03JSON.json","w",encoding
="utf-8") as f
:f
.write
(str(json
.dumps
(jsonStr
,ensure_ascii
=False,indent
=4,separators
=(',', ': '))))
總結(jié)
以上是生活随笔為你收集整理的数据导入与预处理实验二---json格式文件转换的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。