當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

数据导入与预处理实验二---json格式文件转换

發(fā)布時間：2024/3/13 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了数据导入与预处理实验二---json格式文件转换小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

一、實驗概述：
【實驗目的】

初步掌握數(shù)據(jù)采集的方法；

初步掌握利用爬蟲爬取網(wǎng)絡數(shù)據(jù)的方法

掌握不同數(shù)據(jù)格式之間的轉(zhuǎn)換方法；

【實施環(huán)境】（使用的材料、設備、軟件） Linux或Windows操作系統(tǒng)環(huán)境，MySql數(shù)據(jù)庫，Python或其他高級語言

二、實驗內(nèi)容
第1題爬取網(wǎng)絡數(shù)據(jù)
【實驗要求】

爬取酷狗音樂網(wǎng)站（https://www.kugou.com/）上榜單前500名的歌曲名稱，演唱者，歌名和歌曲時長

將爬取的數(shù)據(jù)以JSon格式文件保存。

讀取JSON格式任意數(shù)據(jù)，檢驗文件格式是否正確。

【實驗過程】（步驟、記錄、數(shù)據(jù)、程序等）
請?zhí)峁┎僮鞑襟E及界面截圖證明。

from bs4 import BeautifulSoup import requests import time import re import json import demjson headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' }nameList = [] singerList = [] timeList = [] song = [] total = [] keys = ['songName','singer','time']def get_info(url, file):res = requests.get(url, headers=headers)res.encoding = file.encoding # 同樣讀取和寫入的編碼格式soup = BeautifulSoup(res.text, 'lxml')ranks = soup.select('span.pc_temp_num')titles = soup.select('a.pc_temp_songname')times = soup.select('span.pc_temp_time')#jsonData = []for rank, title, time in zip(ranks, titles, times):data = {#'rank': rank.get_text().strip(),'title': title.get_text().strip(),'time': time.get_text().strip()}#print(data)singer, songName = data['title'].split(' - ')nameList.append(songName)singerList.append(singer)timeList.append(data['time'])#print(nameList)#print(singerList)#print(data['time'])#print(timeList)#print(singer, songName)#print(jsonData)def output(url, file):songInfo = []for i in range(0,len(nameList)):#print(nameList[i])#print(singerList[i])#print(timeList[i])songInfo.append(nameList[i])songInfo.append(singerList[i])songInfo.append(timeList[i])#print(songInfo)for i in range(0, len(songInfo), 3):temp = songInfo[i:i + 3]song.append(temp)#print(len(song))file.write('{\n"songInfo":[\n')for i in range(0,len(song)):d = dict(zip(keys, song[i]))#print(d)file.write(json.dumps(d,ensure_ascii=False,indent=4,separators=(',', ': ')))if i != len(song)-1:file.write(',')file.write('\n]\n}') def get_website_encoding(url): # 一般每個網(wǎng)站自己的網(wǎng)頁編碼都是一致的,所以只需要搜索一次主頁確定res = requests.get(url, headers=headers)charset = re.search("charset=(.*?)>", res.text)if charset is not None:blocked = ['\'', ' ', '\"', '/']filter = [c for c in charset.group(1) if c not in blocked]return ''.join(filter) # 修改res編碼格式為源網(wǎng)頁的格式,防止出現(xiàn)亂碼else:return res.encoding # 沒有找到編碼格式,返回res的默認編碼if __name__ == '__main__':encoding = get_website_encoding('http://www.kugou.com')#print(encoding)urls = ['http://www.kugou.com/yy/rank/home/{}-8888.html?from=rank'.format(str(i)) for i in range(1, 23)] with open(r'.\kugou_500.json', 'w+', encoding=encoding) as f:#f.write("歌手歌名長度\n")for url in urls:get_info(url, f)time.sleep(1) #緩沖一秒,防止請求頻率過快output(url,f)

得到的json文件

打開使用json.load打開文件，成功輸出后代表文件格式正確

import jsonwith open("kugou_500.json",'r',encoding='UTF-8') as f:new_dict = json.load(f)print(new_dict)

第2題編程生成CSV文件并轉(zhuǎn)換成JSon格式
【實驗要求】

編程生成CSV格式文件。文件內(nèi)容如下：姓名，性別，籍貫，系別張迪，男，重慶，計算機系蘭博，男，江蘇，通信工程系黃飛，男，四川，物聯(lián)網(wǎng)系鄧玉春，女，陜西，計算機系周麗，女，天津，藝術(shù)系李云，女，上海，外語系

將上述CSV格式文件轉(zhuǎn)換成JSon格式，并查詢文件中所有女生的信息。

【實驗過程】（步驟、記錄、數(shù)據(jù)、程序等）
請?zhí)峁┎僮鞑襟E及界面截圖證明。

import csv #創(chuàng)建文件對象 f = open("question02.csv","w",encoding="utf-8") #構(gòu)建csv寫入對象 csv_writer = csv.writer(f) #構(gòu)建列表頭 csv_writer.writerow(["姓名","性別","籍貫","系別"]) #寫入csv文件內(nèi)容 csv_writer.writerow(["張迪","男","重慶","計算機系"]) csv_writer.writerow(["蘭博","男","江蘇","通信工程系"]) csv_writer.writerow(["黃飛","男","四川","物聯(lián)網(wǎng)系"]) csv_writer.writerow(["周麗","女","天津","藝術(shù)系"]) csv_writer.writerow(["李蕓","女","上海","外語系"])

轉(zhuǎn)換為json格式

import csv import json csvFile = open("question02.csv","r",encoding="utf-8") jsonFile = open("question02.json","w",encoding="utf-8")fieldNames = {"姓名","性別","籍貫","系別"} reader = csv.DictReader(csvFile) i = 1 jsonFile.write('{\n"personInfo":[\n') for row in reader:print(row)jsonFile.write(json.dumps(row,ensure_ascii=False,indent=4))if i != 5:jsonFile.write(',')i = i+1 jsonFile.write('\n]\n}')

import json with open("question02.json","r",encoding="utf-8") as f:data = json.load(f)#print(data['personInfo'][1]['性別'])#print(type(data))for i in range(0,5):if data['personInfo'][i]['性別'] == '女':print(data['personInfo'][i])

第3題. XML格式文件與JSon的轉(zhuǎn)換
【實驗內(nèi)容集要求】
(1) 讀取以下XML格式的文件，內(nèi)容如下： <?xml
version=”1.0” encoding=”gb2312”> <圖書> <書名>紅樓夢</書名> <作者>曹雪芹</作者><主要內(nèi)容>描述賈寶玉和林黛玉的愛情故事</主要內(nèi)容> <出版社>人民文學出版社</出版社> </圖書>
(2) 將以上XML格式文件轉(zhuǎn)換成JSon格式。

【實驗過程】（步驟、記錄、數(shù)據(jù)、程序等）
請?zhí)峁┫鄳a及程序運行界面截圖。

新建xml文件

import xml.dom.minidom import xmltodict import json #打開xml文檔 #dom = xml.dom.minidom.parse('question_03.xml') #得到文檔元素對象 #root = dom.documentElement #bb = root.getElementsByTagName('書名') #print(bb[0].firstChild.data)#獲取xml文件 file = open("question_03.xml","r",encoding="utf-8") #讀取文件內(nèi)容 xmlStr = file.read() #print(xmlStr) jsonStr = xmltodict.parse(xmlStr) #print(jsonStr) with open("question03JSON.json","w",encoding="utf-8") as f:f.write(str(json.dumps(jsonStr,ensure_ascii=False,indent=4,separators=(',', ': '))))

總結(jié)

以上是生活随笔為你收集整理的数据导入与预处理实验二---json格式文件转换的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： mevan 的常用命令和参数解释
下一篇： Unified Functional T