當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

NILMTK——经典数据集REDD介绍和使用

發(fā)布時間：2023/12/20 编程问答 39 豆豆

生活随笔收集整理的這篇文章主要介紹了 NILMTK——经典数据集REDD介绍和使用小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

配置了NILMTK包的環(huán)境之后，想找數(shù)據(jù)測試一下，在NILMTK官網(wǎng)的API Docs里邊發(fā)現(xiàn)dataset_converters模塊中有內(nèi)置的數(shù)據(jù)集處理函數(shù)，如圖：

將數(shù)據(jù)轉(zhuǎn)換成HDF文件，這些數(shù)據(jù)都是比較優(yōu)秀的，其中，常用的數(shù)據(jù)集是REDD和UK_DALE。

1. REDD數(shù)據(jù)集

目前版本的下載地址為:?http://redd.csail.mit.edu，需要向作者發(fā)送郵件，才能獲取用戶名和密碼進(jìn)行下載！

論文為：J. Zico Kolter and Matthew J. Johnson. REDD: A public data set for energy disaggregation research. In proceedings of the SustKDD workshop on Data Mining Applications in Sustainability, 2011. [pdf]

數(shù)據(jù)集的文件為：

文件主要包含低頻功率數(shù)據(jù)和高頻電壓電流數(shù)據(jù)

low_freq：1Hz功率數(shù)據(jù)

high_freq:校準(zhǔn)和分組之后的電壓電流波形數(shù)據(jù)

high_freq_row:原生電壓電流波形數(shù)據(jù)

（1)low_freq的文件目錄

總共收集了6個家庭的數(shù)據(jù)，labels記錄了每個channel的設(shè)備類型，channel是記錄每個channel的UTC時間戳的功率數(shù)據(jù)。

labels：

channel（一秒一個點(diǎn)）：

（2）high_freq的文件目錄

總共收集了6個家庭的數(shù)據(jù)，current_1記錄了第一電源的電流數(shù)據(jù)，current_1記錄了第二電源的電流數(shù)據(jù)，voltage記錄了電壓數(shù)據(jù)。

需要注意的是：

a、十進(jìn)制的UTC時間戳，與低頻的UTC時間戳是一樣的格式，但是這個允許有小數(shù)部分。

b、循環(huán)計(jì)數(shù)，雖然它在文件中表示為雙精度，但實(shí)際上它是一個整數(shù)，表示該特定波形保留多少交流周期。

c、在等間隔的周期中，275個十進(jìn)制數(shù)值，表示波形的數(shù)值

下載完數(shù)據(jù)集之后，可通過dataset_converters 的函數(shù)將數(shù)據(jù)改為HDF格式：

from nilmtk.dataset_converters import convert_reddconvert_redd(r'C:\Users\admin\Anaconda3\nilm_metadata\low_freq',r'C:\Users\admin\Anaconda3\nilm_metadata\low_freq\redd_low_new.h5')

2. REDD數(shù)據(jù)集的使用

a、負(fù)荷分解算法

通過NILMTK官網(wǎng)的API知道負(fù)荷分解包的算法有組合優(yōu)化（CombinatorialOptimisation）、因子隱馬爾可夫(FHMM)、Hart 1985（Hart 1985 algorithm），常用的是CO和FHMM。

b、負(fù)荷分解實(shí)現(xiàn)

以下例子是通過CO和FHMM計(jì)算的，文件獲取在：

CO：http://nilmtk.github.io/nilmtk/master/_modules/nilmtk/disaggregate/combinatorial_optimisation.html#CombinatorialOptimisation

FHMM：nilmtk.legacy.disaggregate文件下的fhmm_exact文件。

獲取數(shù)據(jù)：

from __future__ import print_function, division import pandas as pd import numpy as np from nilmtk.dataset import DataSet #from nilmtk.metergroup import MeterGroup #from nilmtk.datastore import HDFDataStore #from nilmtk.timeframe import TimeFrame from nilmtk.disaggregate.combinatorial_optimisation import CombinatorialOptimisation from nilmtk.legacy.disaggregate.fhmm_exact import FHMMtrain = DataSet('C:/Users/admin/PycharmProjects/nilmtktest/low_freq/redd_low.h5') # 讀取數(shù)據(jù)集 test = DataSet('C:/Users/admin/PycharmProjects/nilmtktest/low_freq/redd_low.h5') # 讀取數(shù)據(jù)集 building = 1 ## 選擇家庭house train.set_window(end="30-4-2011") ## 劃分?jǐn)?shù)據(jù)集，2011年4月20號之前的作為訓(xùn)練集 test.set_window(start="30-4-2011") ## 四月40號之后的作為測試集## elec包含了這個家庭中的所有的電器信息和總功率信息,building=1-6個家庭 train_elec = train.buildings[1].elec test_elec = test.buildings[1].electop_5_train_elec = train_elec.submeters().select_top_k(k=5) ## 選擇用電量排在前5的來進(jìn)行訓(xùn)練和測試

選取了第一個家庭，用電量在前5的電器數(shù)據(jù)進(jìn)行測試。

計(jì)算：

def predict(clf, test_elec, sample_period, timezone): ## 定義預(yù)測的方法pred = {}gt= {}#獲取總的負(fù)荷數(shù)據(jù)for i, chunk in enumerate(test_elec.mains().load(sample_period=sample_period)):chunk_drop_na = chunk.dropna() ### 丟到缺省值pred[i] = clf.disaggregate_chunk(chunk_drop_na) #### 分解，disaggregate_chunk #通過調(diào)用這個方法實(shí)現(xiàn)分解，這部分代碼在下面可以見到gt[i]={} ## 這是groudtruth，即真實(shí)的單個電器的消耗功率for meter in test_elec.submeters().meters:# Only use the meters that we trained on (this saves time!) gt[i][meter] = next(meter.load(sample_period=sample_period)) gt[i] = pd.DataFrame({k:v.squeeze() for k,v in gt[i].items()}, index=next(iter(gt[i].values())).index).dropna() #### 上面這一塊主要是為了得到pandas格式的gt數(shù)據(jù)# If everything can fit in memorygt_overall = pd.concat(gt) gt_overall.index = gt_overall.index.droplevel()pred_overall = pd.concat(pred)pred_overall.index = pred_overall.index.droplevel()# Having the same order of columnsgt_overall = gt_overall[pred_overall.columns]#Intersection of indexgt_index_utc = gt_overall.index.tz_convert("UTC")pred_index_utc = pred_overall.index.tz_convert("UTC")common_index_utc = gt_index_utc.intersection(pred_index_utc)common_index_local = common_index_utc.tz_convert(timezone)gt_overall = gt_overall.ix[common_index_local]pred_overall = pred_overall.ix[common_index_local]appliance_labels = [m.label() for m in gt_overall.columns.values]gt_overall.columns = appliance_labelspred_overall.columns = appliance_labelsreturn gt_overall, pred_overallclassifiers = { 'CO':CombinatorialOptimisation(),'FHMM':FHMM()} ### 設(shè)置了兩種算法，一種是CO，一種是FHMM predictions = {} sample_period = 120 ## 采樣周期是兩分鐘 for clf_name, clf in classifiers.items():print("*"*20)print(clf_name)print("*" *20)clf.train(top_5_train_elec, sample_period=sample_period) ### 訓(xùn)練部分gt, predictions[clf_name] = predict(clf, test_elec, 120, train.metadata['timezone'])

先用clf.train訓(xùn)練這5種電器的特征規(guī)律，然后在用總的功率數(shù)據(jù)進(jìn)行各種電器特征分解。gt記錄了每個電器的功率數(shù)據(jù)，采樣周期是兩分鐘一個點(diǎn)，后邊根據(jù)預(yù)測的電器種類選取了用電量排名比較高的5種電器。

predictions變量記錄了兩個算法的計(jì)算結(jié)果：

評估：

def compute_rmse(gt, pred): ### 評估指標(biāo) rmsefrom sklearn.metrics import mean_squared_errorrms_error = {}for appliance in gt.columns:rms_error[appliance] = np.sqrt(mean_squared_error(gt[appliance], pred[appliance])) ## 評價指標(biāo)的定義很簡單，就是均方根誤差return pd.Series(rms_error) rmse = {} for clf_name in classifiers.keys():rmse[clf_name] = compute_rmse(gt, predictions[clf_name]) rmse = pd.DataFrame(rmse)

計(jì)算結(jié)果為：

參考博客：https://blog.csdn.net/baidu_36161077/article/details/81144037

總結(jié)

以上是生活随笔為你收集整理的NILMTK——经典数据集REDD介绍和使用的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【坐标标注】点坐标标注插件使用手册，可支
下一篇：关于vs2015各版本的卸载