當前位置：首頁 > 编程语言 > python >内容正文

python

面板数据熵值法-Python

發布時間：2023/12/20 python 36 豆豆

生活随笔收集整理的這篇文章主要介紹了面板数据熵值法-Python 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

面板數據數據熵值法公式-基于Python

更新時間：2022年11月9日
更新內容：已解決運行代碼時出現下述問題：

一、理論基礎

本文通過王曉紅等(2021)中所使用到的面板數據熵值法公式，來講解如何對面板數據使用熵值法及Python代碼的實現，具體過程如下：

二、代碼實現

import pandas as pd import numpy as np import os #熵值法 def Entory(path0,forwrd_indicator,inverse_indicator):df=pd.DataFrame()#創建空的DataFramedf1=pd.DataFrame()#創建空的DataFrame#對面板數據進行處理data=pd.read_excel(path0,sheet_name=None,index_col=0)#用于獲取sheet_name的準備工作Sheet_name=list(data.keys())#data.keys():用于獲取所有的sheet_namefor i in Sheet_name:#遍歷sheet_name。df2=pd.read_excel(path0,sheet_name=i,index_col=0)#讀取原始數據，即逐個讀取sheetdf3=pd.DataFrame(df2.to_numpy().reshape(-1, 1, order='F'))#將讀取出的每個sheet轉為一列數據df=pd.concat([df,df3],axis=1)#將每列數據合并成一個DataFramex1=df2.shape[1]#獲取每個sheet表里的列數x2=df2.shape[0]#獲取每個sheet表里的行數y=df2.index#獲取每個sheet表里的行索引（行名）z=list(df2.columns)df.columns=[i for i in Sheet_name]#重命名列索引（列名）df.insert(0,"城市",list(y)*x1)#插入城市列df.insert(1,"年份",sorted(z*x2))#插入時間列df=df.set_index(["城市","年份"])#熵值法#正、負向指標處理df4=df.copy()forwrd_indicator=[i for i in forwrd_indicator]inverse_indicator=[i for i in inverse_indicator]if forwrd_indicator:inverse_indicator=list(set(forwrd_indicator) ^ set(Sheet_name))else:forwrd_indicator=list(set(inverse_indicator) ^ set(Sheet_name))print("正向指標forwrd_indicator：\n",forwrd_indicator)print("\n")print("逆向指標inverse_indicator：\n",inverse_indicator)print("\n")if forwrd_indicator or inverse_indicator:df4[forwrd_indicator]=(df4[forwrd_indicator]-df4[forwrd_indicator].min())/(df4[forwrd_indicator].max()-df4[forwrd_indicator].min())df4[inverse_indicator]=(df4[inverse_indicator].max()-df4[inverse_indicator])/(df4[inverse_indicator].max()-df4[inverse_indicator].min())df4=df4.apply(lambda x:x+0.01)#為避免 ln0 的影響，進行數據平移,可根據個人需求自行修改數值df5=df4/df4.apply(lambda x:x.sum())#計算各個數據在對應列中所占比重k=np.power(np.log(df5.shape[0]),-1)#計算kp=df5/df5.apply(lambda x:x.sum())P=(p*p.apply(np.log)).sum()#上面求熵值的公式中k的后面那一部分entory=-k*P#計算各列的熵值D=1-entory#計算各指標熵值的差異系數W=D/D.sum()#計算各指標權重print("權重：\n",W)print("\n")#將數據和結果寫入Excel表格excel_to_path=os.path.join(os.path.split(path0)[0],"熵值法.xlsx")with pd.ExcelWriter(path=excel_to_path) as writer:df.to_excel(writer, sheet_name='面板數據')W.to_excel(writer, sheet_name='權重')print("結果已保存到路徑{}下".format(excel_to_path)) path0=r"C:\Users\HP\Desktop\python.xlsx"#目標文件路徑(自己填) forwrd_indicator=["用水量","人均GDP增長率"]#正向指標。正負向指標填一個即可，為方便，可以填少的那個（自己填) inverse_indicator=[]#負向指標 Entory(path0,forwrd_indicator,inverse_indicator)#不用管

三、實例

在本例中，求用水量與GDP增長率這兩個指標所占的比重。
數據格式如下圖所示：
注：每個指標單獨放在一個sheet表里

數據來源：中國統計年鑒

代碼實現過程
此步驟見第二步

結果

其中，最后一行W為各指標的權重，其余各行說明詳見代碼實現部分

四、結果驗證

為驗證結果的正確性，此處使用spssau進行驗證，結果如下圖所示：

比較二者的結果，可認為本文所提供的代碼具有一定的合理性。

五、說明

在計算信息熵時(如下圖所示)，由于我們對數據采用的是極差標準化方法，使得標準化后的數據的取值范圍在[0,1]之間，也就是說該方法會使得部分數據取到0，而在計算信息熵時（如下圖所示），ln0是無效的。而在相關文獻中關于P的處理，有以下兩種：一種是對標準化后的數據進行平移（本文中采取該方法），另一種則是令P*lnP=0。因此，對于這兩種方法所求出來的權重之間的差異如何，本文在此利用上文中的數據對第二種方法進行計算。
第二種方法的處理方式：將下列代碼刪除即可

df4=df4.apply(lambda x:x+0.01)#為避免 ln0 的影響，進行數據平移,可根據個人需求自行修改數值

結果對比：

由上述結果來看，不同方法求出的權重的確不同，但兩者相差大概在0.01左右。因此可根據自己的需求自行選擇。

注：本人能力有限，文中錯漏之處在所難免，請各位多多包涵。

總結

以上是生活随笔為你收集整理的面板数据熵值法-Python的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：无版权图片网站
下一篇： PDF 补丁丁 0.6.2 测试版发布