MLAT-Autoencoders---下篇-关键代码及结果展示(3)(终)
生活随笔
收集整理的這篇文章主要介紹了
MLAT-Autoencoders---下篇-关键代码及结果展示(3)(终)
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
用于回報預(yù)測和交易的條件自動編碼器
本節(jié)主要介紹自編碼器在資產(chǎn)定價中的應(yīng)用。
應(yīng)用步驟分為:
第一步:創(chuàng)建包含股價和元數(shù)據(jù)信息的新數(shù)據(jù)集
第二步:計算預(yù)測資產(chǎn)特征
第三步:創(chuàng)建和訓(xùn)練條件式自動編碼器架構(gòu)
第四步:評估結(jié)果
part1
數(shù)據(jù)
from pathlib import Pathimport numpy as np import pandas as pdfrom statsmodels.regression.rolling import RollingOLS import statsmodels.api as sm import matplotlib.pyplot as plt import seaborn as snsidx = pd.IndexSlice sns.set_style('whitegrid')results_path = Path('results', 'asset_pricing') if not results_path.exists():results_path.mkdir(parents=True)價格數(shù)據(jù)
prices = pd.read_hdf(results_path / 'data.h5', 'stocks/prices/adjusted')prices.info(show_counts=True)out:
元數(shù)據(jù)
out:
用元數(shù)據(jù)選股
sectors = (metadata.sector.value_counts() > 50).indextickers_with_errors = ['FTAI', 'AIRT', 'CYBR', 'KTB'] tickers_with_metadata = metadata[metadata.sector.isin(sectors) & metadata.marketcap.notnull() &metadata.sharesoutstanding.notnull() & (metadata.sharesoutstanding > 0)].index.drop(tickers_with_errors)metadata = metadata.loc[tickers_with_metadata, ['sector', 'sharesoutstanding', 'marketcap']] metadata.index.name = 'ticker'prices = prices.loc[idx[tickers_with_metadata, :], :]prices.info(null_counts=True) metadata.info() close = prices.close.unstack('ticker').sort_index() close.info() volume = prices.volume.unstack('ticker').sort_index() volume.info()
創(chuàng)建周回報
因子
MONTH = 21價格趨勢
短期
(一個月累計回報)
股票走勢
(截止于月末前一個月的十一個月累積報表)
走勢變化
市場走勢
最大回報率
流動性指標(biāo)和風(fēng)險度量略。
part2
建模
導(dǎo)入各種包
import warnings warnings.filterwarnings('ignore')import sys, os from time import time from pathlib import Path from itertools import product from tqdm import tqdm import numpy as np import pandas as pdimport matplotlib.pyplot as plt import seaborn as snsimport tensorflow as tf from tensorflow.keras.layers import Input, Dense, Dot, Reshape, BatchNormalization from tensorflow.keras.models import Model from tensorflow.keras.callbacks import TensorBoardfrom sklearn.preprocessing import quantile_transformfrom scipy.stats import spearmanrsys.path.insert(1, os.path.join(sys.path[0], '..')) from utils import MultipleTimeSeriesCV, format_timeidx = pd.IndexSlice sns.set_style('whitegrid') np.random.seed(42)results_path = Path('results', 'asset_pricing') if not results_path.exists():results_path.mkdir(parents=True)characteristics = ['beta', 'betasq', 'chmom', 'dolvol', 'idiovol', 'ill', 'indmom','maxret', 'mom12m', 'mom1m', 'mom36m', 'mvel', 'retvol', 'turn', 'turn_std']數(shù)據(jù)準(zhǔn)備
with pd.HDFStore(results_path / 'autoencoder.h5') as store:print(store.info())
周回報率
標(biāo)準(zhǔn)列的特點(diǎn)
模型總結(jié)
def make_model(hidden_units=8, n_factors=3):input_beta = Input((n_tickers, n_characteristics), name='input_beta')input_factor = Input((n_tickers,), name='input_factor')hidden_layer = Dense(units=hidden_units, activation='relu', name='hidden_layer')(input_beta)batch_norm = BatchNormalization(name='batch_norm')(hidden_layer)output_beta = Dense(units=n_factors, name='output_beta')(batch_norm)output_factor = Dense(units=n_factors, name='output_factor')(input_factor)output = Dot(axes=(2,1), name='output_layer')([output_beta, output_factor])model = Model(inputs=[input_beta, input_factor], outputs=output)model.compile(loss='mse', optimizer='adam')return modelmodel.summary()訓(xùn)練模型
關(guān)鍵代碼展示(out太長了就不放了)
start = time() for units, n_factors in param_grid:scores = []model = make_model(hidden_units=units, n_factors=n_factors)for fold, (train_idx, val_idx) in enumerate(cv.split(data)):X1_train, X2_train, y_train, X1_val, X2_val, y_val = get_train_valid_data(data,train_idx,val_idx)for epoch in range(250):model.fit([X1_train, X2_train], y_train,batch_size=batch_size,validation_data=([X1_val, X2_val], y_val),epochs=epoch + 1,initial_epoch=epoch,verbose=0, shuffle=True)result = (pd.DataFrame({'y_pred': model.predict([X1_val,X2_val]).reshape(-1),'y_true': y_val.stack().values},index=y_val.stack().index).replace(-2, np.nan).dropna())r0 = spearmanr(result.y_true, result.y_pred)[0]r1 = result.groupby(level='date').apply(lambda x: spearmanr(x.y_pred,x.y_true)[0])scores.append([units, n_factors, fold, epoch, r0,r1.mean(), r1.std(), r1.median()])if epoch % 50 == 0:print(f'{format_time(time()-start)} | {n_factors} | {units:02} | {fold:02}-{epoch:03} | {r0:6.2%} | 'f'{r1.mean():6.2%} | {r1.median():6.2%}') scores = pd.DataFrame(scores, columns=cols)scores.to_hdf(results_path / 'scores.h5', f'{units}/{n_factors}')評估
scores = [] with pd.HDFStore(results_path / 'scores.h5') as store:for key in store.keys():scores.append(store[key]) scores = pd.concat(scores) scores.info() avg = (scores.groupby(['n_factors', 'units', 'epoch'])['ic_mean', 'ic_daily_mean', 'ic_daily_median'].mean().reset_index()) avg.nlargest(n=20, columns=['ic_daily_median']) fig, axes = plt.subplots(ncols=5, nrows=2, figsize=(20, 8), sharey='row', sharex=True)for n in range(2, 7):df = avg[avg.n_factors==n].pivot(index='epoch', columns='units', values='ic_mean')df.rolling(10).mean().loc[:200].plot(ax=axes[0][n-2], lw=1, title=f'{n} Factors')axes[0][n-2].axhline(0, ls='--', c='k', lw=1)axes[0][n-2].get_legend().remove()axes[0][n-2].set_ylabel('IC (10-epoch rolling mean)')df = avg[avg.n_factors==n].pivot(index='epoch', columns='units', values='ic_daily_median')df.rolling(10).mean().loc[:200].plot(ax=axes[1][n-2], lw=1)axes[1][n-2].axhline(0, ls='--', c='k', lw=1)axes[1][n-2].get_legend().remove()axes[1][n-2].set_ylabel('IC, Daily Median (10-epoch rolling mean)')handles, labels = axes[0][0].get_legend_handles_labels() fig.legend(handles, labels, loc='center right', title='# Units') fig.suptitle('Cross-Validation Performance (2015-2019)', fontsize=16) fig.tight_layout() fig.subplots_adjust(top=.9) fig.savefig(results_path / 'cv_performance', dpi=300);part3
生成預(yù)測
對一系列似乎能提供良好預(yù)測的epochs進(jìn)行平均。
n_factors = 4 units = 32 batch_size = 32 first_epoch = 50 last_epoch = 80predictions = [] for epoch in tqdm(list(range(first_epoch, last_epoch))):epoch_preds = []for fold, (train_idx, val_idx) in enumerate(cv.split(data)):X1_train, X2_train, y_train, X1_val, X2_val, y_val = get_train_valid_data(data,train_idx,val_idx)model = make_model(n_factors=n_factors, hidden_units=units)model.fit([X1_train, X2_train], y_train,batch_size=batch_size,epochs=epoch,verbose=0,shuffle=True)epoch_preds.append(pd.Series(model.predict([X1_val, X2_val]).reshape(-1),index=y_val.stack().index).to_frame(epoch))predictions.append(pd.concat(epoch_preds))predictions_combined = pd.concat(predictions, axis=1).sort_index()predictions_combined.info()predictions_combined.to_hdf(results_path / 'predictions.h5', 'predictions')
總結(jié)
以上是生活随笔為你收集整理的MLAT-Autoencoders---下篇-关键代码及结果展示(3)(终)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: smb连接错误“请检查服务器名称或IP地
- 下一篇: 23.网络文件共享服务