DGL教程【一】使用Cora数据集进行分类
生活随笔
收集整理的這篇文章主要介紹了
DGL教程【一】使用Cora数据集进行分类
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
本教程將演示如何構建一個基于半監督的節點分類任務的GNN網絡,任務基于一個小數據集Cora,這是一個將論文作為節點,引用關系作為邊的網絡結構。
任務就是預測一個論文的所屬分類。每一個論文包含一個詞頻信息作為屬性特征。
首先安裝dgl
pip install dgl -i https://pypi.douban.com/simple/
加載Cora數據集
import dgl.datadataset = dgl.data.CoraGraphDataset() print('Number of categories:', dataset.num_classes)這樣會自動下載Cora數據集到Extracting file to C:\Users\vincent\.dgl\cora_v2\目錄下,輸出結果如下:
Downloading C:\Users\vincent\.dgl\cora_v2.zip from https://data.dgl.ai/dataset/cora_v2.zip... Extracting file to C:\Users\vincent\.dgl\cora_v2 Finished data loading and preprocessing.NumNodes: 2708NumEdges: 10556NumFeats: 1433NumClasses: 7NumTrainingSamples: 140NumValidationSamples: 500NumTestSamples: 1000 Done saving data into cached files. Number of categories: 7一個DGL數據集可能包含多個Graph,但是Cora數據集僅包含一個Graph:
g = dataset[0]一個DGL圖可以通過字典的形式存儲節點的屬性ndata和邊的屬性edata。在DGL Cora數據集中,graph包含下面幾個節點特征:
- train_mask:一個bool 類型的tensor,表示一個節點是不是屬于training set
- val_mask: 一個bool 類型的tensor,表示一個節點是不是屬于validation set
- test_mask:一個bool 類型的tensor,表示一個節點是不是屬于test set
- label:節點的分類標簽
- feat:節點的屬性
輸出結果:
Node features {'feat': tensor([[0., 0., 0., ..., 0., 0., 0.],[0., 0., 0., ..., 0., 0., 0.],[0., 0., 0., ..., 0., 0., 0.],...,[0., 0., 0., ..., 0., 0., 0.],[0., 0., 0., ..., 0., 0., 0.],[0., 0., 0., ..., 0., 0., 0.]]), 'label': tensor([3, 4, 4, ..., 3, 3, 3]), 'test_mask': tensor([False, False, False, ..., True, True, True]), 'train_mask': tensor([ True, True, True, ..., False, False, False]), 'val_mask': tensor([False, False, False, ..., False, False, False])} Edge features {}定義一個GNN網絡
我們將構建一個兩層的GCN網絡,每一層通過聚合鄰居信息來計算一個節點表示。
為了構建這樣一個多層的GCN,我們可以簡單的堆疊dgl.nn.GraphConv模塊,這個模塊繼承了torch.nn.Module。
import torch import torch.nn as nn import dgl.data from dgl.nn.pytorch import GraphConv import torch.nn.functional as Fdataset = dgl.data.CoraGraphDataset() print('Number of categories:', dataset.num_classes) g = dataset[0] print('Node features') print(g.ndata) print('Edge features') print(g.edata)class GCN(nn.Module):def __init__(self, in_feats, h_feats, num_classes):super(GCN, self).__init__()self.conv1 = GraphConv(in_feats, h_feats)self.conv2 = GraphConv(h_feats, num_classes)def forward(self, g, in_feat):h = self.conv1(g, in_feat)h = F.relu(h)h = self.conv2(g, h)return h# Create the model with given dimensions model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes) print(model)DGL實現了很多當下流行的聚合鄰居的模塊,我們可以只用一行代碼就可以使用。
訓練GCN
使用DGL訓練GCN與訓練其他Pytorch神經網絡過程類似:
import torch import torch.nn as nn import dgl.data from dgl.nn.pytorch import GraphConv import torch.nn.functional as Fdataset = dgl.data.CoraGraphDataset() print('Number of categories:', dataset.num_classes) g = dataset[0] print('Node features') print(g.ndata) print('Edge features') print(g.edata)class GCN(nn.Module):def __init__(self, in_feats, h_feats, num_classes):super(GCN, self).__init__()self.conv1 = GraphConv(in_feats, h_feats)self.conv2 = GraphConv(h_feats, num_classes)def forward(self, g, in_feat):h = self.conv1(g, in_feat)h = F.relu(h)h = self.conv2(g, h)return hdef train(g, model):optimizer = torch.optim.Adam(model.parameters(), lr=0.01)best_val_acc = 0best_test_acc = 0features = g.ndata['feat']labels = g.ndata['label']train_mask = g.ndata['train_mask']val_mask = g.ndata['val_mask']test_mask = g.ndata['test_mask']for e in range(100):# Forwardlogits = model(g, features)# Compute predictionpred = logits.argmax(1)# Compute loss# Note that you should only compute the losses of the nodes in the training set.loss = F.cross_entropy(logits[train_mask], labels[train_mask])# Compute accuracy on training/validation/testtrain_acc = (pred[train_mask] == labels[train_mask]).float().mean()val_acc = (pred[val_mask] == labels[val_mask]).float().mean()test_acc = (pred[test_mask] == labels[test_mask]).float().mean()# Save the best validation accuracy and the corresponding test accuracy.if best_val_acc < val_acc:best_val_acc = val_accbest_test_acc = test_acc# Backwardoptimizer.zero_grad()loss.backward()optimizer.step()if e % 5 == 0:print('In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(e, loss, val_acc, best_val_acc, test_acc, best_test_acc))# Create the model with given dimensions model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes) print(model) train(g, model)輸出結果:
In epoch 0, loss: 1.946, val acc: 0.134 (best 0.134), test acc: 0.138 (best 0.138) In epoch 5, loss: 1.892, val acc: 0.506 (best 0.522), test acc: 0.499 (best 0.539) In epoch 10, loss: 1.806, val acc: 0.600 (best 0.612), test acc: 0.633 (best 0.636) In epoch 15, loss: 1.698, val acc: 0.594 (best 0.612), test acc: 0.626 (best 0.636) In epoch 20, loss: 1.567, val acc: 0.632 (best 0.632), test acc: 0.653 (best 0.653) In epoch 25, loss: 1.417, val acc: 0.712 (best 0.712), test acc: 0.700 (best 0.700) In epoch 30, loss: 1.251, val acc: 0.738 (best 0.738), test acc: 0.737 (best 0.737) In epoch 35, loss: 1.079, val acc: 0.746 (best 0.746), test acc: 0.751 (best 0.751) In epoch 40, loss: 0.909, val acc: 0.746 (best 0.748), test acc: 0.758 (best 0.756) In epoch 45, loss: 0.751, val acc: 0.738 (best 0.748), test acc: 0.766 (best 0.756) In epoch 50, loss: 0.612, val acc: 0.744 (best 0.748), test acc: 0.767 (best 0.756) In epoch 55, loss: 0.494, val acc: 0.752 (best 0.752), test acc: 0.773 (best 0.773) In epoch 60, loss: 0.399, val acc: 0.762 (best 0.762), test acc: 0.776 (best 0.776) In epoch 65, loss: 0.322, val acc: 0.762 (best 0.766), test acc: 0.776 (best 0.776) In epoch 70, loss: 0.262, val acc: 0.764 (best 0.768), test acc: 0.778 (best 0.775) In epoch 75, loss: 0.215, val acc: 0.766 (best 0.768), test acc: 0.778 (best 0.775) In epoch 80, loss: 0.178, val acc: 0.766 (best 0.768), test acc: 0.779 (best 0.775) In epoch 85, loss: 0.149, val acc: 0.766 (best 0.768), test acc: 0.780 (best 0.775) In epoch 90, loss: 0.126, val acc: 0.768 (best 0.768), test acc: 0.779 (best 0.775) In epoch 95, loss: 0.107, val acc: 0.768 (best 0.768), test acc: 0.776 (best 0.775)在GPU上進行訓練
在GPU上訓練需要將模型和數據通過to()方法放到GPU上:
g = g.to('cuda') model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda') train(g, model)總結
以上是生活随笔為你收集整理的DGL教程【一】使用Cora数据集进行分类的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 2022世界杯预选赛中国赛程表
- 下一篇: Windows常用高效快捷键11个,你都