【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门](6)
【論文閱讀】A Gentle Introduction to Graph Neural Networks [圖神經(jīng)網(wǎng)絡(luò)入門(mén)](6)
- GNN playground
- Some empirical GNN design lessons
- 參考文獻(xiàn)
GNN playground
GNN游樂(lè)場(chǎng)
We’ve described a wide range of GNN components here, but how do they actually differ in practice? This GNN playground allows you to see how these different components and architectures contribute to a GNN’s ability to learn a real task.
我們已經(jīng)了解了各種各樣的GNN組件,但是它們?cè)趯?shí)踐中有什么不同呢?這個(gè)GNN游樂(lè)場(chǎng)允許您了解這些不同的組件和體系結(jié)構(gòu)如何幫助GNN學(xué)習(xí)實(shí)際任務(wù)的能力。
Our playground shows a graph-level prediction task with small molecular graphs. We use the the Leffingwell Odor Dataset [23] [24], which is composed of molecules with associated odor percepts (labels). Predicting the relation of a molecular structure (graph) to its smell is a 100 year-old problem straddling chemistry, physics, neuroscience, and machine learning.
我們的游樂(lè)場(chǎng)展示了一個(gè)帶有小分子圖的圖預(yù)測(cè)任務(wù)。我們使用Leffingwell氣味數(shù)據(jù)集[23] [24],它由分子和相關(guān)的氣味感知器(標(biāo)簽)組成。預(yù)測(cè)分子結(jié)構(gòu)(圖)與氣味的關(guān)系是一個(gè)百年難題,橫跨化學(xué)、物理、神經(jīng)科學(xué)和機(jī)器學(xué)習(xí)。
To simplify the problem, we consider only a single binary label per molecule, classifying if a molecular graph smells “pungent” or not, as labeled by a professional perfumer. We say a molecule has a “pungent” scent if it has a strong, striking smell. For example, garlic and mustard, which might contain the molecule allyl alcohol have this quality. The molecule piperitone, often used for peppermint-flavored candy, is also described as having a pungent smell.
為了簡(jiǎn)化這個(gè)問(wèn)題,我們只考慮每個(gè)分子一個(gè)單一的二元標(biāo)簽,如果分子圖聞起來(lái)“刺鼻”或不刺鼻,就按照專(zhuān)業(yè)調(diào)香師的標(biāo)簽進(jìn)行分類(lèi)。我們說(shuō)一個(gè)分子有“刺鼻”氣味,如果它有一種強(qiáng)烈的、刺鼻的氣味。例如,大蒜和芥末中可能含有烯丙醇分子,具有這種特性。胡椒酮分子,通常用于薄荷味的糖果,也被描述為具有刺激性的氣味。
We represent each molecule as a graph, where atoms are nodes containing a one-hot encoding for its atomic identity (Carbon, Nitrogen, Oxygen, Fluorine) and bonds are edges containing a one-hot encoding its bond type (single, double, triple or aromatic).
我們將每個(gè)分子表示為一個(gè)圖,其中原子是包含一個(gè)one-hot編碼(碳、氮、氧、氟)的節(jié)點(diǎn),而鍵是包含一個(gè)one-hot編碼(單鍵、雙鍵、三鍵或芳香鍵)的鍵類(lèi)型的邊。
Our general modeling template for this problem will be built up using sequential GNN layers, followed by a linear model with a sigmoid activation for classification. The design space for our GNN has many levers that can customize the model:
1.The number of GNN layers, also called the depth.
2.The dimensionality of each attribute when updated. The update function is a 1-layer MLP with a relu activation function and a layer norm for normalization of activations.
3.The aggregation function used in pooling: max, mean or sum.
4.The graph attributes that get updated, or styles of message passing: nodes, edges and global representation. We control these via boolean toggles (on or off). A baseline model would be a graph-independent GNN (all message-passing off) which aggregates all data at the end into a single global attribute. Toggling on all message-passing functions yields a GraphNets architecture.
我們將使用順序的GNN層來(lái)對(duì)這個(gè)問(wèn)題進(jìn)行建模,然后使用一個(gè)用于分類(lèi)的sigmoid激活的線性模型。我們的GNN設(shè)計(jì)架構(gòu)需要許多建模的手段:
1.GNN層數(shù),又稱(chēng)深度。
2.更新時(shí)每個(gè)屬性的維度。更新函數(shù)是一個(gè)具有relu激活函數(shù)和激活規(guī)范化層規(guī)范的單層MLP。
3.在池化操作中使用的聚合函數(shù):max, mean或者sum。
4.被更新的圖屬性或消息傳遞的屬性:節(jié)點(diǎn)、邊和全局表示。我們通過(guò)布爾開(kāi)關(guān)(on或off)來(lái)控制這些?;€模型將是一個(gè)獨(dú)立于圖的GNN(所有信息傳遞),它在最后將所有數(shù)據(jù)聚合到一個(gè)全局屬性中。切換所有信息傳遞函數(shù)會(huì)生成一個(gè)GraphNets體系結(jié)構(gòu)。
To better understand how a GNN is learning a task-optimized representation of a graph, we also look at the penultimate layer activations of the GNN. These ‘graph embeddings’ are the outputs of the GNN model right before prediction. Since we are using a generalized linear model for prediction, a linear mapping is enough to allow us to see how we are learning representations around the decision boundary.
為了更好地理解GNN如何學(xué)習(xí)圖的任務(wù)最佳化表示,我們還將研究GNN的倒數(shù)第二層激活。這些“graph embeddings”是GNN模型在預(yù)測(cè)之前的輸出。由于我們使用廣義線性模型進(jìn)行預(yù)測(cè),一個(gè)線性映射就足以讓我們看到如何圍繞決策邊界進(jìn)行學(xué)習(xí)表示。
Since these are high dimensional vectors, we reduce them to 2D via principal component analysis (PCA). A perfect model would visibility separate labeled data, but since we are reducing dimensionality and also have imperfect models, this boundary might be harder to see.
由于這些是高維向量,我們通過(guò)主成分分析(PCA)將它們簡(jiǎn)化為2D。一個(gè)完美的模型可以看到單獨(dú)的標(biāo)簽數(shù)據(jù),但是由于我們?cè)诮档途S數(shù),所以美中不足的是,這個(gè)邊界可能較難看到。
Play around with different model architectures to build your intuition. For example, see if you can edit the molecule on the left to make the model prediction increase. Do the same edits have the same effects for different model architectures?
使用不同的模型架構(gòu)來(lái)構(gòu)建您的認(rèn)識(shí)。例如,看看你是否可以編輯左邊的分子,使模型預(yù)測(cè)增加。對(duì)于不同的模型體系結(jié)構(gòu),相同的編輯是否具有相同的效果?
This playground is running live on the browser in tfjs.
這個(gè)游樂(lè)場(chǎng)可以在瀏覽器上實(shí)時(shí)運(yùn)行tfjs框架。
圖1 Edit the molecule to see how the prediction changes, or change the model params to load a different model. Select a different molecule in the scatter plot.
編輯分子,看看預(yù)測(cè)如何變化,或改變模型參數(shù),以加載不同的模型。在散點(diǎn)圖中選擇一個(gè)不同的分子。
該實(shí)驗(yàn)一共有6個(gè)參數(shù)可以變換,分別是Depth(模型層數(shù)/GNN層數(shù))、Aggregation function(聚合函數(shù))、Node embedding size(節(jié)點(diǎn)embedding大小)、Edge embedding size(邊embedding大小)、Global embedding size(全局embedding大小)。
Aggregation function(聚合函數(shù))類(lèi)似于卷積神經(jīng)網(wǎng)絡(luò)池化層中的聚合函數(shù),只不過(guò)卷積神經(jīng)網(wǎng)絡(luò)池化層中的聚合函數(shù)Sum函數(shù)并不常見(jiàn)。
Node embedding size也可以理解為表示節(jié)點(diǎn)的向量長(zhǎng)度,Edge embedding size和Global embedding size亦然,這三個(gè)變量可以設(shè)置為空,也就是在條件未知的情況下去預(yù)測(cè)。
右邊的散點(diǎn)圖為數(shù)據(jù)集中每個(gè)分子圖的預(yù)測(cè)結(jié)果與真實(shí)標(biāo)簽的重合程度。其中空心圈為Ground Truth,也就是一個(gè)點(diǎn)的圈邊緣部分表示這個(gè)分子圖的標(biāo)簽值; 實(shí)心圈為Model Prediction,也就是一個(gè)點(diǎn)的實(shí)心部分表示這個(gè)分子圖的預(yù)測(cè)結(jié)果; 紅色代表具有刺激性氣味,藍(lán)色代表不具有刺激性氣味; 那么也就是說(shuō),如果一個(gè)點(diǎn)的圈邊部分和實(shí)心部分如果顏色都是一樣的,說(shuō)明我們的模型對(duì)數(shù)據(jù)集中的該分子圖的是否具有刺激性氣味預(yù)測(cè)正確,反之,則預(yù)測(cè)失敗; MODEL AUC為模型的預(yù)測(cè)準(zhǔn)確率; Pungent為刺激性氣味的程度。
下面為數(shù)據(jù)集中某個(gè)分子圖的預(yù)測(cè)成功和預(yù)測(cè)失敗的例子,可以看到預(yù)測(cè)成功的分子(圖2)的Model Prediction為4% pungent,Ground Truth為not pungent,這說(shuō)明預(yù)測(cè)該分子的刺激程度為4%,這與其標(biāo)簽為非刺激性基本吻合,說(shuō)明預(yù)測(cè)成功; 預(yù)測(cè)失敗的分子(圖3)的Model Prediction為64% pungent,Ground Truth為not pungent,這說(shuō)明預(yù)測(cè)該分子的刺激程度為64%,屬于具有刺激性氣味,這與其標(biāo)簽為非刺激性不吻合,說(shuō)明預(yù)測(cè)失敗; 該模型是在3層GNN,聚合函數(shù)為Sum,節(jié)點(diǎn)embedding大小為50,邊embedding大小為10,全局embedding大小為50的參數(shù)下完成預(yù)測(cè),其預(yù)測(cè)精確度為0.75。
當(dāng)然,你還可以在左邊自定義分子圖結(jié)構(gòu)送入模型來(lái)預(yù)測(cè)該分子是否具有刺激性氣味,圖4為我們輸入了一個(gè)分子圖,其預(yù)測(cè)結(jié)果為9%刺激性氣味,屬于不具有刺激性氣味,由于該圖是我們自定義的,所以其標(biāo)簽為unknown。
Some empirical GNN design lessons
一些GNN設(shè)計(jì)建議
When exploring the architecture choices above, you might have found some models have better performance than others. Are there some clear GNN design choices that will give us better performance? For example, do deeper GNN models perform better than shallower ones? or is there a clear choice between aggregation functions? The answers are going to depend on the data, [25] [26] , and even different ways of featurizing and constructing graphs can give different answers.
在探索上面的架構(gòu)選擇時(shí),您可能會(huì)發(fā)現(xiàn)某些模型的準(zhǔn)確率比其他模型更高。是否有一些明確的GNN設(shè)計(jì)架構(gòu),將給我們更好的性能?例如,較深的GNN模型是否比較淺的模型表現(xiàn)更好?或者在幾種聚合函數(shù)之間有一個(gè)最好的選擇?答案取決于數(shù)據(jù),[25] [26],甚至不同的表示和構(gòu)造圖的方法也會(huì)得出不同的結(jié)論。
With the following interactive figure, we explore the space of GNN architectures and the performance of this task across a few major design choices: Style of message passing, the dimensionality of embeddings, number of layers, and aggregation operation type.
在下圖中,我們通過(guò)幾個(gè)主要的設(shè)計(jì)選擇來(lái)探索GNN架構(gòu)的空間和該任務(wù)的性能:信息傳遞的方式、embedding的維度、GNN的層數(shù)和聚合操作的類(lèi)型。
Each point in the scatter plot represents a model: the x axis is the number of trainable variables, and the y axis is the performance. Hover over a point to see the GNN architecture parameters.
散點(diǎn)圖中的每個(gè)點(diǎn)代表一個(gè)模型:x軸是可訓(xùn)練變量的數(shù)量,y軸是準(zhǔn)確率。將鼠標(biāo)懸停在一個(gè)點(diǎn)上可以看到GNN架構(gòu)參數(shù)。
The first thing to notice is that, surprisingly, a higher number of parameters does correlate with higher performance. GNNs are a very parameter-efficient model type: for even a small number of parameters (3k) we can already find models with high performance.
首先要注意是,參數(shù)數(shù)量越多,性能出乎意料的越好。GNN是一種非常高效的參數(shù)模型: 即使是少量的參數(shù)(3k),我們?nèi)匀荒軌蛘业礁咝阅艿哪P汀?/p>
Next, we can look at the distributions of performance aggregated based on the dimensionality of the learned representations for different graph attributes.
接下來(lái),我們可以查看基于不同圖屬性的表示學(xué)習(xí)的維數(shù)聚合的性能分布。
We can notice that models with higher dimensionality tend to have better mean and lower bound performance but the same trend is not found for the maximum. Some of the top-performing models can be found for smaller dimensions. Since higher dimensionality is going to also involve a higher number of parameters, these observations go in hand with the previous figure.
我們可以注意到,模型的維數(shù)越高,其均值和最小值(準(zhǔn)確率最低的模型)的性能越好,但最大值沒(méi)有出現(xiàn)相同的趨勢(shì)。對(duì)于較低維度的模型,可以找到一些性能最好的模型。由于更高的維數(shù)也將涉及更多的參數(shù),這些觀察結(jié)果與前面的圖是一致的。
其實(shí)這段話也就是想告訴我們,節(jié)點(diǎn)、邊、全局向量的維度并不是越多模型的性能就越強(qiáng),模型的性能與許多參數(shù)息息相關(guān)。
Next we can see the breakdown of performance based on the number of GNN layers.
接下來(lái),我們可以看到GNN層數(shù)對(duì)模型性能的影響。
圖8 GNN層數(shù)與模型性能的關(guān)系圖,模型性能與參數(shù)數(shù)量的關(guān)系散點(diǎn)圖。每個(gè)點(diǎn)都根據(jù)層數(shù)著色。將鼠標(biāo)懸停在一個(gè)點(diǎn)上可以看到GNN架構(gòu)參數(shù)。
The box plot shows a similar trend, while the mean performance tends to increase with the number of layers, the best performing models do not have three or four layers, but two. Furthermore, the lower bound for performance decreases with four layers. This effect has been observed before, GNN with a higher number of layers will broadcast information at a higher distance and can risk having their node representations ‘diluted’ from many successive iterations [27].
箱形圖也顯示了類(lèi)似的趨勢(shì),雖然平均性能隨著層數(shù)的增加而增加,但性能最好的模型不是三層或四層,而是兩層。此外,性能的最小值隨著層的增加而降低。這種效應(yīng)之前已經(jīng)觀察到,具有較多層數(shù)的GNN將以更遠(yuǎn)的距離傳遞信息,并可能在許多連續(xù)迭代中使其節(jié)點(diǎn)表示被“稀釋”[27]。
Does our dataset have a preferred aggregation operation? Our following figure breaks down performance in terms of aggregation type.
我們的數(shù)據(jù)集有聚合操作的最佳選擇嗎?下圖按照聚合操作的類(lèi)型對(duì)模型性能進(jìn)行了解析。
圖9 聚合類(lèi)型與模型性能的關(guān)系圖,模型性能與參數(shù)個(gè)數(shù)的關(guān)系散點(diǎn)圖。每個(gè)點(diǎn)都根據(jù)聚合類(lèi)型著色。將鼠標(biāo)懸停在一個(gè)點(diǎn)上可以看到GNN架構(gòu)參數(shù)。
Overall it appears that sum has a very slight improvement on the mean performance, but max or mean can give equally good models. This is useful to contextualize when looking at the discriminatory/expressive capabilities of aggregation operations .
總的來(lái)說(shuō),sum函數(shù)似乎對(duì)平均性能有非常輕微的改善,但max或mean可以給出性能同樣好的模型。選擇聚合操作的類(lèi)型/表達(dá)能力時(shí)對(duì)于將其置于全局上下文中非常有用。
The previous explorations have given mixed messages. We can find mean trends where more complexity gives better performance but we can find clear counterexamples where models with fewer parameters, number of layers, or dimensionality perform better. One trend that is much clearer is about the number of attributes that are passing information to each other.
之前的探索給出了復(fù)雜的信息。我們可以發(fā)現(xiàn)模型復(fù)雜程度越高,性能越好的平均趨勢(shì),但我們也可以發(fā)現(xiàn)參數(shù)越少、層數(shù)越少或維數(shù)越少的明顯反例。一個(gè)更明顯的趨勢(shì)是相互傳遞信息的屬性的數(shù)量。
Here we break down performance based on the style of message passing. On both extremes, we consider models that do not communicate between graph entities (“none”) and models that have messaging passed between nodes, edges, and globals.
這里,我們將根據(jù)信息傳遞的方式對(duì)模型性能進(jìn)行解析。在這兩種極端情況下,我們考慮的模型在圖實(shí)體(“none”)之間不進(jìn)行信息傳遞,而模型在節(jié)點(diǎn)、邊和全局之間進(jìn)行信息傳遞。
圖10 信息傳遞與模型性能的關(guān)系圖,模型性能與參數(shù)數(shù)量的關(guān)系散點(diǎn)圖。每個(gè)點(diǎn)都通過(guò)信息傳遞著色。將鼠標(biāo)懸停在一個(gè)點(diǎn)上可以看到GNN架構(gòu)參數(shù)
Overall we see that the more graph attributes are communicating, the better the performance of the average model. Our task is centered on global representations, so explicitly learning this attribute also tends to improve performance. Our node representations also seem to be more useful than edge representations, which makes sense since more information is loaded in these attributes.
總的來(lái)說(shuō),我們看到能夠進(jìn)行信息傳遞的圖屬性越多,平均模型的性能就越好。我們的任務(wù)以全局表示為中心,因此明確地學(xué)習(xí)這個(gè)屬性也有助于提高性能。我們的節(jié)點(diǎn)表示似乎也比邊表示更有用,這是有意義的,因?yàn)樵谶@些屬性中加載了更多的信息。
There are many directions you could go from here to get better performance. We wish two highlight two general directions, one related to more sophisticated graph algorithms and another towards the graph itself.
為了獲得更好的性能,你可以從很多方面著手。我們希望突出兩個(gè)方向,一個(gè)與更復(fù)雜的圖算法有關(guān),另一個(gè)與圖本身有關(guān)。
Up until now, our GNN is based on a neighborhood-based pooling operation. There are some graph concepts that are harder to express in this way, for example a linear graph path (a connected chain of nodes). Designing new mechanisms in which graph information can be extracted, executed and propagated in a GNN is one current research area [28] , [29] , [30] , [31] .
到目前為止,我們的GNN是基于鄰居共享操作的。有些圖概念很難用這種方式表達(dá),例如線性圖路徑(連接的節(jié)點(diǎn)鏈)。設(shè)計(jì)一種新的機(jī)制,使圖信息能夠在GNN中提取、執(zhí)行和傳遞是當(dāng)前的研究領(lǐng)域之一[28] , [29] , [30] , [31]。
One of the frontiers of GNN research is not making new models and architectures, but “how to construct graphs”, to be more precise, imbuing graphs with additional structure or relations that can be leveraged. As we loosely saw, the more graph attributes are communicating the more we tend to have better models. In this particular case, we could consider making molecular graphs more feature rich, by adding additional spatial relationships between nodes, adding edges that are not bonds, or explicit learnable relationships between subgraphs.
GNN研究的前沿領(lǐng)域之一不是制造新的模型和架構(gòu),而是“如何構(gòu)造圖”,更精確地說(shuō),為圖注入可以利用的額外結(jié)構(gòu)或關(guān)系。正如我們所看到的,圖屬性之間的信息傳遞越多,我們的模型性能就越好。在這種特殊情況下,我們可以考慮通過(guò)添加節(jié)點(diǎn)之間的額外空間關(guān)系、添加非鍵的邊或子圖之間的顯式可學(xué)習(xí)關(guān)系,使子圖的特征更加豐富。
See more in Other types of graphs.
更多信息見(jiàn)其他類(lèi)型的圖。
參考文獻(xiàn)
[23] Leffingwell Odor Dataset Sanchez-Lengeling, B., Wei, J.N., Lee, B.K., Gerkin, R.C., Aspuru-Guzik, A. and Wiltschko, A.B., 2020.
[24] Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules Sanchez-Lengeling, B., Wei, J.N., Lee, B.K., Gerkin, R.C., Aspuru-Guzik, A. and Wiltschko, A.B., 2019.
[25] Benchmarking Graph Neural Networks Dwivedi, V.P., Joshi, C.K., Laurent, T., Bengio, Y. and Bresson, X., 2020.
[26] Design Space for Graph Neural Networks You, J., Ying, R. and Leskovec, J., 2020.
[27] Principal Neighbourhood Aggregation for Graph Nets Corso, G., Cavalleri, L., Beaini, D., Lio, P. and Velickovic, P., 2020.
[28] Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning Markowitz, E., Balasubramanian, K., Mirtaheri, M., Abu-El-Haija, S., Perozzi, B., Ver Steeg, G. and Galstyan, A., 2021.
[29] Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels Du, S.S., Hou, K., Poczos, B., Salakhutdinov, R., Wang, R. and Xu, K., 2019.
[30] Representation Learning on Graphs with Jumping Knowledge Networks Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K. and Jegelka, S., 2018.
[31] Neural Execution of Graph Algorithms Velickovic, P., Ying, R., Padovano, M., Hadsell, R. and Blundell, C., 2019.
總結(jié)
以上是生活随笔為你收集整理的【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门](6)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 真心没想到!五旬男子开车撞上护栏:只因刚
- 下一篇: PC暴跌:戴尔猛裁员 还要加快停用中国芯