【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门](5)
【論文閱讀】A Gentle Introduction to Graph Neural Networks [圖神經網絡入門](5)
Graph Neural Networks
圖神經網絡
Now that the graph’s description is in a matrix format that is permutation invariant, we will describe using graph neural networks (GNNs) to solve graph prediction tasks. A GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances). We’re going to build GNNs using the “message passing neural network” framework proposed by Gilmer et al. [18] using the Graph Nets architecture schematics introduced by Battaglia et al. [19] GNNs adopt a “graph-in, graph-out” architecture meaning that these model types accept a graph as input, with information loaded into its nodes, edges and global-context, and progressively transform these embeddings, without changing the connectivity of the input graph.
既然圖的描述是排列不變的矩陣格式,我們將使用圖神經網絡(GNN)來描述以解決圖預測任務。GNN是對圖的所有屬性(節點、邊、全局上下文)的一種可優化的轉換,它保持了圖的對稱性(排列不變性)。我們將使用Gilmer等人提出的“信息傳遞神經網絡”[18]框架來構建GNN。使用Battaglia等人介紹的圖網架構示意圖[19]。GNN采用一種“圖入,圖出”的體系結構,這意味著這些模型類型接受一個圖作為輸入,將信息加載到它的節點、邊和全局上下文中,并逐步轉換這些embedding,而不改變輸入圖的連通性。
The simplest GNN
最簡單的GNN
With the numerical representation of graphs that we’ve constructed above (with vectors instead of scalars), we are now ready to build a GNN. We will start with the simplest GNN architecture, one where we learn new embeddings for all graph attributes (nodes, edges, global), but where we do not yet use the connectivity of the graph.
使用我們上面構建的圖的數字表示(使用向量而不是標量),我們現在可以構建GNN了。我們將從最簡單的GNN架構開始,在這個架構中,我們學習所有圖屬性(節點、邊、全局)的新embeddings,但我們還沒有使用圖的連通性。
For simplicity, the previous diagrams used scalars to represent graph attributes; in practice feature vectors, or embeddings, are much more useful.
You could also call it a GNN block. Because it contains multiple operations/layers (like a ResNet block).
為簡單起見,前面的圖使用標量來表示圖的屬性;在實踐中,特征向量或embeddings更有用。
你也可以叫它GNN block。因為它包含多個操作/層(與ResNet block類似)。
This GNN uses a separate multilayer perceptron (MLP) (or your favorite differentiable model) on each component of a graph; we call this a GNN layer. For each node vector, we apply the MLP and get back a learned node-vector. We do the same for each edge, learning a per-edge embedding, and also for the global-context vector, learning a single embedding for the entire graph.
該GNN使用一個獨立的多層感知器(MLP)(或你喜歡的可微模型)在圖的每個組件; 我們稱之為GNN層。對于每個節點向量,我們應用MLP并得到一個學習后的節點向量。我們對每條邊做同樣的操作,學習每條邊的embedding,對全局上下文向量也做同樣的操作,學習整個圖的單個embedding。
A single layer of a simple GNN. A graph is the input, and each component (V,E,U) gets updated by a MLP to produce a new graph. Each function subscript indicates a separate function for a different graph attribute at the n-th layer of a GNN model.
一個簡單的GNN單層。輸入是一個圖,MLP更新每個參數(V,E,U)以生成一個新的圖。每個函數下標表示GNN模型第n層的不同圖形屬性的單獨函數。
As is common with neural networks modules or layers, we can stack these GNN layers together.
與神經網絡模塊或層一樣,我們可以將這些GNN層堆疊在一起。
Because a GNN does not update the connectivity of the input graph, we can describe the output graph of a GNN with the same adjacency list and the same number of feature vectors as the input graph. But, the output graph has updated embeddings, since the GNN has updated each of the node, edge and global-context representations.
由于GNN不更新輸入的圖的連通性,我們可以用與輸入的圖相同的鄰接表和相同數量的特征向量來描述GNN的輸出圖。但是,輸出圖更新了embeddings,因為GNN更新了每個節點、邊和全局上下文表示。
GNN Predictions by Pooling Information
通過池化信息來預測GNN
We have built a simple GNN, but how do we make predictions in any of the tasks we described above?
我們已經構建了一個簡單的GNN,但是我們如何在上面描述的任務中進行預測呢?
We will consider the case of binary classification, but this framework can easily be extended to the multi-class or regression case. If the task is to make binary predictions on nodes, and the graph already contains node information, the approach is straightforward?—?for each node embedding, apply a linear classifier.
我們先來考慮二分類問題,這個框架可以很容易地擴展到多分類或回歸的問題。如果任務是對節點進行二分類預測,而圖中已經包含了節點信息,那么這種方法很簡單——對于每個embedding的節點,應用一個線性分類器。
We could imagine a social network, where we wish to anonymize user data (nodes) by not using them, and only using relational data (edges). One instance of such a scenario is the node task we specified in the Node-level task subsection. In the Karate club example, this would be just using the number of meetings between people to determine the alliance to Mr. Hi or John H.
我們可以想象一個社交網絡,在這個網絡中,我們希望匿名化用戶數據(節點),不使用它們,只使用關系數據(邊)。這種場景的一個實例是我們在節點層面的任務小節中指定的節點任務。在空手道俱樂部的例子中,這將只是使用人們之間的會議數量來決定與Mr. Hi或John H的聯盟。
However, it is not always so simple. For instance, you might have information in the graph stored in edges, but no information in nodes, but still need to make predictions on nodes. We need a way to collect information from edges and give them to nodes for prediction. We can do this by pooling. Pooling proceeds in two steps:
1.For each item to be pooled, gather each of their embeddings and concatenate them into a matrix.
2.The gathered embeddings are then aggregated, usually via a sum operation.
然而,事情并不總是那么簡單。例如,你可能在圖里存儲在邊中的信息,但在節點中沒有信息,但仍然需要對節點進行預測。我們需要一種方法從邊收集信息,并將它們交給節點進行預測。我們可以通過pooling(池化)來實現。池化操作實現分兩個步驟:
1.對于要進行池化操作的每個項,將它們的每個embeddings集合起來,并將它們連接到一個矩陣中。
2.然后,通常使用求和操作對收集到的embeddings進行聚合。
For a more in-depth discussion on aggregation operations go to the Comparing aggregation operations section.
有關聚合操作的更深入討論,請參考聚合操作比較一節。
We represent the pooling operation by the letter ρρρ, and denote that we are gathering information from edges to nodes as ρEn→Vnρ_{E_n→V_n}ρEn?→Vn??.
我們用字母ρρρ表示池化操作,并表示我們從邊收集信息到節點ρEn→Vnρ_{E_n→V_n}ρEn?→Vn??。
Hover over a node (black node) to visualize which edges are gathered and aggregated to produce an embedding for that target node.
將鼠標懸停在一個節點(黑節點)上,可以查看收集和聚合了哪些邊以生成目標節點的嵌入。
So If we only have edge-level features, and are trying to predict binary node information, we can use pooling to route (or pass) information to where it needs to go. The model looks like this.
因此,如果我們只有邊層面的特征,并試圖預測二分類節點信息,我們可以使用池化操作路由(或傳遞)信息到它需要去的地方。模型是這樣的。
If we only have node-level features, and are trying to predict binary edge-level information, the model looks like this.
如果我們只有節點層面的特征,并試圖預測二分類問題邊層面的信息,那么模型看起來是這樣的。
One example of such a scenario is the edge task we specified in Edge level task sub section. Nodes can be recognized as image entities, and we are trying to predict if the entities share a relationship (binary edges).
這種場景的一個例子是我們在邊層面任務子節中指定的邊層面任務。節點可以被識別為圖像中的實體,我們試圖預測這些實體是否共享一個關系(邊的二分類)。
If we only have node-level features, and need to predict a binary global property, we need to gather all available node information together and aggregate them. This is similar to Global Average Pooling layers in CNNs. The same can be done for edges.
如果我們只有節點層面的特征,并且需要預測二分類問題的全局屬性,那么我們需要將所有可用的節點信息收集在一起,并對它們進行聚合。這類似于CNN中的全局平均池化層(Average Pooling)。對于邊也可以這樣做。
This is a common scenario for predicting molecular properties. For example, we have atomic information, connectivity and we would like to know the toxicity of a molecule (toxic/not toxic), or if it has a particular odor (rose/not rose).
這是預測分子性質的一個常見場景。例如,我們有原子信息,連通性,我們想知道一個分子的毒性(有毒/無毒),或者它是否有特定的氣味(玫瑰香/非玫瑰香)。
In our examples, the classification model ccc can easily be replaced with any differentiable model, or adapted to multi-class classification using a generalized linear model.
在我們的例子中,分類模型ccc可以很容易地用任何可微分模型替換,或者使用廣義線性模型適應多分類問題。
Now we’ve demonstrated that we can build a simple GNN model, and make binary predictions by routing information between different parts of the graph. This pooling technique will serve as a building block for constructing more sophisticated GNN models. If we have new graph attributes, we just have to define how to pass information from one attribute to another.
現在,我們已經演示了構建一個簡單的GNN模型,并通過在圖的不同部分之間傳遞信息來進行二分類預測。這種池化技術將作為構建更復雜的GNN模型的基礎。如果我們有新的圖屬性,我們只需要定義如何將信息從一個屬性傳遞到另一個屬性即可。
Note that in this simplest GNN formulation, we’re not using the connectivity of the graph at all inside the GNN layer. Each node is processed independently, as is each edge, as well as the global context. We only use connectivity when pooling information for prediction.
請注意,在這個最簡單的GNN架構中,我們根本沒有在GNN層中使用圖的連通性。每個節點、每條邊以及全局上下文都是獨立處理的。我們只在將信息用于預測時使用連通性。
Passing messages between parts of the graph
圖各個部分之間的信息傳遞
We could make more sophisticated predictions by using pooling within the GNN layer, in order to make our learned embeddings aware of graph connectivity. We can do this using message passing[18], where neighboring nodes or edges exchange information and influence each other’s updated embeddings.
通過在GNN層中使用池化操作,我們可以做出更復雜的預測,使我們學到的embeddings關聯到圖的連通性。我們可以使用信息傳遞[18]來實現這一點,即相鄰節點或邊交換信息并影響彼此的更新embeddings。
Message passing works in three steps:
1.For each node in the graph, gather all the neighboring node embeddings (or messages), which is the ggg function described above.
2.Aggregate all messages via an aggregate function (like sum).
3.All pooled messages are passed through an update function, usually a learned neural network.
信息傳遞工作分為三個步驟:
1.對于圖中的每個節點,收集所有的相鄰節點embeddings (或信息),即上面描述的ggg函數。
2.通過聚合函數(與sum函數類似)聚合所有信息。
3.所有匯集的信息都通過一個更新函數傳遞,通常是一個學習過的神經網絡。
You could also 1) gather messages, 3) update them and 2) aggregate them and still have a permutation invariant operation.[20]
您還可以1)收集信息,3)更新信息,2)聚合信息,并且仍然具有置換不變操作。[20]
Just as pooling can be applied to either nodes or edges, message passing can occur between either nodes or edges.
正如池化操作可以應用于節點或邊一樣,信息傳遞也可以發生在節點或邊之間。
These steps are key for leveraging the connectivity of graphs. We will build more elaborate variants of message passing in GNN layers that yield GNN models of increasing expressiveness and power.
這些步驟是利用圖連通性的關鍵。我們將在GNN層中構建信息傳遞的更復雜的變體,從而產生可提高表達能力和功能的GNN模型。
Hover over a node, to highlight adjacent nodes and visualize the adjacent embedding that would be pooled, updated and stored.
將鼠標懸停在一個節點上,以突出顯示相鄰的節點,并顯示將要合并、更新和存儲的相鄰embedding。
This sequence of operations, when applied once, is the simplest type of message-passing GNN layer.
如果應用一次,這個操作序列就是信息傳遞GNN層的最簡單類型。
This is reminiscent of standard convolution: in essence, message passing and convolution are operations to aggregate and process the information of an element’s neighbors in order to update the element’s value. In graphs, the element is a node, and in images, the element is a pixel. However, the number of neighboring nodes in a graph can be variable, unlike in an image where each pixel has a set number of neighboring elements.
這讓我們想起了標準卷積神經網絡:從本質上講,信息傳遞和卷積是聚合和處理元素鄰居信息以更新元素值的操作。在圖形中,元素是一個節點,而在圖像中,元素是一個像素。然而,圖中相鄰節點的數量可以是可變的,不像在圖像中,每個像素都有一組相鄰元素。
By stacking message passing GNN layers together, a node can eventually incorporate information from across the entire graph: after three layers, a node has information about the nodes three steps away from it.
通過將傳遞GNN層的信息疊加在一起,一個節點最終可以整合來自整個圖的信息: 經過三層后,一個節點就擁有了距離它三步遠的節點的信息。
We can update our architecture diagram to include this new source of information for nodes:
我們可以更新我們的架構圖來包含這個新的節點信息源:
Schematic for a GCN architecture, which updates node representations of a graph by pooling neighboring nodes at a distance of one degree.
一種GCN架構的示意圖,它通過將距離為一的相鄰節點池化來更新圖的節點表示。
Learning edge representations
學習邊表示
Our dataset does not always contain all types of information (node, edge, and global context). When we want to make a prediction on nodes, but our dataset only has edge information, we showed above how to use pooling to route information from edges to nodes, but only at the final prediction step of the model. We can share information between nodes and edges within the GNN layer using message passing.
我們的數據集并不總是包含所有類型的信息(節點、邊和全局上下文)。當我們想要對節點進行預測,但我們的數據集只有邊的信息時,我們在上面展示了如何使用池化操作將信息從邊傳遞到節點,但只在模型的最終預測步驟。我們可以使用消息傳遞在GNN層的節點和邊之間共享信息。
We can incorporate the information from neighboring edges in the same way we used neighboring node information earlier, by first pooling the edge information, transforming it with an update function, and storing it.
我們可以將來自相鄰邊的信息合并到一起,就像我們之前使用相鄰節點信息的方式一樣,首先將邊的信息合并到一起,用一個更新函數對其進行轉換,然后存儲它。
However, the node and edge information stored in a graph are not necessarily the same size or shape, so it is not immediately clear how to combine them. One way is to learn a linear mapping from the space of edges to the space of nodes, and vice versa. Alternatively, one may concatenate them together before the update function.
然而,存儲在圖中的節點和邊的信息不一定是相同的大小或形狀,因此,如何將它們組合起來還不是立即就清楚的。一種方法是學習從邊空間到節點空間的線性映射,反之亦然。或者,可以在更新函數之前將它們連接在一起。
Architecture schematic for Message Passing layer. The first step “prepares” a message composed of information from an edge and it’s connected nodes and then “passes” the message to the node.
消息傳遞層的架構示意圖。第一步“準備”一條由來自邊及其連接節點的信息組成的消息,然后將消息“傳遞”給該節點。
Which graph attributes we update and in which order we update them is one design decision when constructing GNNs. We could choose whether to update node embeddings before edge embeddings, or the other way around. This is an open area of research with a variety of solutions– for example we could update in a ‘weave’ fashion [21] where we have four updated representations that get combined into new node and edge representations: node to node (linear), edge to edge (linear), node to edge (edge layer), edge to node (node layer).
在構造GNN時,我們需要決定更新哪些圖屬性以及更新它們的順序。我們可以選擇是否在邊緣嵌入之前更新節點嵌入,或者反之。這是一個開放的研究領域與各種解決方案——例如我們可以更新“編織”的方式[21],我們有四個更新表示,組合成新節點和邊緣表示:節點到節點(線性),邊到邊(線性),節點到邊(邊層面),邊到節點(節點層面)。
Some of the different ways we might combine edge and node representation in a GNN layer.
在GNN層中結合邊和節點表示的一些不同方法。
Adding global representations
增加全局表示
There is one flaw with the networks we have described so far: nodes that are far away from each other in the graph may never be able to efficiently transfer information to one another, even if we apply message passing several times. For one node, If we have k-layers, information will propagate at most k-steps away. This can be a problem for situations where the prediction task depends on nodes, or groups of nodes, that are far apart. One solution would be to have all nodes be able to pass information to each other. Unfortunately for large graphs, this quickly becomes computationally expensive (although this approach, called ‘virtual edges’, has been used for small graphs such as molecules). [18]
到目前為止,我們所描述的網絡還存在一個缺陷:圖中彼此相距較遠的節點可能永遠無法有效地相互傳遞信息,即使我們應用了多次消息傳遞。對于一個節點,如果我們有k層,信息將在最多k步的距離傳播。在預測任務依賴于距離很遠的節點或節點組的情況下,這會成為一個問題。一種解決方案是讓所有節點都能夠互相傳遞信息。不幸的是,對于大的圖來說,這很快就會增加計算成本(盡管這種被稱為“虛邊”的方法已經被用于小的圖,如分子)。[18]
One solution to this problem is by using the global representation of a graph (U) which is sometimes called a master node[19] [18] or context vector. This global context vector is connected to all other nodes and edges in the network, and can act as a bridge between them to pass information, building up a representation for the graph as a whole. This creates a richer and more complex representation of the graph than could have otherwise been learned.
解決這個問題的一種方法是使用圖(U)的全局表示,它有時被稱為主節點[19] [18]或上下文向量。這個全局上下文向量連接到網絡中所有其他的節點和邊,作為它們之間傳遞信息的橋梁,建立了整個圖的表示。這將創建一個更豐富、更復雜的圖形表示,而不是通過其他方法來學習。
Schematic of a Graph Nets architecture leveraging global representations.
利用全局表示的Graph Nets體系結構示意圖。
In this view all graph attributes have learned representations, so we can leverage them during pooling by conditioning the information of our attribute of interest with respect to the rest. For example, for one node we can consider information from neighboring nodes, connected edges and the global information. To condition the new node embedding on all these possible sources of information, we can simply concatenate them. Additionally we may also map them to the same space via a linear map and add them or apply a feature-wise modulation layer [22], which can be considered a type of featurize-wise attention mechanism.
在這個視圖中,所有的圖形屬性都已經學習了表示,所以我們可以在池化過程中利用它們,方法是將我們感興趣的屬性的信息與其他屬性相比較。例如,對于一個節點,我們可以考慮來自相鄰節點、連通邊和全局信息的信息。為了調整embedding到所有這些可能的信息源上的新節點,我們可以簡單地將它們連接起來。此外,我們還可以通過線性映射將它們映射到相同的空間,并添加它們或應用特征調整層[22],這可以認為是一種特征注意力機制。
Schematic for conditioning the information of one node based on three other embeddings (adjacent nodes, adjacent edges, global). This step corresponds to the node operations in the Graph Nets Layer.
基于其他三種embeddings (相鄰節點、相鄰邊、全局)來調節一個節點信息的示意圖。此步驟對應于圖網層中的節點操作。
參考文獻
[18] Neural Message Passing for Quantum Chemistry Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O. and Dahl, G.E., 2017. Proceedings of the 34th International Conference on Machine Learning, Vol 70, pp. 1263--1272. PMLR.
[19] Relational inductive biases, deep learning, and graph networks Battaglia, P.W., Hamrick, J.B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C., Song, F., Ballard, A., Gilmer, J., Dahl, G., Vaswani, A., Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y. and Pascanu, R., 2018.
[20] Deep Sets Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. and Smola, A., 2017.
[21] Molecular graph convolutions: moving beyond fingerprints Kearnes, S., McCloskey, K., Berndl, M., Pande, V. and Riley, P., 2016. J. Comput. Aided Mol. Des., Vol 30(8), pp. 595--608.
< name = "ref22">[22] Feature-wise transformations Dumoulin, V., Perez, E., Schucher, N., Strub, F., Vries, H.d., Courville, A. and Bengio, Y., 2018. Distill, Vol 3(7), pp. e11.總結
以上是生活随笔為你收集整理的【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门](5)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 余承东独揽大权?曝华为智能车业务迎来多项
- 下一篇: 真心没想到!五旬男子开车撞上护栏:只因刚