基于神经网络的混合计算(DNC)-Hybrid computing using a NN with dynamic external memory
前言:
基于神經(jīng)網(wǎng)絡(luò)的混合計(jì)算
Hybrid computing using a neural network with dynamic external memory
原文:Nature:doi: 10.1038/nature20101
異義祠:memory matrix :存儲(chǔ)矩陣,內(nèi)存以矩陣方式編碼,亦成為記憶矩陣。
??????????? ?? the neural Turing machine:神經(jīng)圖靈機(jī)[16]。看做是DNC的早期版本。
???????????? ? differentiable attention mechanisms:可微注意力機(jī)制。
???????????? ? The read vector:結(jié)合操作符和數(shù)據(jù)結(jié)構(gòu) 的操作。
使用神經(jīng)網(wǎng)絡(luò)和動(dòng)態(tài)外部存儲(chǔ)器進(jìn)行混合計(jì)算
Hybrid computing using a neural network with dynamic external memory
1. 摘要
- ANN非常擅長(zhǎng)感知處理、 序列學(xué)習(xí)、 增強(qiáng)學(xué)習(xí),而由于外部存儲(chǔ)器的缺失,在表達(dá)變量、數(shù)據(jù)結(jié)構(gòu)和存儲(chǔ)長(zhǎng)時(shí)間數(shù)據(jù)上能力有限。
- 在此我們介紹一種機(jī)器學(xué)習(xí)模型稱為可微神經(jīng)計(jì)算機(jī) (DNC) ,包含一個(gè)可以讀取和寫(xiě)入外部存儲(chǔ)器的神經(jīng)網(wǎng)絡(luò),類(lèi)似于傳統(tǒng)計(jì)算機(jī)的隨機(jī)存儲(chǔ)器。正如傳統(tǒng)計(jì)算機(jī),可以用內(nèi)存來(lái)表達(dá)和操縱復(fù)雜的數(shù)據(jù)結(jié)構(gòu),并且,類(lèi)似于一個(gè)神經(jīng)網(wǎng)絡(luò),依然可以從數(shù)據(jù)中進(jìn)行學(xué)習(xí)。
- 當(dāng)使用監(jiān)督學(xué)習(xí)進(jìn)行訓(xùn)練時(shí),我們可以確定,DNC 可以成功地解答用來(lái)模仿自然語(yǔ)言中的推理和判斷的綜合問(wèn)題。我們可以得到,它可以進(jìn)行任務(wù)學(xué)習(xí),例如查找隨機(jī)圖中指定點(diǎn)之間的最短路徑和推斷的缺失環(huán)節(jié),之后再將這種能力泛化,用于交通線路圖、家譜等特定的圖。
- 使用強(qiáng)化學(xué)習(xí)訓(xùn)練后,DNC 能夠完成移動(dòng)拼圖這個(gè)益智游戲,其中游戲目標(biāo)可以使用序列符號(hào)進(jìn)行表示。
- 綜上所述,我們的成果展示了 DNC 擁有解決復(fù)雜、結(jié)構(gòu)化任務(wù)的能力,這些任務(wù)是沒(méi)有外部可讀寫(xiě)的存儲(chǔ)器的神經(jīng)網(wǎng)絡(luò)難以勝任的。
2. 前言
- 現(xiàn)代計(jì)算機(jī)普遍使用計(jì)算和數(shù)據(jù)分離的計(jì)算體系,計(jì)算和輸入輸出分離。這包含兩個(gè)便利:分層的存儲(chǔ)結(jié)構(gòu)帶來(lái)價(jià)格和存儲(chǔ)的折中。但是變量的讀取和生成需要運(yùn)算器對(duì)地址進(jìn)行操作,不好之處就是,在內(nèi)存動(dòng)態(tài)增長(zhǎng)的網(wǎng)絡(luò)中,網(wǎng)絡(luò)不能進(jìn)行隨機(jī)動(dòng)態(tài)進(jìn)行存儲(chǔ)操作。
- 最近的在信號(hào)處理、序列學(xué)習(xí)、強(qiáng)化學(xué)習(xí)、認(rèn)知科學(xué)和神經(jīng)科學(xué)有很大突破,但在表達(dá)變量和數(shù)據(jù)結(jié)構(gòu)時(shí)受到限制。此文旨在通過(guò)提供一個(gè)結(jié)合神經(jīng)網(wǎng)絡(luò)和外部存儲(chǔ)器的結(jié)構(gòu),結(jié)合神經(jīng)網(wǎng)絡(luò)和計(jì)算處理的優(yōu)勢(shì),方法是聚焦于最小化備忘錄memoranda/內(nèi)存和長(zhǎng)時(shí)間存儲(chǔ)器的接口。整個(gè)系統(tǒng)是可微的,因此可以使用隨機(jī)梯度下降法進(jìn)行端到端的訓(xùn)練,允許網(wǎng)絡(luò)學(xué)習(xí)如何 在有目的行為中操作和組織內(nèi)存。
3.系統(tǒng)概覽
- DNC 是一種耦合到外部存儲(chǔ)矩陣的神經(jīng)網(wǎng)絡(luò)(只要內(nèi)存不被占用完全,網(wǎng)絡(luò)的行為與內(nèi)存塊的大小獨(dú)立|應(yīng)該是使用了分布表進(jìn)行去位置相關(guān)|,因此我們認(rèn)為內(nèi)存是“外部的”)。如果內(nèi)存可以被認(rèn)為是 DNC 的 RAM,網(wǎng)絡(luò)則可以被稱為控制器,CPU可微的操作是通過(guò)梯度下降法直接進(jìn)行學(xué)習(xí)。DNC的早期結(jié)構(gòu),神經(jīng)圖靈機(jī),擁有相似的結(jié)構(gòu),但使用了更受限的內(nèi)存存取方法。
- DNC 架構(gòu)不同于最近提出的Memory networks和Pointer networks的神經(jīng)記憶框架,其區(qū)別在于DNC內(nèi)存有選擇性地可以寫(xiě)入和讀取,允許迭代修改內(nèi)存內(nèi)容。
- 相比傳統(tǒng)計(jì)算機(jī)使用唯一編址內(nèi)存,DNC使用可微注意/分析機(jī)制[2,16-18]定義指派內(nèi)存第N行或者“位置”,在N*W的矩陣M中(這樣直接定義內(nèi)存有問(wèn)題啊),這些分派,這里我們成為權(quán)值,表示此處位置涉及到讀或者寫(xiě)的程度/度量?。讀向量r通過(guò)對(duì)記憶矩陣M的一個(gè)讀權(quán)值操作wr返回( 記憶位置的權(quán)值累加和 ):
- 類(lèi)比,寫(xiě)操作符使用一個(gè)寫(xiě)權(quán)值wW首先擦除向量e,然后加和一個(gè)向量v:
- ??????????????????????? M[ i, j ] <—— M[ i, j ]
- 決定 和應(yīng)用權(quán)值的單元叫做讀寫(xiě)頭。頭的操作可由表1進(jìn)行闡述。
表1 DNC的結(jié)構(gòu)
a,A recurrent controller network receives input from an external data source and produces output.b, c, The controller also outputs vectors that parameterize one write head (green) and multiple read heads (two in this case, blue and pink). (A reduced selection of parameters is shown.) The write head defines a write and an erase vector that are used to edit the N × W memory matrix, whose elements’ magnitudes and signs are indicated by box area and shading, respectively. Additionally, a write key is used for content lookup to find previously written locations to edit. The write key can contribute to defining a weighting that selectively focuses the write operation over the rows, or locations, in the memory matrix. The read heads can use gates called read modes to switch between content lookup using a read key (‘C’) and reading out locations either forwards (‘F’) or backwards (‘B’) in the order they were written. d, The usage vector records which locations have been used so far, and a temporal link matrix records the order in which locations were written; here, we represent the order locations were written to using directed arrows.
a,一個(gè)DNC結(jié)構(gòu)從額外的數(shù)據(jù)源接受數(shù)據(jù)輸入并產(chǎn)生輸出;
b,c,控制器可以寫(xiě)/輸出向量(參數(shù)化一個(gè)寫(xiě)磁頭-綠色)且并聯(lián)一個(gè)讀磁頭(上圖中有兩個(gè),藍(lán)色和粉色)。
寫(xiě)磁頭定義了一個(gè)寫(xiě)和擦除向量(用于編輯N*M內(nèi)存塊),其元素的量級(jí)和符號(hào)通過(guò)塊區(qū)域和shading唯一表示。另外,一個(gè)寫(xiě)鍵用來(lái)查找內(nèi)容去尋找先前寫(xiě)過(guò)的位置(待編輯)。寫(xiě)鍵可以用于定義一個(gè)權(quán)值(有選擇的)確定于寫(xiě)操作在矩陣塊的行或者位置。
讀磁頭可以使用門(mén)(被稱作讀模式)來(lái)進(jìn)行 使用一個(gè)讀鍵(“C”)進(jìn)行內(nèi)容查找,和讀出位置后(使用F鍵進(jìn)行前向搜索或者“B”鍵進(jìn)行后項(xiàng))寫(xiě)入。
d.使用標(biāo)記位置向量 記錄目前已使用位置,一個(gè)緩存鏈接矩陣記錄被寫(xiě)入的順序;圖中,我們使用有向箭頭表示寫(xiě)入的順序。
4 EXPERIMENT SETTINGS
?????? We evaluate the proposed approach on the task of English-to-French translation. We use the bilingual, parallel corpora provided by ACL WMT ’14.3 As a comparison, we also report the performance of an RNN Encoder–Decoder which was proposed recently by Cho et al. (2014a). We use the same training procedures and the same dataset for both models.4
????????? 不要再翻譯了,可能不小心nature會(huì)找上門(mén)來(lái)。
References
1. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems Vol. 25 (eds Pereira, F. et al.) 1097–1105 (Curran Associates, 2012).
2. Graves, A. Generating sequences with recurrent neural networks. Preprint at http://arxiv.org/abs/1308.0850 (2013).
3. Sutskever, I., Vinyals, O. & Le, Q. V. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems Vol. 27 (eds Ghahramani, Z. et al.) 3104–3112 (Curran Associates, 2014).
4. Mnih, V. et al. Human-level control through deep reinforcement learning.Nature 518, 529–533 (2015).
5. Gallistel, C. R. & King, A. P. Memory and the Computational Brain: Why Cognitive Science Will Transform Neuroscience (John Wiley & Sons, 2011).
6. Marcus, G. F. The Algebraic Mind: Integrating Connectionism and Cognitive Science (MIT Press, 2001).
7. Kriete, T., Noelle, D. C., Cohen, J. D. & O’Reilly, R. C. Indirection and symbol-like processing in the prefrontal cortex and basal ganglia. Proc. Natl Acad. Sci. USA110, 16390–16395 (2013).
8. Hinton, G. E. Learning distributed representations of concepts. In Proc. Eighth Annual Conference of the Cognitive Science Society Vol. 1, 1–12 (Lawrence Erlbaum Associates, 1986).
9. Bottou, L. From machine learning to machine reasoning. Mach. Learn. 94, 133–149 (2014).
10. Fusi, S., Drew, P. J. & Abbott, L. F. Cascade models of synaptically stored memories. Neuron 45, 599–611 (2005).
11. Ganguli, S., Huh, D. & Sompolinsky, H. Memory traces in dynamical systems. Proc. Natl Acad. Sci. USA 105, 18970–18975 (2008).
12. Kanerva, P. Sparse Distributed Memory (MIT press, 1988).
13. Amari, S.-i. Characteristics of sparsely encoded associative memory. Neural Netw. 2, 451–457 (1989).
14. Weston, J., Chopra, S. & Bordes, A. Memory networks. Preprint at http://arxiv.org/abs/1410.3916 (2014).
15. Vinyals, O., Fortunato, M. & Jaitly, N. Pointer networks. In Advances in Neural Information Processing Systems Vol. 28 (eds Cortes, C et al.) 2692–2700(Curran Associates, 2015).
16. Graves, A., Wayne, G. & Danihelka, I. Neural Turing machines. Preprint at http://arxiv.org/abs/1410.5401 (2014).
17. Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at http://arxiv.org/abs/1409.0473 (2014).
18. Gregor, K., Danihelka, I., Graves, A., Rezende, D. J. & Wierstra, D. DRAW: a recurrent neural network for image generation. In Proc. 32nd International Conference on Machine Learning (eds Bach, F. & Blei, D.) 1462–1471 (JMLR, 2015).
19. Hintzman, D. L. MINERVA 2: a simulation model of human memory. Behav. Res. Methods Instrum. Comput. 16, 96–101 (1984).
20. Kumar, A. et al. Ask me anything: dynamic memory networks for natural language processing. Preprint at http://arxiv.org/abs/1506.07285 (2015).
21. Sukhbaatar, S. et al. End-to-end memory networks. In Advances in Neural Information Processing Systems Vol. 28 (eds Cortes, C et al.) 2431–2439(Curran Associates, 2015).
22. Magee, J. C. & Johnston, D. A synaptically controlled, associative signal for Hebbian plasticity in hippocampal neurons. Science 275, 209–213 (1997).
23. Johnston, S. T., Shtrahman, M., Parylak, S., Gonc? alves, J. T. & Gage, F. H. Paradox of pattern separation and adult neurogenesis: a dual role for new neurons balancing memory resolution and robustness. Neurobiol. Learn. Mem. 129, 60–68 (2016).
24. O’Reilly, R. C. & McClelland, J. L. Hippocampal conjunctive encoding, storage, and recall: avoiding a trade-off. Hippocampus 4, 661–682 (1994).
25. Howard, M. W. & Kahana, M. J. A distributed representation of temporal context. J. Math. Psychol. 46, 269–299 (2002).
26. Weston, J., Bordes, A., Chopra, S. & Mikolov, T. Towards AI-complete question answering: a set of prerequisite toy tasks. Preprint at http://arxiv.org/abs/1502.05698 (2015).
27. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9,1735–1780 (1997).
28. Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In Proc.26th International Conference on Machine Learning (eds Bottou, L. & Littman, M.)41–48 (ACM, 2009).
29. Zaremba, W. & Sutskever, I. Learning to execute. Preprint at http://arxiv.org/abs/1410.4615 (2014).
30. Winograd, T. Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. Report No. MAC-TR-84 (DTIC, MIT Project MAC, 1971).
31. Epstein, R., Lanza, R. P. & Skinner, B. F. Symbolic communication between two pigeons (Columba livia domestica). Science 207, 543–545 (1980).
32. McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419–457 (1995).
33. Kumaran, D., Hassabis, D. & McClelland, J. L. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends Cogn. Sci. 20, 512–534 (2016).
34. McClelland, J. L. & Goddard, N. H. Considerations arising from a complementary learning systems perspective on hippocampus and neocortex. Hippocampus 6, 654–665 (1996).
35. Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).
36. Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K. & Wierstra, D. One-shot generalization in deep generative models. In Proc. 33nd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 1521–1529 (JMLR, 2016).
37. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proc. 33nd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.)1842–1850 (JMLR, 2016).
38. Oliva, A. & Torralba, A. The role of context in object recognition. Trends Cogn.Sci. 11, 520–527 (2007).
39. Hermann, K. M. et al. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems Vol. 28 (eds Cortes, C. et al.)1693–1701 (Curran Associates, 2015).
40. O’Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Oxford Univ. Press,1978).
總結(jié)
以上是生活随笔為你收集整理的基于神经网络的混合计算(DNC)-Hybrid computing using a NN with dynamic external memory的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 支持向量机的近邻理解:图像二分类为例(3
- 下一篇: 消息称拼多多旗下 TEMU 将在美国开放