Transformer architecture的解释
生活随笔
收集整理的這篇文章主要介紹了
Transformer architecture的解释
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Go Forth And Transform
I hope you’ve found this a useful place to start to break the ice with the major concepts of the Transformer. If you want to go deeper, I’d suggest these next steps:
- Read the?Attention Is All You Need?paper, the Transformer blog post (Transformer: A Novel Neural Network Architecture for Language Understanding), and the?Tensor2Tensor announcement.
- Watch??ukasz Kaiser’s talk?walking through the model and its details
- Play with the?Jupyter Notebook provided as part of the Tensor2Tensor repo
- Explore the?Tensor2Tensor repo.
Follow-up works:
- Depthwise Separable Convolutions for Neural Machine Translation
- One Model To Learn Them All
- Discrete Autoencoders for Sequence Models
- Generating Wikipedia by Summarizing Long Sequences
- Image Transformer
- Training Tips for the Transformer Model
- Self-Attention with Relative Position Representations
- Fast Decoding in Sequence Models using Discrete Latent Variables
- Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Acknowledgements
Thanks to?Illia Polosukhin,?Jakob Uszkoreit,?Llion Jones?,?Lukasz Kaiser,?Niki Parmar, and?Noam Shazeer?for providing feedback on earlier versions of this post.
Please hit me up on?Twitter?for any corrections or feedback.
轉載自:http://jalammar.github.io/illustrated-transformer/
?
總結
以上是生活随笔為你收集整理的Transformer architecture的解释的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: GridSearchCV与Randomi
- 下一篇: python 文件保存读取时不用with