文華閱讀筆記: alphago zero

11/15/2017

alphago zero

https://www.inside.com.tw/2017/11/10/aja-alphago-zero
黃士傑：AlphaGo Zero 只用了三天，就走過人類幾千年圍棋研究的歷程
2017/11/10 李柏鋒 AlphaGo、DeepMind、人工智慧、圍棋、黃士傑

https://www.inside.com.tw/2017/10/20/deepmind-ama-alphago
Deepmind AMA：關於最強 AlphaGo 如何煉成的真心話，都在這裡回答了！
2017/10/20 【合作媒體】雷鋒網 AlphaGo、AMA、DeepMind、人工智慧

https://technews.tw/2017/10/19/alphago-zero-learning-from-scratch/
DeepMind 如約在《Nature》發表這篇論文──名為《Mastering the game of Go without human knowledge》（不使用人類知識掌握圍棋），DeepMind 展示了更強大的新版本圍棋程式「AlphaGo Zero」，驗證了即使像圍棋這最具挑戰性的領域，也可以透過純強化學習的方法自我完善達到目的。人工智慧的長期目標是透過後天的自主學習（註：tabula rasa，意為「白板」，指所有的知識都是逐漸從感官和經驗而來），在一個具挑戰性的領域創造出超越人類的精通程度學習的演算法。

https://www.nature.com/articles/nature24270
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

沒有留言:

張貼留言