Product attributes
Other attributes
AlphaGo Zero, an updated version of AlphaGo, is a computer program developed by Google DeepMind to play the board game Go using reinforcement learning. In March 2016, AlphaGo became the first computer program to beat a world-champion Go player. AlphaGo used search trees to evaluate positions and neural networks to select moves. These neural networks were initially trained using thousands of human amateur and professional games (supervised learning) before using reinforcement learning via self-play. AlphaGo Zero is based solely on reinforcement learning; it doesn't use human data, guidance, or any knowledge beyond the game rules.
Starting from completely random play, AlphaGo Zero used a novel form of reinforcement learning to become its own teacher. The neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo games. This neural network improves the strength of the tree search, resulting in higher-quality move selection and improved self-play with each iteration.
AlphaGo Zero was announced in a blog published by Google DeepMind on October 18, 2017. This was followed by a paper published in Nature on October 19, 2017. The paper titled "Mastering the game of Go without human knowledge" goes into greater detail describing the architecture and training of the AlphaGo Zero algorithm. Starting in a tabula rasa ("blank slate") condition, AlphaGo Zero achieved superhuman performance, winning 100–0 against DeepMind's previous program AlphaGo after only three days of self-play training. After forty days of self-training, it outperformed the upgraded version of AlphaGo known as "Master."